Tuesday, February 16, 2010

A small Fosdem wrapup

The other weekend I was in Brussels for FOSDEM. As you know, this year we had a Mono room on sunday, thanks to the amazing efforts of Ruben Vermeersch and Stéphane Delcroix. The conference was great, as it always is, although as usual as didn't get to see much of the talks on saturday - busy preparing my own talk about Moonlight, and meeting people, which is one of the parts I enjoy most at FOSDEM. Sunday was awesome, full of Mono talks in a nicely packed room. People were very interested, we had great feedback, and everything went very well, including my demos - it was a very good day, and all in all, a great event. On monday we had a special Mono hackday, where we got together and, well, hacked. I sat down with Lucas Meijer of Unity and we went through some of the issues they have embedding Mono, similar to what Moonlight has to do. Lucas decided to stay an extra day just for the Mono hackday, after a lot of chatting and quite a few beers the day before, and I'm so glad he did, it was a very productive, if somewhat short, day.

Over the three days of the event I had the pleasure of meeting, remeeting and chatting with a lot of wonderful people, whom I usually only get to talk to online - Jo Shields, Mirco Bauer, Alan McGovern, Jeremie Laval, Jim Purbrick, Michael Meeks, Mans Rullgard, David "Lefty" Schlesinger, Rob Taylor, Bertrand Lorentz, Massimiliano Mantione, just to name a few and not in any particular order (I just know I forgot a ton of people!). Also got to meet a bunch of portuguese people, like Vânia Gonçalves, Miguel Azevedo, Paulo Trezentos and more - some of them I only get to see at FOSDEM these days, for some odd reason... weird country this is :)

All in all, it was great, I missed the interaction and the chats and the dinners and the talks and the general merryness and learning that is to be had when you're surrounded by a thousand geeks. I hope to see you all again soon!

PS: I somehow got Jérémie's name confused with a known beer brand... which might, or might not be, a good sign! Fixed... :)

Tuesday, February 02, 2010

Solving the gcc 4.4 strict aliasing problems

A couple of days ago Jeff Stedfast ran into some problems with gcc 4.4, strict aliasing and optimizations. Being a geeky sort of person, I found the problem really interesting, not only because it shows just how hard it is to write a good, clear standard, even when you're dealing with highly technical (and supposedly unambiguous) language, but also because I never did "get" the aliasing rules, so it was a nice excuse to read up on the subject.

Basically, the standard says that you can't do this:

int a = 0x12345678;
short *b = (short *)&a;

I'm forcing a cast here, and since the types are not compatible, they can't be "alias" of each other, and therefore I'm breaking strict-aliasing rules. Note that if you compile this with -O2 -Wall, it will *not* warn you that you're breaking the rules, even though -O2 activates -fstrict-aliasing and -Wall is supposed to complain about everything (right??). Apparently, this is by design, though why would anyone not want warnings in -Wall for something that will obviously break code is beyond me. If you want to be told that you're not playing by the rules, make sure you build with -Wstrict-aliasing=2, which will say:

line 2 - warning: dereferencing type-punned pointer will break strict-aliasing rules

So now you know you're being naughty. Of course, if you did try to access the variable, even just with -Wall it will complain at you - this more complete snippet will give you several warnings with -Wall:

int a = 0x12345678;
short *b = (short *)&a;
b[1] = 0;
if (a == 0x12345678)
  printf ("error\n");
else
  printf ("good\n");

line 3 - warning: dereferencing pointer ‘({anonymous})’ does break strict-aliasing rules

The problem gets ugly when you're dealing with structs and pointers to them - then -Wall is completely silent about possible issues, and only -Wstrict-aliasing=2 will work, like in this little snippet:

typedef struct type {
  struct type *next;
  int val;
} Type;

...

Type *t1, *t2, *t3;
t1 = t2 = NULL;
t1 = (Type*) &t2;
int i;
for (i = 0; i < 2; i++) {
  t3 = malloc (sizeof (Type));
  t1->next = t3;
  t1 = t3;
}
if (!t2)
  printf ("error\n");
else
  printf ("good\n");

This doesn't emit any warnings on -Wall because the loop makes it slightly fuzzy for gcc to tell whether things are getting assigned or not. -O2 will optimize away the assignment to t1 on line 3, which will make things not work later on.

So how to fix this? The attribute may_alias allows a type to bypass the aliasing rules, just like character types do (character types are allowed to alias any other type, according to the c99 standard). Changing the definition of Type to the following will make the compiler happy:

typedef struct type {
  struct type *next;
  int val;
} __attribute__((__may_alias__)) Type;

One final note: if you mix up code with aliased types and non-aliased types, gcc will not enforce aliasing optimizations on your non-aliased-possibly-broken code... i.e., if you define this type two times, one with the attribute, one without, and then do the loop above with both types (separately mind you, with separate variables, the code just happens to be in the same method), the non-aliased type won't fail. Aren't optimizations fun?


Update: People have pointed out that the first statement short *b = (short *)&a; is totally legal and has nothing to do with aliasing.

Yes, that's true, I should have been more precise. The statement is perfectly legal. It's when you try to access the data via the pointer that was assigned on that line that breaks the standard. So when your code blows up, it blows up accessing the data, but that's not the cause, that's the consequence. The cause of said explosion is that optimizations + strict-aliasing look at that (totally legal) statement and say "oh, dude, come on, this is bogus" and throw it away while munching on scooby snacks. Well, not sure about that last part.

Anyways, where was I? Oh yes, so, two things: if you don't want to change your code, you can use may_alias , gcc will say "that's so awesome" and everyone will make merry. Or something. The second thing is, and let me add a little emphasis to this part, because I'm sometimes a bit too subtle, and apparently some things should be said *very clearly*: when a statement is perfectly legal, and yet it IS removed via a combination of default flags with NO warnings whatsoever, something is WRONG, and in my opinion, the problem here is lack of warnings.

And that, as someone said, is that. Or not, whatever tickles your fancy. Hmmmm, tickles...

Friday, January 22, 2010

Chrome and Moonlight, or how to deadlock a browser

It's no secret that Moonlight works best on Firefox at the moment - it's our baseline browser, after all - but we've had many requests to add Chrome support, and since it supports NPAPI just like all browsers out there, it should really work out of the box, requiring only some extra code to implement/hackify stuff that Chrome/WebKit doesn't expose and that we need - basically, DOM support and some downloader tweaks.

After some initial positive reports of Chrome loading the Silverlight Chess sample successfully, I decided to run some tests and start working on the WebKit bridge code... only to find out that I couldn't make Moonlight load properly on Chrome on my laptop at all. Even the simplest of test pages would hang forever on our initial splash animation, and killing Chrome would dump stacktraces all over the place. Clearly it wasn't happy about Moonlight.

My first instinct was "I must be doing something wrong", so I tried on another machine. Same thing. Built a Chromium debug build and tried it - even worse, I hit symbol conflicts all over the place. It seems the Native Client plugin is included inside Chromium by default, and it exports all the NPAPI symbols publicly. Any plugin (like Moonlight) which uses a loader and dynamically loads the real plugin from another location will get its calls intercepted by the Native Client plugin, and things will fail badly. After fixing this, it still kept hanging on the splash animation. Asked other people to test it - same thing. 99.8% of the time it deadlocks completely, and in only 0.2% of the time will it actually load properly. I guess the positive reports were just really, really lucky.

Next course of action - debug the thing. Following the instructions on how to debug Chrome on Linux, I learned about the Renderer and the Plugin processes that get spawned (and the Zygote, too :P), and how to debug them. Only it didn't work (of course not, I hear you say, that would have been way too easy), due to a missing condition on an if on the Chrome loader (I'm guessing nobody actually debugs it on Linux? :P) Patch the thing, and yay, we're debugging.

To keep plugins from blowing up and/or generally misbehaving and giving the browser a bad reputation, Chrome runs them on a separate process that communicates with the main rendering process via IPC. This, of course, is a terrain rife with potential race conditions and reentrancy issues, and that's exactly what's happening with Moonlight. Fortunately, unlike most race conditions, the problem was very reproducible under gdb as well, and I was able to get traces of both processes in the middle of the deadlock.

So what is deadlocking? Well, it's actually very simple: the renderer process calls NPP_SetWindow on the plugin, and also does a blocking call at the same time. In NPP_Setwindow, we do NPN_GetValue and NPN_GetProperty, which call back into the renderer process and block... oops.

I wasn't very confident that I could reproduce this without all the Moonlight code, but just in case, and because I wanted to have a nice clean skeleton NPAPI plugin around, I built one, which does nothing but stub out all the required methods to get an empty plugin going. When it gets to NPP_SetWindow, it calls NPN_GetValue and NPN_GetProperty - and it deadlocks pretty much 100% of the time.

I opened issue #32797 on crbug.com, with the small splash plugin test case, if you're curious. Hopefully this will get fixed fast. With all the calls to the browser that we do during execution, I really really hope we don't hit this again... but it's more likely than not that we will :/

While the idea of keeping the plugins under control by shuffling them to the side is a good one, browser devs should keep in mind that, with all the limitations that a plugin is subjected to, with NPAPI being very far from perfect, with browsers implementing it differently, OS differences that plugins have to deal with as well, it's already so difficult to have a performant plugin (and believe me, the last thing we want to do is stall the brower), we shouldn't have to be worrying about potential reentrancy issues and race conditions when doing such simple things as querying the browser for a property value.

Pretty please?

Wednesday, December 02, 2009

Mono Developer Room for FOSDEM 2010!

Some excellent news out of Brussels today, there is going to be a Mono Developer Room at FOSDEM 2010! Call for participation is now open, so come and join us put together an awesome Mono day at FOSDEM! 

Thank you so much to Ruben Vermeersch for spearheading this effort, together with Stéphane Delcroix. You guys rock!

Don't forget, send in your talk!

Saturday, November 21, 2009

New phone, Moonlight almost upon us and other little tidbits from the week

First off, Moonlight news: 2.0 is almost upon us (or upon you, in any case). The official release date is not set yet, but it is going to be in the next two weeks, so if you have bugs that need fixing for the release, speak now or forever hold your peace. Well, not forever forever... you know what I mean :)

A simple phone

This week I got supremely frustrated with my phone(s). I have a very friendly Nokia 6288 which is unfortunately locked-in to Vodafone, and as some of you might know, I got a new phone number from a different operator, which means I can't use my Nokia until I unlock it... and in this country, it's not an easy thing to do. In the meantime, I have some unlocked phones, but they're for emergencies only, really, when I'm travelling and just need to use a local card for a bit, not at all something that I would like to use on a regular basis. And I did try to use them, but it turns out I need a little more from a phone than what a Nokia Prism can provide... boy oh boy is that thing slow :P

Choosing a new phone for me is always hard... it beats clothes shopping in difficulty level/time spent hands down. I take *months* to make up my mind, and of course, by that time they're deprecated and new models are out, so I can never decide :P So this week I was stuck in a shopping center, getting frustrated with my crappy phone yet again, and I happen to walk in to a Fnac store, which the mobile phone section right in front of me. They always say you shouldn't shop for food when you're hungry... well, it turns out that applies to looking at pretty gadgets while being frustrated with your own gadgets, too.

I started browsing, not very impressed by the selection, and I come to a corner with a pretty litte thing called HTC Tattoo. I don't know if it was the name, or the silly android logo on the silvery back of the display model or what, but before I realized it, I was walking out with a new phone.


I love it, it's everything I needed and a ton more. The only gripe I have with it is lack of a foldout keyboard, but the touch keypad is surprisingly accurate for it's small size, so I'm not missing it too much. The parts I like best are having mp3 and ogg as my morning alarm and ringtones, the twitter/facebook/gtalk/gmail/imap/flickr integration with everything and the gorgeous display. Oh yes, it makes calls, too :)

Two seconds into playing with it I found a bug in the messaging software, where when you send an sms, it gives you the option to send more by having an empty box and a send button, and if you hit the send button, it will send an empty sms without any confirmation :P That was pretty much the only hiccup I've had with it so far, it's been a pretty smooth ride. I'm sure I'll have a lot more problems once I start hacking on it though :D

ChromiumOS

Thursday Google released the source of ChromiumOS. That evening I basically couldn't get to sleep, so I figured I might as well go do something boring like trying it out on my EeePC - watching stuff build usually does wonders for my sleeplessness.

Building the thing was pretty straightforward, and in a short while I had it running from USB on the Eee 901. But, alas, no wireless - the 901 comes with a ralink, whose drivers are still relegated to the staging area of the kernel, and it turns out ChromiumOS doesn't include those in the build. So I had to go back, reconfigure the kernel to add the ralink drivers and build again, then build the image, load it on the stick, and boot again - et voilà, wireless was up and running.

Next problem... how to get it to connect to a secure AP - it's not like you can login without network on ChromiumOS (well, yes, you can, but I'm stubborn and don't wanna), and there's no way to configure anything from the login screen... fortunately, wpa_cli was available from the terminal, so after a few choice commands, it connected to the AP.

But still, no login. For some reason the connman DNS proxy refused to talk to the DNS on the router, so it wouldn't resolve any hosts (and it doesn't log anything! there's no docs for it, nothing... essential system-level software that fails without telling me anything is annoying!).

At this point I decided to get it running from the HD, to make it easier to configure stuff, and the installation of the system to HD was completely painless and fast. Another reboot, and we're up and running from HD. Still no DNS, so terminal again to hack up resolv.conf and set up nameservers. ChromiumOS doesn't save wireless configurations, so every reboot means another set of wpa_cli commands, which is annoying. But hey, I have a lot of patience. Or I'm masochistic... one of those.

Finally, net was running, tried to login and... no dice. A quick look at the logs reveals that curl is failing trying to load security certificates :/ At this point I gave in a bit and connected the eee to the ethernet, just to make sure. Same thing. Looks like it's either trying to connect to a stale server, or there's something wrong with my build. So I've seen the login screen (very blue), but nothing else. Maybe tomorrow I'll take another stab at it - I'm curious how the desktop is done.

Didn't help much with the sleep thing, but it was fun. :)

Tuesday, October 20, 2009

Doing the Wave

For the past week I've been doing the Google Wave dance. First impressions are, it's a really interesting mashup of different messaging/content concepts - Wiki meets IM meets Email threading - but it's way too cluttered. The social web evolution tells us that simpler is better, services tend to be straightforward, simple, uncluttered, fast. Google's own web page was a hit precisely because it was simple, clean and to the point, Twitter and all related services are the same thing. Having a huge chunk of my desktop space occupied by one browser window with a bunch of stuff because that's what it takes to be able to interact with Google Wave is way too intrusive. And of course, the fact that it can literally bring the browser to it's knees tells me that it might be a bit too much for a web app.

I'm looking forward to seeing how it evolves, I'm sure a lot of people are getting ideas for better ways to do it... I've had a bunch already. In the meantime, I'm on it as shana.ufie@googlewave.com, feel free to add me to waves, I want to see what people are doing with that thing.

Monday, August 03, 2009

Binding C++ APIs, the COM way

A couple of days ago, during a routine "aaagh, we still don't have a nice way to do C# bindings for C++ APIs" discussion, Miguel asked me how hard would it be to leverage COM to bind C++ APIs. I've been known to mess around with COM, as when I did Mono.WebBrowser/Gecko C# bindings, but I never did get around to do little test apps to try and streamline the whole process of using COM to bind a C++ API, so I jumped at the chance and got some interesting results.

COM, despite all the bad connotations surrounding it, is actually really simple: it is just a contract stating that any COM-conforming C++ class has at least 3 methods: QueryInterface, AddRef and Release. No matter how many members the class might have, those 3 are always present at the top of the class' vtable, so Mono's COM interop layer always knows where they are and can invoke them directly. And since the vtable layout for the class is known, any other method on that class can also be invoked in this way, bypassing name-mangling and other issues.

COM-comforming C++ classes can be described in C# via interfaces that have the same layout as the C++ class, so Mono knows exactly where the methods are in the vtable when invoking. Furthermore, COM support is pretty much transparent in C# - once you've defined your interfaces, you don't even realize you're using a COM object, it's just another object that you invoke methods on. Mono does all the marshalling for you, so you don't have to pass IntPtrs around, you just use the types you defined and everything will be marshalled for you behind the scenes.

Show me the code!

Let's say you have a little C++ library you'd like to use from C#:

class File {
public:
  int Open();
  int Close();
};

The C++ COM class

The first thing you need to do is create a COM class which will serve as a proxy between C# and your nice little library.

class COMFile {
public:
  virtual int QueryInterface (void* id, void** result) {
    *result = this;
    return 0;
  }
  virtual int AddRef () { return 1; }
  virtual int Release () { return 0; }

  virtual int Open () { return file->Open(); }
  virtual int Close () { return file->Close(); }

  COMFile (File* f) : file(f) {}

private:
  File* file;
};

All methods that need to be "exported" are marked as virtual, and the layout is what you would expect: the 3 methods on top that make this a COM class, plus the 2 methods that are proxying the calls to the library's File class.

AddRef and Release are standard refcounting methods - these will be called by Mono as needed when you invoke things that end up creating objects of this type. I'm just returning fixed values here, but it's important to note that when Release makes the refcount go to 0, the object should be released.

QueryInterface allows Mono's COM interop layer to figure out if a pointer can be cast to a specified type - via behind the scenes magic (and a little code), it enables a dynamic type system. This example is very simple and doesn't use inheritance, but with a complex binding you'll certainly have inheritance, and there is where QueryInterface comes in, for instance allowing for upcasts if your COM class inherits from several different classes.

You'll notice in the C# interface below that it is marked with a Guid - this id is unique to every class, and your C++ class definition should also have the same id. When QueryInterface is invoked, the id argument is the Guid of the type you want to cast to, so you can check if your C++ class is of the correct type by comparing ids, or if it is a subclass and you need to cast the result (or you don't support it at all, in which case you'd return null).

The C# interface

[Guid ("00000000-0000-0000-0000-000000001111")]
[InterfaceType (ComInterfaceType.InterfaceIsUnknown)]
[ComImport()]
public interface COMFile {
  [PreserveSigAttribute]
  [MethodImpl (MethodImplOptions.InternalCall, MethodCodeType = MethodCodeType.Runtime)]
  int Open();

  [PreserveSigAttribute]
  [MethodImpl (MethodImplOptions.InternalCall, MethodCodeType = MethodCodeType.Runtime)]
  int Close();
}

"Wait a minute... where are the 3 methods?", I can hear you thinking. Well, on the C# side you don't need them. The interface is marked ComImport(), so Mono already knows it's a COM class and it will add them for you, no questions asked. The C# interface only needs the definitions of the methods you want to access, and nothing else.

Putting it all together

Now that you have all your definitions in place, the only thing you need is to get a reference to a COMFile object. For this you're going to need to add a P/Invoke call to a C function on your proxy code that gives you a pointer to an instance of that class. You only need to do that for top level objects, because any objects that are returned via COM calls are directly available to you.

C++ proxy library

extern "C" {
  COMFile* getptr() {
    return new COMFile (new File ());
  }
}

C# test app

[DllImport ("myglue")]
[return:MarshalAs(UnmanagedType.Interface)]
static extern COMFile getptr();

public static void Main() {
  COMFile file = getptr();
  int return = file.Open();
  ...
}

And there you go, the "file" variable is now talking to your COM class which is proxying all calls directly to your library. The glue code is very straightforward and can be easily autogenerated.

You can download a complete working sample here:
comtest-0.1.tar.gz
comtestsharp-0.1.tar.gz

Build and install both packages to the same prefix, then go to $prefix/lib/ and do " ln -s comtestsharp/* . ". Then just do"mono comtestsharp.exe" and you should see the output of the Open and Close calls.

Update: BTW, neither QI nor AddRef/Release are actually implemented properly in this little sample. The unused parameter "id" is the Guid that is getting requested, and QI should always check it against the current instance.