Switched to mono-threading

One mutex and many OS threads

The previous version of Lubyk was heavily using multi-threading for every type of input, event or timers such as network sockets, graphical interface events, mDNS notifications, timers, etc.

The model implied a global mutex for each process and was roughly like this:

select

This model is easy to understand and easy to code but has many drawbacks that only appear when things become increasingly complex. The main drawbacks are:

  • Non-deterministic execution (especially problematic in case of bugs or on startup/close).
  • Possibility of dead-locks (see Recursive mutex are great for a solution).
  • Cannot execute arbitrary commands from any thread (ex. GUI object creation).
  • Needs more resources (thread context switching cost).
  • Blocking IO can result in denial of service if this is not worked out properly.

One thread and many file descriptors

Thanks to the talks with the great people at Lua Workshop 2011, I finally understood how to get rid of my many OS threads. The trick is to replace the global mutex by a global select or poll loop. This also implies that all sockets IO is non-blocking.

select2

The details of the implementation are not trivial (code here), because we need to have a system with easy to use semantics (if possible equivalent to the multi-threaded version) and the scheduler has to work in the following environments:

  • With Qt as main event loop
  • With Qt + ZeroMQ
  • By using only ZeroMQ with a single poll

Since lubyk always starts without the graphical interface loaded, we also needed to enable a seamless transition from a ZeroMQ based poll to Qt’s main event loop (with lots of already running sockets and timers).

patch

Patch and GUI running with the new Scheduler

Using Lua coroutines instead of OS threads was not an easy transition (had to rewrite important parts of the socket and mDNS code). The most difficult part being memory management:

  • With a central scheduler, how to enable garbage collection of threads and sockets (since they are always reachable from the scheduler) ?

The solution to the memory management issues was solved by using Lua weak tables and proper cleanup code on threaded object garbage collection.

Gaspard Bucher

comments

  1. leave a comment