Recursive mutex are great

Today, I committed an important change to lubyk: the global lock on the Lua state now uses a recursive mutex. If you happen to read forums on multi-threading and the like, you will probably encounter some fierce advocates against such locks saying things such as “if you know your code, you don’t need recursive mutexes” or “you don’t need mutex at all”.

All these trolls are probably right in their own realm, but in a Lubyk process, code is multi-threaded but not concurrent (many threads access the Lua state, but only one at a time) and we use multi-threading to ease GUI and I/O integration, not for faster data processing (for this we use mutiprocessing). Ok, so we need a global mutex. Now, what kind of mutex do we need ?

The threads in a Lubyk process take hold of the Lua state and should only release their hold in well known places such as:

  • waiting for data (“recv” in network socket)
  • when their task is finished
  • when sleeping (timer or explicit “sleep”)

The previous implementation used a non-recursive (fast) mutex. This went smoothly until we started working on the gui library that you can use in Lubyk to create nice graphical interface based applications. As long as the calling pattern is simple, our non-recursive mutex worked fine:

C++                     Lua
  paint ---> [lock] --> Lua callback (paint)
        <--- [unlock]

But what happens if the Lua callback does more complex things (such as creating widgets, hiding them, etc) in a “click” operation ?

C++                      Lua
  click  ---> [lock] --> Lua callback (click)
  ...    <---            create widget                         
  paint  ---> [deadlock !]

The “paint” operation is an example, the library does not paint on widget creation, but the principle holds: after some time, when the complexity grows, deadlocks appear.

The previous solution was to release the lock in any C++ method called by Lua that might reenter Lua state (the “new widget” method in our example). Adding tons of “ScopedUnlock” code to manage this (and having to guess where such locks might occur) was a pain. But the worst part of it is that we opened the door for data corruption: whenever the lock would be released, another thread could get hold of the Lua state and our code is not meant to be concurrent so weird problems could have occurred.

A recursive mutex in our case, despite it’s bad reputation, does the work perfectly: it keeps our Lua state well protected, avoids the need for opened doors in unwanted places and avoids deadlocks when working with the graphical interface library (the excellent Qt).

Gaspard Bucher

comments

  1. leave a comment