Rants: Multicores and lock-free programming

Personal computers are getting multi-core, meaning that a single chip can have 2, 4 or even 128 CPUs.

This is important for software writters, since multithreading up to 1024 threads will not be uncommon and all the synchronization problems is a big roadblock in the way of thread-heaven.

Synchronization problems are of 2 kinds:

1. Race conditions: Not getting what you want.
2. Contention: Getting what you want, but too late.

Usually race conditions are alleviated using locks: mutexes, semaphores, read-write-locks, condition variables, monitors, etc. The problem with all these is that they introduce contention in one way or the other, because it is implicit that they make a thread wait while the other executes the 'critical section'.

Ok, so this seems a problem without a solution, go now home, get a beer and enjoy The Simpsons.

Still here? Hmmm, ok, there is lock free synchronization, which is really a form of obscuring the problem.

See: http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-639.pdf

At the hart of lock free algorithms lies the best of lies: They do not remove the locks, they just make them smaller and smaller, until you can not see the locks.

Let me explain with an example of the database kind: In the old days of databases (1997), it was common to write software that used temporary tables, so you hired 30 guys, each wrote a piece of the action using temporary tables and finally when you integrated everything, you realized that the whole system froze whenever there was a user triggering work over a temporary table.

Why? Because all databases store their metadata in tables (this is a requierement to be called "relational database"), and everytime you did an insert in a table, the table would lock until you either commit or rollback. This "feature" is because databases are transactional. (The was a joke in those days "It is not a bug, it is a feature").

Microsoft SQL Server guys realized people were using databases like these and instead of issuing a warning to the computer software makers, they released a new version that did not lock the entire temporary meta-table when somebody did an insert. (The meta-table is the table of the table), so that two processes could create 2 temporary tables and continue to work in parallel.

Soon, this improvement was done for all tables, not just the temporary meta-table. The result was more paralellism, but please notice that a big temporary lock was just replaced with a smaller temporary lock, that was, conceptually, the real advance.

Why can't we do this for all software, that is, convert all those big locks in really smaller ones?

What if I needed to perform a really big change in a data structure, but instead of locking the whole data-structure, locked just the elements to be modified. Any modification could still lock(because others were modifying it) or even dead-lock (because we grabbed all our locks in different order), but then we could make one of those threads to succeed and restart the other thread (undoing the changes).

Undoing changes in the database world is cheap, undoing them in the systems programmers world is hard, because neither the language nor the runtime help. But someone came up with the idea that we could just copy everything in a temporary area in RAM, and just using a switch of pointers (called CAS), we could have instant gratification.

The data structure thus copied is simply let go if the CAS operation finds that the structure had been modified before CAS had a chance to execute, therefore the only lock is the CAS and in case of failure it is easy to restart the whole operation.

In fact this means that a very big piece of software is not aware of multithreading because it is using thread-safe data structures that are lock-free, and those data structures that are thread-safe but lock-free, work by just making copies of everything and switching pointers at the end, therefore reducing the amount of lock from several millisecs to a few nanosecs.

This of course makes CPU manufacturers happy and software developers happy, because they can deliver software that is written as if it was in one thread, but performs better than carefully written multithreaded software.

Rants

miércoles, 11 de julio de 2007

Multicores and lock-free programming

No hay comentarios:

Archivo del blog

Datos personales