Example race condition in Node.js
I've had time to evaluate the current crop of software platforms and decided to focus my energy on JavaScript and Node.js. This decision surprises me a little, but our understanding of JavaScript in the industry has come a long way in the past few years, and the more familiar I am with Node.js, the more impressed I become.
Ghost is one of the larger open source Node.js applications, and it is a good place to find ideas on how to use the platform to create a signficant application. I have been studying the Ghost code base and created a few patches to scratch some itches I had with it.
While reviewing the code I found a problem which I believe occurs at a few different places and with one of my own patches -- the application has race conditions. This isn't a knock on Ghost, which is a well written and tested project, but it does show
how easy it is to create race conditions and how tricky they can be to rectify.
Race conditions certainly are not unique to Node.js. Multi-threaded and multi-process web platforms have similar issues, but it is important to realize that Node.js's evented, single-threaded architecture does not prevent race conditions from occurring either. Although Node.js apps are single threaded, they are concurrent. While much of the currency is handled by Node.js and the operating system itself, that concurrency can still result in race conditions.
The problem: Database writes depend on the current database state
Race conditions occur most commonly in web applications when writes depend on
the current state of the database.
What happens is an application handling a request queries for the current state, makes an assumption about the system based on that queried state,
and then performs a write based on that assumption. Simultaneously, after the state has been queried, but before the new state is written, another request modifies the underlying state, invalidating the assumptions of the first request.
While much of the infrastructure for multi-user support exists in Ghost, multi-user support isn't ready, so Ghost limits the number of user accounts to exactly 1. This is done by first checking to see if a user exists. If it doesn't, the new user account is created.
The general logic flow is as follows:
query to see if a user exists
if no users exists then
insert a new user record in the database and redirect to admin page
else
provide an error message to the user that no new users can be created
Each access to the database is done asynchronously, and the application flow continues with a call-back when the database operation has been completed. This frees up Node.js to handle other requests while the I/O operation is occurring.
During this time, one or more requests could query the database for the number of users, determine that no users exist in the database, and then go ahead and create the new user in the database, violating the constraint.
I created a simple Node.js application which shows this behavior using SQLite. The problem can be reproduced with a trivial concurrent client. The first time the client is run 3 requests simultaneously create user. The second time it is run, the error is determined, and an exception is thrown, terminating the server. This can been shown to occur in a single run of the client by increasing the number of concurrent connections created.
Depend on your database
Much has been written about atomic writes, counters, sharded counters, etc., but for an application like Ghost, which already is heavily dependendant on a relational database, I'd recommend allowing the database to do the counting and/or enforce the constraints. I think the cleanest solution would be a conditional INSERT based on the number of rows in a table, but conditional INSERTs are not standard or universally supported.
But all SQL databases I've used have auto-incremented fields. I understand the arguments against using them, and this happens to be a special case where they can be applied, but SQLite automatically creates them for every table anyway, unless developers explicitly ask it not to. The auto-incremented values can be used at the application level to detect when multiple rows have been created to a table, or to prevent multiple rows from being added by using a CHECK clause with CREATE TABLE.
To enforce the constraint with CHECK the logic is as follows:
create the user table with auto incremented id and check that id is 1
query for a user
if the query returns that no users exist then
insert the new user into the database
if the insert returns a constraint violation
query for the existing user and return it
else if a user exists
return the queried user
This has one significant drawback. If the user is for deleted from the database, it cannot be added back in, because the auto incremented id may not be 1. This is purely a one shot operation when creating the database, but ghost never deletes the user, so in practice this isn't a signficant problem.
It also has a couple minor drawbacks. It requires changing the schema, and it checking the error value returned from the database, which is different for every database engine.
Another option would be to always create the user with the assumption it will fail if there is more than one.
In conclusion, the single-threaded, event-driven architecture of Node.js, doesn't instrinsically prevent race conditions. While this is just one example, it follows a common pattern of updating database state based on stale assumptions about the current data. If your application uses a relational database, then use the database's atomic operations to prevent race conditions. I realize this is easier said then done, especially if you are attemping to support multiple databases, but it is worth the effort to protect your application's integrity.