Dr. Optimistic Locking

Or How I learned to stop worrying and lock the EOObjectStoreCoordinator

So those of you who read the WebObjects Developer mailing list know I’m a bit obsessed about concurrency issues. Namely concurrent data write operations, specially when you have lots of derived data on the database due to performance issues.

A few months ago, I wrote an article about handling concurrent data access in WebObjects applications that included a nice solution for the problem. The thing is… it’s wrong. It’s all wrong. It doesn’t work, in the sense that it won’t give you any guarantee that you won’t discard upgrades made by other threads or instances of the application. I have to thank Chuck that looked at the code and said “doesn’t work”. Chuck is right, by definition, so when he says something doesn’t work, it just doesn’t. And he was right again.

So, in that article, I avoided the problem created by an editing context delaying the merge of the changes made by other contexts to its objects to the moment when the context is unlocked. But I made a very important mistake: I assumed WebObjects implementation of optimistic locking worked. It doesn’t. Unfortunately, optimistic locking is broken in WO due to a design flaw.

The problem is simple: you can lock everything you want. You can lock your contexts, you can use classic Java locking, you can do whatever you want. But, as long as you don’t lock the EOObjectStoreCoordinator, you can’t trust optimistic locking. Because the row snapshots are unique to a EOF stack, you can have a thread in the middle of a critical part of your code, like I showed on the previous article, and have another thread changing those snapshots simply by doing any operation that brings fresh data from the database, like a fetch specification with the refreshesRefetchedObjects option turned on. Unless you use classic Java locking to lock everything in your code where you might eventually do a refreshing fetch (which is insane), you cannot trust the row snapshots that will be used as a base of comparison for optimistic locking to be unchanged during your code critical segments.

If this sounds too confusing, think in the following example. We have an entity with attribute ATT that we want to increment using optimistic locking. Remember that the ideia is the following: we want to make sure that, if two or more threads increment the value at the same time, only one of the threads will succeed, all the others will receive an optimistic locking exception, and effectively abort the transaction (or repeat it).

So, we have two instances, INST1 and INST2 running. Assume we increment the value inside a method like the one I wrote in the previous article. Let’s now run some steps:

INST2 reads ATT with value = 3. The row snapshot in the INST2 EOF stack has ATT = 3, as expected.
INST1 reads ATT with value = 3. Again, the row snapshot in INST1 EOF stack has ATT = 3.
INST2 increments ATT to 4, and saves the result to the database. It will naturally succeed, updating the DB and it’s own snapshot to ATT = 4. We can forget INST2 from now on.
INST1, in another thread, runs a fetch specification with refreshesRefetchedObjects on. As the ATT value in the database is now 4, that thread will have ATT = 4 on the editing context that fetched the object, and (here’s the problem) the row snapshot will be updated to what’s currently in the database, which is ATT = 4.
INST1, in the original thread, finally increments the value of ATT to 4 (remember, it was 3 on it’s editing context), and tries to save. It will use the row snapshot do to the optimistic locking comparison. So, what it will do, in a rough translation of SQL, is “update att to 4 where entityId is some Id and att is 4“. The red 4 comes from the row snapshot.
As the value on the database is now 4, the save will succeed. The final value of ATT will be 4 and not 5, as expected after being incremented in both threads.

The problem here is the assumption EOF does when saving changes of attributes with optimistic locking enabled. EOF assumes that the row snapshots represent the original status of the objects when they were fetched to the context being saved. So, it compares the current database status with the original status to detect changes. The problem is that assumption is simply not true. There can only be one row snapshot in an EOF stack for a given row, and nothing will guarantee the snapshot doesn’t change in the middle of the critical section.

Facing this, there’s only one thing you can do: make that assumption become true. And to do that, you need to either use a different EOF stack per thread (very very high resource consumption, not only in memory but in DB connections too) or use the “big” lock in the stack. That big lock is the lock in the EOObjectStoreCoordinator. If you need absolute reliability when managing concurrent data access, you’ll have to lock EOObjectStoreCoordinator before fetching (and refreshing, of course) the objects you will change and only unlock it after saving those objects. You can do that with code like this:

EOObjectStore osc = someEditingContextOnThisThread.rootObjectStore();
try {
  osc.lock();
  // Do as I wrote on the previous article:
  // create a new editing context with fetch timestamp
  // set to "now", fetch all your objects, process them,
  // and try to save them, handling OL exceptions.
} finally {
  osc.unlock();
}

Note that I put osc.unlock() call in the “finally” segment of the try/catch wrapper. You need to take extra care to make sure that, whatever happens, you will not leave the EOObjectStoreCoordinator locked, or all your application’s database access will simply be stopped forever.

Of course, there’s a downside for this solution: while you are processing the objects inside the critical code segment, no other thread will be able to access the database, in read or write mode. I can give you two advices on this.

First, the obvious one: keep that processing as simple and fast as possible. Remember that long operations should be done in background threads with their own EOF stack or, better yet, in separate “maintenance” applications. This technique should be used only for fast operations, or the users of your application will get bored waiting.

Second, create multiple EOF stacks (a reasonable amount, like 3 or 4). This will speed up things a bit, because it will reduce the probability of a thread having to wait for another one. If you have 4 EOF stacks and you enter a critical part of your code on one of them, 75% of your threads (and that means 75% of your users) won’t take the hit. The downside is more memory usage. You’ll have multiple copies of the row snapshots, plus all the objects needed to handle the database connection. And, of course, you may have to increase the number of maximum simultaneous connections on the database, which may force you to increase the maximum shared memory and semaphores on your operating system configuration. The good part is that, if you are using Wonder, creating the EOF stacks themselves is as simple as adding a property to your Properties file. That’s right! Just set er.extensions.ERXObjectStoreCoordinatorPool.maxCoordinators=x, being x the number of EOF stacks you want. That’s it, really. It just works, and it tries to minimize weirdness by creating all the editing contexts on a session in the same stack. It’s that good. It’s Wonder!