Recovering from optimistic locking exceptions

WARNING: the solution presented in this article is wrong. It’s a partial solution only, it won’t always work, and you may screw up your data. To see why, check my second article on this issue.

There are many situations, in simple web applications, where you don’t really have to deal with concurrent data update. Imagine you are writing a blog app, like WordPress. Even if two persons are changing a post title at the same time (which, by it’s nature, it’s a really rare event), you, as the developer, may simply don’t care about it, and rely on the “last write wins” strategy.

When you are writing more complex stuff, specially when one needs to have really complex data being modified by different users, in different situations, at the same time, and when you have pre-calculated data, because grabbing all the objects and calculating the results in real time (fast enough for a page load) is simply not possible, you will have serious headaches (it’s already hard enough to understand this sentence without re-reading it!).

When developing WebObjects, there are two main classes of concurrency problems to solve. Intra-instance concurrent data updates, and inter-instance concurrent updates. It would appear that the second is much harder than the first… well… it’s not.

Intra-instance updates should be simplified by automatic data merging between contexts. But there are some problems. The biggest problem is that, in fact, it’s possible to totally overwrite a data modification done by another user without getting any notification of that at all. How? See this thread on the WebObjects Dev mailing list. The biggest problem is that, while a context is locked, data merging simply does not occur. So, imagine you lock context A and start modifying data. At the same time, someone locks context B on another thread, changes the same data that you are modifying in context A, and saves it. The data you changed is merged on all the other contexts where the same objects are present, as long as they are unlocked. But remember, your context A isn’t, so no merging will occur for now. Now, imagine you continue working on context A, and finally save it, and unlock it. The merging will occur now – to late, because you already saved, and probably won’t work any more in that context.

By now, you say “Naaah, that won’t happen, because WebObjects will trigger an optimistic locking exception”. No it won’t. Why? Remember that context B saved while context A was locked, right? So, when B saves, the row snapshots of the EOF stack will be updated. When you save context A, assuming that no one else (besided the A and B guys) worked on the same data at the same time, EOF will base the OL check on the row snapshots that is has in the stack. Remember, this are no more the snapshots for objects in context A. They are the snapshots caused by B saving, and those will be the same that exist in the DB. So, no OL exception will occur. Congratulations, A just trashed B data updates, and wasn’t notified at all.

Are you already scared? Good. So be even more scared. There are only two solutions for this. The first one, create a lot of separate EOF stacks (one per session, assuming the sessions are being locked in the normal WO way). This sucks, because it’s too resource expensive. It will use a lot of memory (lots or repeated snapshots in memory) and it will open many connections to the DB, that may be a problem by itself. The second solution… classic Java locking (syncronized keyword, etc). This is the time you start thinking about being a farmer, right?

Well, the news on inter-instance concurrency are better, although far from perfect. Here, there’s no data merging, and no Java syncronized stuff. Everything will be based in locking. Usually optimistic locking, but you can also use pessimistic locking (ie, “real” row-database locking). Pessimistic locking has huge problems and many experienced coders will recommend that you stay away from it. So I will not cover that, and I’ll assume you want to use Optimistic locking.

The ideia is simple: create context, fetch data, modify data, save. If you get a locking exception, re-fetch data, re-modify, and re-try to save it until no OL exception is thrown.

This is simple for simple stuff, ie, when you have one single object causing the failure. Now imagine the following scenario (and this isn’t imaginary, it occured to me): you are working with dozens of objects thay may cause OL exceptions, and if one of them causes it, all the others will probably cause too. Also, I need to create some objects (relationships, mostly, but also some individual objects) depending on what I’m doing. If I get an OL exception, I must go back, delete those objects, and create new ones, because the fact that I create, or not, an specific object depends on the data already present on the data storage. In theory, this will not be a problem: create context, fetch data, modify data, create objects (and save the created objects in a temporary array), save, get OL exception, delete all the created objects, re-fetch data, re-modify data, re-create objects, save. Right? Wrong. Unfortunately I’ve hit some obscure WebObejcts bug, as you can see in this WODev thread. This is the time you start browsing the net looking for a farm to buy, if you don’t already have one.

After a lot of experiencies, I think I have reached a reasonable way to handle this in a simple manner (remember, you will always have to think about intra-instance problems, but we are dealing with inter-instance now). Instead of fighting with WO about deleting created objects, and getting outdated data when one was supposed to get fresh one, simply don’t worry about it, and create everything from scratch every time you try to save. The code will look something like this:

synchronized (lock) { // Solving intra-instance problems,
                      //your millage may vary on this, of course
  int tries = 0;

  // Make MAX_SAVE_ANSWERS_TRY_COUNT something reasonable, like 50
  while( tries < MAX_SAVE_ANSWERS_TRY_COUNT ) { 

    // Setup context
    EOEditingContext context = new EOEditingContext();
    context.setFetchTimestamp(System.currentTimeMillis());

    // Make sure nothing "strange" will happen, this is a simply delegate
    // that blocks merging. Probably it won't be needed but I'm paranoid.
    context.setDelegate(new NoMergingECDeletage());

    // Register the EC in my session lock manager, to handle locking and
    // garbage collection automatically. Your millage may vary, specially 
    // if using Wonder in a smarter way than I do.
    session.lockManager().registerEditingContext(context); 
                        
    // Get local copies of objects
    < here you get all the local copies of your objects. Remember the
    context.setFetchTimestamp(System.currentTimeMillis()); line above?
    This will guarantee that ALL the objects you get in this context will
    contain fresh data. You can use EOUtilities.localInstanceOfObject,
    walking trough relationships like object.otherObject().anotherOne(),
    using fetch specifications, whatever. Everything will be fresh. >

    // Let's do it.
    < Do your thing here. Create objects, modify objects, delete objects,
    go crazy. >

    // Try to save
    try {
      context.saveChanges();
      return;
    } catch (EOGeneralAdaptorException saveException) {
      ++tries;

      // isOptimisticLockingFailure is basically the method
      // with the same name in Apple docs
      if( Util.isOptimisticLockingFailure(saveException) ) {

        NSLog.out.appendln("Optimistic locking exception");
        // Note that I don't refault anything here. I don't need it.
        // On the next iteration, a new EC will be created, and all
        // the objects will be automatically refaulted.
      } else {
        // It's some other exception, handle it somewhere else
        throw saveException;
      }
    }
  }

  throw new RunTimeException("Could not save after " + 
MAX_SAVE_ANSWERS_TRY_COUNT + " tries. Help me.");
}

At the beggining, I didn’t like this approach at all. But after using in some places, and seeing it work perfectly, I’m liking it more and more. The main advantage is, it works. No strange problems, no fighting with WO bugs, no strange refaulting behaviour. It’s a dream come true. It just works. But, of course, it has some disadvantages. All the objects are fetched from the data store, with the consequent performance hit. So, if you are dealing with many many objects, this may be undoable for you. Also, it has a more serious problem if you are lazy like me: you cannot have an object binded to the page component, and use it in the processing. The reason is that you need to bring that object “inside” the newly created EC, and localInstanceOfObject won’t copy the unsaved modifications, and will not even copy newly created objects. So you have to manually copy and reproduce all the user modifications to the local copies of objects. More work for you, more work for the CPU, more objects in memory. It’s life.

Written at the beautiful and peaceful town of Serpa. Not on an iPhone.