Archive for August, 2007

Recovering from optimistic locking exceptions

Friday, August 17th, 2007

WARNING: the solution presented in this article is wrong. It’s a partial solution only, it won’t always work, and you may screw up your data. To see why, check my second article on this issue.

There are many situations, in simple web applications, where you don’t really have to deal with concurrent data update. Imagine you are writing a blog app, like WordPress. Even if two persons are changing a post title at the same time (which, by it’s nature, it’s a really rare event), you, as the developer, may simply don’t care about it, and rely on the “last write wins” strategy.

When you are writing more complex stuff, specially when one needs to have really complex data being modified by different users, in different situations, at the same time, and when you have pre-calculated data, because grabbing all the objects and calculating the results in real time (fast enough for a page load) is simply not possible, you will have serious headaches (it’s already hard enough to understand this sentence without re-reading it!).

When developing WebObjects, there are two main classes of concurrency problems to solve. Intra-instance concurrent data updates, and inter-instance concurrent updates. It would appear that the second is much harder than the first… well… it’s not.

Intra-instance updates should be simplified by automatic data merging between contexts. But there are some problems. The biggest problem is that, in fact, it’s possible to totally overwrite a data modification done by another user without getting any notification of that at all. How? See this thread on the WebObjects Dev mailing list. The biggest problem is that, while a context is locked, data merging simply does not occur. So, imagine you lock context A and start modifying data. At the same time, someone locks context B on another thread, changes the same data that you are modifying in context A, and saves it. The data you changed is merged on all the other contexts where the same objects are present, as long as they are unlocked. But remember, your context A isn’t, so no merging will occur for now. Now, imagine you continue working on context A, and finally save it, and unlock it. The merging will occur now – to late, because you already saved, and probably won’t work any more in that context.

By now, you say “Naaah, that won’t happen, because WebObjects will trigger an optimistic locking exception”. No it won’t. Why? Remember that context B saved while context A was locked, right? So, when B saves, the row snapshots of the EOF stack will be updated. When you save context A, assuming that no one else (besided the A and B guys) worked on the same data at the same time, EOF will base the OL check on the row snapshots that is has in the stack. Remember, this are no more the snapshots for objects in context A. They are the snapshots caused by B saving, and those will be the same that exist in the DB. So, no OL exception will occur. Congratulations, A just trashed B data updates, and wasn’t notified at all.

Are you already scared? Good. So be even more scared. There are only two solutions for this. The first one, create a lot of separate EOF stacks (one per session, assuming the sessions are being locked in the normal WO way). This sucks, because it’s too resource expensive. It will use a lot of memory (lots or repeated snapshots in memory) and it will open many connections to the DB, that may be a problem by itself. The second solution… classic Java locking (syncronized keyword, etc). This is the time you start thinking about being a farmer, right?

Well, the news on inter-instance concurrency are better, although far from perfect. Here, there’s no data merging, and no Java syncronized stuff. Everything will be based in locking. Usually optimistic locking, but you can also use pessimistic locking (ie, “real” row-database locking). Pessimistic locking has huge problems and many experienced coders will recommend that you stay away from it. So I will not cover that, and I’ll assume you want to use Optimistic locking.

The ideia is simple: create context, fetch data, modify data, save. If you get a locking exception, re-fetch data, re-modify, and re-try to save it until no OL exception is thrown.

This is simple for simple stuff, ie, when you have one single object causing the failure. Now imagine the following scenario (and this isn’t imaginary, it occured to me): you are working with dozens of objects thay may cause OL exceptions, and if one of them causes it, all the others will probably cause too. Also, I need to create some objects (relationships, mostly, but also some individual objects) depending on what I’m doing. If I get an OL exception, I must go back, delete those objects, and create new ones, because the fact that I create, or not, an specific object depends on the data already present on the data storage. In theory, this will not be a problem: create context, fetch data, modify data, create objects (and save the created objects in a temporary array), save, get OL exception, delete all the created objects, re-fetch data, re-modify data, re-create objects, save. Right? Wrong. Unfortunately I’ve hit some obscure WebObejcts bug, as you can see in this WODev thread. This is the time you start browsing the net looking for a farm to buy, if you don’t already have one.

After a lot of experiencies, I think I have reached a reasonable way to handle this in a simple manner (remember, you will always have to think about intra-instance problems, but we are dealing with inter-instance now). Instead of fighting with WO about deleting created objects, and getting outdated data when one was supposed to get fresh one, simply don’t worry about it, and create everything from scratch every time you try to save. The code will look something like this:

synchronized (lock) { // Solving intra-instance problems,
                      //your millage may vary on this, of course
  int tries = 0;

  // Make MAX_SAVE_ANSWERS_TRY_COUNT something reasonable, like 50
  while( tries < MAX_SAVE_ANSWERS_TRY_COUNT ) { 

    // Setup context
    EOEditingContext context = new EOEditingContext();
    context.setFetchTimestamp(System.currentTimeMillis());

    // Make sure nothing "strange" will happen, this is a simply delegate
    // that blocks merging. Probably it won't be needed but I'm paranoid.
    context.setDelegate(new NoMergingECDeletage());

    // Register the EC in my session lock manager, to handle locking and
    // garbage collection automatically. Your millage may vary, specially
    // if using Wonder in a smarter way than I do.
    session.lockManager().registerEditingContext(context); 

    // Get local copies of objects
    < here you get all the local copies of your objects. Remember the
    context.setFetchTimestamp(System.currentTimeMillis()); line above?
    This will guarantee that ALL the objects you get in this context will
    contain fresh data. You can use EOUtilities.localInstanceOfObject,
    walking trough relationships like object.otherObject().anotherOne(),
    using fetch specifications, whatever. Everything will be fresh. >

    // Let's do it.
    < Do your thing here. Create objects, modify objects, delete objects,
    go crazy. >

    // Try to save
    try {
      context.saveChanges();
      return;
    } catch (EOGeneralAdaptorException saveException) {
      ++tries;

      // isOptimisticLockingFailure is basically the method
      // with the same name in Apple docs
      if( Util.isOptimisticLockingFailure(saveException) ) {

        NSLog.out.appendln("Optimistic locking exception");
        // Note that I don't refault anything here. I don't need it.
        // On the next iteration, a new EC will be created, and all
        // the objects will be automatically refaulted.
      } else {
        // It's some other exception, handle it somewhere else
        throw saveException;
      }
    }
  }

  throw new RunTimeException("Could not save after " +
MAX_SAVE_ANSWERS_TRY_COUNT + " tries. Help me.");
}

At the beggining, I didn’t like this approach at all. But after using in some places, and seeing it work perfectly, I’m liking it more and more. The main advantage is, it works. No strange problems, no fighting with WO bugs, no strange refaulting behaviour. It’s a dream come true. It just works. But, of course, it has some disadvantages. All the objects are fetched from the data store, with the consequent performance hit. So, if you are dealing with many many objects, this may be undoable for you. Also, it has a more serious problem if you are lazy like me: you cannot have an object binded to the page component, and use it in the processing. The reason is that you need to bring that object “inside” the newly created EC, and localInstanceOfObject won’t copy the unsaved modifications, and will not even copy newly created objects. So you have to manually copy and reproduce all the user modifications to the local copies of objects. More work for you, more work for the CPU, more objects in memory. It’s life.

Written at the beautiful and peaceful town of Serpa. Not on an iPhone.

Testing memory

Sunday, August 12th, 2007

I wrote some days ago about badblocks for testing a hard drive surface. Now, the same for memory.

As I said, I bought a second-hand PowerMac G5 to replace my old G4. When I got the new machine, I run Apple Hardware Test (AHT), using the Extended Test. AHT tested my hardware, including the 2.5 GB of RAM, taking more than two hours (and making a hell of a noise, because during tests, the G5 ventilation system works in failsafe mode, which means, full power). Everything seemed to be fine. Until I installed Retrospect. I use Retrospect to make all my backups at home, and despite all it’s quirks, it always worked fine on the G4. Since I installed it on the G5, I got strange errors (the famous “internal consistentcy check”) and even crashes.

After nailing down all the possibilities (trashing preferences and existing backup sets, reinstalling Retrospect, etc) I suspected it could be an hardware problem, because I was told that the “internal consistentcy check” appears when the backup set contents are corrupted. So, I thought, my hard drive is corrupting data. I duplicated one backup set with about 80 GB, and surprise – after duplicating and running an md5 checksum on it (and diff), the files were different! This was NOT supposed to happen, naturally. So I tried the same thing on my boot drive – same problem. Ops… it’s not the drives. So, if it’s not the drives, and supposing (more exactly, praying) that it was not a motherboard issue, it must be the memory.

All my collegues at IST System Adminstration team use memtest on PCs to test the memory. This great distribution of memtest has a really nice touch: you can burn this on a CD, and boot the PC from it. It takes less that 200K of RAM, so all the other memory will be tested. Unfortunately, it’s not possible to boot it in Macs (not even Intel Macs – I tried it!). So, you must get the Mac OS X version of memtest, and boot the OS in Single User mode (using command-S during the boot sequence). The OS will take about 50 MB of RAM, which is, of course, much worse than the 200K used by the PC version, because those 50 MB will simply not be tested. But it’s better than nothing.

The official site for the Mac OS X version of memtest is here, but unfortunately, the author requires you to pay a small ammount for the download. I don’t like the approach very much because I don’t really know what I’m buying. the author says that, after paying, he sends a password for the encrypted DMG you downloaded. But I cannot download without paying, because the link is no-where. So… what happens when a new version comes out? Do I have to pay it again? Well, anyway, someone else is distributing memtest for OS X for free. Yes, it’s legal, because the software is under GNU license. So, if you don’t want to pay, just click here and grap your own free copy. Happy testing!

By the way, some tests take a lot of time. Let all of them run. Don’t assume the fact that all the “quick” tests passed means your memory is OK. Some problems may only be found with the more complex and slower tests – that’s why they are there. So, let it run. And if you have a G5, get the hell out of there, or use ear-plugs. It won’t be a nice office to work during testing, trust me.

memtest will detetct lots of common problems in memories, and will probably identify more than 99% of the defective memory modules arround. But never forget: it’s impossible to be entirely sure that a memory module is OK, simply because it’s not possible, in a reasonable time frame, to test all the possible combinations of data. Also, memory may pass all the tests in a day, and fail the next day. There are many factors that may trigger a hidden problem in memory modules: temperature, electrical flutuations, the data it contains, age, etc. If you suspect you have a bad memory module, and if you have time, run memtest for several days in a row, using the option to do many passes.

Mac-compatible ethernet card

Friday, August 3rd, 2007

For those of you who need an ethernet card that works with Mac OS X, this may be a useful tip: Mac OS X has a built-in driver for the RealTek RTL8139 chip. I looked arround and found this Netgear card, based on that chip. It’s not Gigabit, but I wanted it to connect my “new” PowerMac G5 to the ADSL modem, so 100 Mbps is fine. It’s not PCI-X, but it complies to the PCI 2.x specification, which means it will work on the G5 PCI-X slots (although the entire PCI bus will work at PCI speed, not PCI-X). It’s cheap, it works, and I can have the G5 doing all the NAT routing stuff, as I like.