Promise UltraTrak woes

So, we have in your department a Promise UltraTrak SX8000 RAID system that we use for backups. This mother can drive 8 PATA hard drives in several RAID modes, and expose them through a SCSI interface. We currently have a RAID 5 array using 4 500 GB Hitachi hard drives and a fifth hard drive, similar to the other 4, to be used as hot swap if one of the other drives commits suicide during the night. This is plugged to a PowerMac G4 that has the boring task of carrying up all our backups while we are all sleeping like babies.

This is a pretty old product, and it was never tested with drives this big, but the fact is that it has been working flawlessly for months now (since we upgraded the original 120 GB drives). Until last week. As I had some empty bays and unused small drives, I thought about doing a second array to store some archive stuff, as the RAID 5 array is getting pretty full.

I remembered that the case controller reboots itself when the user creates a new array, so I opted for turning off the Mac and the Promise itself, and install the disks with everything shut down. You know, this thing has hot swap, but it’s getting old, and we don’t want to push it too far. So let’s play it safe. Big mistake. Big big mistake.

I pop in the drives, press the power switch. Controller boots itself up and, as Steve Jobs would say, boom! The first two drives had the red led of death glowing. I spent a quite dramatic few minutes looking at the damn thing, and thinking that it happened. The least likely, the most feared of all things that could happen on a RAID 5 system had just happened. Two drives failed. At the same time. All the backups, some of them more than one year old, lost. Forever.

I got rational again and thought, no, this can’t happen, this thing did not boot correctly, there is something with the drives I’ve just inserted that is screwing this up. I powered down the case, removed the drives I had just inserted, and powered on again, this time carefully watching all the lights and bells. The Promise RAID, when booting, scans all the bays to see what’s going on there. You can see that happening by watching the drive lights, all of them blink quickly in a slow sequence. Well, the first two weren’t blinking.

I thought, hum, bigger drives then expected, too much time to spin up, the controller is testing them too soon. I powered off the damn thing, removed the first two drives, and reinserted them on lower bays. Another big mistake. The array disappeared. It was lost forever. Looks like the drive position is crucial for the arrays to be recognized. I had just killed what was left of it.

After some moments of desolation, I went to recreate the array again, assuming that the backups were lost I had to start from the ground up. I started the process of creating the array, and then I saw the light. On the little LCD display, the controller had the best of the words I could see on it’s first line: INITIALIZE. It allowed me to choose Yes and No. I stopped for a while, and though, if the hard drives are OK, and if I can create an array without any kind of initialization… all my data will be there! Right?

Power off, insert the drives on the original positions, power on, create array, RAID 5, default block size, initialization OFF, gigabyte boundary on, and GO. The array was created. No activity on the drives whatsoever. Perfect. Reboot. I fired up the G4 and run to the KVM console. OS took ages to boot (actually it took as long as every other time, but the adrenaline was all around). The desktop appeared and… YES! There was the RAID volume, as if nothing had ever happened. I did some quick tests, but that was it. The RAID was back in all it’s glory. Months of backups, saved.

Knowing this, I decided to push my luck a little further and turn the RAID off again. After powering up, the history repeated, first two drives were “dead”. I simply destroyed the array, created a new one without initialization, and I was back in business. Then I turned it off and quickly on again, not allowing the drives to spin down to a full stop. That time, the controller booted up correctly and the array was online.

I went to Promise site to check on this issue, and I see they had released a new firmware that announced to support some newer drives. I installed it (and it was a terrible experience, it started by having to download an older firmware to get the updater software, as Promise forgot to pack the software together with the new firmware on the ZIP archive, and ended up with an old PC with a serial cable plugged to the RAID, two floppy disks – yes, two floppy disks, and yes, we are in late 2008 -, one with DOS, another with the software, and about 30 tries – power cycles on the Promise RAID and software reloads on the PC – to get the serial communication working). After that, I powered off the RAID, waited for complete spin down, powered up again, and everything worked fine. Although I’m not trusting it fully, it looks like the problem might be solved.

So, lessons to learn: if this happens to you, 1) Do not panic (yet); 2) Do not change the order of the drives; 3) Use the LCD display and the buttons to obtain all the settings of the array (block size, gigabyte boundary status, etc); 4) Delete the array; 5) Create a new array with the same settings and initialization off. You should be off the hook by now, unless the problem WAS in fact two drives dying at the same time. Which, you know, doesn’t happen. It just can’t. Really.

Tags: , , , , ,

6 Responses to “Promise UltraTrak woes”

  1. lastmile says:

    Hey, I’m thinking about buying one of these. What’s the total size of the array(s) you’ve got now? The manual says there’s an upper limit of 2.199 TB on total size of all arrays in the unit. I’m wondering if a firmware upgrade might raise this – no reply from Promise yet.

  2. Miguel Arroz says:

    Currently I have a RAID 5 array constituted by four 500 GB hard drives (plus hot-swap), so it’s 1.5 TB of usable space. I still didn’t hit any maximum volume size barrier, but I still didn’t get to the 2.199 TB. Try to obtain a reply from Promise before buying, it’s safer.

  3. lastmile says:

    Thanks. I found a troubleshooting PDF online somewhere that said the 2.199 TB limit was per array. Promise did reply to me but if I hadn’t found that their answer wouldn’t have helped much; not too clear.

    I’d thought about doing two 4x 750 GB RAID 5s. That’d be 2.25 TB for each either dropped to 2.199 TB before formatting or they’d be below the limit after formatting anyway. But 750 GB ATA drives are now ridiculously priced. Add adapters to SATA disks and your at the price of a 1 TB SATA. Use 1 TB SATA plus adapters and you can bet 3x 1TB RAID5 (2 TB), 3x 1 TB RAID5 (2 TB), and 2x 1 TB RAID1 (1 TB) but that’s less space for more money.

    Now I’m considering buying one of these or some similar external array (old ATA-> SCSI boxes are more common with HW RAID controllers than without, and cheaper then newer SATA->SCSI), putting it in JBOD mode, and trying to use software RAID. I’m not concerned with speed. I just want something on the order of 8x 1TB RAID 5 (7 TB) or w/ hot spare (6 TB) that doesn’t require me to build an entirely new PC with more drive bays.

    Thanks again.

  4. Miguel Arroz says:

    Keep in mind the space the SATA->ATA converters will take in each bay. I don’t know if there’s enough space for them, after installing a drive on a bay there’s not much room left.

    • lastmile says:

      Yeah, it will be a tight fit but one of the adapters that fits verticaly against the back of the drive might fit. I haven’t tried to hunt down the info on the actual size of the drive sleds yet.

Leave a Reply