Category Archives: UNIX

Amanda: recovering a Mac OS X client

This is the third post about Amanda, an open source backup system for UNIX-based computers. The previous two posts were a general introduction to Amanda inner workings, and instructions for configuring a Mac OS X amanda client.

In this post I’ll explain how to recover from a catastrophic failure, like when a hard drive dies. Although much of the steps are identical, this post is focused on how to recover the entire file system and not a small set of files that an user accidentally deleted. To recover something like that, you can simply run amrecover on the server, recover the files you need and transfer them using SFTP or any other protocol to the client machine.

Continue reading

Amanda: installing a Mac OS X client

In my previous article, I presented Amanda, its basic concepts, and how does it compare to Time Machine. Now, I’ll give you an example of how to install and configure a Mac OS X machine to be an Amanda client. The next post will explain how to properly recover after a catastrophic failure.

As with any UNIX tool, Amanda can be compiled, installed and configured in a lot of different ways. How you should do it depends on your needs, so don’t feel pressured to do everything in the same way I did, as it’s not “the right way”, just one way. Also, everything I describe here should work on the Leopard or Snow Leopard versions of either Mac OS X or Mac OS X Server on Intel or PowerPC Macs. I’m not sure about previous versions of the OS, but you may find more information about those in the Mac OS X installation notes page of the Amanda wiki.

Continue reading

Amanda on Mac OS X

Given that Retrospect 8 is essentially a piece of crap, I’ve been searching for an alternative I can use when Time Machine is not an option for backing up Macs. The main two points I’m focused on is reliability and speed. I want a backup system I can trust that won’t take the age of the universe to recover a file.

I’ve been using Amanda for a while now to backup all the Macs in our workgroup (10 machines) and so far I’m nothing but happy. Amanda is an open source backup system for UNIX-based operating systems, Mac OS X included (I believe it can also backup Windows clients, but I couldn’t care less).

Continue reading

Avoid escaping URLs in Apache rewrite rules

Today we started hacking some Apache rewrite rules to make some URLs a little more friendly. All of the URLs we are rewriting are entry points for our application, which in the WO world means direct actions.

All of them contain queries parts. These are the arguments in a URL, like http://domain.com/something?arg=value&anotherArg=anotherValue. Well, everything was running fine until we accessed an URL with a escaped character. We had the following rule:


RewriteRule ^/optOut(.*) /cgi-bin/WebObjects/OurApp.woa/wa/optOut$1 [R]

When we tried to access the URL http://domain.com/optOut?email=me%40domain.com, we got:

http://domain.mac/cgi-bin/WebObjects/OurApp.woa/wa/optOut?email=me%2540domain.com

Note the last part or the URL. The problem is that the original % in %40 (the escape code for @) was being itself escaped, leading to a corrupt URL. The result of calling formValueForKey on this was me%40domain.com.

After googling for a while, I found out many people have this problem, but surprisingly I could not find a decent solution. I saw hacks with escape internal functions and PHP weird variables that obviously wouldn’t help me at all, and I was starting to get nervous.

Well, while reading the rewrite module documentation, I found the solution right there: the NE option. According to the docs:

‘noescape|NE’ (no URI escaping of output)
This flag prevents mod_rewrite from applying the usual URI escaping rules to the result of a rewrite. Ordinarily, special characters (such as ‘%’, ‘$’, ‘;’, and so on) will be escaped into their hexcode equivalents (‘%25′, ‘%24′, and ‘%3B’, respectively); this flag prevents this from happening. This allows percent symbols to appear in the output(…)

Well, this is it. I saw so many pages with people asking about this problem without getting any answers that I was not very confident on this, but I tried to change the rule for:


RewriteRule ^/optOut(.*) /cgi-bin/WebObjects/OurApp.woa/wa/optOut$1 [R,NE]

And that’s it. It’s now working as it should, producing the correct URL:

http://domain.mac/cgi-bin/WebObjects/OurApp.woa/wa/optOut?email=me%40domain.com

Apparently, it’s that simple.

Testing memory

I wrote some days ago about badblocks for testing a hard drive surface. Now, the same for memory.

As I said, I bought a second-hand PowerMac G5 to replace my old G4. When I got the new machine, I run Apple Hardware Test (AHT), using the Extended Test. AHT tested my hardware, including the 2.5 GB of RAM, taking more than two hours (and making a hell of a noise, because during tests, the G5 ventilation system works in failsafe mode, which means, full power). Everything seemed to be fine. Until I installed Retrospect. I use Retrospect to make all my backups at home, and despite all it’s quirks, it always worked fine on the G4. Since I installed it on the G5, I got strange errors (the famous “internal consistentcy check”) and even crashes.

After nailing down all the possibilities (trashing preferences and existing backup sets, reinstalling Retrospect, etc) I suspected it could be an hardware problem, because I was told that the “internal consistentcy check” appears when the backup set contents are corrupted. So, I thought, my hard drive is corrupting data. I duplicated one backup set with about 80 GB, and surprise – after duplicating and running an md5 checksum on it (and diff), the files were different! This was NOT supposed to happen, naturally. So I tried the same thing on my boot drive – same problem. Ops… it’s not the drives. So, if it’s not the drives, and supposing (more exactly, praying) that it was not a motherboard issue, it must be the memory.

All my collegues at IST System Adminstration team use memtest on PCs to test the memory. This great distribution of memtest has a really nice touch: you can burn this on a CD, and boot the PC from it. It takes less that 200K of RAM, so all the other memory will be tested. Unfortunately, it’s not possible to boot it in Macs (not even Intel Macs – I tried it!). So, you must get the Mac OS X version of memtest, and boot the OS in Single User mode (using command-S during the boot sequence). The OS will take about 50 MB of RAM, which is, of course, much worse than the 200K used by the PC version, because those 50 MB will simply not be tested. But it’s better than nothing.

The official site for the Mac OS X version of memtest is here, but unfortunately, the author requires you to pay a small ammount for the download. I don’t like the approach very much because I don’t really know what I’m buying. the author says that, after paying, he sends a password for the encrypted DMG you downloaded. But I cannot download without paying, because the link is no-where. So… what happens when a new version comes out? Do I have to pay it again? Well, anyway, someone else is distributing memtest for OS X for free. Yes, it’s legal, because the software is under GNU license. So, if you don’t want to pay, just click here and grap your own free copy. Happy testing!

By the way, some tests take a lot of time. Let all of them run. Don’t assume the fact that all the “quick” tests passed means your memory is OK. Some problems may only be found with the more complex and slower tests – that’s why they are there. So, let it run. And if you have a G5, get the hell out of there, or use ear-plugs. It won’t be a nice office to work during testing, trust me.

memtest will detetct lots of common problems in memories, and will probably identify more than 99% of the defective memory modules arround. But never forget: it’s impossible to be entirely sure that a memory module is OK, simply because it’s not possible, in a reasonable time frame, to test all the possible combinations of data. Also, memory may pass all the tests in a day, and fail the next day. There are many factors that may trigger a hidden problem in memory modules: temperature, electrical flutuations, the data it contains, age, etc. If you suspect you have a bad memory module, and if you have time, run memtest for several days in a row, using the option to do many passes.

Bad blocks? badblocks!

There are not many things I miss from Mac OS 9. But there’s one that was really useful: the ability to test a hard drive surface. OS 9 disk formatter (I don’t even recall it’s name) had a “Test Disk” option that would perform a surface scan of the selected hard drive. That was awsome to test for bad blocks on the drives.

Unfortunately, that’s impossible to do with Mac OS X, at least with it’s built-in software. There are some commerical applications to do that (like TechTool Pro), but I get a little pissed off when I have to spend a lot of money buying a software that does a zillion things when all I want is surface scans, and specially when I could do it with the “old” OS and not with the new powerful UNIX-based one.

Well, Linux has the badblocks command that will do just that: test the disk surface for bad blocks. It’s a simple UNIX command, so I thought there must be a port of that to OS X (and, of course, I could try to compile it in OS X as last resource). After some googling, I found out badblocks is part of the ext2fs tools. And, fortunately, Brian Bergstrand has already done the port to OS X, including a nice installer.

The installer installs all the ext2fs stuff, including an extension that will allow you to access ext2fs volumes on OS X. As always, this is a somewhat risky operation. Personally, I avoid as many extensions as I can, because they run too close to the kernel for me to feel confortable. So, if possible, install it on a secondary OS (like an utility/recover system on an exteral hard drive, or so).

The badblocks command will be installed in /usr/local/sbin/badblocks, and it will probably not be on your PATH, so you have to type the entire path when using, or edit your PATH environment variable.

Usage is simple. First, run the “mount” command, so that you know the device names for the drives you want to test. You can obtain something like this:

arroz% mount
/dev/disk0s3 on / (local, journaled)
devfs on /dev (local)
fdesc on /dev (union)
on /.vol
automount -nsl [142] on /Network (automounted)
automount -fstab [168] on /automount/Servers (automounted)
automount -static [168] on /automount/static (automounted)

The internal hard drive is /dev/disk0 (note that /dev/disk0 is the entire drive, /dev/disk0s3 is a single partition). Imagining you want to test the internal hard drive you would type the command (as root):

badblocks -v /dev/disk0

This would start a read-only test on the entire volume. The -v is the typical verbose setting, so you may follow what’s happening. This will take a long time, depending on the hard drive you use. For a 160 GB hard drive, it took between 2 and 3 hours in a G5 Dual 2 Ghz.

I mention this because time is an important factor when testing hard drives! You should run badblocks on a known-to-be-in-good-condition hard drive, so that you can get the feeling of how fast (or slow) badblocks is. Later, if you test a possibly failing hard drive, and badblocks progresses notably slower, it will probably mean that the hard drive is in bad condition (even if it doesn’t have badblocks).

After running the command, you may get two results: your disk has, or hasn’t badblocks! :) You will see many outputs of a successful surface scan, so I leave here an example of a not-so-successful one:

/usr/local/sbin arroz$ sudo ./badblocks -v /dev/disk0
Password:
Checking blocks 0 to 156290904
Checking for bad blocks (read-only test): 120761344/156290904
120762872/156290904
120762874/156290904
done
Pass completed, 3 bad blocks found.

This is the result of a test on a 160 GB hard drive with 3 bad blocks.

After getting something like this, you may try to run badblocks again, in write mode. Note that this will destroy all the information you have on the hard drive! badblocks won’t copy the information to memory, and than back to disk. It simple destroys it. The point of running a write-enabled badblocks check is forcing the hard drive to remap the damaged sectors. Hard drives have a reserved space to use when bad blocks are found. The bad blocks are remapped to that reserved space, until it fills. And this will only happen on a write. So, run badblocks in write mode, and then again in read-only mode. If badblocks finds no bad blocks, your hard drive is fine (for now). If badblocks still finds bad blocks, it means that there are so many damaged blocks on the disk surface that the reserved area is full. Forget it, and throw the disk away. It’s useless.

Emulating a slow internet link for http

When developing a Web Application, many times you need to test how will it react then operating over a slow internet connection, like 56k modem, GPRS or any other situation where the bandwidth is tight. This is important not only to test how long will those funky pictures take to load, but as AJAX gets more and more used everywhere, you really need to guarantee that everything works acceptably on slow network links, including AJAX calls, timeouts, animations, and everything else that might depend on the server.

There are some graphical applications for Mac OS X that try to do that, but many don’t work as promised. Well, Mac OS X has the solution built-in, you just have to configure it, and that it the internal firewall (ipfw). Since the introduction of Tiger, ipfw can now do QoS (Quality of Service) stuff, and that means you can create channels (or pipes, in ipfw terminology) and control the priority, bandwith, delay and some other settings of those channels. Than, you simply have to pump the traffic you want to see slowed down thru those channels.

I have done two simple scripts to configure my firewall automatically. They will slow down the http port (80) to a crawl. I do not handle the SSL port for https traffic, but it’s easily achieved by duplicating the script content and adjusting the parameters.

Here are the scripts. The first one, that slows down everything:


#!/bin/sh

/sbin/ipfw add 100 pipe 1 ip from any 80 to any out
/sbin/ipfw add 200 pipe 2 ip from any 80 to any in
/sbin/ipfw pipe 1 config bw 128Kbit/s queue 64Kbytes delay 250ms
/sbin/ipfw pipe 2 config bw 128Kbit/s queue 64Kbytes delay 250ms

And the one that brings your network back to normally:


#!/bin/sh

/sbin/ipfw delete 100
/sbin/ipfw delete 200

As you can see, this slows down all your http traffic to 128 Kbps, and introduces a delay of 250ms (which is roughly what you have on slow, modem-like or wireless connections). The delay can really impact the performance of your application, specially if you do a lot of requests to get information to display on a web page. You can change the values accoring to your needs, and even bring the 128 Kbps down to 56 or even less if you have the guts! I still don’t know exactly what infuence is caused bu changing the queue size, but the 64 KB value produces a nice result.

Don’t forget, if you are just making a static web page, for this to work, you have to turn on the apache web server (the easiest way if to turn on Web Sharing in the System Preferences “Sharing” panel) and browse your pages thru it. Direct file access won’t naturally be affected by the firewall. Also, this slows down ALL your http traffic. Not only the traffic that comes from your computer, but also the traffic that comes from the Internet. So, if you are wondering why is your net really slow, the solution is simple: you forgot to run the second script after finishing your tests! :) If for some reason you cannot recover the normal speed, just reboot and your Mac will be back to normal.

To run this scripts, just copy them to two text files (using any text editor that saves in TXT – RTF won’t do it!) and then make then executable (using the “chmod u-x ” command on the shell). Also, you have to run then as root (using “sudo” or switching to root before running them).

Backups, rsync, and –link-dest not working

I use Retrospect to backup most of the machines at GAEL. You may wonder why do I use a commercial tool that still shows it’s OS 9 roots, instead of open source alternatives. Well, Retrospect has some cool advantages (namely the very good support of laptops that may be disconnected abruptely from the network while a backup is in progress). Also, when I first did this setup, Amanda and other tools did not work reliably with Mac OS X file format.

While this works to backup all the desktop workstations and laptops of GAEL members, I have a problem with our xServe. It runs Mac OS X Server, and Retrospect will not backup machines with the Server version of Mac OS X with the license we have. To do that, we would have to buy a much more expensive license.

No problem. A server, due to it’s nature, doesn’t have the “sudden disappearing” problem of the laptops, so I can use a “classic” UNIX approach – and my choice was rsync and the –link-dest option. You may read about this option in the rsync manpage, but in case you don’t know, what it does is the following: instead of synchronizing a directory in the usual way, it will create a new directory with a new file tree. But, to save space, it won’t copy the non-updated files from the old tree to the new one. Instead, it creates hard links, so that both entries in the file system point to the same data on the hard drive (to the same inode), thus saving space. So, everytime you update your backup, you will create a new tree, but you will only waste the space required by the files that were updated since the last backup, and some more space for the filesystem structures that support the directory tree. You can use a command like this:


// rotate old dirs
rm -rf /Volumes/Storage/test/test.5
mv /Volumes/Storage/test/test.4 /Volumes/Storage/test/test.5
mv /Volumes/Storage/test/test.3 /Volumes/Storage/test/test.4
mv /Volumes/Storage/test/test.2 /Volumes/Storage/test/test.3
mv /Volumes/Storage/test/test.1 /Volumes/Storage/test/test.2
mv /Volumes/Storage/test/test.0 /Volumes/Storage/test/test.1

/usr/bin/rsync --rsync-path=/usr/bin/rsync -az -E -e ssh --exclude=/dev/\* --exclude=/private/tmp/\* --exclude=/Network/\* --exclude=/Volumes/\* --exclude=/private/var/run/\* --exclude=/afs/\* --exclude=/automount/\* --exclude=/.Spotlight-V100/\* --link-dest="/Volumes/Storage/test/test.1" "root@my.machine.com:/Users/arroz/TestDirectory" "/Volumes/Storage/test/test.0/"

Side note: the -E option (capital E) is an option present on Mac OS X rsync version, that forces rsync to copy all the extended Mac file system attributes, including resource forks. It only exists in Mac OS X 10.4 (Tiger) or newer versions. If you are still using 10.3 (Panther) or older, use rsyncx. Do not use rsyncx with Tiger.

Until about a week ago, my backup machine (an old PowerMac G4) had an external SCSI Raid with 640 GB, and an internal RAID 0 (2 * 80 GB drives), besides the boot disk. All the Retrospect backups were being placed on the external RAID, and the server backups were going to the internal RAID 0. Now, I know it’s living on the edge to backup to a RAID 0. But there was really no more space, and it was a temporary situation, because the new drives for the external RAID were already ordered.

When the new drives arrived, I stored all the backups where I could for some days (640 GB was huge when we purchased the RAID, but today is relatively managable), switched the drives and created a new fresh RAID 5. Formatted it in the HFS+ file system, and copied back all the backups, including the server backups and finally trashed the internal RAID 0.

Some path adjustements on my server backup scripts, and we are back in business. But the RAID free space was getting dramatically shorter every day. I used the ‘ls -i’ command to compare the inodes of files that were supposed to be unchanged from the backup of a day to the other in the next day, and as I suspected, rsync was duplicating all the files, instead of hard-linking them.

After Googling a lot, I could not find answers for this. I tried to see if ‘cp -la’ would successfully create hard links, but to my surprise, I found out that the Mac OS X built-in ‘cp’ command would not support the “l” option. Nice. Before installing the GNU ‘cp’ version (and because I’m lazy and I didn’t want to do that) I started thinking about everything I had done since the new drives arrived. The OS was the same, the rsync command was the same, it worked before, so it had to work now. The only reason why it could be not working was because rsync, somehow, thought that all the files were changing, even when they did not.

Suddenly, the solution poped up in my head. Mac OS X has an option, associated to every HFS+ volume, called “Ignore ownership on this volume”. This is turned off by default on the boot drive, but it’s turned on by default on all the external drives you format. There’s a good reason for this: Mac OS X is a consumer product. And average users want to buy an external drive, store data on it, bring it to another Mac, and read their data. They don’t care if their UID is the same on both machines or not.

But this causes serious problems to rsync. Althought the file system will store the owner of the files, it probably won’t report it to the applications who try to read it (or will mask them to the user who’s trying to access them). Somewhere this information is filtered, between the file system and application layers. So, rsync was not getting the real UID of the files. As the files that came from the server had real UIDs, both UIDs wouldn’t match, and rsync would create a new copy because, from it’s point of view, the file had been changed.

The solution was simple – just going to the machine console, and “Get Info” of the external volume. I turned off the “Ignore ownership on this volume” setting, and rsync started operating normally again.