Archive for the ‘Mac OS X Server’ Category

Amanda on Mac OS X

Thursday, July 22nd, 2010

Given that Retrospect 8 is essentially a piece of crap, I’ve been searching for an alternative I can use when Time Machine is not an option for backing up Macs. The main two points I’m focused on is reliability and speed. I want a backup system I can trust that won’t take the age of the universe to recover a file.

I’ve been using Amanda for a while now to backup all the Macs in our workgroup (10 machines) and so far I’m nothing but happy. Amanda is an open source backup system for UNIX-based operating systems, Mac OS X included (I believe it can also backup Windows clients, but I couldn’t care less).

Amanda was originally designed to backup to tapes. Today, since hard drives became cheap and are a great media for backup, Amanda also supports virtual tapes. A virtual tape is a directory on disk that essentially acts as a tape, storing raw information. While configuring an Amanda server, you create a set of tapes of an arbitrary size (100 GB, for instance) and Amanda will use at least one tape per day, eventually rotating trough all of them.

I’ll now lay down some considerations about how Amanda works and how does that reflect on usage and backup planning. Also, I’ll do some comparisons with Time Machine.

Scheduling

In amanda, you define one or more configuration files. Each configuration can have a set of hosts (a backed-up machine), and for each host a set of disks (a disk is any directory you want to backup, it’s not needed to match Amanda disks with physical disks or logical HFS+ volumes). You generally run a backup operation that will backup all the disks of all the hosts on that configuration file.

Launching a backup operation is done by simply executing an UNIX command from whatever mechanism you prefer (cron, launchd, manually, etc). This means you can define how often does your backup run, and at what time. Usually, a daily backup is performed, but you may want to backup every 6 hours or only once per week, depending on how often does your data change and how bad is it to loose a few hours of work. This means that, in Amanda, the server is proactive and initiates the backup procedure whenever you set it to. In Time Machine, backup operations are initiated by the clients every hour. The server is simply a dumb file server where backups are stored.

Storage

Amanda uses the tar format to store data on a virtual tape. For each backed-up disk Amanda creates a tar file inside the tape with data. Amanda stores data in incremental fashion, using some redundancy as well. This is implemented trough the concept of backup levels. The base backup (where all the files on a disk are backed-up) is level 0. From there, backup levels are incremented, and each level essentially means it contains the incremental changes relative to an archive in the immediate lower level. This means a level 3 backup contains the changes relative to a level 2 backup. That level 2 backup contains the changes relative to a level 1 backup which contains, as expected, the changes relative to the level 0 backup.

Amanda decides on the level is should operate on based on a complex planner that considers several factors that result in a decision. You may add some configuration options to customize the weight of some of those factors in the final planner decision. Essentially, Amanda tries to obtain the right balance between reliability (which, in this context, means the probability of recovering a backup successfully) and used disk space. Reliability decreases as the backup level increases, due to the fact that, if you want to recover a level 5 backup, you need to have the consistent level 0, 1, 2, 3, 4 and obviously 5 backups, because each of those build upon the previous one. A level 0 backup is more reliable in the sense that you only need the level 0 backup itself to recover. Of course, a level 0 backup is an “everything” backup. If you do a level 0 backup every day, you’ll use a lot of disk space.

This is an example of a production Amanda installation for a host called bergman and volume / (the root):


date                host    disk lv tape or file file part status
2010-07-01 06:05:24 bergman /     2 DAILYS-6       15  1/1 OK
2010-07-02 04:19:11 bergman /     3 DAILYS-7       23  1/1 OK
2010-07-04 04:47:44 bergman /     3 DAILYS-9        9  1/1 OK
2010-07-05 02:51:01 bergman /     0 DAILYS-10      21  1/1 OK
2010-07-06 02:51:34 bergman /     1 DAILYS-11      11  1/1 OK
2010-07-07 04:59:40 bergman /     1 DAILYS-12      12  1/1 OK
2010-07-08 02:59:32 bergman /     1 DAILYS-13      15  1/1 PARTIAL
2010-07-10 02:46:41 bergman /     1 DAILYS-15      20  1/1 OK
2010-07-11 05:25:49 bergman /     2 DAILYS-16       8  1/1 OK
2010-07-12 03:43:04 bergman /     0 DAILYS-17      16  1/1 OK
2010-07-13 04:25:23 bergman /     1 DAILYS-18      11  1/1 OK
2010-07-14 02:56:48 bergman /     1 DAILYS-19      15  1/1 OK
2010-07-15 02:54:01 bergman /     2 DAILYS-20      11  1/1 OK
2010-07-16 02:46:30 bergman /     2 DAILYS-1       11  1/1 OK
2010-07-17 05:05:08 bergman /     3 DAILYS-2       12  1/1 OK
2010-07-18 03:00:43 bergman /     3 DAILYS-3        9  1/1 OK
2010-07-19 05:11:36 bergman /     3 DAILYS-4        7  1/1 OK
2010-07-20 02:49:03 bergman /     4 DAILYS-5       24  1/1 OK

The backup level is indicated by the “lv” column. As you can see, I have 20 virtual tapes, and the last one to be used was DAILYS-5. The next one will be DAILYS-6 (its content will be erased and the tape will be reused). You may be asking yourself why are there two level 0 backups, and a lot of consecutive repeated backups with the same level.

Consider two rules of thumb to understand that:

1) A level 0 backup is needed to recover, so Amanda must make sure at least one level 0 backup exists at any time, given the number of tapes and their rotation. Also, bad stuff can happen, like a machine being down or unavailable at the time the backup runs. If that happens during a few Amanda runs, it may happen than you hit a number of rotations where you effectively loose the level 0 backup without creating a new one. That’s why Amanda makes a few level 0 backups among the way, to make sure you still have a second level 0 backup if the first one is destroyed. You can configure the maximum number of runs that go by without a level 0 backup being created. It’s highly recommended that that number is lower than the half of the total amount of tapes you have, so that, in the worst case (like in my example, where I have 20 tapes), you have a level 0 backup in the “middle” of your tape recycling circuit. Now, why did Amanda created two level 0 backups, one in July 5, and another one in July 12, effectively less than the maximum time allowed? See below.

2) Despite what seems intuitive, Amanda does not increase the backup level every run until it gets back to zero again. I don’t know in detail all the data the planner uses to make a decision, but there are at least two interesting considerations the planner takes into account.

The first one, is a level N+1 backup a lot smaller than a level N? If the answer is no, Amanda decides to keep the same level. There’s no point in increasing the level (and lowering reliability) if no significative amount of disk space will be saved. This may happen if you change approximately the same set of files in each run. In that case, the incremental backup from level N to N+1 would be of almost the same size as the N-1 to N.

Second, Amanda sometimes promotes level 0 backups ahead of schedule to spread them trough time. Doing all the level 0 backups in the same day would be very slow and might not fit in the maximum number of tapes allowed for a single run (you can define that value on the config file). There’s also another factor: free space on the tape. Remember that, for now, Amanda cannot use the same tape in two consecutive runs, so there’s no point in leaving unused space on a tape. If there’s enough space to perform a level 0 backup instead of a higher level, Amanda may decide to do it.

As you can see by now, data storage is handled quite differently in Amanda and Time Machine. I’ll assume you know how Time Machine works, so I’ll get straight to the point: in Amanda you may have more than one copy of the entire volumes you are backing up (ie, several level 0 backups of the same data), so you need to take that in consideration while planning storage space. On the other hand, amanda allows gzip compression (and encryption) of backup data, so it’s not completely obvious how much more (or less) space you need for Amanda backups compared to Time Machine. In some situations, if your data is highly compressible, you may even end up with two level 0 backups taking less space than a single Time Machine backup, although the opposite will happen most of the time. What’s cool is that if you use compression, Amanda learns with time how compressible your data is, and adjusts it’s planning according to that.

Another interesting note is Amanda using standard file formats for storing backups (tar and optionally gzip). This allows recovering of data even in machines where Amanda is not present. If you navigate into a virtual tape directory on your file system and run the “head” command in one of the stored files, you’ll see something like this:

AMANDA: FILE 20100715010001 serpa /Library  lev 2 comp .gz
  program /usr/bin/gnutar
DLE=<<ENDDLE
<dle>
  <program>GNUTAR</program>
  <disk>/Library</disk>
  <level>2</level>
  <auth>ssh</auth>
  <compress>FAST</compress>
  <record>YES</record>
  <index>YES</index>
  <exclude>
    <list>.amanda-exclude.list</list>
    <optional>YES</optional>
  </exclude>
</dle>
ENDDLE
To restore, position tape at start of file and run:
	dd if=<tape> bs=32k skip=1 | /usr/bin/gzip -dc |
          /usr/bin/gnutar -xpGf - ...

The first part is metadata used by Amanda. After the metadata ends, there are two lines used to tell you how to restore the data stored on that file with standard UNIX tools. This means you don’t need to worry about being able to recover your backups if Amanda development happens to stop for some reason. As long as you have an UNIX machine, you’ll be able to restore your data.

Security

There are two points I want to mention about security: transport and storage encryption, and system architecture. I’ll assume we’re always talking about having a backup server and several clients. If the problem you’re trying to solve is so simple that it can be fixed with an USB disk and Time Machine, you’re just wasting your time reading this. :)

Amanda supports encryption both during the data transport and on data storage. Transport security is guaranteed by using ssh with public/private key pairs. It also supports data encryption on the data storage by using symmetric private-key based encryption of public/private key pairs. Encrypting your backups is important, specially if you store them in an offsite location (either via network transfer to a remote data center, or by physically storing hard drives or real tapes in a safe). In the event data ends up in the wrong hands, encryption will rend it useless for the bad guys. Things are substantially different in Time Machine. Data storage encryption is simply non-existing (unless you use a lower level encryption method, like PGP). Transport encryption may be provided by the AFP protocol, used by Time Machine to connect to the backup server, depending on your server configuration.

More interesting, in my opinion, is the implicit security resulting from the client/server architecture used by Amanda. A backup server should be one of the most safer and well guarded machines you have in your network. All your data will be there. If the backup server gets compromised, it means all the data from all the backed-up machines might now be in the wrong hands. You would want your backup server to run as few services as possible, and to allow access to the server (like ssh) only by trusted admins from controlled networks.

In Amanda, this is possible. As I described before, when a backup operation starts, the Amanda server will contact its clients using the method you chose in the configuration (ssh in my case). So the connection is made from the server to the client, and not the opposite. This means your user’s machines (which will naturally be less secure than your server because those pesky users will run all the crap they get from the internets!) will never access the backup server, and though compromise it’s security. Time Machine works in the opposite direction. There’s no concept of a “backup server”. The clients run the show, and the server is simply a file server where backups are stored. This exposes the user backups to whatever malware and trojan horses they may have running on their Macs. If the server is misconfigured, or if some hacker exploits an unknown vulnerability in the AFP protocol, other users backups may be compromised as well.

Conclusions

Amanda may be a great option to backup always-on Macs, like xServes or desktop machines (along with Linux or FreeBSD servers, of course). It offers a vey fast, reliable and secure infra-structure upon you may build your backup system. However, it lacks the user interface and simplicity of Time Machine (most configurations require sysadmin intervention for recovering data from a backup). Also, Time Machine may be more appropriate if you rely heavily on laptops that will not be available on the network on a predictable schedule.

Amanda pros:

  • Works on any UNIX system (and Windows clients) which may help when planning a multi-OS backup scenario.
  • Offers very good control of used disk space.
  • Allows data encryption and compression.
  • Secure client/server architecture.
  • May be used on real tapes, not just hard drive-based backups.

Amanda cons:

  • Not appropriate for laptops that may be out of network reach during backup operations.
  • Step learning curve, a little hard for beginners to configure and get everything running smoothly.
  • Unless your users are computer experts and have access to the backup server, it requires sysadmin intervention for recovering data.
  • Recovering a single file may take some time, because the entire tar archive has to be read until the file is found.
  • Backups are usually less frequent than Time Machine.

Time Machine pros:

  • Very simple setup.
  • Non-expert users may recover lost files and easily browse filesystem history using the stunning user interface, without requiring sysadmin intervention.
  • Works fine with laptops with intermittent network connections.
  • Very well integrated with Mac OS X, makes backing up and recovering very easy and part of normal usage and new machine installation workflow.

Time Machine cons:

  • Works only on Mac OS X, and requires a Mac OS X Server as backup server (unless you go with unsupported devices and face possible consequences).
  • Offers no data storage security.
  • Very hard/impossible to control data storage strategy and used space.
  • Requires clients to access server, which decreases security.
  • Works only with disk-based backups.

There’s no clear winner here, it highly depends on your needs and restrictions. I hope this article gave you a general idea of what Amanda is, how does it work, and it’s advantages and weaknesses. In the next article, I’ll describe how to install a Mac OS X amanda client, and how to recover from a catastrophic drive failure.

Apple 2009 wish list

Friday, January 2nd, 2009

It’s a brand new year. So here’s my wish list for Apple:

  • Please fix the wireless driver that causes my Mac to crash about 10% of the times I turn Airport off.
  • Please fix the trackpad driver, or whatever is causing the trackpad to behave strongly erratic during about 30 seconds after waking the Mac up.
  • Please fix the damn copy/paste bug that makes the paste command paste the previously copied object and not the most recent one. This is specially irritating when you cut a piece of text, paste and you realize you are pasting something else, and that your supposedly cut piece of text is lost forever, unless you can undo and get it back.
  • Please fix the irritating bug that causes an iChat window to keep being the active one even after I click Safari, making its window go in front of iChat’s. That’s specially annoying when I type apple-W to close the Safari window, and the ichat one goes away.
  • Please provide replacement keyboards for people who has pre-unibody MacBook Pros that, you know, actually sense a keystroke every time the key goes all the way down, without the need to almost punch the key.
  • Please fix whatever is causing my father’s MacBook Pro to keep waking up and going back to sleep when the lid is closed and the charger on, despite I had already turned off every god damn thing that could wake it up, including the lid open event.
  • Speaking about the charger, please provide chargers where the charge light doesn’t go off for some unknown reason. It still works, but it doesn’t inspire a lot of confidence in it ans it’s safety.
  • Please provide granular updates to Mac OS X Server. Please please please pretty please.
  • Please care a little more about the entreprise and IT markets, namely your own web application technology (WebObjects, of course).

Thank you, guys! You must hate me but you’re nice people anyway. Sometimes.

Granular updates, please!

Friday, September 26th, 2008

Today was update day for our Mac OS X Server machine. The story is quick: installed Mac OS X Server 10.5.5 update, fixed WebObjects to go back to 5.3.3, rebooted. Surprise: my customized version of PHP was replaced by a new version! Why was it a surprise? Because neither the Mac OS X Server 10.5.5 Update notes, nor the Mac OS X (Client) 10.5.5 Update notes, nor even the security notes about the 10.5.5 updates mention PHP anywhere.

Apple, please, can we have granular updates for Mac OS X Server, like we do on basically any other decent server OS? Can we, system administrators, have the privilege to make decisions on what and when we want to update? Why don’t you finally understand that the needs of the IT people are a little bit different than the average consumer? Because, you know, the feeling I get is that you are completely clueless about the IT needs and the way things work. Really.

I like FreeBSD more and more…

Upgrading Mac OS X Leopard Server

Friday, July 11th, 2008

I was expecting that. Fortunately, I was ready for it. So, I’ll describe some details of upgrading Leopard Server from 10.5.2 to 10.5.4 to avoid you the same trouble.

If you read my previous article about Leopard Server, you know that I had to recompile MySQL, PHP and Apache to PowerPC 32 bits, because I could not use the 64 bits version on our G5 server. The story was a bit complicated, but the idea was that I needed PostgreSQL and MySQL support on PHP. As the MySQL default installation on OS X Server doesn’t include the shared libraries, I had to recompile MySQL to be able to recompile PHP. But I could not make MySQL compile for 64 bits, so I compiled everything (including Apache) to 32 bits.

So, I was expecting it: as soon as I would upgrade my server, the installers would replace Apache with a more recent version, and it would not start because PHP and MySQL stuff was compiled for 32 bits. And it happened.

How to fix it? Simple. You have to do two things: first, remove the PowerPC 64 part from the httpd binary. Second, recompile your customized version of PHP, because PHP gets replaced too.

To remove the PowerPC 64 part of the binary, you don’t need to recompile Apache as I did before. There are two very useful commands you can use: file and lipo. If you do the following commands:


cd /usr/sbin
file httpd

The result will be:


httpd: Mach-O universal binary with 4 architectures
httpd (for architecture ppc7400): Mach-O executable ppc
httpd (for architecture ppc64): Mach-O 64-bit executable ppc64
httpd (for architecture i386): Mach-O executable i386
httpd (for architecture x86_64): Mach-O 64-bit executable x86_64

As you can see, Apple ships httpd compiled for their four possible architectures. In my case, I want to remove the ppc64 one to force httpd being executed in 32 bits. That’s where lipo comes in hand:


lipo -remove ppc64 -output httpd32 httpd
mv httpd httpd64
mv httpd32 httpd

That’s it. The ppc64 part of the binary has been removed, and a new binary was created. I then just switch the binaries, and there you go, Apache running on 32 bits mode on PowerPC:


httpd32: Mach-O universal binary with 3 architectures
httpd32 (for architecture ppc7400): Mach-O executable ppc
httpd32 (for architecture i386): Mach-O executable i386
httpd32 (for architecture x86_64): Mach-O 64-bit executable x86_64

Then I just needed to recompile PHP and everything was back on track.

Also, be careful because the 10.5.4 update will upgrade WebObjects to 5.4.2. If you are still using WO 5.3 like me, you’ll have to reinstall 5.3 again, as described in the Mike’s document.

Migrating to Leopard Server

Sunday, March 23rd, 2008

This was it. I spent the easter weekend migrating GAEL‘s Xserve to Leopard Server. It all went well, although some more or less serious issues poped up.

Our server is used mainly for hosting web content and applications (php, perl, and of course, WebObjects). It also handles some Subversion repositories and some other minor stuff.

I did the standard procedure I always do when migrating a machine to a new OS version: clone the hard drive to an external firewire disk, format the internal hard drive, install new OS and migrate data. I usually use migration assistant, but of course, this is a server, it’s a little more complicated than that.

While my memory is still fresh, here’s some notes about it, not necessarily by any specific order.

RAID Formatting

Our Xserve has two 80 GB hard drives configured in software RAID 1 (mirroring). As Apple sometimes does some tweaks and changes on the RAID software and drivers, I decided to destroy the RAID and create a new one. So I did: I booted from the Leopard Server DVD, destroyed the RAID and tried to make a new one. I had some problems with that, though. Disk Utility was not allowing me to create the new RAID. I don’t recall the error exactly, but it had something to do with not being able to mount a volume or RAID slice. I quit Disk Utility and launched it again, and then the RAID creation went fine. I just love to see those two blue leds blinking in sync!

Installation and System Updates

Installation itself went without any issues. Our Xserve has a graphics card, so it was like any regular desktop Mac, click click choose click and wait. The system installed correctly, rebooted, configuration assistants, answered all the questions, network working, etc. Perfect. Then, I went to grab all the system updates. I installed it this weekend, so I had a few updates waiting, namely the 10.5.2 combo update. Then something weird happened – after installing all the available updates, the machine rebooted 3 times in a row instead of just one. I know some updates that came out lately require 2 reboots in a row, but I never had seen 3. When the server finally came to life, I manually rebooted it again 2 or 3 times more, just to see if it was booting OK. Apparently, everything is fine. I checked the logs, and they were inconclusive. So, does anyone know if 3 reboots in a row is normal for all the updates that came out so far for a G5 Xserve running Leopard?

SSH

This is a fast one, but… sshd comes with PermitRootLogin defaulting to “yes”. Oh come on, guys!

User Migration

This is one of the most serious issues that I find with Mac OS X Server migration. I had seen this when migrating from Panther Server to Tiger Server, and it’s still a problem. The thing is: you cannot migrate passwords. You can use Workgroup Manager to export all the user information… except passwords. That means all the user passwords will have to be reset on the new server. Of course, I don’t expect the real passwords to be exported – specially because they are hashed, so it’s impossible to recover them. But the hash itself could be exported and imported again.

This presents a very serious issue to system administrators and users. Of course, if you have thousands of users, you should use multiple LDAP servers dedicated to the authentication services, and you can clone them at will, making sure that you never loose information and the service never stops. But when you have about 30 users like we do, that is overkill. Even so, it’s a real pain in the ass to reset all those passwords, because some users are actually not in our office. They are external users, either from the other university campus (although that’s not too bad, I actually live closer to that campus that the one I work in, so I can drive by and take care of that stuff), or, worse, from people in some companies that are working remotely with us.

I believe migrations like this should be transparent to the user, and this little detail make them very very opaque.

64 bits hell

Having a full 64 bits OS running on a 64 bits machine can only be a good thing, right? Well… maybe not.

I’m a little crazy and my organizational skills might be very well defined by the word “chaos”, but I’m not crazy enough to do this in the space of two days without having tested all this stuff first and document the important details. So, before trashing our G5 Xserve, I grabbed an old PowerMac G4, installed Leopard Server and all the stuff that really needs to work. The most experienced of you should be smiling by now. Although it seems that the only important difference between both CPUs for the matters we are discussing is just speed, there’s a really important one: 32 bits VS 64 bits. The G5 is a full 64 bits CPU, and the G4 is 32 bits. Up to Tiger, this is not a problem at all, because most of the OS was also running in 32 bits. This included most services, like DBs and Apache. On Leopard, everything (or close to that) is compiled to four different architectures: PowerPC 32 and 64 bits, and Intel 32 and 64 bits. We’ll come back to this in a minute.

Mac OS X Server is bundled with MySQL, PHP and Apache, but not with PostgreSQL. As I prefer PostgreSQL to MySQL by far, I tend to use PostgreSQL with all the applications I can, including my own WebObjects applications. So, I compiled and installed PostgreSQL on the server. As I also need PHP applications to access PostgreSQL databases, I had to download PHP source code and recompile it with PostgreSQL support (you gotta love a language where you have to recompile the whole damn thing to add support to a DB…). But, to compile PHP with support to MySQL (and PostgreSQL) I need to have the MySQL headers and dynamic libraries. Well, Mac OS X Server is bundled with MySQL binaries, but not the headers or libraries. As there were no binaries available for PowerPC 10.5 on the MySQL page, I also had to grab the source and recompile all this stuff.

This is where problems started. I recompiled MySQL, and put it working after some struggle (I really hate MySQL). Then I recompiled PHP. Installed it, added the LoadModule directive to the apache config file, and restarted apache. Bum. Explosions. Apache would not start. It said that the PHP module was compiled for the wrong architecture. I started to thing, WTF, are you telling me that my Xserve just compiled PHP… for Intel? Why did this work on the test G4 box? Well, what other architecture could it be? :P I started googling for the problem and I got it: apache is compiled for all the four architectures I referred above, and it always runs with the most appropriate one for the machine. In the Xserve case, it uses the PowerPC 64 binaries. The problem is that PHP had been compiled for 32 bits only. Ok, no problem. Go to PHP dir, make clean, poke around with the environment variables, recompile the thing for 64 bits. Bum. More explosions. Guess what, MySQL was NOT compiled for 64 bits! Ok ok, one more level deep in the stack, go to MySQL directory, blablabla, recompile and… BUM! Yet another explosion. Now this one was more complicated. Apparently some of the libraries on the MySQL source code package were not being compiled for 64 bits. So, no 64 bits MySQL means no 64 bits PHP that means no runnable PHP with 64 bits apache that means falling back to Apple’s branded PHP that means… no PostgreSQL.

From what I saw on the Net, convincing MySQL to compile on 64 bits was not a road I wanted to go into. Also, one of the pages I found about the “wrong architecture” problem when starting Apache actually suggested to go in the opposite direction: grab Apache source code and recompile it in 32 bits. Using the mention configure command (./configure –enable-layout=Darwin –enable-mods-shared=all) I compiled the exactly same Apache version that Apple bundles with Leopard Server, and installed it over the Apple branded one. That made it all work, now on 32 bits. Of course, if you follow this trick, please keep in mind that this may break in future system updates. If some Apple system update replaces apache, it will not start unless you recompile it again for 32 bits only, or remove the PHP module.

This 64 bits mess is actually a very nasty problem, and makes me think what I’m actually gaining in all this. And the answer is: zero. My server has one GB of RAM, and will probably never have more than 4. If, for some reason, we actually need to boost the memory so far, it certainly won’t be because of Apache. It gets me thinking about actually how many people will actually need apache to run in 64 bits mode. If it’s more that 1% or 2% of the Xserve users, I’ll be very amazed. And what do I loose? A lot. Not all the open source projects compile easily in 64 bits mode (I know MySQL that comes with Mac OS X Server is compiled for 64 bits, but for some reason the needed fixes for that are not in the public MySQL source code tree), Apache may stop working at all in the next system update, and I had a lot of extra work. Maybe Apple should provide an easy way to switch this kind of stuff between 32 and 64 bits mode at will. Having only one OS version to all the architectures is interesting, but solving the problems that it creates is not.

Wrapping up

Everything is working now, after an entire weekend spent behind many terminal windows. Unfortunately, I have to say that my opinion about Mac OS X Server is not the best one. I have been working lately with FreeBSD. My experience with FreeBSD is way, way less than the experience I have with Mac OS X, so there are probably many downsides in FreeBSD I had not yet to deal with. That being said, I think Mac OS X Server is a very easy to use OS, as long as you keep using the tools Apple provided. As soon as you need different tools, specially the ones that tinker with Apache, you’ll start regretting liking computers in the first place. And surprisingly, you start to find that it’s actually easier to do it in a FreeBSD server. Every software I installed so far in FreeBSD (including WebObjects) was installed in a very easy and straightforward, painless way. Just browse the ports tree, make install clean and there it is. No crazy problems, everything is made to work with everything. And the default configurations are usually safer than Apple’s.

It makes sense: although FreeBSD guys don’t do beautiful GUIs and assistants, they work hard to make sure the system Works. All of it, including all the ports. And most important, not just it works, but it works together. If I had to use a word to define FreeBSD, I would pick “consistency” without hesitation. Even WebObjects, which does not have an “official” port on the FreeBSD port tree actually installs easier in FreeBSD than in OS X (due to the hard work of Quinton Dolan that created a FreeBSD port of WO). And face it: probably all the software you need exist in the port tree. It’s HUGE. And if it doesn’t, you can always install it using the classic UNIX way.

The Apple way is different. Apple picks a very small range of software, compiles and packages it in a very easy to use OS. It’s really easy, way more than FreeBSD in many ways. The problems appear when you conclude that the bundled software is not enough, and you want to install your own. And when that happens, you are completely on your own. You’ll start fighting Apple sometimes weird configurations and file system structure, you may run in binary architecture incompatibilities like I did, and so on. And you’ll probably need to do this, because what comes bundled with OS X Server is probably far from enough to get the job done.

Testing memory

Sunday, August 12th, 2007

I wrote some days ago about badblocks for testing a hard drive surface. Now, the same for memory.

As I said, I bought a second-hand PowerMac G5 to replace my old G4. When I got the new machine, I run Apple Hardware Test (AHT), using the Extended Test. AHT tested my hardware, including the 2.5 GB of RAM, taking more than two hours (and making a hell of a noise, because during tests, the G5 ventilation system works in failsafe mode, which means, full power). Everything seemed to be fine. Until I installed Retrospect. I use Retrospect to make all my backups at home, and despite all it’s quirks, it always worked fine on the G4. Since I installed it on the G5, I got strange errors (the famous “internal consistentcy check”) and even crashes.

After nailing down all the possibilities (trashing preferences and existing backup sets, reinstalling Retrospect, etc) I suspected it could be an hardware problem, because I was told that the “internal consistentcy check” appears when the backup set contents are corrupted. So, I thought, my hard drive is corrupting data. I duplicated one backup set with about 80 GB, and surprise – after duplicating and running an md5 checksum on it (and diff), the files were different! This was NOT supposed to happen, naturally. So I tried the same thing on my boot drive – same problem. Ops… it’s not the drives. So, if it’s not the drives, and supposing (more exactly, praying) that it was not a motherboard issue, it must be the memory.

All my collegues at IST System Adminstration team use memtest on PCs to test the memory. This great distribution of memtest has a really nice touch: you can burn this on a CD, and boot the PC from it. It takes less that 200K of RAM, so all the other memory will be tested. Unfortunately, it’s not possible to boot it in Macs (not even Intel Macs – I tried it!). So, you must get the Mac OS X version of memtest, and boot the OS in Single User mode (using command-S during the boot sequence). The OS will take about 50 MB of RAM, which is, of course, much worse than the 200K used by the PC version, because those 50 MB will simply not be tested. But it’s better than nothing.

The official site for the Mac OS X version of memtest is here, but unfortunately, the author requires you to pay a small ammount for the download. I don’t like the approach very much because I don’t really know what I’m buying. the author says that, after paying, he sends a password for the encrypted DMG you downloaded. But I cannot download without paying, because the link is no-where. So… what happens when a new version comes out? Do I have to pay it again? Well, anyway, someone else is distributing memtest for OS X for free. Yes, it’s legal, because the software is under GNU license. So, if you don’t want to pay, just click here and grap your own free copy. Happy testing!

By the way, some tests take a lot of time. Let all of them run. Don’t assume the fact that all the “quick” tests passed means your memory is OK. Some problems may only be found with the more complex and slower tests – that’s why they are there. So, let it run. And if you have a G5, get the hell out of there, or use ear-plugs. It won’t be a nice office to work during testing, trust me.

memtest will detetct lots of common problems in memories, and will probably identify more than 99% of the defective memory modules arround. But never forget: it’s impossible to be entirely sure that a memory module is OK, simply because it’s not possible, in a reasonable time frame, to test all the possible combinations of data. Also, memory may pass all the tests in a day, and fail the next day. There are many factors that may trigger a hidden problem in memory modules: temperature, electrical flutuations, the data it contains, age, etc. If you suspect you have a bad memory module, and if you have time, run memtest for several days in a row, using the option to do many passes.

Backups, rsync, and –link-dest not working

Sunday, May 20th, 2007

I use Retrospect to backup most of the machines at GAEL. You may wonder why do I use a commercial tool that still shows it’s OS 9 roots, instead of open source alternatives. Well, Retrospect has some cool advantages (namely the very good support of laptops that may be disconnected abruptely from the network while a backup is in progress). Also, when I first did this setup, Amanda and other tools did not work reliably with Mac OS X file format.

While this works to backup all the desktop workstations and laptops of GAEL members, I have a problem with our xServe. It runs Mac OS X Server, and Retrospect will not backup machines with the Server version of Mac OS X with the license we have. To do that, we would have to buy a much more expensive license.

No problem. A server, due to it’s nature, doesn’t have the “sudden disappearing” problem of the laptops, so I can use a “classic” UNIX approach – and my choice was rsync and the –link-dest option. You may read about this option in the rsync manpage, but in case you don’t know, what it does is the following: instead of synchronizing a directory in the usual way, it will create a new directory with a new file tree. But, to save space, it won’t copy the non-updated files from the old tree to the new one. Instead, it creates hard links, so that both entries in the file system point to the same data on the hard drive (to the same inode), thus saving space. So, everytime you update your backup, you will create a new tree, but you will only waste the space required by the files that were updated since the last backup, and some more space for the filesystem structures that support the directory tree. You can use a command like this:


// rotate old dirs
rm -rf /Volumes/Storage/test/test.5
mv /Volumes/Storage/test/test.4 /Volumes/Storage/test/test.5
mv /Volumes/Storage/test/test.3 /Volumes/Storage/test/test.4
mv /Volumes/Storage/test/test.2 /Volumes/Storage/test/test.3
mv /Volumes/Storage/test/test.1 /Volumes/Storage/test/test.2
mv /Volumes/Storage/test/test.0 /Volumes/Storage/test/test.1

/usr/bin/rsync --rsync-path=/usr/bin/rsync -az -E -e ssh --exclude=/dev/\* --exclude=/private/tmp/\* --exclude=/Network/\* --exclude=/Volumes/\* --exclude=/private/var/run/\* --exclude=/afs/\* --exclude=/automount/\* --exclude=/.Spotlight-V100/\* --link-dest="/Volumes/Storage/test/test.1" "root@my.machine.com:/Users/arroz/TestDirectory" "/Volumes/Storage/test/test.0/"

Side note: the -E option (capital E) is an option present on Mac OS X rsync version, that forces rsync to copy all the extended Mac file system attributes, including resource forks. It only exists in Mac OS X 10.4 (Tiger) or newer versions. If you are still using 10.3 (Panther) or older, use rsyncx. Do not use rsyncx with Tiger.

Until about a week ago, my backup machine (an old PowerMac G4) had an external SCSI Raid with 640 GB, and an internal RAID 0 (2 * 80 GB drives), besides the boot disk. All the Retrospect backups were being placed on the external RAID, and the server backups were going to the internal RAID 0. Now, I know it’s living on the edge to backup to a RAID 0. But there was really no more space, and it was a temporary situation, because the new drives for the external RAID were already ordered.

When the new drives arrived, I stored all the backups where I could for some days (640 GB was huge when we purchased the RAID, but today is relatively managable), switched the drives and created a new fresh RAID 5. Formatted it in the HFS+ file system, and copied back all the backups, including the server backups and finally trashed the internal RAID 0.

Some path adjustements on my server backup scripts, and we are back in business. But the RAID free space was getting dramatically shorter every day. I used the ‘ls -i’ command to compare the inodes of files that were supposed to be unchanged from the backup of a day to the other in the next day, and as I suspected, rsync was duplicating all the files, instead of hard-linking them.

After Googling a lot, I could not find answers for this. I tried to see if ‘cp -la’ would successfully create hard links, but to my surprise, I found out that the Mac OS X built-in ‘cp’ command would not support the “l” option. Nice. Before installing the GNU ‘cp’ version (and because I’m lazy and I didn’t want to do that) I started thinking about everything I had done since the new drives arrived. The OS was the same, the rsync command was the same, it worked before, so it had to work now. The only reason why it could be not working was because rsync, somehow, thought that all the files were changing, even when they did not.

Suddenly, the solution poped up in my head. Mac OS X has an option, associated to every HFS+ volume, called “Ignore ownership on this volume”. This is turned off by default on the boot drive, but it’s turned on by default on all the external drives you format. There’s a good reason for this: Mac OS X is a consumer product. And average users want to buy an external drive, store data on it, bring it to another Mac, and read their data. They don’t care if their UID is the same on both machines or not.

But this causes serious problems to rsync. Althought the file system will store the owner of the files, it probably won’t report it to the applications who try to read it (or will mask them to the user who’s trying to access them). Somewhere this information is filtered, between the file system and application layers. So, rsync was not getting the real UID of the files. As the files that came from the server had real UIDs, both UIDs wouldn’t match, and rsync would create a new copy because, from it’s point of view, the file had been changed.

The solution was simple – just going to the machine console, and “Get Info” of the external volume. I turned off the “Ignore ownership on this volume” setting, and rsync started operating normally again.