Amanda on Mac OS X - an Indepth Review

Given that Retrospect 8 is essentially a piece of crap, I’ve been searching for an alternative I can use when Time Machine is not an option for backing up Macs. The main two points I’m focused on is reliability and speed. I want a backup system I can trust that won’t take the age of the universe to recover a file.

I’ve been using Amanda for a while now to backup all the Macs in our workgroup (10 machines) and so far I’m nothing but happy. Amanda is an open source backup system for UNIX-based operating systems, Mac OS X included (I believe it can also backup Windows clients, but I couldn’t care less).

Amanda was originally designed to backup to tapes. Today, since hard drives became cheap and are a great media for backup, Amanda also supports virtual tapes. A virtual tape is a directory on disk that essentially acts as a tape, storing raw information. While configuring an Amanda server, you create a set of tapes of an arbitrary size (100 GB, for instance) and Amanda will use at least one tape per day, eventually rotating trough all of them.

I’ll now lay down some considerations about how Amanda works and how does that reflect on usage and backup planning. Also, I’ll do some comparisons with Time Machine.

Scheduling

In amanda, you define one or more configuration files. Each configuration can have a set of hosts (a backed-up machine), and for each host a set of disks (a disk is any directory you want to backup, it’s not needed to match Amanda disks with physical disks or logical HFS+ volumes). You generally run a backup operation that will backup all the disks of all the hosts on that configuration file.

Launching a backup operation is done by simply executing an UNIX command from whatever mechanism you prefer (cron, launchd, manually, etc). This means you can define how often does your backup run, and at what time. Usually, a daily backup is performed, but you may want to backup every 6 hours or only once per week, depending on how often does your data change and how bad is it to loose a few hours of work. This means that, in Amanda, the server is proactive and initiates the backup procedure whenever you set it to. In Time Machine, backup operations are initiated by the clients every hour. The server is simply a dumb file server where backups are stored.

Storage

Amanda uses the tar format to store data on a virtual tape. For each backed-up disk Amanda creates a tar file inside the tape with data. Amanda stores data in incremental fashion, using some redundancy as well. This is implemented trough the concept of backup levels. The base backup (where all the files on a disk are backed-up) is level 0. From there, backup levels are incremented, and each level essentially means it contains the incremental changes relative to an archive in the immediate lower level. This means a level 3 backup contains the changes relative to a level 2 backup. That level 2 backup contains the changes relative to a level 1 backup which contains, as expected, the changes relative to the level 0 backup.

Amanda decides on the level is should operate on based on a complex planner that considers several factors that result in a decision. You may add some configuration options to customize the weight of some of those factors in the final planner decision. Essentially, Amanda tries to obtain the right balance between reliability (which, in this context, means the probability of recovering a backup successfully) and used disk space. Reliability decreases as the backup level increases, due to the fact that, if you want to recover a level 5 backup, you need to have the consistent level 0, 1, 2, 3, 4 and obviously 5 backups, because each of those build upon the previous one. A level 0 backup is more reliable in the sense that you only need the level 0 backup itself to recover. Of course, a level 0 backup is an “everything” backup. If you do a level 0 backup every day, you’ll use a lot of disk space.

This is an example of a production Amanda installation for a host called bergman and volume / (the root):

date                host    disk lv tape or file file part status
2010-07-01 06:05:24 bergman /     2 DAILYS-6       15  1/1 OK
2010-07-02 04:19:11 bergman /     3 DAILYS-7       23  1/1 OK
2010-07-04 04:47:44 bergman /     3 DAILYS-9        9  1/1 OK
2010-07-05 02:51:01 bergman /     0 DAILYS-10      21  1/1 OK
2010-07-06 02:51:34 bergman /     1 DAILYS-11      11  1/1 OK
2010-07-07 04:59:40 bergman /     1 DAILYS-12      12  1/1 OK
2010-07-08 02:59:32 bergman /     1 DAILYS-13      15  1/1 PARTIAL
2010-07-10 02:46:41 bergman /     1 DAILYS-15      20  1/1 OK
2010-07-11 05:25:49 bergman /     2 DAILYS-16       8  1/1 OK
2010-07-12 03:43:04 bergman /     0 DAILYS-17      16  1/1 OK
2010-07-13 04:25:23 bergman /     1 DAILYS-18      11  1/1 OK
2010-07-14 02:56:48 bergman /     1 DAILYS-19      15  1/1 OK
2010-07-15 02:54:01 bergman /     2 DAILYS-20      11  1/1 OK
2010-07-16 02:46:30 bergman /     2 DAILYS-1       11  1/1 OK
2010-07-17 05:05:08 bergman /     3 DAILYS-2       12  1/1 OK
2010-07-18 03:00:43 bergman /     3 DAILYS-3        9  1/1 OK
2010-07-19 05:11:36 bergman /     3 DAILYS-4        7  1/1 OK
2010-07-20 02:49:03 bergman /     4 DAILYS-5       24  1/1 OK

The backup level is indicated by the “lv” column. As you can see, I have 20 virtual tapes, and the last one to be used was DAILYS-5. The next one will be DAILYS-6 (its content will be erased and the tape will be reused). You may be asking yourself why are there two level 0 backups, and a lot of consecutive repeated backups with the same level.

Consider two rules of thumb to understand that:

1) A level 0 backup is needed to recover, so Amanda must make sure at least one level 0 backup exists at any time, given the number of tapes and their rotation. Also, bad stuff can happen, like a machine being down or unavailable at the time the backup runs. If that happens during a few Amanda runs, it may happen than you hit a number of rotations where you effectively loose the level 0 backup without creating a new one. That’s why Amanda makes a few level 0 backups among the way, to make sure you still have a second level 0 backup if the first one is destroyed. You can configure the maximum number of runs that go by without a level 0 backup being created. It’s highly recommended that that number is lower than the half of the total amount of tapes you have, so that, in the worst case (like in my example, where I have 20 tapes), you have a level 0 backup in the “middle” of your tape recycling circuit. Now, why did Amanda created two level 0 backups, one in July 5, and another one in July 12, effectively less than the maximum time allowed? See below.

2) Despite what seems intuitive, Amanda does not increase the backup level every run until it gets back to zero again. I don’t know in detail all the data the planner uses to make a decision, but there are at least two interesting considerations the planner takes into account.

The first one, is a level N+1 backup a lot smaller than a level N? If the answer is no, Amanda decides to keep the same level. There’s no point in increasing the level (and lowering reliability) if no significative amount of disk space will be saved. This may happen if you change approximately the same set of files in each run. In that case, the incremental backup from level N to N+1 would be of almost the same size as the N-1 to N.

Second, Amanda sometimes promotes level 0 backups ahead of schedule to spread them trough time. Doing all the level 0 backups in the same day would be very slow and might not fit in the maximum number of tapes allowed for a single run (you can define that value on the config file). There’s also another factor: free space on the tape. Remember that, for now, Amanda cannot use the same tape in two consecutive runs, so there’s no point in leaving unused space on a tape. If there’s enough space to perform a level 0 backup instead of a higher level, Amanda may decide to do it.

As you can see by now, data storage is handled quite differently in Amanda and Time Machine. I’ll assume you know how Time Machine works, so I’ll get straight to the point: in Amanda you may have more than one copy of the entire volumes you are backing up (ie, several level 0 backups of the same data), so you need to take that in consideration while planning storage space. On the other hand, amanda allows gzip compression (and encryption) of backup data, so it’s not completely obvious how much more (or less) space you need for Amanda backups compared to Time Machine. In some situations, if your data is highly compressible, you may even end up with two level 0 backups taking less space than a single Time Machine backup, although the opposite will happen most of the time. What’s cool is that if you use compression, Amanda learns with time how compressible your data is, and adjusts it’s planning according to that.

Another interesting note is Amanda using standard file formats for storing backups (tar and optionally gzip). This allows recovering of data even in machines where Amanda is not present. If you navigate into a virtual tape directory on your file system and run the “head” command in one of the stored files, you’ll see something like this:

AMANDA: FILE 20100715010001 serpa /Library  lev 2 comp .gz
  program /usr/bin/gnutar
DLE=<<ENDDLE
<dle>
  <program>GNUTAR</program>
  <disk>/Library</disk>
  <level>2</level>
  <auth>ssh</auth>
  <compress>FAST</compress>
  <record>YES</record>
  <index>YES</index>
  <exclude>
    <list>.amanda-exclude.list</list>
    <optional>YES</optional>
  </exclude>
</dle>
ENDDLE
To restore, position tape at start of file and run:
        dd if=<tape> bs=32k skip=1 | /usr/bin/gzip -dc |
          /usr/bin/gnutar -xpGf - ...

The first part is metadata used by Amanda. After the metadata ends, there are two lines used to tell you how to restore the data stored on that file with standard UNIX tools. This means you don’t need to worry about being able to recover your backups if Amanda development happens to stop for some reason. As long as you have an UNIX machine, you’ll be able to restore your data.

Security

There are two points I want to mention about security: transport and storage encryption, and system architecture. I’ll assume we’re always talking about having a backup server and several clients. If the problem you’re trying to solve is so simple that it can be fixed with an USB disk and Time Machine, you’re just wasting your time reading this. 🙂

Amanda supports encryption both during the data transport and on data storage. Transport security is guaranteed by using ssh with public/private key pairs. It also supports data encryption on the data storage by using symmetric private-key based encryption of public/private key pairs. Encrypting your backups is important, specially if you store them in an offsite location (either via network transfer to a remote data center, or by physically storing hard drives or real tapes in a safe). In the event data ends up in the wrong hands, encryption will rend it useless for the bad guys. Things are substantially different in Time Machine. Data storage encryption is simply non-existing (unless you use a lower level encryption method, like PGP). Transport encryption may be provided by the AFP protocol, used by Time Machine to connect to the backup server, depending on your server configuration.

More interesting, in my opinion, is the implicit security resulting from the client/server architecture used by Amanda. A backup server should be one of the most safer and well guarded machines you have in your network. All your data will be there. If the backup server gets compromised, it means all the data from all the backed-up machines might now be in the wrong hands. You would want your backup server to run as few services as possible, and to allow access to the server (like ssh) only by trusted admins from controlled networks.

In Amanda, this is possible. As I described before, when a backup operation starts, the Amanda server will contact its clients using the method you chose in the configuration (ssh in my case). So the connection is made from the server to the client, and not the opposite. This means your user’s machines (which will naturally be less secure than your server because those pesky users will run all the crap they get from the internets!) will never access the backup server, and though compromise it’s security. Time Machine works in the opposite direction. There’s no concept of a “backup server”. The clients run the show, and the server is simply a file server where backups are stored. This exposes the user backups to whatever malware and trojan horses they may have running on their Macs. If the server is misconfigured, or if some hacker exploits an unknown vulnerability in the AFP protocol, other users backups may be compromised as well.

Conclusions

Amanda may be a great option to backup always-on Macs, like xServes or desktop machines (along with Linux or FreeBSD servers, of course). It offers a vey fast, reliable and secure infra-structure upon you may build your backup system. However, it lacks the user interface and simplicity of Time Machine (most configurations require sysadmin intervention for recovering data from a backup). Also, Time Machine may be more appropriate if you rely heavily on laptops that will not be available on the network on a predictable schedule.

Amanda pros:

Works on any UNIX system (and Windows clients) which may help when planning a multi-OS backup scenario.
Offers very good control of used disk space.
Allows data encryption and compression.
Secure client/server architecture.
May be used on real tapes, not just hard drive-based backups.

Amanda cons:

Not appropriate for laptops that may be out of network reach during backup operations.
Step learning curve, a little hard for beginners to configure and get everything running smoothly.
Unless your users are computer experts and have access to the backup server, it requires sysadmin intervention for recovering data.
Recovering a single file may take some time, because the entire tar archive has to be read until the file is found.
Backups are usually less frequent than Time Machine.

Time Machine pros:

Very simple setup.
Non-expert users may recover lost files and easily browse filesystem history using the stunning user interface, without requiring sysadmin intervention.
Works fine with laptops with intermittent network connections.
Very well integrated with Mac OS X, makes backing up and recovering very easy and part of normal usage and new machine installation workflow.

Time Machine cons:

Works only on Mac OS X, and requires a Mac OS X Server as backup server (unless you go with unsupported devices and face possible consequences).
Offers no data storage security.
Very hard/impossible to control data storage strategy and used space.
Requires clients to access server, which decreases security.
Works only with disk-based backups.

There’s no clear winner here, it highly depends on your needs and restrictions. I hope this article gave you a general idea of what Amanda is, how does it work, and it’s advantages and weaknesses. In the next article, I’ll describe how to install a Mac OS X amanda client, and how to recover from a catastrophic drive failure.