Archive for the ‘Mac OS X Server’ Category

Upgrading Mac OS X Leopard Server

Friday, July 11th, 2008

I was expecting that. Fortunately, I was ready for it. So, I’ll describe some details of upgrading Leopard Server from 10.5.2 to 10.5.4 to avoid you the same trouble.

If you read my previous article about Leopard Server, you know that I had to recompile MySQL, PHP and Apache to PowerPC 32 bits, because I could not use the 64 bits version on our G5 server. The story was a bit complicated, but the idea was that I needed PostgreSQL and MySQL support on PHP. As the MySQL default installation on OS X Server doesn’t include the shared libraries, I had to recompile MySQL to be able to recompile PHP. But I could not make MySQL compile for 64 bits, so I compiled everything (including Apache) to 32 bits.

So, I was expecting it: as soon as I would upgrade my server, the installers would replace Apache with a more recent version, and it would not start because PHP and MySQL stuff was compiled for 32 bits. And it happened.

How to fix it? Simple. You have to do two things: first, remove the PowerPC 64 part from the httpd binary. Second, recompile your customized version of PHP, because PHP gets replaced too.

To remove the PowerPC 64 part of the binary, you don’t need to recompile Apache as I did before. There are two very useful commands you can use: file and lipo. If you do the following commands:


cd /usr/sbin
file httpd

The result will be:


httpd: Mach-O universal binary with 4 architectures
httpd (for architecture ppc7400): Mach-O executable ppc
httpd (for architecture ppc64): Mach-O 64-bit executable ppc64
httpd (for architecture i386): Mach-O executable i386
httpd (for architecture x86_64): Mach-O 64-bit executable x86_64

As you can see, Apple ships httpd compiled for their four possible architectures. In my case, I want to remove the ppc64 one to force httpd being executed in 32 bits. That’s where lipo comes in hand:


lipo -remove ppc64 -output httpd32 httpd
mv httpd httpd64
mv httpd32 httpd

That’s it. The ppc64 part of the binary has been removed, and a new binary was created. I then just switch the binaries, and there you go, Apache running on 32 bits mode on PowerPC:


httpd32: Mach-O universal binary with 3 architectures
httpd32 (for architecture ppc7400): Mach-O executable ppc
httpd32 (for architecture i386): Mach-O executable i386
httpd32 (for architecture x86_64): Mach-O 64-bit executable x86_64

Then I just needed to recompile PHP and everything was back on track.

Also, be careful because the 10.5.4 update will upgrade WebObjects to 5.4.2. If you are still using WO 5.3 like me, you’ll have to reinstall 5.3 again, as described in the Mike’s document.

Migrating to Leopard Server

Sunday, March 23rd, 2008

This was it. I spent the easter weekend migrating GAEL’s Xserve to Leopard Server. It all went well, although some more or less serious issues poped up.

Our server is used mainly for hosting web content and applications (php, perl, and of course, WebObjects). It also handles some Subversion repositories and some other minor stuff.

I did the standard procedure I always do when migrating a machine to a new OS version: clone the hard drive to an external firewire disk, format the internal hard drive, install new OS and migrate data. I usually use migration assistant, but of course, this is a server, it’s a little more complicated than that.

While my memory is still fresh, here’s some notes about it, not necessarily by any specific order.

RAID Formatting

Our Xserve has two 80 GB hard drives configured in software RAID 1 (mirroring). As Apple sometimes does some tweaks and changes on the RAID software and drivers, I decided to destroy the RAID and create a new one. So I did: I booted from the Leopard Server DVD, destroyed the RAID and tried to make a new one. I had some problems with that, though. Disk Utility was not allowing me to create the new RAID. I don’t recall the error exactly, but it had something to do with not being able to mount a volume or RAID slice. I quit Disk Utility and launched it again, and then the RAID creation went fine. I just love to see those two blue leds blinking in sync!

Installation and System Updates

Installation itself went without any issues. Our Xserve has a graphics card, so it was like any regular desktop Mac, click click choose click and wait. The system installed correctly, rebooted, configuration assistants, answered all the questions, network working, etc. Perfect. Then, I went to grab all the system updates. I installed it this weekend, so I had a few updates waiting, namely the 10.5.2 combo update. Then something weird happened - after installing all the available updates, the machine rebooted 3 times in a row instead of just one. I know some updates that came out lately require 2 reboots in a row, but I never had seen 3. When the server finally came to life, I manually rebooted it again 2 or 3 times more, just to see if it was booting OK. Apparently, everything is fine. I checked the logs, and they were inconclusive. So, does anyone know if 3 reboots in a row is normal for all the updates that came out so far for a G5 Xserve running Leopard?

SSH

This is a fast one, but… sshd comes with PermitRootLogin defaulting to “yes”. Oh come on, guys!

User Migration

This is one of the most serious issues that I find with Mac OS X Server migration. I had seen this when migrating from Panther Server to Tiger Server, and it’s still a problem. The thing is: you cannot migrate passwords. You can use Workgroup Manager to export all the user information… except passwords. That means all the user passwords will have to be reset on the new server. Of course, I don’t expect the real passwords to be exported - specially because they are hashed, so it’s impossible to recover them. But the hash itself could be exported and imported again.

This presents a very serious issue to system administrators and users. Of course, if you have thousands of users, you should use multiple LDAP servers dedicated to the authentication services, and you can clone them at will, making sure that you never loose information and the service never stops. But when you have about 30 users like we do, that is overkill. Even so, it’s a real pain in the ass to reset all those passwords, because some users are actually not in our office. They are external users, either from the other university campus (although that’s not too bad, I actually live closer to that campus that the one I work in, so I can drive by and take care of that stuff), or, worse, from people in some companies that are working remotely with us.

I believe migrations like this should be transparent to the user, and this little detail make them very very opaque.

64 bits hell

Having a full 64 bits OS running on a 64 bits machine can only be a good thing, right? Well… maybe not.

I’m a little crazy and my organizational skills might be very well defined by the word “chaos”, but I’m not crazy enough to do this in the space of two days without having tested all this stuff first and document the important details. So, before trashing our G5 Xserve, I grabbed an old PowerMac G4, installed Leopard Server and all the stuff that really needs to work. The most experienced of you should be smiling by now. Although it seems that the only important difference between both CPUs for the matters we are discussing is just speed, there’s a really important one: 32 bits VS 64 bits. The G5 is a full 64 bits CPU, and the G4 is 32 bits. Up to Tiger, this is not a problem at all, because most of the OS was also running in 32 bits. This included most services, like DBs and Apache. On Leopard, everything (or close to that) is compiled to four different architectures: PowerPC 32 and 64 bits, and Intel 32 and 64 bits. We’ll come back to this in a minute.

Mac OS X Server is bundled with MySQL, PHP and Apache, but not with PostgreSQL. As I prefer PostgreSQL to MySQL by far, I tend to use PostgreSQL with all the applications I can, including my own WebObjects applications. So, I compiled and installed PostgreSQL on the server. As I also need PHP applications to access PostgreSQL databases, I had to download PHP source code and recompile it with PostgreSQL support (you gotta love a language where you have to recompile the whole damn thing to add support to a DB…). But, to compile PHP with support to MySQL (and PostgreSQL) I need to have the MySQL headers and dynamic libraries. Well, Mac OS X Server is bundled with MySQL binaries, but not the headers or libraries. As there were no binaries available for PowerPC 10.5 on the MySQL page, I also had to grab the source and recompile all this stuff.

This is where problems started. I recompiled MySQL, and put it working after some struggle (I really hate MySQL). Then I recompiled PHP. Installed it, added the LoadModule directive to the apache config file, and restarted apache. Bum. Explosions. Apache would not start. It said that the PHP module was compiled for the wrong architecture. I started to thing, WTF, are you telling me that my Xserve just compiled PHP… for Intel? Why did this work on the test G4 box? Well, what other architecture could it be? :P I started googling for the problem and I got it: apache is compiled for all the four architectures I referred above, and it always runs with the most appropriate one for the machine. In the Xserve case, it uses the PowerPC 64 binaries. The problem is that PHP had been compiled for 32 bits only. Ok, no problem. Go to PHP dir, make clean, poke around with the environment variables, recompile the thing for 64 bits. Bum. More explosions. Guess what, MySQL was NOT compiled for 64 bits! Ok ok, one more level deep in the stack, go to MySQL directory, blablabla, recompile and… BUM! Yet another explosion. Now this one was more complicated. Apparently some of the libraries on the MySQL source code package were not being compiled for 64 bits. So, no 64 bits MySQL means no 64 bits PHP that means no runnable PHP with 64 bits apache that means falling back to Apple’s branded PHP that means… no PostgreSQL.

From what I saw on the Net, convincing MySQL to compile on 64 bits was not a road I wanted to go into. Also, one of the pages I found about the “wrong architecture” problem when starting Apache actually suggested to go in the opposite direction: grab Apache source code and recompile it in 32 bits. Using the mention configure command (./configure –enable-layout=Darwin –enable-mods-shared=all) I compiled the exactly same Apache version that Apple bundles with Leopard Server, and installed it over the Apple branded one. That made it all work, now on 32 bits. Of course, if you follow this trick, please keep in mind that this may break in future system updates. If some Apple system update replaces apache, it will not start unless you recompile it again for 32 bits only, or remove the PHP module.

This 64 bits mess is actually a very nasty problem, and makes me think what I’m actually gaining in all this. And the answer is: zero. My server has one GB of RAM, and will probably never have more than 4. If, for some reason, we actually need to boost the memory so far, it certainly won’t be because of Apache. It gets me thinking about actually how many people will actually need apache to run in 64 bits mode. If it’s more that 1% or 2% of the Xserve users, I’ll be very amazed. And what do I loose? A lot. Not all the open source projects compile easily in 64 bits mode (I know MySQL that comes with Mac OS X Server is compiled for 64 bits, but for some reason the needed fixes for that are not in the public MySQL source code tree), Apache may stop working at all in the next system update, and I had a lot of extra work. Maybe Apple should provide an easy way to switch this kind of stuff between 32 and 64 bits mode at will. Having only one OS version to all the architectures is interesting, but solving the problems that it creates is not.

Wrapping up

Everything is working now, after an entire weekend spent behind many terminal windows. Unfortunately, I have to say that my opinion about Mac OS X Server is not the best one. I have been working lately with FreeBSD. My experience with FreeBSD is way, way less than the experience I have with Mac OS X, so there are probably many downsides in FreeBSD I had not yet to deal with. That being said, I think Mac OS X Server is a very easy to use OS, as long as you keep using the tools Apple provided. As soon as you need different tools, specially the ones that tinker with Apache, you’ll start regretting liking computers in the first place. And surprisingly, you start to find that it’s actually easier to do it in a FreeBSD server. Every software I installed so far in FreeBSD (including WebObjects) was installed in a very easy and straightforward, painless way. Just browse the ports tree, make install clean and there it is. No crazy problems, everything is made to work with everything. And the default configurations are usually safer than Apple’s.

It makes sense: although FreeBSD guys don’t do beautiful GUIs and assistants, they work hard to make sure the system Works. All of it, including all the ports. And most important, not just it works, but it works together. If I had to use a word to define FreeBSD, I would pick “consistency” without hesitation. Even WebObjects, which does not have an “official” port on the FreeBSD port tree actually installs easier in FreeBSD than in OS X (due to the hard work of Quinton Dolan that created a FreeBSD port of WO). And face it: probably all the software you need exist in the port tree. It’s HUGE. And if it doesn’t, you can always install it using the classic UNIX way.

The Apple way is different. Apple picks a very small range of software, compiles and packages it in a very easy to use OS. It’s really easy, way more than FreeBSD in many ways. The problems appear when you conclude that the bundled software is not enough, and you want to install your own. And when that happens, you are completely on your own. You’ll start fighting Apple sometimes weird configurations and file system structure, you may run in binary architecture incompatibilities like I did, and so on. And you’ll probably need to do this, because what comes bundled with OS X Server is probably far from enough to get the job done.

Testing memory

Sunday, August 12th, 2007

I wrote some days ago about badblocks for testing a hard drive surface. Now, the same for memory.

As I said, I bought a second-hand PowerMac G5 to replace my old G4. When I got the new machine, I run Apple Hardware Test (AHT), using the Extended Test. AHT tested my hardware, including the 2.5 GB of RAM, taking more than two hours (and making a hell of a noise, because during tests, the G5 ventilation system works in failsafe mode, which means, full power). Everything seemed to be fine. Until I installed Retrospect. I use Retrospect to make all my backups at home, and despite all it’s quirks, it always worked fine on the G4. Since I installed it on the G5, I got strange errors (the famous “internal consistentcy check”) and even crashes.

After nailing down all the possibilities (trashing preferences and existing backup sets, reinstalling Retrospect, etc) I suspected it could be an hardware problem, because I was told that the “internal consistentcy check” appears when the backup set contents are corrupted. So, I thought, my hard drive is corrupting data. I duplicated one backup set with about 80 GB, and surprise - after duplicating and running an md5 checksum on it (and diff), the files were different! This was NOT supposed to happen, naturally. So I tried the same thing on my boot drive - same problem. Ops… it’s not the drives. So, if it’s not the drives, and supposing (more exactly, praying) that it was not a motherboard issue, it must be the memory.

All my collegues at IST System Adminstration team use memtest on PCs to test the memory. This great distribution of memtest has a really nice touch: you can burn this on a CD, and boot the PC from it. It takes less that 200K of RAM, so all the other memory will be tested. Unfortunately, it’s not possible to boot it in Macs (not even Intel Macs - I tried it!). So, you must get the Mac OS X version of memtest, and boot the OS in Single User mode (using command-S during the boot sequence). The OS will take about 50 MB of RAM, which is, of course, much worse than the 200K used by the PC version, because those 50 MB will simply not be tested. But it’s better than nothing.

The official site for the Mac OS X version of memtest is here, but unfortunately, the author requires you to pay a small ammount for the download. I don’t like the approach very much because I don’t really know what I’m buying. the author says that, after paying, he sends a password for the encrypted DMG you downloaded. But I cannot download without paying, because the link is no-where. So… what happens when a new version comes out? Do I have to pay it again? Well, anyway, someone else is distributing memtest for OS X for free. Yes, it’s legal, because the software is under GNU license. So, if you don’t want to pay, just click here and grap your own free copy. Happy testing!

By the way, some tests take a lot of time. Let all of them run. Don’t assume the fact that all the “quick” tests passed means your memory is OK. Some problems may only be found with the more complex and slower tests - that’s why they are there. So, let it run. And if you have a G5, get the hell out of there, or use ear-plugs. It won’t be a nice office to work during testing, trust me.

memtest will detetct lots of common problems in memories, and will probably identify more than 99% of the defective memory modules arround. But never forget: it’s impossible to be entirely sure that a memory module is OK, simply because it’s not possible, in a reasonable time frame, to test all the possible combinations of data. Also, memory may pass all the tests in a day, and fail the next day. There are many factors that may trigger a hidden problem in memory modules: temperature, electrical flutuations, the data it contains, age, etc. If you suspect you have a bad memory module, and if you have time, run memtest for several days in a row, using the option to do many passes.

Backups, rsync, and –link-dest not working

Sunday, May 20th, 2007

I use Retrospect to backup most of the machines at GAEL. You may wonder why do I use a commercial tool that still shows it’s OS 9 roots, instead of open source alternatives. Well, Retrospect has some cool advantages (namely the very good support of laptops that may be disconnected abruptely from the network while a backup is in progress). Also, when I first did this setup, Amanda and other tools did not work reliably with Mac OS X file format.

While this works to backup all the desktop workstations and laptops of GAEL members, I have a problem with our xServe. It runs Mac OS X Server, and Retrospect will not backup machines with the Server version of Mac OS X with the license we have. To do that, we would have to buy a much more expensive license.

No problem. A server, due to it’s nature, doesn’t have the “sudden disappearing” problem of the laptops, so I can use a “classic” UNIX approach - and my choice was rsync and the –link-dest option. You may read about this option in the rsync manpage, but in case you don’t know, what it does is the following: instead of synchronizing a directory in the usual way, it will create a new directory with a new file tree. But, to save space, it won’t copy the non-updated files from the old tree to the new one. Instead, it creates hard links, so that both entries in the file system point to the same data on the hard drive (to the same inode), thus saving space. So, everytime you update your backup, you will create a new tree, but you will only waste the space required by the files that were updated since the last backup, and some more space for the filesystem structures that support the directory tree. You can use a command like this:


// rotate old dirs
rm -rf /Volumes/Storage/test/test.5
mv /Volumes/Storage/test/test.4 /Volumes/Storage/test/test.5
mv /Volumes/Storage/test/test.3 /Volumes/Storage/test/test.4
mv /Volumes/Storage/test/test.2 /Volumes/Storage/test/test.3
mv /Volumes/Storage/test/test.1 /Volumes/Storage/test/test.2
mv /Volumes/Storage/test/test.0 /Volumes/Storage/test/test.1

/usr/bin/rsync --rsync-path=/usr/bin/rsync -az -E -e ssh --exclude=/dev/\* --exclude=/private/tmp/\* --exclude=/Network/\* --exclude=/Volumes/\* --exclude=/private/var/run/\* --exclude=/afs/\* --exclude=/automount/\* --exclude=/.Spotlight-V100/\* --link-dest="/Volumes/Storage/test/test.1" "root@my.machine.com:/Users/arroz/TestDirectory" "/Volumes/Storage/test/test.0/"

Side note: the -E option (capital E) is an option present on Mac OS X rsync version, that forces rsync to copy all the extended Mac file system attributes, including resource forks. It only exists in Mac OS X 10.4 (Tiger) or newer versions. If you are still using 10.3 (Panther) or older, use rsyncx. Do not use rsyncx with Tiger.

Until about a week ago, my backup machine (an old PowerMac G4) had an external SCSI Raid with 640 GB, and an internal RAID 0 (2 * 80 GB drives), besides the boot disk. All the Retrospect backups were being placed on the external RAID, and the server backups were going to the internal RAID 0. Now, I know it’s living on the edge to backup to a RAID 0. But there was really no more space, and it was a temporary situation, because the new drives for the external RAID were already ordered.

When the new drives arrived, I stored all the backups where I could for some days (640 GB was huge when we purchased the RAID, but today is relatively managable), switched the drives and created a new fresh RAID 5. Formatted it in the HFS+ file system, and copied back all the backups, including the server backups and finally trashed the internal RAID 0.

Some path adjustements on my server backup scripts, and we are back in business. But the RAID free space was getting dramatically shorter every day. I used the ‘ls -i’ command to compare the inodes of files that were supposed to be unchanged from the backup of a day to the other in the next day, and as I suspected, rsync was duplicating all the files, instead of hard-linking them.

After Googling a lot, I could not find answers for this. I tried to see if ‘cp -la’ would successfully create hard links, but to my surprise, I found out that the Mac OS X built-in ‘cp’ command would not support the “l” option. Nice. Before installing the GNU ‘cp’ version (and because I’m lazy and I didn’t want to do that) I started thinking about everything I had done since the new drives arrived. The OS was the same, the rsync command was the same, it worked before, so it had to work now. The only reason why it could be not working was because rsync, somehow, thought that all the files were changing, even when they did not.

Suddenly, the solution poped up in my head. Mac OS X has an option, associated to every HFS+ volume, called “Ignore ownership on this volume”. This is turned off by default on the boot drive, but it’s turned on by default on all the external drives you format. There’s a good reason for this: Mac OS X is a consumer product. And average users want to buy an external drive, store data on it, bring it to another Mac, and read their data. They don’t care if their UID is the same on both machines or not.

But this causes serious problems to rsync. Althought the file system will store the owner of the files, it probably won’t report it to the applications who try to read it (or will mask them to the user who’s trying to access them). Somewhere this information is filtered, between the file system and application layers. So, rsync was not getting the real UID of the files. As the files that came from the server had real UIDs, both UIDs wouldn’t match, and rsync would create a new copy because, from it’s point of view, the file had been changed.

The solution was simple - just going to the machine console, and “Get Info” of the external volume. I turned off the “Ignore ownership on this volume” setting, and rsync started operating normally again.