Ulcer-free Solaris Upgrades

As any admin knows, OS upgrades can be painful. Despite the best intentions and efforts by the vendor, bad things happen. This is especially true with kernel upgrades. It’s one thing to run a blanket “yum update” or “apt-get upgrade” on your average Linux desktop, but even the tamest patch can ascend to truly ulcer-inducing levels when it applies to a critical system that can ill afford any extended downtime.

It was with such trepidation in our hearts today that we at the $DAYJOB faced the necessity of patching a critical server running Solaris 10 11/06. The impending patch was for the kernel, and it was a dependency for almost all the other outstanding patches. It had a big, fat, hairy disclaimer that it might do anything and everything short of setting the system on fire, just so we were aware. Gee, thanks. Granted, this was not really uber-critical, because the system in question isn’t in production yet, but its current state represents many man-hours of careful application setup and tuning, not to mention data import (its primary function is serving a half-terabyte Oracle instance.) Having to start over from scratch would be a significant setback. Furthermore, once it is in production, you can bet we’ll one day be in the same position, and it really will be critical to minimize downtime.

Enter Live Upgrade. Very briefly, LU enables the administrator to either copy the existing boot environment, or install a completely new environment, to an alternate location, all without disturbing the running system. We knew about LU and had saved a slice during jumpstart that matched the size of /. This would give us the chance to duplicate our boot environment, apply the necessary patches, and do a test boot on the updated copy, while retaining the unpatched version, in case the proverbial mass hit the cooling device.

Our filesystem layout did not leave enough free slices to duplicate /var and /opt, however. Not to worry, these directories actually had far less data in them than we originally budgeted for, so we could merge all three into our one LU slice for the test. What followed was very nearly a textbook application of LU. It worked like a charm, and the patches turned out to work just fine. Ulcer avoided!

We chose not to make this the official layout of the system, because /var was still part of /, which is sub-optimal. Another stroke of fortune (or incredible foresight… yeah, that’s it) was to make all four slices (/, /var, /opt, LU) the same size. We could stand having /opt folded back in, because we’d already installed all the significant apps and knew it wasn’t going to grow much larger. So, giddy with success, we devised a clever scheme to redo the layout with two slices for the OS and two identical slices for future LU use. It involved another round of LU to get rid of the old /, /var, and /opt slices and constitute a version that combined / and /opt on the first slice and put /var on the second, leaving the third to become a future LU location. When the dust settled, we had the layout we wanted, with two fresh LU slices for future acid-reflux relief, and the total downtime was between 20-30 minutes. Very cool.

Looking to the future, the prospects for LU seem even brighter with the advent of root-on-ZFS (currently available as of Solaris Express b62). If you have everything on ZFS, then you don’t need spare slices for anything– a snapshot becomes your LU environment. Break it, and you just roll back. Even if it’s not quite that simple, it’s still a quantum leap beyond what we can do today in mainline Solaris. I can’t wait.

Back to top