SLES11 and AutoYaST

After the first week passed awfully quick, the last week I worked on refining the way on how we are doing openSuSE / SUSE Linux Enterprise Server installations. Up till now, they were done by hand (without a predefined schema) and were getting ugly to maintain. Working my way through the Novell documentation on AutoYaST was pretty straight forward, but the little details were getting hairy. So I decided to write them down, in case someone was gonna end up in the same situation like me.

1. Understanding the folder layout
This isn’t a debate about how you should organize your file system(s), this is a reminder on how to design the AutoYaST profile … If (and probably when) you’re using rules, the layout of the profile directory mentioned in your autoyast PXE line should look like this:

AutoYast folder structure

Keep in mind, that profiles mentioned in your rules.xml, need to be placed in the parent folder, not in the rules folder itself.

2. Rules might not work as you think (ie installed_product)
If you work your way through AutoYaST, you are gonna get to the point where you can’t get around rules. Rules match specific parameters of the target system to a (sub) set of configuration parameters. Now, say you want to create a rule for systems on which you are about to install SUSE Linux Enterprise Server 11. The documentation tells us, to use something along the lines like this:

You’re quickly gonna notice that you’re installation is gonna fail. Why ? Because Novell decided to drop the defined schema and use something else … The tag install_product, when performing an SLES11 installation doesn’t contain ‘SUSE Linux Enterprise Server‘ as you might think, but rather ‘SUSE Linux Enterprise Server 11‘ … notice the difference ? That and the default, that if no match-tag is given, it matches to the exact phrase. So the correct AutoYaST snippet should look likes this:

Another hint when trying to match for SUSE Linux Enterprise Server 10 — say you wanna deploy SLES10SP2. Make sure you make a regex match for that, since — as you might have guessed, I didn’t at first — the install_product tag seems to contain SUSE Linux Enterprise Server 10 SP2 ❗

3. Asking questions during the installation
If you’re in need of asking questions (like “Please enter the hostname” or “Please supply the root password”), you’re gonna need a bit of understanding as well as a few tricks:

4. Working with scripts
Once you get to the point, where you want to embed scripts into the AutoYaST profile, you might up in a situation where YaST literally shoots you in the foot. If you (like me) utilize different profile, and in each profile define a script, YaST is gonna overwrite the tags from the other profile. This sucks, and I ended up placing a dummy script into my master profile.

4. Debugging the rules merging process

Well this is quite troublesome. There is no easy way to do this, no GUI or script telling you what the resulting AutoYaST xml is gonna look like. There are a few helps (like the xsltproc call mentioned in the FAQ — or looking at the output of /var/log/YaST2/y2log concerning your rules.xml) but other than that there really is just fiddling with it.

Right now, I’m stuck working on the determination whether or not the system YaST is about to install happens to be inside a VMware virtual machine or not (simply looking at the MAC address should suffice).

But again, as mentioned in the second point, Novell apparently lacks some consistency. The handbook states the rules tag mac would contain the MAC address … apprently not, the mac-tag is empty … Which leads us to using a custom rules script fiddling with ip, a bit of grep and a simple echo, which works quite well. Just make sure, that you move the output of your script to /dev/null, otherwise you’re gonna sit there and end up trying to figure out why the hell the script works on the command line, but not within an AutoYaST installation …

Loooong time

It’s been very quiet around here, I’ve been rather busy with my real life. During that busy time, a lot of things happened. I switched jobs starting on October 1st, I’m now working in Karlsruhe (as compared to the 870km northern Greifswald). It may sound far, but it’s actually quite pleasant. You know, I was born down here (well not exactly here — 70 kilometers afar) and I still had the feeling that this is my home.

My tasks haven’t changed that much, I’m still doing

  • VMware Virtual Infrastructure (as compared to vSphere)
  • IBM Storage / Brocade SAN (was IBM Storage / Cisco SAN)
  • Storage Virtualisation Controller (we were just buying that before I left)
  • SUSE Linux Enterprise Server 10/11 – Deployment and Management (is pretty much the same as before)

What I don’t do any longer is Windows. That is, per se not completely right, since Virtual Center only runs on Windows boxen, but pretty much my whole work focuses on Linux and Storage. It’s as I argumented in the the interview a step ahead, since I’m specializing myself into a certain direction (whether or not that works out — I can’t tell yet. Time is gonna tell me that).

In my first week I spent some time getting to know the co-workers, working my head into the SVC (I already had a somewhat theroretical — and practical — insight, but not deep enough to actually make do with it). Next on my list is the AutoYaST environment for SLES-boxen / Kickstart for ESX(i), which (hopefully) enables us to standardize server installations using common schemas and partitions layout.

Also on the list is building a two-node test environment for the SVC so we don’t break the live environment with some tests we might be doing. Next on the list is some accouting to bring the settlements for the resource utilization based on vCPU/vMem upon a solid, up to date foundation.

New IBM RDAC version (or not)

A week ago (September 02nd), I received a mail detailing the release of IBM’s new multipathing device driver for the DS4x00 series, which finally works with SLES11 (the available software up till now doesn’t — as in fails with kernels > 2.6.26 iirc).

ESC+ notification detailing the release
ESC+ notification detailing the release

There wouldn’t be any trouble, if IBM (or rather the vendor providing the driver — LSI) would actually release the driver … up till today, I have yet to see the new version appear on the download page. I already tried to notify IBM about the trouble, but as usual there is lack of ways to actually get this to the right person.

Well, IBM just replied to my feedback and apparently the download is available (it is right now, after two weeks hah — finally).

Tivoli Storage Manager Server 5.5.3

I spent yesterday afternoon upgrading our TS7530, and in my fad I also upgraded TSM to 5.5.3. Now, once I started TSM it quickly started complaining about the paths to the drives.

I thought maybe this is a mere device problem (we have had them before), so I rebooted the boxes. But still no luck and I went home after about an hour of trying without any luck. In the morning, my co-worker called our trustworthy IBM service partner, and the TSM consultant said he had the exact, same problem yesterday. We would have two options:

  1. Enable the option SANDISCOVERY, with the (completely undocumented) Passive setting (setopt SANDISCOVERY PASSIVE)
  2. Downgrade back to 5.5.2

For now, we implemented the first option, in the hope that’ll solve our troubles. And it actually does.

Mass-updating Tivoli Storage Manager drive status

I was fighting with our VTL again, and TSM was thinking all the drives were offline. In order to update the drive status, you’d need to go into the ISC and select each drive and set them to ONLINE. Since I’m a bit click-lazy, I wrote a simple nested for-loop, which gives me the output to update all the drives at once:

Result is a list like this:

The same goes for mass-updating the path status:

Result is a list like this:

IBM RSA II adapter and Java RE (fini)

If you remember back to July, I looked into some troubles I had with the IBM RSA II adapter’s Java interface and the latest JRE updates. I just noticed, that IBM released a new firmware yesterday for the RSA. The ChangeLog states this:

Version 1.13, GFEP35A
Problem(s) Fixed:

* Suggested
o Fix for Remote Control General Exception in JRE 1.6 update 12 and above.
o Corrected a problem that DHCP renew/release may fail after a long time.
o Corrected a problem that remote control preference link disapears after creating new key buttons.
o Corrected a problem that cause event number shows only from 0 to 255 when views RSA log via telnet session.

As you can see, IBM finally decided that it isn’t a Sun problem but rather their own! Finally, after about 4 months a fix, yay!

Even if the fix is just for the x3550 for now, but that puts a light to the end of the tunnel and puts up hope, that they are gonna fix it for the other RSA adapters too!

rpc.statd starting before portmap

One problem gone, another one turns up. When rpc.statd (nfs-common) tries to start before portmap, it’s gonna result in failure. Now, the logfile (/var/log/daemon.log) is gonna print a rather cryptic error message:

After fixing the start order (I really hate *SUSE*/Debian* for not having init-script dependencies — like Gentoo’s baselayout/Roy’s openrc does have), everything is like it should be and I’m able to put the /srv/xen mount into the fstab

OFED packages for Debian

As I mentioned yesterday, I’m currently doing some project work. Said project includes InfiniBand technology.

Apparently we bought a “cheap” InfiniBand switch, which comes without a subnet manager. So, in order to communicate between the nodes, you need to install the subnet manager (opensm in my case) on each node.

In order to utilize the InfiniBand interface you need to do a few things first though:

  1. Obviously install the opensm package
  2. Add ib_umad and ib_ipoib to /etc/modules

After installing opensm on the host as well as the NFS root, opensm comes up just fine and the network starts automatically. Only trouble right now is, that ISC’s DHCP doesn’t support InfiniBand, otherwise I could even utilize DHCP to distribute the IP addresses.

Xen dom0 failing with kernel panic

I’m building a 6-node cluster, using Xen at the moment. For the last few days, I tried my setup in a virtual machine, simply because VM’s boot much faster than the real hardware. However, certain things you can only replicate on the real hardware (for example, the InfiniBand interfaces, as well as certain nfs-stuff).

So I spent most of the day to replicate my configurations onto the hardware. After getting all done, the moment of the first boot … kaput! Doesn’t boot, just keeps hanging before booting the real kernel. Now what ? I removed the Xen vga parameters and rebooted (waited ~2 minutes in the process) until I finally saw the root cause for my trouble:

I was like *wtf* … My tftp setup _worked_ inside the VM’s, why ain’t it working here ? Quick look at the pxelinux.cfg for the mac address revealed this:

As you can see, I had devised 64M for the dom0, which apparently wasn’t enough. After tuning the memory limit to 256M, everything is honky-dory!

TS7530 authentification failure

Today, I had a rather troublesome morning. Once I got to work, Nagios was already complaining about the lin_taped on one of our TSM servers, which apparently failed due to too many SCSI resets. Additionally, I can’t login using the VE console (I can login however using SSH) so I ended up opening up a IBM Electronic Service Call (ESC+).

Using SSH, I can get some information on the VE’s status:

After looking a bit deeper, it seems that none of the two TSM server is able to see the IBMchanger devices for the first VTL. The second is perfectly visible, just not the first. After putting both VE nodes into suspended failover, gathering support data for the IBM support from both VE’s and the Brocade SAN switches, apparently everything works again. I guess the library does have “self healing” properties.