UCS – BAFM

UCS Manager 2.0.2r KVM bug

April 28, 2013April 28, 2013 Christian Leave a comment

Well, we’ve been battling with a KVM bug in our UCS installation, that’s been driving me (and apparently the Cisco L3 support and development) nuts. But lets back up a bit. If you’ve worked with UCS before, once you open up the KVM console you’ll see the KVM and a shortcut commands (Shutdown, Reset) and another tab that allows you to mount virtual media.

Once you open it up, it should look like this:

Now, when we re-installed some of our servers (mostly the XenServer’s) and out of a sudden the KVM virtual media didn’t work for some reason. The UCS KVM would suddenly reject us from switching to the virtual media tab, saying that either the Login timed out or we’d have the wrong user and/or password, even if we tried with the most powerful user the UCS has, the local admin account.

UCS Manager - KVM virtual media tab rejecting authentification — UCS Manager – KVM virtual media tab rejecting authentification

So I opened a TAC, and Cisco got to work on it immediately. After poking around in the depths of the fabric interconnect with a dplug extension from Cisco with a Cisco L3 guy, and after about two months of development I just got a call back from the Cisco support guy. Apparently development figured out why we’d get the above error message.

Once you put a hash tag (#) in the Service Profiles User Label you’d get the error message.

UCS Manager - User Label — UCS Manager – User Label

Once I removed the hash tag, the KVM started working like it’s supposed to do. So if anyone ever comes across this, that’s your solution. Apparently Cisco is going to fix this in an upcoming release, but just removing the hash tag and everything is fine.

UCS 5108 power redundancy lost

September 20, 2012June 21, 2013 Christian 2 Comments

Well, another day – another UCS error. Out of the blue, one of our chassis started displaying that one PSU had failed, however the UCS was showing no PSU had failed *shrug*

Well, as it turns out – this is yet another known bug in 2.0.2(r). You’ll either have to unplug and plug all the power cables (that’s four) in a maintainance window – or simply change the Equipment Power Policy (found in the Root of your UCS, tab Policy)

from “N+1” to “Non Redundant”; wait a minute or two till the error is “fixed” and then change it back to “N+1”. Problem solved for now … 😀

UCS 5108: VIF down

September 17, 2012June 21, 2013 Christian Leave a comment

Well, I have yet another weird UCS problem. I have a single blade, that has trouble with it’s primary fabric attachment.

The problem get’s even more weird, if you look at the details.

After looking at the IO modules, the error doesn’t become any clearer:

So far, I have tried nearly everything. I’ve tried resetting the active and passive Connectivity of the vNIC, I tried resetting the DCE adapter for the vNIC, but nothing. I even tried resetting the vHBA that’s associated with this fabric, but that didn’t result to anything. Not even the usual flogi (fibre channel login) errors, that you get when either booting/resetting the blade.

Well, I opened a TAC case and the Cisco engineer looked over the CLI of my fabric interconnects, hacked away at the thousand logs the UCS keeps, and asked me if I could switch the blade to another slot. However, since I don’t have any slots to spare – the five chassis are full – he said he’d start the RMA process for the MK81R mezzanine card.

The new MK81R arrived a few days later, however the issue still persisted. So the TAC engineer suggested I’d pull the IO module and plug in a “new” one (one that I knew was working). So I picked a Friday afternoon for the maintainance window (since I didn’t know if the blades would survive a IO module failover) and pulled one from my newly arrived chassis and reseated the one I knew to be faulty.

Guess what: after swapping the IO modules, the error is completely gone …. *shrug* I don’t have a clue how this error happened or why it was fixed by pulling and plugging the IO module. Guess another error for the odd category.

UCS 5108: Power problem

September 4, 2012June 21, 2013 Christian Leave a comment

Well, I recently had yet another UCS display/I2C communication problem. Somehow one of my chassis’ started to think, that the power redundancy was lost.

After looking at it a bit deeper, it seems only the GUI or the chassis did notice this power glitch:

As you can see, all PSU’s still have power. Now, since I had a big maintainance window the last weekend anyhow (and I spent ~14 hours at work), I decided to restart the IO modules in that chassis. And guess what: The error is gone! Another weird I2C communication issue with the firmware release 2.0.2 …

SLES11.1 and updated multipath-tools

April 7, 2012June 21, 2013 Christian Leave a comment

Well, after I scripted the installation the other day, I tried installing SLES11.1-Updates to the freshly installed systems. Guess what ? The thing broke. Initially (it was late Friday afternoon – like 6 PM – before my one week vacation) I didn’t have much time to debug the issue, so I sat down last week and looked at the issue.

During the installation, when first starting multipath via command line, the scsi-mpatha device appears, and each and every occurance of this is subsequentially being used (and other stuff replaced by this actually) during the whole installation phase.

But what is this multipath-tools update doing ? No clue what exactly, however after installing the update the system is bricked. The system is basically looking for /dev/scsi/by-id/scsi-disk-mpatha and waiting for this device to appear. But since the update robbed the device, the system is no longer starting.

So I went ahead and digged around in the /dev/disk/by-id directory. Turns out /dev/disk/by-id/scsi- is actually pointing to the right device, and thus I ended up using it. So I rewrote all my scripts (profile and chroot/post-chroot adjustments) as you can see below, and for now at least, I have a working installation that lets you install updates! (careful it gots electrolytes)

#!/bin/bash

cat << EOF > /etc/multipath.conf
defaults {
	user_friendly_names		no
	bindings_file			/etc/multipath_bindings
}

devices {
	device {
		vendor			"NETAPP"
		product			"LUN"
		getuid_callout		"/lib/udev/scsi_id -g -u --device=/dev/%n"
		features		"1 queue_if_no_path"
		hardware_handler	"1 alua"
		path_checker		tur
		path_selector		"round-robin 0"
		path_grouping_policy	group_by_prio
		failback		immediate
		rr_weight		uniform
		rr_min_io		128
		prio			alua
		max_fds			max
        }
}
EOF

sleep 1

# Disable the resume kernel
sed -e "s,resume=/dev/mapper/.*_part[0-9],noresume,g" -i 
	/etc/sysconfig/bootloader /boot/grub/menu.lst 

# Figure out the multipath ID
MD_ID="$( /sbin/multipath -l | head -n1 | cut -d  -f1 )"

# Fix wrong root-path in /etc/fstab
sed -i "s/mapper/.*_part/disk/by-id/scsi-${MD_ID}-part/" /etc/fstab

# Fix grub root-path and wrong root-partition
sed -i -e "s/mapper/.*_part/disk/by-id/scsi-${MD_ID}-part/" 
	-e "s,scsi-${MD_ID}-part2,scsi-${MD_ID}-part3," /boot/grub/menu.lst

# Fix the device.map
echo -e "(hd0)t/dev/disk/by-id/scsi-${MD_ID}" > /boot/grub/device.map

sleep 1
mkinitrd -f multipath

#!/bin/bash

cat << EOF > /etc/multipath.conf

defaults {

user_friendly_names no

bindings_file /etc/multipath_bindings

}

devices {

device {

vendor "NETAPP"

product "LUN"

getuid_callout "/lib/udev/scsi_id -g -u --device=/dev/%n"

features "1 queue_if_no_path"

hardware_handler "1 alua"

path_checker tur

path_selector "round-robin 0"

path_grouping_policy group_by_prio

failback immediate

rr_weight uniform

rr_min_io 128

prio alua

max_fds max

}

EOF

sleep 1

# Disable the resume kernel

sed -e "s,resume=/dev/mapper/.*_part[0-9],noresume,g" -i

/etc/sysconfig/bootloader /boot/grub/menu.lst

# Figure out the multipath ID

MD_ID="$( /sbin/multipath -l | head -n1 | cut -d -f1 )"

# Fix wrong root-path in /etc/fstab

sed -i "s/mapper/.*_part/disk/by-id/scsi-${MD_ID}-part/" /etc/fstab

# Fix grub root-path and wrong root-partition

sed -i -e "s/mapper/.*_part/disk/by-id/scsi-${MD_ID}-part/"

-e "s,scsi-${MD_ID}-part2,scsi-${MD_ID}-part3," /boot/grub/menu.lst

# Fix the device.map

echo -e "(hd0)t/dev/disk/by-id/scsi-${MD_ID}" > /boot/grub/device.map

sleep 1

mkinitrd -f multipath

Now what’s left to do for tomorrow is “fixing” the already (previous to those changes) installed systems, so we can install security updates on those too!

UCS blades w/ Boot-from-SAN and AutoYaST

March 3, 2012August 8, 2014 Christian 1 Comment

As I wrote before about enabling multipathing for the AutoYaST installation it’s about time I write this one here.

Sadly AutoYaST needs a little push in the right direction (as to where to actually put the root device), so here’s part of my AutoYaST profile for such a Cisco blade:

<profile xmlns="http://www.suse.com/1.0/yast2ns" xmlns:config="http://www.suse.com/1.0/configns">

	<bootloader>
		<device_map config:type="list">
			<device_map_entry>
				<firmware>hd0</firmware>
				<linux>/dev/sda</linux>
			</device_map_entry>
		</device_map>
	</bootloader>

	<partitioning config:type="list">
		<drive>
			<device>/dev/sda</device>
		</drive>
	</partitioning>

	<scripts>
		<pre-scripts config:type="list">

			<script>
				<debug config:type="boolean">false</debug>
				<feedback config:type="boolean">false</feedback>
				<filename>config-ucs.sh</filename>
				<interpreter>shell</interpreter>
				<source><![CDATA[
cat /tmp/profile/autoinst.xml | sed "s,/dev/sda,/dev/mapper/`/sbin/multipath -ll | grep dm-0 | cut -d  -f1`," > /tmp/profile/modified.xml
]]>
				</source>
			</script>

		</pre-scripts>

		<chroot-scripts config:type="list">

			<script>
				<chrooted config:type="boolean">true</chrooted>
				<debug config:type="boolean">true</debug>
				<feedback config:type="boolean">true</feedback>
				<filename>config-ucs-chroot.sh</filename>
				<interpreter>shell</interpreter>
				<location>http://install.home.barfoo.org/autoyast/scripts/config-ucs-chroot.sh</location>
			</script>

		</chroot-scripts>
	</scripts>

	<software>
		<packages config:type="list">
			<package>multipath-tools</package>
		</packages>
	</software>

</profile>

<device_map config:type="list">

<device_map_entry>

</device_map_entry>

</device_map>

</bootloader>

<drive>

</drive>

</partitioning>

<pre-scripts config:type="list">

<debug config:type="boolean">false</debug>

<feedback config:type="boolean">false</feedback>

<filename>config-ucs.sh</filename>

<interpreter>shell</interpreter>

<source><![CDATA[

cat /tmp/profile/autoinst.xml | sed "s,/dev/sda,/dev/mapper/`/sbin/multipath -ll | grep dm-0 | cut -d -f1`," > /tmp/profile/modified.xml

]]>

</source>

</script>

</pre-scripts>

<chroot-scripts config:type="list">

<filename>config-ucs-chroot.sh</filename>

<interpreter>shell</interpreter>

<location>http://install.home.barfoo.org/autoyast/scripts/config-ucs-chroot.sh</location>

</script>

</chroot-scripts>

</scripts>

<package>multipath-tools</package>

</packages>

</software>

</profile>

Now, the profile addition takes care of the placement of the root-device now (simply parses multipath -ll) and adjusts the pulled profile accordingly (/tmp/profile/modified.xml), which AutoYaST then re-reads.

Now, after installing the system, it’s gonna come up broken and shitty. That’s what the chroot-script above is for. This script looks like this (the original idea was here, look through the attachments):

#!/bin/bash

echo "defaults {
	user_friendly_names yes
	bindings_file /etc/multipath_bindings
}" > /etc/multipath.conf

sleep 1

# Fix wrong root-path in /etc/fstab
sed -i 's/mapper/.*_part/disk/by-id/scsi-mpatha-part/' /etc/fstab

# Fix grub root-path and wrong root-partition
sed -i -e 's/mapper/.*_part/disk/by-id/scsi-mpatha-part/' 
	-e 's,scsi-mpatha-part2,scsi-mpatha-part3,' /boot/grub/menu.lst

# Fix the device.map
echo -e "(hd0)t$( ls /dev/disk/by-id/scsi-* | grep -v part )" > /boot/grub/device.map

sleep 1
mkinitrd -f multipath

#!/bin/bash

echo "defaults {

user_friendly_names yes

bindings_file /etc/multipath_bindings

}" > /etc/multipath.conf

sleep 1

# Fix wrong root-path in /etc/fstab

sed -i 's/mapper/.*_part/disk/by-id/scsi-mpatha-part/' /etc/fstab

# Fix grub root-path and wrong root-partition

sed -i -e 's/mapper/.*_part/disk/by-id/scsi-mpatha-part/'

-e 's,scsi-mpatha-part2,scsi-mpatha-part3,' /boot/grub/menu.lst

# Fix the device.map

echo -e "(hd0)t$( ls /dev/disk/by-id/scsi-* | grep -v part )" > /boot/grub/device.map

sleep 1

mkinitrd -f multipath

What the script does, is 1) fix the occurances of /dev/mapper/, since this isn’t the proper way 2) create a multipathing initrd. Without the creation of the multipath-aware initrd the system will also not boot!

Enabling multipathing in autoyast installations

February 29, 2012June 21, 2013 Christian 1 Comment

As I mentioned before, we’re starting to utilize Boot-from-SAN as a means to strip the blades of their local disk. As the title says, after trying a manual installation of SLES 11.1 via CD/HTTP I wanted to automate the process, in order to get a reproducible, consistent installation method. As you might have figured, AutoYaST doesn’t have any built in support for configuring multipathing (hey, that’s what Novell says here). Now, they also provide a comprehensive how-to on how to “add” this to your AutoYaST, using a DUD (or Driver Update Disk).

Now, you can download the provided Driver Update Disk to any Linux box and unpack it using cpio. As the KB states, do the following:

cd /tmp/
gunzip multipath.DUD.gz
cd /Installsource/SLES11SP1-x86_64/
cpio -i </tmp/multipath.DUD

cd /tmp/

gunzip multipath.DUD.gz

cd /Installsource/SLES11SP1-x86_64/

cpio -i </tmp/multipath.DUD

And you might get the same as I do:

chrischie:(sles-vm.home.barfoo.org) PWD:/srv/instsrc/sles/11.1/x64
Wed Feb 29, 21:13:42 [0] > sudo cpio -i < /tmp/multipath.DUD
cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux: Cannot mkdir: No such file or directory
cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse: Cannot mkdir: No such file or directory
cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11: Cannot mkdir: No such file or directory
cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys: Cannot mkdir: No such file or directory
cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys/usr: Cannot mkdir: No such file or directory
cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys/usr/lib: Cannot mkdir: No such file or directory
cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys/usr/lib/YaST2: Cannot mkdir: No such file or directory
cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys/usr/lib/YaST2/startup: Cannot mkdir: No such file or directory
cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys/usr/lib/YaST2/startup/hooks: Cannot mkdir: No such file or directory
cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys/usr/lib/YaST2/startup/hooks/preFirstStage-old: Cannot mkdir: No such file or directory
cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys/usr/lib/YaST2/startup/hooks/preFirstStage-old/S01start_multipath.sh: Cannot open: No such file or directory
cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys/usr/lib/YaST2/startup/hooks/preFirstCall: Cannot mkdir: No such file or directory
cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys/usr/lib/YaST2/startup/hooks/preFirstCall/S01start_multipath.sh: Cannot open: No such file or directory
8 blocks

chrischie:(sles-vm.home.barfoo.org) PWD:/srv/instsrc/sles/11.1/x64

Wed Feb 29, 21:13:42 [0] > sudo cpio -i < /tmp/multipath.DUD

cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux: Cannot mkdir: No such file or directory

cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse: Cannot mkdir: No such file or directory

cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11: Cannot mkdir: No such file or directory

cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys: Cannot mkdir: No such file or directory

cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys/usr: Cannot mkdir: No such file or directory

cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys/usr/lib: Cannot mkdir: No such file or directory

cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys/usr/lib/YaST2: Cannot mkdir: No such file or directory

cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys/usr/lib/YaST2/startup: Cannot mkdir: No such file or directory

cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys/usr/lib/YaST2/startup/hooks: Cannot mkdir: No such file or directory

cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys/usr/lib/YaST2/startup/hooks/preFirstStage-old: Cannot mkdir: No such file or directory

cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys/usr/lib/YaST2/startup/hooks/preFirstStage-old/S01start_multipath.sh: Cannot open: No such file or directory

cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys/usr/lib/YaST2/startup/hooks/preFirstCall: Cannot mkdir: No such file or directory

cpio: /NETSTORE/Installsource/SLES11SP1-x86_64/linux/suse/x86_64-sles11/inst-sys/usr/lib/YaST2/startup/hooks/preFirstCall/S01start_multipath.sh: Cannot open: No such file or directory

8 blocks

Now, after creating the path in question (just mkdir -p /NETSTORE/Installsource/SLES11SP1-x86_64) and retrying the cpio, you’ll get the following:

chrischie:(sles-vm.home.barfoo.org) PWD:/srv/instsrc/sles/11.1/x64
Wed Feb 29, 21:15:44 [0] > sudo cpio -i < /tmp/multipath.DUD
8 blocks

chrischie:(sles-vm.home.barfoo.org) PWD:/srv/instsrc/sles/11.1/x64

Wed Feb 29, 21:15:44 [0] > sudo cpio -i < /tmp/multipath.DUD

8 blocks

Next, according to the KB article is moving the linux directory to your actual install location (in my case /srv/instsrc/sles/11.1/x64) and then booting your system in question. And guess what you get ? Nil (as in the setup starts, but /sbin/multipath isn’t being called — which is nothing in my setup).

Next I tried copying the cpio-image (/tmp/multipath.DUD) to my install location as driverupdate (I know the SLES setup is pulling that file), which produced a warning about an unsigned driverupdate file (as it isn’t in ./content) — which can be circumvented by adding Insecure: 1 to your Info file (or passing insecure=1 as linuxrc parameter) — but after pressing “Yes” produced yet again Nil (still no call to /sbin/multipath).

After about three hours of fiddling with the original DUD (sadly the UCS blades are painfully slow to reboot — takes them about six minutes each), I decided to repack the Driver Update Disk. The Update Media HOWTO explains the structure/layout of the DUD pretty well, but fails to mention what kind of image it actually is or how to create it. Luckily there’s Google and the Internets.

The guys over at OPS East Blog, posted something that helped me create the DUD.

Basically, create /tmp/update-media and copy/move the linux folder into this folder.

mkdir /tmp/driver-update
cp -r /NETSTORE/Installsource/SLES11SP1-x86_64/linux /tmp/update-media
chown -R root.root /tmp/update-media

mkdir /tmp/driver-update

cp -r /NETSTORE/Installsource/SLES11SP1-x86_64/linux /tmp/update-media

chown -R root.root /tmp/update-media

After this, we create a Driver Update Disk configration.

echo "UpdateName: Enable multipathing config on AutoYaST boot" 
  > /tmp/update-media/linux/suse/x86_64-sles11/dud.config
echo "UpdateID: autoyast_multipath_1" 
  >> /tmp/update-media/linux/suse/x86_64-sles11/dud.config

echo "UpdateName: Enable multipathing config on AutoYaST boot"

> /tmp/update-media/linux/suse/x86_64-sles11/dud.config

echo "UpdateID: autoyast_multipath_1"

>> /tmp/update-media/linux/suse/x86_64-sles11/dud.config

Now, we create the DUD package.

mkfs.cramfs /tmp/update-media /tmp/driverupdate

1	mkfs.cramfs /tmp/update-media /tmp/driverupdate

This produces a CramFS image named /tmp/driverupdate (which you can view using mount -o loop). After moving this image to my install location and keeping the filename driverupdate, /sbin/multipath is actually being called as you can see below.