Multipath – BAFM

XenServer 6.0.2: Fixing Root-Disk-Multipathing with Boot-from-SAN

July 16, 2013July 17, 2013 Christian Leave a comment

As the title pretty much tells, I’ve been working on fixing the Root-Disk-Multipathing feature of our XenServer installations. Our XenServer boot from a HA-enabled NetApp controller, however we recently noticed that during a controller fail-over some, if not all, paths would go offline and never come back. If you do a cf takeover and cf giveback in short succession, you’ll end up with a XenServer host that is unusable, as the Root-Disk would be pretty much non-responsive.

Guessing from that, there don’t seem to be that many people using XenServer with Boot-from-SAN. Otherwise Citrix/NetApp would have fixed that by now…. Anyhow, I went around digging in our XenServer’s. What I already did, was adjust the /etc/multipath.conf according to a bug report (or TR-3373). For completeness sake I’ll list it here:

# Multipathing configuration for XenServer on NetApp ALUA
# enabled storage.
# TR-3732, revision 5

defaults {
        user_friendly_names no
        queue_without_daenon no
        flush_on_last_del yes
}

## some vendor specific modifications
devices {
        device {
                vendor "NETAPP"
                product "LUN"
                path_grouping_policy group_by_prio
                features "1 queue_if_no_path"
                getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
                prio_callout "/sbin/mpath_prio_alua /dev/%n"
                path_checker directio
                failback immediate
                hardware handler "0"
                rr_weight uniform
                rr_min_io 128
        }
}

# Multipathing configuration for XenServer on NetApp ALUA

# enabled storage.

# TR-3732, revision 5

defaults {

user_friendly_names no

queue_without_daenon no

flush_on_last_del yes

}

## some vendor specific modifications

devices {

device {

vendor "NETAPP"

product "LUN"

path_grouping_policy group_by_prio

features "1 queue_if_no_path"

getuid_callout "/sbin/scsi_id -g -u -s /block/%n"

prio_callout "/sbin/mpath_prio_alua /dev/%n"

path_checker directio

failback immediate

hardware handler "0"

rr_weight uniform

rr_min_io 128

}

And as it turns out, this is the reason why we’re having such difficulties with the Multipathing. The information in TR-3373 is a bunch of BS (no, not everything but a single path is wrong, the getuid_callout) and thus the whole concept of Multipathing, Failover and High-Availibility (yeah, I know – if you want HA, don’t use XenServer :P) is gone.

SLES11.1 and updated multipath-tools

April 7, 2012June 21, 2013 Christian Leave a comment

Well, after I scripted the installation the other day, I tried installing SLES11.1-Updates to the freshly installed systems. Guess what ? The thing broke. Initially (it was late Friday afternoon – like 6 PM – before my one week vacation) I didn’t have much time to debug the issue, so I sat down last week and looked at the issue.

During the installation, when first starting multipath via command line, the scsi-mpatha device appears, and each and every occurance of this is subsequentially being used (and other stuff replaced by this actually) during the whole installation phase.

But what is this multipath-tools update doing ? No clue what exactly, however after installing the update the system is bricked. The system is basically looking for /dev/scsi/by-id/scsi-disk-mpatha and waiting for this device to appear. But since the update robbed the device, the system is no longer starting.

So I went ahead and digged around in the /dev/disk/by-id directory. Turns out /dev/disk/by-id/scsi- is actually pointing to the right device, and thus I ended up using it. So I rewrote all my scripts (profile and chroot/post-chroot adjustments) as you can see below, and for now at least, I have a working installation that lets you install updates! (careful it gots electrolytes)

#!/bin/bash

cat << EOF > /etc/multipath.conf
defaults {
	user_friendly_names		no
	bindings_file			/etc/multipath_bindings
}

devices {
	device {
		vendor			"NETAPP"
		product			"LUN"
		getuid_callout		"/lib/udev/scsi_id -g -u --device=/dev/%n"
		features		"1 queue_if_no_path"
		hardware_handler	"1 alua"
		path_checker		tur
		path_selector		"round-robin 0"
		path_grouping_policy	group_by_prio
		failback		immediate
		rr_weight		uniform
		rr_min_io		128
		prio			alua
		max_fds			max
        }
}
EOF

sleep 1

# Disable the resume kernel
sed -e "s,resume=/dev/mapper/.*_part[0-9],noresume,g" -i 
	/etc/sysconfig/bootloader /boot/grub/menu.lst 

# Figure out the multipath ID
MD_ID="$( /sbin/multipath -l | head -n1 | cut -d  -f1 )"

# Fix wrong root-path in /etc/fstab
sed -i "s/mapper/.*_part/disk/by-id/scsi-${MD_ID}-part/" /etc/fstab

# Fix grub root-path and wrong root-partition
sed -i -e "s/mapper/.*_part/disk/by-id/scsi-${MD_ID}-part/" 
	-e "s,scsi-${MD_ID}-part2,scsi-${MD_ID}-part3," /boot/grub/menu.lst

# Fix the device.map
echo -e "(hd0)t/dev/disk/by-id/scsi-${MD_ID}" > /boot/grub/device.map

sleep 1
mkinitrd -f multipath

#!/bin/bash

cat << EOF > /etc/multipath.conf

defaults {

user_friendly_names no

bindings_file /etc/multipath_bindings

}

devices {

device {

vendor "NETAPP"

product "LUN"

getuid_callout "/lib/udev/scsi_id -g -u --device=/dev/%n"

features "1 queue_if_no_path"

hardware_handler "1 alua"

path_checker tur

path_selector "round-robin 0"

path_grouping_policy group_by_prio

failback immediate

rr_weight uniform

rr_min_io 128

prio alua

max_fds max

}

EOF

sleep 1

# Disable the resume kernel

sed -e "s,resume=/dev/mapper/.*_part[0-9],noresume,g" -i

/etc/sysconfig/bootloader /boot/grub/menu.lst

# Figure out the multipath ID

MD_ID="$( /sbin/multipath -l | head -n1 | cut -d -f1 )"

# Fix wrong root-path in /etc/fstab

sed -i "s/mapper/.*_part/disk/by-id/scsi-${MD_ID}-part/" /etc/fstab

# Fix grub root-path and wrong root-partition

sed -i -e "s/mapper/.*_part/disk/by-id/scsi-${MD_ID}-part/"

-e "s,scsi-${MD_ID}-part2,scsi-${MD_ID}-part3," /boot/grub/menu.lst

# Fix the device.map

echo -e "(hd0)t/dev/disk/by-id/scsi-${MD_ID}" > /boot/grub/device.map

sleep 1

mkinitrd -f multipath

Now what’s left to do for tomorrow is “fixing” the already (previous to those changes) installed systems, so we can install security updates on those too!

IBM RDAC and Windows Cluster Service

May 28, 2008June 21, 2013 Christian 1 Comment

Okay, so we received a brand new x3650 the other day entitled to replace one (or better two) of our NAS frontend servers. We installed Windows on it the other day (had to create a custom Windows Server 2003 CD first, since the default one doesn’t recognize the integrated ServeRAID), and we prepped the box during the week with the usual things.

On Monday I started installing the “IBM StorageManager RDAC” MultiPath driver (since the box got two single port PCIe FC-HBA’s) and figured I’d be nice if we had this. I asked a IBM Systems Engineer of one of our partners, which told me generally there wouldn’t be a problem with Microsoft Cluster Services (MSCS) and the IBM MPIO driver. Only requirement would be that I’d install the new storport.sys driver (version 5.2.3790.4021) first (as in Microsoft KB932755).

Now, yesterday I finished the zoning, did the mappings on the storage arrays and then figured the box should see the hard disks. So I started adding another node to our existing Microsoft Cluster.

Result: Zip (as in MSCS telling me not all nodes could see the quorum disk)

Reason: a combination of two things. First, said IBM Storage Manager RDAC. The first time I installed it, I forgot about the storage mappings, thus the box seeing zero disks. After uninstalling it, I was seeing 121 (that’s right, one hundred and twenty one) new devices.

Visible volumes previous to installing the RDAC driver

That is basically a result of the zoning I did for this particular device, which has *all* controllers present in a single SAN zone, thus the HBA’s seeing devices eight (or nine) times .. Update: yes, I’m missing one controller … 😀

Now, as I reinstalled the RDAC *after* the host discovered the volumes, it’s showing only a dozen drives.

Visible volumes after installing the RDAC driver

Now, as I figured this out, I told myself “Hey, adding the third node to the Windows Cluster should now work without a clue …” … guess what ?

It’s Microsoft and it doesn’t. Now why doesn’t it work ? ‘Cause the Cluster Setup Wizard is getting confused in Typical mode, as it’s creating a “local quorum disk” which naturally isn’t present in the cluster it’s joining. Now, switching the wizard to “Advanced (minimum) configuration” as suggested in Q331801, just works … *shrug*