Cisco – BAFM

MDS9100 firmware updates – generating copy commands

October 20, 2013October 20, 2013 Christian Leave a comment

Well, I went to work today … yeah, I know it’s Sunday right ? I ended up updating two MDS9148 switches and I didn’t want to figure out everything all over again. So I put the system image and kickstart onto one of our FTP servers and ran a short bash loop on it:

root:(ftp.daheim.heimdaheim.de) PWD:~
Sun Oct 20, 08:57:24 [0] > for file in `ls /srv/ftp/firmware/mds9148/*6.2.3*`; do
echo "copy ftp://10.0.0.55/`echo $file | sed 's,/srv/ftp/,,'` bootflash:/`echo $file | sed 's,/srv/ftp/firmware/mds9148/,,'`"
done

root:(ftp.daheim.heimdaheim.de) PWD:~

Sun Oct 20, 08:57:24 [0] > for file in `ls /srv/ftp/firmware/mds9148/*6.2.3*`; do

echo "copy ftp://10.0.0.55/`echo $file | sed 's,/srv/ftp/,,'` bootflash:/`echo $file | sed 's,/srv/ftp/firmware/mds9148/,,'`"

done

Now that’ll generate me two lines, which in turn I can use on the MDS’n:

copy ftp://10.0.0.55/firmware/mds9148/m9100-s3ek9-kickstart-mz.6.2.3.bin bootflash:/m9100-s3ek9-kickstart-mz.6.2.3.bin
copy ftp://10.0.0.55/firmware/mds9148/m9100-s3ek9-mz.6.2.3.bin bootflash:/m9100-s3ek9-mz.6.2.3.bin

1 2	copy ftp://10.0.0.55/firmware/mds9148/m9100-s3ek9-kickstart-mz.6.2.3.bin bootflash:/m9100-s3ek9-kickstart-mz.6.2.3.bin copy ftp://10.0.0.55/firmware/mds9148/m9100-s3ek9-mz.6.2.3.bin bootflash:/m9100-s3ek9-mz.6.2.3.bin

UCS Manager 2.0.2r KVM bug

April 28, 2013April 28, 2013 Christian Leave a comment

Well, we’ve been battling with a KVM bug in our UCS installation, that’s been driving me (and apparently the Cisco L3 support and development) nuts. But lets back up a bit. If you’ve worked with UCS before, once you open up the KVM console you’ll see the KVM and a shortcut commands (Shutdown, Reset) and another tab that allows you to mount virtual media.

Once you open it up, it should look like this:

Now, when we re-installed some of our servers (mostly the XenServer’s) and out of a sudden the KVM virtual media didn’t work for some reason. The UCS KVM would suddenly reject us from switching to the virtual media tab, saying that either the Login timed out or we’d have the wrong user and/or password, even if we tried with the most powerful user the UCS has, the local admin account.

UCS Manager - KVM virtual media tab rejecting authentification — UCS Manager – KVM virtual media tab rejecting authentification

So I opened a TAC, and Cisco got to work on it immediately. After poking around in the depths of the fabric interconnect with a dplug extension from Cisco with a Cisco L3 guy, and after about two months of development I just got a call back from the Cisco support guy. Apparently development figured out why we’d get the above error message.

Once you put a hash tag (#) in the Service Profiles User Label you’d get the error message.

UCS Manager - User Label — UCS Manager – User Label

Once I removed the hash tag, the KVM started working like it’s supposed to do. So if anyone ever comes across this, that’s your solution. Apparently Cisco is going to fix this in an upcoming release, but just removing the hash tag and everything is fine.

MDS9000: Setting summer time for CET

October 17, 2012June 21, 2013 Christian Leave a comment

After rebuilding two MDS9148, I wanted them to correctly switch the summer/winter time for my time zone. Currently I’m in CET (or CEST during the summer), so I googled for that. The search came up with Cisco-FAQ, however that needed a slight adjustment.

Apparently the NXOS doesn’t support the feature “recurring” in the clock configuration. So I had to slightly adapt the configuration line:

clock timezone MET 2 0
clock summer-time MET 5 Sun Mar 02:00 5 Sun Oct 03:00 60

1 2	clock timezone MET 2 0 clock summer-time MET 5 Sun Mar 02:00 5 Sun Oct 03:00 60

UCS 5108 power redundancy lost

September 20, 2012June 21, 2013 Christian 2 Comments

Well, another day – another UCS error. Out of the blue, one of our chassis started displaying that one PSU had failed, however the UCS was showing no PSU had failed *shrug*

Well, as it turns out – this is yet another known bug in 2.0.2(r). You’ll either have to unplug and plug all the power cables (that’s four) in a maintainance window – or simply change the Equipment Power Policy (found in the Root of your UCS, tab Policy)

from “N+1” to “Non Redundant”; wait a minute or two till the error is “fixed” and then change it back to “N+1”. Problem solved for now … 😀

UCS 5108: VIF down

September 17, 2012June 21, 2013 Christian Leave a comment

Well, I have yet another weird UCS problem. I have a single blade, that has trouble with it’s primary fabric attachment.

The problem get’s even more weird, if you look at the details.

After looking at the IO modules, the error doesn’t become any clearer:

So far, I have tried nearly everything. I’ve tried resetting the active and passive Connectivity of the vNIC, I tried resetting the DCE adapter for the vNIC, but nothing. I even tried resetting the vHBA that’s associated with this fabric, but that didn’t result to anything. Not even the usual flogi (fibre channel login) errors, that you get when either booting/resetting the blade.

Well, I opened a TAC case and the Cisco engineer looked over the CLI of my fabric interconnects, hacked away at the thousand logs the UCS keeps, and asked me if I could switch the blade to another slot. However, since I don’t have any slots to spare – the five chassis are full – he said he’d start the RMA process for the MK81R mezzanine card.

The new MK81R arrived a few days later, however the issue still persisted. So the TAC engineer suggested I’d pull the IO module and plug in a “new” one (one that I knew was working). So I picked a Friday afternoon for the maintainance window (since I didn’t know if the blades would survive a IO module failover) and pulled one from my newly arrived chassis and reseated the one I knew to be faulty.

Guess what: after swapping the IO modules, the error is completely gone …. *shrug* I don’t have a clue how this error happened or why it was fixed by pulling and plugging the IO module. Guess another error for the odd category.

UCS 5108: Power problem

September 4, 2012June 21, 2013 Christian Leave a comment

Well, I recently had yet another UCS display/I2C communication problem. Somehow one of my chassis’ started to think, that the power redundancy was lost.

After looking at it a bit deeper, it seems only the GUI or the chassis did notice this power glitch:

As you can see, all PSU’s still have power. Now, since I had a big maintainance window the last weekend anyhow (and I spent ~14 hours at work), I decided to restart the IO modules in that chassis. And guess what: The error is gone! Another weird I2C communication issue with the firmware release 2.0.2 …

UCS blades w/ Boot-from-SAN and AutoYaST

March 3, 2012August 8, 2014 Christian 1 Comment

As I wrote before about enabling multipathing for the AutoYaST installation it’s about time I write this one here.

Sadly AutoYaST needs a little push in the right direction (as to where to actually put the root device), so here’s part of my AutoYaST profile for such a Cisco blade:

<profile xmlns="http://www.suse.com/1.0/yast2ns" xmlns:config="http://www.suse.com/1.0/configns">

	<bootloader>
		<device_map config:type="list">
			<device_map_entry>
				<firmware>hd0</firmware>
				<linux>/dev/sda</linux>
			</device_map_entry>
		</device_map>
	</bootloader>

	<partitioning config:type="list">
		<drive>
			<device>/dev/sda</device>
		</drive>
	</partitioning>

	<scripts>
		<pre-scripts config:type="list">

			<script>
				<debug config:type="boolean">false</debug>
				<feedback config:type="boolean">false</feedback>
				<filename>config-ucs.sh</filename>
				<interpreter>shell</interpreter>
				<source><![CDATA[
cat /tmp/profile/autoinst.xml | sed "s,/dev/sda,/dev/mapper/`/sbin/multipath -ll | grep dm-0 | cut -d  -f1`," > /tmp/profile/modified.xml
]]>
				</source>
			</script>

		</pre-scripts>

		<chroot-scripts config:type="list">

			<script>
				<chrooted config:type="boolean">true</chrooted>
				<debug config:type="boolean">true</debug>
				<feedback config:type="boolean">true</feedback>
				<filename>config-ucs-chroot.sh</filename>
				<interpreter>shell</interpreter>
				<location>http://install.home.barfoo.org/autoyast/scripts/config-ucs-chroot.sh</location>
			</script>

		</chroot-scripts>
	</scripts>

	<software>
		<packages config:type="list">
			<package>multipath-tools</package>
		</packages>
	</software>

</profile>

<device_map config:type="list">

<device_map_entry>

</device_map_entry>

</device_map>

</bootloader>

<drive>

</drive>

</partitioning>

<pre-scripts config:type="list">

<debug config:type="boolean">false</debug>

<feedback config:type="boolean">false</feedback>

<filename>config-ucs.sh</filename>

<interpreter>shell</interpreter>

<source><![CDATA[

cat /tmp/profile/autoinst.xml | sed "s,/dev/sda,/dev/mapper/`/sbin/multipath -ll | grep dm-0 | cut -d -f1`," > /tmp/profile/modified.xml

]]>

</source>

</script>

</pre-scripts>

<chroot-scripts config:type="list">

<filename>config-ucs-chroot.sh</filename>

<interpreter>shell</interpreter>

<location>http://install.home.barfoo.org/autoyast/scripts/config-ucs-chroot.sh</location>

</script>

</chroot-scripts>

</scripts>

<package>multipath-tools</package>

</packages>

</software>

</profile>

Now, the profile addition takes care of the placement of the root-device now (simply parses multipath -ll) and adjusts the pulled profile accordingly (/tmp/profile/modified.xml), which AutoYaST then re-reads.

Now, after installing the system, it’s gonna come up broken and shitty. That’s what the chroot-script above is for. This script looks like this (the original idea was here, look through the attachments):

#!/bin/bash

echo "defaults {
	user_friendly_names yes
	bindings_file /etc/multipath_bindings
}" > /etc/multipath.conf

sleep 1

# Fix wrong root-path in /etc/fstab
sed -i 's/mapper/.*_part/disk/by-id/scsi-mpatha-part/' /etc/fstab

# Fix grub root-path and wrong root-partition
sed -i -e 's/mapper/.*_part/disk/by-id/scsi-mpatha-part/' 
	-e 's,scsi-mpatha-part2,scsi-mpatha-part3,' /boot/grub/menu.lst

# Fix the device.map
echo -e "(hd0)t$( ls /dev/disk/by-id/scsi-* | grep -v part )" > /boot/grub/device.map

sleep 1
mkinitrd -f multipath

#!/bin/bash

echo "defaults {

user_friendly_names yes

bindings_file /etc/multipath_bindings

}" > /etc/multipath.conf

sleep 1

# Fix wrong root-path in /etc/fstab

sed -i 's/mapper/.*_part/disk/by-id/scsi-mpatha-part/' /etc/fstab

# Fix grub root-path and wrong root-partition

sed -i -e 's/mapper/.*_part/disk/by-id/scsi-mpatha-part/'

-e 's,scsi-mpatha-part2,scsi-mpatha-part3,' /boot/grub/menu.lst

# Fix the device.map

echo -e "(hd0)t$( ls /dev/disk/by-id/scsi-* | grep -v part )" > /boot/grub/device.map

sleep 1

mkinitrd -f multipath

What the script does, is 1) fix the occurances of /dev/mapper/, since this isn’t the proper way 2) create a multipathing initrd. Without the creation of the multipath-aware initrd the system will also not boot!

Nagios: Integrating Cisco switches

February 13, 2009June 21, 2013 Christian 1 Comment

Well, as I wrote recently, we received a new BladeCenter a few weeks back. Now, as we slowly take it into service I was interested in watching the utilization of the back planes as well as the CPU utilization of the Cisco Catalyst 3012 network switches.

The first mistake I made, was to trust Cisco with their guide about how to get the utilization from the device using SNMP. They stated some OID’s, which I tried with snmpwalk and got a result from.

; html-script: false ]snmpwalk -v1 -c public -O n 10.0.0.35 .1.3.6.1.4.1.9.5.1.1.8
.1.3.6.1.4.1.9.5.1.1.8.0 = INTEGER: 0

1 2	; html-script: false ]snmpwalk -v1 -c public -O n 10.0.0.35 .1.3.6.1.4.1.9.5.1.1.8 .1.3.6.1.4.1.9.5.1.1.8.0 = INTEGER: 0

Now, as I tried retrieving the SNMP data by means of the check_snmp plugin, I got some flaky results:

; html-script: false ]/usr/lib/nagios/plugins/check_snmp -H 10.0.0.35 -C public 
                                   .1.3.6.1.4.1.9.5.1.1.8
SNMP problem - No data received from host
CMD: /usr/bin/snmpget -t 1 -r 5 -m &#039;&#039; -v 1 [authpriv] 10.0.0.35:161

; html-script: false ]/usr/lib/nagios/plugins/check_snmp -H 10.0.0.35 -C public

.1.3.6.1.4.1.9.5.1.1.8

SNMP problem - No data received from host

CMD: /usr/bin/snmpget -t 1 -r 5 -m '' -v 1 [authpriv] 10.0.0.35:161

Those of you, who read the excerpts carefully will notice the difference between snmpwalk and the OID I passed on to check_snmp.

The point being, the OID’s Cisco gave in their Design tech notes are either old, or just not accurate at all. After passing on the .0 to each value given by Cisco, the check_snmp is all honky dory and integrated into Nagios.

As usual, the Nagios definitions are further down, for those interested. Read More

Setting up the BladeCenter H

January 28, 2009June 21, 2013 Christian 2 Comments

Well, we finally had our maintenance window today, in which we planned the hardware exchange for our current Dell Blade Chassis (don’t ask!). The exchange went fine, but as we started exploring the components (like the IBM BladeCenter SAN switches — which are in fact Cisco MDS 9100) we hit a few road blocks.

First, the default user name/password combo for the Cisco MDS 9100 for the BladeCenter is USERID/PASSW0RD (just as the rest of the password combinations).

Next, we started tinkering around with the Catalyst Switch modules. A hint to myself:

Whenever setting up the switch via the WebGUI, make sure you setup both passwords. The password for the switch itself (when prompted by the WebGUI, enter “admin” as well as the password you just entered.

Now, you should be able to connect to the switch with telnet and be able to access the EXEC mode (and unlike me who struggled ~30 minutes till one of my trainees told me to enter a switch password — out of curiosity).

Now, here the list of commands I needed to setup the switch’s “basics”:

service password-encryption
username admin privilege 15 password 0 &lt;password&gt;
clock timezone CET 1
clock summer-time CET recurring last Sun Mar 2:00 last Sun Oct 3:00
ip domain-name home.barfoo.org
ip name-server 10.0.0.2
ip name-server 10.0.0.1
line con 0
 login local
line vty 0 4
 password &lt;password&gt;
 login local
line vty 5 15
 password &lt;password&gt;
 login
ntp server 160.45.10.8

service password-encryption

username admin privilege 15 password 0 <password>

clock timezone CET 1

clock summer-time CET recurring last Sun Mar 2:00 last Sun Oct 3:00

ip domain-name home.barfoo.org

ip name-server 10.0.0.2

ip name-server 10.0.0.1

line con 0

line vty 0 4

password <password>

line vty 5 15

password <password>

ntp server 160.45.10.8

Shibboleth (WTF is that?)

February 28, 2007June 21, 2013 Christian Leave a comment

OK, I’m sitting now again in train (hrm, I get the feeling I’ve done that already in the last few days – oh wait, I was doing that just on Monday) this time to Berlin.

My boss ordered me to attend a workshop covering the implementation of Shibboleth (for those of you, who can’t associate anything with that term – it’s an implementation for single sign-on, also covering distributed authorization and authentication) somewhere in Berlin Spandau (Evangelisches Johannesstift Berlin).

Yesterday was quite amazing workwise, we lifted the 75kg Blade Chassis into the rack (*yuck* there was a time I was completely against Dell stuff, but recently that has changed), plugged all four C22 plugs into the rack’s PDU’s and into the chassis, patched the management interface (which is *waaay* to slow for a dedicated management daughter board) and for the first time started the chassis. *ugh* That scared me .. that wasn’t noise like a xSeries or any other rack-based server we have around, more like a starting airplane. You can literally stand in behind of the chassis, and get your hairs dried (if you need to). So I looked at the blades together with my co-worker and we figured, that they don’t have any coolers anymore, they are just using the cooling the chassis provides.

Another surprise awaited us, when we thought, we could use the integrated switch to provide network for both integrated network cards (Broadcome NetExtreme II). *sigh* You need two seperate switches to serve two network cards, even if you only have two blades in the chassis (which provides space for 10 blades). *sigh* That really sucks, but its the same with the FC stuff …

So, we are waiting yet again for Dell to make us an offer, and on top of that, the sales representative doesn’t have the slightest idea if the FC passthrough module includes SFP’s or not … *yuck*

I must say, I’m impressed by the Dell hardware, but I’m really disappointed by their sales representative.