UCS 5108 power redundancy lost

Well, another day – another UCS error. Out of the blue, one of our chassis started displaying that one PSU had failed, however the UCS was showing no PSU had failed *shrug*

Well, as it turns out – this is yet another known bug in 2.0.2(r). You’ll either have to unplug and plug all the power cables (that’s four) in a maintainance window – or simply change the Equipment Power Policy (found in the Root of your UCS, tab Policy)

from “N+1” to “Non Redundant”; wait a minute or two till the error is “fixed” and then change it back to “N+1”. Problem solved for now … 😀

NetApp LUN creation/vol sizing

Well, as you might know I’ve been tinkering with a NetApp FAS at work. The last few months, I’ve been trying to figure out a few things, which I actually did.

One “error” I ran into with creating the lun’s and volumes by hand was that the volumes were running out of space. Even if the volume was a bit larger than the LUN. After that happened a few times, I decided to see how to fix that. As it turns out, the GUI “fixes” that already in a way I wouldn’t have expected.

The GUI wizard for creating a new LUN simply enlarges the hosting volume by three percent (that’s 3%!). So if you create a 300GiB LUN, the GUI will create a volume with 309GiB (well about that – the GUI calculates in KiB thus you’ll see something like 324009984k in the output of vol size).

I also wrote a short script, which will sum up the space of all LUNs contained inside a volume and then based on your snap reserve and the actual LUN space give you the current vol size and the vol size it should be. I’ll post the script later on.

UCS 5108: VIF down

Well, I have yet another weird UCS problem. I have a single blade, that has trouble with it’s primary fabric attachment.

UCS 5108: Server overview
UCS 5108: Error Message

The problem get’s even more weird, if you look at the details.

UCS 5108: VIF Paths

After looking at the IO modules, the error doesn’t become any clearer:

UCS 5108: IOM port states

So far, I have tried nearly everything. I’ve tried resetting the active and passive Connectivity of the vNIC, I tried resetting the DCE adapter for the vNIC, but nothing. I even tried resetting the vHBA that’s associated with this fabric, but that didn’t result to anything. Not even the usual flogi (fibre channel login) errors, that you get when either booting/resetting the blade.

Well, I opened a TAC case and the Cisco engineer looked over the CLI of my fabric interconnects, hacked away at the thousand logs the UCS keeps, and asked me if I could switch the blade to another slot. However, since I don’t have any slots to spare – the five chassis are full – he said he’d start the RMA process for the MK81R mezzanine card.

The new MK81R arrived a few days later, however the issue still persisted. So the TAC engineer suggested I’d pull the IO module and plug in a “new” one (one that I knew was working). So I picked a Friday afternoon for the maintainance window (since I didn’t know if the blades would survive a IO module failover) and pulled one from my newly arrived chassis and reseated the one I knew to be faulty.

Guess what: after swapping the IO modules, the error is completely gone …. *shrug* I don’t have a clue how this error happened or why it was fixed by pulling and plugging the IO module. Guess another error for the odd category.

UCS 5108: Power problem

Well, I recently had yet another UCS display/I2C communication problem. Somehow one of my chassis’ started to think, that the power redundancy was lost.

F0408: Power redundancy failed

After looking at it a bit deeper, it seems only the GUI or the chassis did notice this power glitch:

UCS 5108: Power status

As you can see, all PSU’s still have power. Now, since I had a big maintainance window the last weekend anyhow (and I spent ~14 hours at work), I decided to restart the IO modules in that chassis. And guess what: The error is gone! Another weird I2C communication issue with the firmware release 2.0.2 …