XenServer 6.0.2: Fixing Root-Disk-Multipathing with Boot-from-SAN

As the title pretty much tells, I’ve been working on fixing the Root-Disk-Multipathing feature of our XenServer installations. Our XenServer boot from a HA-enabled NetApp controller, however we recently noticed that during a controller fail-over some, if not all, paths would go offline and never come back. If you do a cf takeover and cf giveback in short succession, you’ll end up with a XenServer host that is unusable, as the Root-Disk would be pretty much non-responsive.

Guessing from that, there don’t seem to be that many people using XenServer with Boot-from-SAN. Otherwise Citrix/NetApp would have fixed that by now…. Anyhow, I went around digging in our XenServer’s. What I already did, was adjust the /etc/multipath.conf according to a bug report (or TR-3373). For completeness sake I’ll list it here:

And as it turns out, this is the reason why we’re having such difficulties with the Multipathing. The information in TR-3373 is a bunch of BS (no, not everything but a single path is wrong, the getuid_callout) and thus the whole concept of Multipathing, Failover and High-Availibility (yeah, I know – if you want HA, don’t use XenServer :P) is gone.

Generate Nagios config for check_netapp-api.pl

As so often, I wanted a script, that’ll crawl my filers and regenerate the configuration if there are any new volumes/snapvaults/snapmirrors or if one of them has been removed.

Generate Nagios config for NetApp filers

At some point in the last few weeks, I repeatedly had to recreate my Nagios config for currently six filers. After doing that a few times, I ended up (like sooo often) writing a short Bash script, that’ll do this for me – without any fuss.

The only thing the script needs, is that the filers and the filers are registered in DNS … Here’s an example:

With that done, the script will create the necessary Nagios config for those filers.

Read More

NetApp: Establishing SnapMirror relationships

After figuring out the SnapVault stuff, I needed to implement a whole bunch of SnapMirror relations. As I am lazy (as in click-lazy), I ended up writing a somewhat short Bash script, that’ll either establish a bunch of SnapMirror relations (for a single host) or just for a single volume.

The script expects, that SSH public key authentification has been set up, and that the source for the SnapMirror exists and is online/not-restricted.

Read More

NetApp: SnapVault snapshot retention for non-standard snapshot names

Well, the name says it pretty much. Once you rename the snapshot on the SnapVault destination from daily.0 to something else, the whole builtin SnapVault snapshot retention isn’t gonna work anymore.

Back when I started all the code-writing, I wasn’t aware of this. One of my co-worker complained to me about it on Wednesday that there are an assfull of snapshots on the SnapVault destination (one snapshot each day since the end of October, meaning more than 50 snapshots per volume, in a total of 12 or so FlexVolumes, making the total about 500 snapshots).

So I took the time to write this little Bash script (yeah, I know I’m mixing a bunch of languages – I really like the KISS principle), which will get the necessary information from the filer (snapvault snap sched needs to be set) and then deletes the over-aged snapshots.

Read More

NetApp: Monitoring of SnapVault/SnapMirror/LUN/Snapshot information with Nagios

As I wrote before, we have a bunch of filers (and a ton of volumes w/ luns on them), that I need to monitor. At first, I tried the existing NetApp Nagios-Plugin(s), but they all use SNMP and with that I can either watch all volumes or none. And that didn’t satisfy me.

Don’t get me wrong, the existing plugins are okay and I still use them for stuff (like GLOBALSTATUS or FAN/CPU/POWER) which isn’t present in the API or real hard to get at, however I wanted more. So I ended up looking at the NetApp API, and ended up writing a “short” plugin for Nagios using Perl.

Maybe if I’m ever bored, I’ll rewrite it using C, but for now the Perl plugin has to suffice.

So far the plugin supports the following things:

  • Monitoring FlexVolumes (simply watching the free space)
  • Monitoring LUN space (the allocated space inside a FlexVolume for iSCSI/FC LUNs)
  • Monitoring Snapshot space (the allocated space inside a FlexVolume for Snapshots)
  • Monitoring SnapVault relations (and their age)
  • Monitoring SnapMirror relations (and their age)

The plugin will return performance data for most (if not all) of those classes. It needs a user on the filer you wish to monitor – which sadly needs to have the admin role.

Read More

NetApp: Archive SnapManager SQL snapshots

As I wrote before, we’re using SnapManager (for SQL/Oracle) to create consistent snapshots. However my database guys don’t want to name their snapshots daily.<increment> (which I can understand), as once you archive those snapshots to a secondary (and tertiary) system, the names become junk.

So, they’re naming the snapshots like snap__vcsrv_29_12_2012-10.00.01. Sadly, when it comes to SnapVault, it expects the names in form of daily.<increment> otherwise you won’t be able to transfer the snapshots with the CLI (none that I have found anyway).

But we didn’t want to move away from naming the snapshots the way they are, so I ended up writing a PowerShell script, that once triggered archives the Snapshots needed for a set of databases. It took me a while to figure a bunch of stuff out, but in the end I think I have a working way of archiving custom-named snapshots.

Read More