NetApp – BAFM

XenServer 6.0.2: Fixing Root-Disk-Multipathing with Boot-from-SAN

July 16, 2013July 17, 2013 Christian Leave a comment

As the title pretty much tells, I’ve been working on fixing the Root-Disk-Multipathing feature of our XenServer installations. Our XenServer boot from a HA-enabled NetApp controller, however we recently noticed that during a controller fail-over some, if not all, paths would go offline and never come back. If you do a cf takeover and cf giveback in short succession, you’ll end up with a XenServer host that is unusable, as the Root-Disk would be pretty much non-responsive.

Guessing from that, there don’t seem to be that many people using XenServer with Boot-from-SAN. Otherwise Citrix/NetApp would have fixed that by now…. Anyhow, I went around digging in our XenServer’s. What I already did, was adjust the /etc/multipath.conf according to a bug report (or TR-3373). For completeness sake I’ll list it here:

# Multipathing configuration for XenServer on NetApp ALUA
# enabled storage.
# TR-3732, revision 5

defaults {
        user_friendly_names no
        queue_without_daenon no
        flush_on_last_del yes
}

## some vendor specific modifications
devices {
        device {
                vendor "NETAPP"
                product "LUN"
                path_grouping_policy group_by_prio
                features "1 queue_if_no_path"
                getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
                prio_callout "/sbin/mpath_prio_alua /dev/%n"
                path_checker directio
                failback immediate
                hardware handler "0"
                rr_weight uniform
                rr_min_io 128
        }
}

# Multipathing configuration for XenServer on NetApp ALUA

# enabled storage.

# TR-3732, revision 5

defaults {

user_friendly_names no

queue_without_daenon no

flush_on_last_del yes

}

## some vendor specific modifications

devices {

device {

vendor "NETAPP"

product "LUN"

path_grouping_policy group_by_prio

features "1 queue_if_no_path"

getuid_callout "/sbin/scsi_id -g -u -s /block/%n"

prio_callout "/sbin/mpath_prio_alua /dev/%n"

path_checker directio

failback immediate

hardware handler "0"

rr_weight uniform

rr_min_io 128

}

And as it turns out, this is the reason why we’re having such difficulties with the Multipathing. The information in TR-3373 is a bunch of BS (no, not everything but a single path is wrong, the getuid_callout) and thus the whole concept of Multipathing, Failover and High-Availibility (yeah, I know – if you want HA, don’t use XenServer :P) is gone.

Dealing with SnapVault replication issues

April 9, 2013April 9, 2013 Christian Leave a comment

Well, for the past two months I had a case open with NetApp to figure out this SnapVault replication issue we were seeing. The initial transfer of the SnapVault relation would complete with a hick up, manual snapshot transfers also work – just the scheduled, auto-created Snapshots won’t replicate.

At first I (and the NetApp support) thought this was an issue with SnapVault itself, however after being away for the last four weeks I looked at the issue with fresh eyes. After a short peek into the logs, I found what I had found back when I first looked into this.

Mon Apr  1 21:16:46 CEST [fas01: lun.offline:warning]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.aux has been taken offline
Mon Apr  1 21:16:46 CEST [fas01: lun.offline:warning]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.rws has been taken offline
Mon Apr  1 23:06:35 CEST [fas01: rpl.src.lun.invalid_clone:error]: Replication source transfer failed due to an invalid LUN clone with fileid 32466 in snapid 67 in volume flex_windows_boot.
Mon Apr  1 23:06:35 CEST [fas01: replication.src.err:error]: SnapVault: source transfer from /vol/flex_windows_boot/sv/ to fas03:/vol/flex_windows_boot/sv : replication source found an invalid lun clone.
Mon Apr  1 23:08:35 CEST [fas01: rpl.src.lun.invalid_clone:error]: Replication source transfer failed due to an invalid LUN clone with fileid 32466 in snapid 67 in volume flex_windows_boot.
Mon Apr  1 23:08:36 CEST [fas01: replication.src.err:error]: SnapVault: source transfer from /vol/flex_windows_boot/sv/ to fas03:/vol/flex_windows_boot/sv : replication source found an invalid lun clone.

Mon Apr 1 21:16:46 CEST [fas01: lun.offline:warning]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.aux has been taken offline

Mon Apr 1 21:16:46 CEST [fas01: lun.offline:warning]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.rws has been taken offline

Mon Apr 1 23:06:35 CEST [fas01: rpl.src.lun.invalid_clone:error]: Replication source transfer failed due to an invalid LUN clone with fileid 32466 in snapid 67 in volume flex_windows_boot.

Mon Apr 1 23:06:35 CEST [fas01: replication.src.err:error]: SnapVault: source transfer from /vol/flex_windows_boot/sv/ to fas03:/vol/flex_windows_boot/sv : replication source found an invalid lun clone.

Mon Apr 1 23:08:35 CEST [fas01: rpl.src.lun.invalid_clone:error]: Replication source transfer failed due to an invalid LUN clone with fileid 32466 in snapid 67 in volume flex_windows_boot.

Mon Apr 1 23:08:36 CEST [fas01: replication.src.err:error]: SnapVault: source transfer from /vol/flex_windows_boot/sv/ to fas03:/vol/flex_windows_boot/sv : replication source found an invalid lun clone.

SnapVault would create the daily snapshot on the SnapVault Primary and start the replication. However something (or someone, wasn’t clear at this point) then created a FlexClone of a volume … And as, back when we first encountered this, I was kinda puzzled.

But then I decided (please don’t ask me what made me look there) to look at the logs of the NetApp Filer on our logserver. As it turns out, back when I enabled syslogging to an external logserver I seem to have enabled debug logging … and it was great to have that! Below you’ll find the log I found – and as you can see there’s at least a clue as to from where that ghost snapshot is coming from.

Apr  1 21:16:46 fas01 [fas01: lun.offline:warning]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.aux has been taken offline
Apr  1 21:16:46 fas01 [fas01: lun.destroy:info]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.aux destroyed
Apr  1 21:16:46 fas01 [fas01: lun.offline:warning]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.rws has been taken offline
Apr  1 21:16:48 fas01 [fas01: lun.map:info]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.rws was mapped to initiator group viaRPC.20:00:00:25:b5:02:0a:4c.b230-5=3
Apr  1 21:16:48 fas01 [fas01: lun.map:info]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.rws was mapped to initiator group viaRPC.20:00:00:25:b5:02:0b:4c.b230-5=3
Apr  1 22:05:31 fas01 [fas01: lun.map.unmap:info]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.rws unmapped from initiator group viaRPC.20:00:00:25:b5:02:0a:4c.b230-5
Apr  1 22:05:31 fas01 [fas01: lun.map.unmap:info]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.rws unmapped from initiator group viaRPC.20:00:00:25:b5:02:0b:4c.b230-5
Apr  1 22:05:35 fas01 [fas01: lun.destroy:info]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.rws destroyed
Apr  1 22:05:37 fas01 [fas01: wafl.snap.delete:info]: Snapshot copy {25b8da84-9351-4f20-987c-e7b02d76f15e} on volume flex_windows_boot NetApp was deleted by the Data ONTAP function zapi_snapshot_
delete. The unique ID for this Snapshot copy is (64, 3471969).
Apr  1 23:06:35 fas01 [fas01: rpl.src.lun.invalid_clone:error]: Replication source transfer failed due to an invalid LUN clone with fileid 32466 in snapid 67 in volume flex_windows_boot.
Apr  1 23:06:35 fas01 [fas01: snapdiff.abnormal.abort:debug]: Encountered unexpected error while computing differences between Snapshot copies.
Apr  1 23:06:35 fas01 [fas01: replication.src.err:error]: SnapVault: source transfer from /vol/flex_windows_boot/sv/ to fas03:/vol/flex_windows_boot/sv : replication source found an invalid
lun clone.
Apr  1 23:08:35 fas01 [fas01: rpl.src.lun.invalid_clone:error]: Replication source transfer failed due to an invalid LUN clone with fileid 32466 in snapid 67 in volume flex_windows_boot.
Apr  1 23:08:35 fas01 [fas01: snapdiff.abnormal.abort:debug]: Encountered unexpected error while computing differences between Snapshot copies.
Apr  1 23:08:36 fas01 [fas01: replication.src.err:error]: SnapVault: source transfer from /vol/flex_windows_boot/sv/ to fas03:/vol/flex_windows_boot/sv : replication source found an invalid
lun clone.

Apr 1 21:16:46 fas01 [fas01: lun.offline:warning]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.aux has been taken offline

Apr 1 21:16:46 fas01 [fas01: lun.destroy:info]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.aux destroyed

Apr 1 21:16:46 fas01 [fas01: lun.offline:warning]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.rws has been taken offline

Apr 1 21:16:48 fas01 [fas01: lun.map:info]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.rws was mapped to initiator group viaRPC.20:00:00:25:b5:02:0a:4c.b230-5=3

Apr 1 21:16:48 fas01 [fas01: lun.map:info]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.rws was mapped to initiator group viaRPC.20:00:00:25:b5:02:0b:4c.b230-5=3

Apr 1 22:05:31 fas01 [fas01: lun.map.unmap:info]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.rws unmapped from initiator group viaRPC.20:00:00:25:b5:02:0a:4c.b230-5

Apr 1 22:05:31 fas01 [fas01: lun.map.unmap:info]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.rws unmapped from initiator group viaRPC.20:00:00:25:b5:02:0b:4c.b230-5

Apr 1 22:05:35 fas01 [fas01: lun.destroy:info]: LUN /vol/flex_windows_boot/sv/{6f3899ab-6c8a-402e-bbb7-d7b7298d254f}.rws destroyed

Apr 1 22:05:37 fas01 [fas01: wafl.snap.delete:info]: Snapshot copy {25b8da84-9351-4f20-987c-e7b02d76f15e} on volume flex_windows_boot NetApp was deleted by the Data ONTAP function zapi_snapshot_

delete. The unique ID for this Snapshot copy is (64, 3471969).

Apr 1 23:06:35 fas01 [fas01: rpl.src.lun.invalid_clone:error]: Replication source transfer failed due to an invalid LUN clone with fileid 32466 in snapid 67 in volume flex_windows_boot.

Apr 1 23:06:35 fas01 [fas01: snapdiff.abnormal.abort:debug]: Encountered unexpected error while computing differences between Snapshot copies.

Apr 1 23:06:35 fas01 [fas01: replication.src.err:error]: SnapVault: source transfer from /vol/flex_windows_boot/sv/ to fas03:/vol/flex_windows_boot/sv : replication source found an invalid

lun clone.

Apr 1 23:08:35 fas01 [fas01: rpl.src.lun.invalid_clone:error]: Replication source transfer failed due to an invalid LUN clone with fileid 32466 in snapid 67 in volume flex_windows_boot.

Apr 1 23:08:35 fas01 [fas01: snapdiff.abnormal.abort:debug]: Encountered unexpected error while computing differences between Snapshot copies.

Apr 1 23:08:36 fas01 [fas01: replication.src.err:error]: SnapVault: source transfer from /vol/flex_windows_boot/sv/ to fas03:/vol/flex_windows_boot/sv : replication source found an invalid

lun clone.

Now, with knowing from which corner this issue originated it dawned on me, we have had a similar issue before. A quick peek into TSM Manager and I knew I was on the right track. The daily system backup starts around 21:15. Now our TSM backup includes the System State backup (which in turn utilizes VSS – which triggers the NetApp Snapshot!).

After excluding the System State from the Daily Backup the SnapVault stuff worked without a hickup. I ended up removing SnapDrive from the Server in question, since we don’t really need it there. Snapshots created from SnapDrive of the boot lun are gonna be inconsistent anyhow (doesn’t matter if I do ’em from SnapDrive or the NetApp CLI).

That restored the default VSS handler, which enables TSM to backup the System State again.

Generate Nagios config for check_netapp-api.pl

February 5, 2013June 21, 2013 Christian Leave a comment

As so often, I wanted a script, that’ll crawl my filers and regenerate the configuration if there are any new volumes/snapvaults/snapmirrors or if one of them has been removed.

#!/bin/bash

FAS_HOSTS="$( ls /etc/nagios/objects/hosts/san/fas*{a,b}.cfg | cut -d/ -f7 | cut -d. -f1 )"

for host in $FAS_HOSTS; do
	OUTPUT_FILE=/etc/nagios/objects/hosts/san/$host-vol.cfg

	# Clear the output file
	echo "" > $OUTPUT_FILE

	# Get the volume list
	for volume in `ssh $host vol status | awk '{ print $1 }' | grep ^vol | sort -u | grep -v vol0$`; do
		user="$( grep "USER=" /etc/netapp-sdk/$host | cut -d= -f2 )"
		pass="$( grep "PASS=" /etc/netapp-sdk/$host | cut -d= -f2 )"
#		echo "define service {"
#		echo "	use				generic-service"
#		echo ""
#		echo "	check_command			check_netapp-volfree!$user!$pass!${volume}!92!98"
#		echo "	check_interval			5"
#		echo "	host_name			${host}"
#		echo "	notifications_enabled		0"
#		echo "	notification_interval		720"
#		echo "	service_description		VOLSPACE ${volume}"
#		echo "}"
		echo
		echo "define service {"
		echo "	use				generic-service-san-perfdata"
		echo ""
		echo "	check_command			check_netapp-lunspace!$user!$pass!${volume}"
		echo "	check_interval			5"
		echo "	host_name			${host}"
		echo "	notifications_enabled		0"
		echo "	notification_interval		720"
		echo "	service_description		LUNSPACE ${volume}"
		echo "}"
		echo

		SR="$( ssh $host snap reserve $volume | cut -d  -f7 )"
		if [ "$SR" != "0%" ] ; then
			echo "define service {"
			echo "	use				generic-service-san-perfdata"
			echo ""
			echo "	check_command			check_netapp-snapreserve!$user!$pass!${volume}"
			echo "	check_interval			10"
			echo "	host_name			${host}"
			echo "	notifications_enabled		0"
			echo "	notification_interval		720"
			echo "	# SR:				$SR"
			echo "	service_description		SNAPRESERVE ${volume}"
			echo "}"
			echo
		fi
	done | tee -a $OUTPUT_FILE

	# Check snapvault foo
	for sv in `ssh $host snapvault status -l 2>/dev/null | awk '{ print $2 }' | grep vol`; do
		# only do the checks on sv_secondary
		if [ "$( echo $sv | grep $host | cut -d: -f1 )" == "${host}" ]; then
			vol="$( echo $sv | cut -d/ -f3 )"
			user="$( grep "USER=" /etc/netapp-sdk/$host | cut -d= -f2 )"
			pass="$( grep "PASS=" /etc/netapp-sdk/$host | cut -d= -f2 )"
			echo "define service {"
			echo "	use				generic-service-san-perfdata"
			echo ""
			echo "	check_command			check_netapp-snapvault!$user!$pass!$vol!38!42!"
			echo "	check_interval			60"
			echo "	host_name			${host}"
			echo "	notifications_enabled		0"
			echo "	notification_interval		720"
			echo "	service_description		SNAPVAULT ${vol}"
			echo "}"
			echo
		fi
	done | tee -a $OUTPUT_FILE

	# Check snapmirror foo
	for sm in `ssh $host snapmirror status 2>/dev/null | awk '{ print $2 }' | grep vol | grep $host`; do
		# only do the checks on sm_secondary
		if [ "$( echo $sm | grep $host | cut -d: -f1 )" == "${host}" ]; then
			vol="$( echo $sm | cut -d/ -f3 | cut -d: -f2 )"
			user="$( grep "USER=" /etc/netapp-sdk/$host | cut -d= -f2 )"
			pass="$( grep "PASS=" /etc/netapp-sdk/$host | cut -d= -f2 )"
			echo "define service {"
			echo "	use				generic-service-san-perfdata"
			echo ""
			echo "	check_command			check_netapp-snapmirror!$user!$pass!$vol!38!42!"
			echo "	check_interval			60"
			echo "	host_name			${host}"
			echo "	notifications_enabled		0"
			echo "	notification_interval		720"
			echo "	service_description		SNAPMIRROR ${vol}"
			echo "}"
			echo
		fi
	done | tee -a $OUTPUT_FILE
done

#!/bin/bash

FAS_HOSTS="$( ls /etc/nagios/objects/hosts/san/fas*{a,b}.cfg | cut -d/ -f7 | cut -d. -f1 )"

for host in $FAS_HOSTS; do

OUTPUT_FILE=/etc/nagios/objects/hosts/san/$host-vol.cfg

# Clear the output file

echo "" > $OUTPUT_FILE

# Get the volume list

for volume in `ssh $host vol status | awk '{ print $1 }' | grep ^vol | sort -u | grep -v vol0$`; do

user="$( grep "USER=" /etc/netapp-sdk/$host | cut -d= -f2 )"

pass="$( grep "PASS=" /etc/netapp-sdk/$host | cut -d= -f2 )"

# echo "define service {"

# echo " use generic-service"

# echo ""

# echo " check_command check_netapp-volfree!$user!$pass!${volume}!92!98"

# echo " check_interval 5"

# echo " host_name ${host}"

# echo " notifications_enabled 0"

# echo " notification_interval 720"

# echo " service_description VOLSPACE ${volume}"

# echo "}"

echo

echo "define service {"

echo " use generic-service-san-perfdata"

echo ""

echo " check_command check_netapp-lunspace!$user!$pass!${volume}"

echo " check_interval 5"

echo " host_name ${host}"

echo " notifications_enabled 0"

echo " notification_interval 720"

echo " service_description LUNSPACE ${volume}"

echo "}"

echo

SR="$( ssh $host snap reserve $volume | cut -d -f7 )"

if [ "$SR" != "0%" ] ; then

echo "define service {"

echo " use generic-service-san-perfdata"

echo ""

echo " check_command check_netapp-snapreserve!$user!$pass!${volume}"

echo " check_interval 10"

echo " host_name ${host}"

echo " notifications_enabled 0"

echo " notification_interval 720"

echo " # SR: $SR"

echo " service_description SNAPRESERVE ${volume}"

echo "}"

echo

done | tee -a $OUTPUT_FILE

# Check snapvault foo

for sv in `ssh $host snapvault status -l 2>/dev/null | awk '{ print $2 }' | grep vol`; do

# only do the checks on sv_secondary

if [ "$( echo $sv | grep $host | cut -d: -f1 )" == "${host}" ]; then

vol="$( echo $sv | cut -d/ -f3 )"

user="$( grep "USER=" /etc/netapp-sdk/$host | cut -d= -f2 )"

pass="$( grep "PASS=" /etc/netapp-sdk/$host | cut -d= -f2 )"

echo "define service {"

echo " use generic-service-san-perfdata"

echo ""

echo " check_command check_netapp-snapvault!$user!$pass!$vol!38!42!"

echo " check_interval 60"

echo " host_name ${host}"

echo " notifications_enabled 0"

echo " notification_interval 720"

echo " service_description SNAPVAULT ${vol}"

echo "}"

echo

done | tee -a $OUTPUT_FILE

# Check snapmirror foo

for sm in `ssh $host snapmirror status 2>/dev/null | awk '{ print $2 }' | grep vol | grep $host`; do

# only do the checks on sm_secondary

if [ "$( echo $sm | grep $host | cut -d: -f1 )" == "${host}" ]; then

vol="$( echo $sm | cut -d/ -f3 | cut -d: -f2 )"

user="$( grep "USER=" /etc/netapp-sdk/$host | cut -d= -f2 )"

pass="$( grep "PASS=" /etc/netapp-sdk/$host | cut -d= -f2 )"

echo "define service {"

echo " use generic-service-san-perfdata"

echo ""

echo " check_command check_netapp-snapmirror!$user!$pass!$vol!38!42!"

echo " check_interval 60"

echo " host_name ${host}"

echo " notifications_enabled 0"

echo " notification_interval 720"

echo " service_description SNAPMIRROR ${vol}"

echo "}"

echo

done | tee -a $OUTPUT_FILE

done

Generate Nagios config for NetApp filers

January 1, 2013June 21, 2013 Christian Leave a comment

At some point in the last few weeks, I repeatedly had to recreate my Nagios config for currently six filers. After doing that a few times, I ended up (like sooo often) writing a short Bash script, that’ll do this for me – without any fuss.

The only thing the script needs, is that the filers and the filers are registered in DNS … Here’s an example:

fas3240a      IN   A     172.31.76.150
fas3240a-sp   IN   A     172.31.74.150
fas3240b      IN   A     172.31.76.151
fas3240b-sp   IN   A     172.31.74.151

fas3240a IN A 172.31.76.150

fas3240a-sp IN A 172.31.74.150

fas3240b IN A 172.31.76.151

fas3240b-sp IN A 172.31.74.151

With that done, the script will create the necessary Nagios config for those filers.

NetApp: Establishing SnapMirror relationships

December 31, 2012August 8, 2014 Christian Leave a comment

After figuring out the SnapVault stuff, I needed to implement a whole bunch of SnapMirror relations. As I am lazy (as in click-lazy), I ended up writing a somewhat short Bash script, that’ll either establish a bunch of SnapMirror relations (for a single host) or just for a single volume.

The script expects, that SSH public key authentification has been set up, and that the source for the SnapMirror exists and is online/not-restricted.

NetApp: Establishing SnapVault relations

December 30, 2012August 8, 2014 Christian 1 Comment

I’ve been spending a lot of my time the last week on getting SnapVault with out FAS-filers to work. Out came a script, which does this for a given volume (and of course SnapVault Primary and Secondary).

The script expects, that SSH public key authentification has been set up.

NetApp: SnapVault snapshot retention for non-standard snapshot names

December 29, 2012June 21, 2013 Christian Leave a comment

Well, the name says it pretty much. Once you rename the snapshot on the SnapVault destination from daily.0 to something else, the whole builtin SnapVault snapshot retention isn’t gonna work anymore.

Back when I started all the code-writing, I wasn’t aware of this. One of my co-worker complained to me about it on Wednesday that there are an assfull of snapshots on the SnapVault destination (one snapshot each day since the end of October, meaning more than 50 snapshots per volume, in a total of 12 or so FlexVolumes, making the total about 500 snapshots).

So I took the time to write this little Bash script (yeah, I know I’m mixing a bunch of languages – I really like the KISS principle), which will get the necessary information from the filer (snapvault snap sched needs to be set) and then deletes the over-aged snapshots.

NetApp: Monitoring of SnapVault/SnapMirror/LUN/Snapshot information with Nagios

December 29, 2012September 10, 2014 Christian 4 Comments

As I wrote before, we have a bunch of filers (and a ton of volumes w/ luns on them), that I need to monitor. At first, I tried the existing NetApp Nagios-Plugin(s), but they all use SNMP and with that I can either watch all volumes or none. And that didn’t satisfy me.

Don’t get me wrong, the existing plugins are okay and I still use them for stuff (like GLOBALSTATUS or FAN/CPU/POWER) which isn’t present in the API or real hard to get at, however I wanted more. So I ended up looking at the NetApp API, and ended up writing a “short” plugin for Nagios using Perl.

Maybe if I’m ever bored, I’ll rewrite it using C, but for now the Perl plugin has to suffice.

So far the plugin supports the following things:

Monitoring FlexVolumes (simply watching the free space)
Monitoring LUN space (the allocated space inside a FlexVolume for iSCSI/FC LUNs)
Monitoring Snapshot space (the allocated space inside a FlexVolume for Snapshots)
Monitoring SnapVault relations (and their age)
Monitoring SnapMirror relations (and their age)

The plugin will return performance data for most (if not all) of those classes. It needs a user on the filer you wish to monitor – which sadly needs to have the admin role.

NetApp: Archive SnapManager SQL Snapinfo

December 29, 2012August 8, 2014 Christian Leave a comment

The MSSQL admins decided to dump the SMSQL Snapinfo stuff on a separate volume, that SMSQL also snapshots. Same as before, I need a PowerShell script that’ll archive the snapshot and rename it.

NetApp: Archive SnapManager Oracle Snapshots

December 29, 2012June 21, 2013 Christian Leave a comment

And here’s the script for SMO. However, since different people administrate the Oracle Databases, they don’t want me to tinker with the database like the MSSQL admins. They give me a CSV-list of volumes, that should be backed up and I work with that.