NetApp: Archive SnapManager SQL snapshots

As I wrote before, we’re using SnapManager (for SQL/Oracle) to create consistent snapshots. However my database guys don’t want to name their snapshots daily.<increment> (which I can understand), as once you archive those snapshots to a secondary (and tertiary) system, the names become junk.

So, they’re naming the snapshots like snap__vcsrv_29_12_2012-10.00.01. Sadly, when it comes to SnapVault, it expects the names in form of daily.<increment> otherwise you won’t be able to transfer the snapshots with the CLI (none that I have found anyway).

But we didn’t want to move away from naming the snapshots the way they are, so I ended up writing a PowerShell script, that once triggered archives the Snapshots needed for a set of databases. It took me a while to figure a bunch of stuff out, but in the end I think I have a working way of archiving custom-named snapshots.

Read More

Implementing SnapVault backups – the hard way

Well, I recently had the pleasant task of implementing SnapVault backups, that are being shipped to an offsite location with SnapMirror.

That in itself isn’t the bad thing, however we decided against Protection Manager (since it was a charged product back when we decided on this). So I basically had the three tasks:

  1. Actually implement the SnapVault stuff (and learn my way around it and also document it)
  2. Write a bunch of scripts, that help us in creating scheduled backups of our databases
  3. Create a monitoring script, that’ll fit into our Nagios environment already in place

Well, two months later (sadly it still has some kinks – I can’t figure out this one bug though for the life of it) and a few hundred hours of working on/with it and out came four things:

  1. Bash-scripts to create the SnapVault/SnapMirror relations
  2. Powershell scripts to trigger the SnapVault updates
  3. a Nagios plugin, based on NetApp’s SDK for Data ONTAP (even if the API is crap from time to time – it’s still better than using SNMP)

I’ll post those things one after another, once I wrote up all the articles.

vCenter: Removing VSC custom attributes

Well, yesterday I got pissed of those Virtual Storage Console custom attributes.

Currently we don’t use the Provisioning & Cloning feature of the VSC, thus we don’t need the custom attributes. After poking around, I decided to write a short PowerCLI script to do the task.

It’s really rather simple, so here goes:

NetApp: Changing DS4243 shelf ID

I’m working on a project right now, providing a SnapVault target for our “big” NetApp. So we moved our 3240 to it’s target location, I spent most of my time yesterday doing the cabling (SAS and ACP, as well as power).

I’m still not finished, I still need to “beautify” the power cables, need to fix the network cables (currently I don’t have none ….) and some other minor stuff. But lemme skip back a bit.

The 3240 initially had only two shelfs, one with the ID 10 and the other with the ID 50. When reimplementing the thing, I wanted to do two things:

  1. Make the shelfs “proper” (i.e. adjacent shelf IDs)
  2. Make sure it’s done right
So, I ended up googling the topic (or rather NetApping, since the NOW page isn’t being indexed), and found a NetApp Community post. As I already did a complete wipe/cleanconfig of both filers, I was left with this:
  1. Halt both controllers (don’t power them off!)
  2. Change the shelf ID using the front panel of the DS4243
  3. Power-cycle each shelf
  4. Wait at least 30 seconds
  5. Boot both controllers
And that actually did it, my HA controllers are up and running, with the new shelf ID’s.

NetApp: Migrating FCP luns with ndmpcopy to another controller

Well, I’m in a situation, where I need to move all volumes from one controller to two others. So I looked at the ways I had available:

  1. Freshly implementing everything: No option at all!
  2. vol copy: Is rather slow, thus no option
  3. ndmpcopy: That’s exactly what I needed!

ndmpcopy is a great way to copy over a whole volume including it’s files (thus FCP luns) to another volume/controller.

First I threw in a crossover cable, since at around 6 PM our backup system starts it’s daily run, and everything else running via IP in between 6 PM and 6 AM is seriously impaired by this. Configured the additional ports on all three controllers (picked a private, not-routed range just in case) and then kicked of a simple bash script that ran the following:

Now, that in itself worked like a charm as you can see from the output below.

However, once I switched the UCS into the correct VSAN and modified the Boot Policy, the XenServer would boot, but didn’t find *any* Storage Repository. So I went ahead and looked at the CLI of the XenServer, looked at /var/log/messages and saw that apparently the PBD’s weren’t there yet (for whatever reason).

Poked around in /dev/disk/by-id, looked at the output of xe pbd-list and found that the SCSI-IDs used in the PBD’s we’re actually not present yet. So I was like *wtf* for a moment, however then took a quick peek at the output of lun show -v /vol/vol_xen_boot on both NetApp controllers and found the cause for my troubles:

As you can see, the lun itself is available and mapped with the correct LUN ID. However, if you look closely at the serial of both LUNs you might notice what I noticed. So it turns out, ndmpcopy does the copy-process, however you need to adjust the LUN serial on the destination controller to match the one from the source controller, otherwise it’ll throw any system out of whack.

After adjusting that, everything came up just fine. And I’m finished with my first XenServer environment, only the big one is still copying.

NetApp LUN creation/vol sizing

Well, as you might know I’ve been tinkering with a NetApp FAS at work. The last few months, I’ve been trying to figure out a few things, which I actually did.

One “error” I ran into with creating the lun’s and volumes by hand was that the volumes were running out of space. Even if the volume was a bit larger than the LUN. After that happened a few times, I decided to see how to fix that. As it turns out, the GUI “fixes” that already in a way I wouldn’t have expected.

The GUI wizard for creating a new LUN simply enlarges the hosting volume by three percent (that’s 3%!). So if you create a 300GiB LUN, the GUI will create a volume with 309GiB (well about that – the GUI calculates in KiB thus you’ll see something like 324009984k in the output of vol size).

I also wrote a short script, which will sum up the space of all LUNs contained inside a volume and then based on your snap reserve and the actual LUN space give you the current vol size and the vol size it should be. I’ll post the script later on.

SnapManager for SQL Server: Service fails to respond in a timely fashin

Well, we recently upgraded the SnapManager version on our test box to 6.4.1. Now however, after restarting the box the SnapManager service failed to start … The error was something like this:

SnapManager: Failed to start

Now, first I stumbled upon this NetApp Community post, which only contained the “solution” to increase the global! wait time for services. That didn’t sit well with me.

So after looking through NOW! for a bit, I actually found the correct way. The fix is described in KB2010835. Yet again, another certificate error. Why do vendors deploy SSL certificates, when they use untrusted ones, which defeats the purpose of SSL certs or at least “brings up” users to ignore any error message they get concerning SSL certificates ?

As the KB article describes, you need to remove the following settings:

SnapManager for SQL Server: Internet Explorer fixes
SnapManager for SQL Server: Internet Explorer fixes

NetApp – Remove LUN mappings

As promised in the earlier post, for completeness sake, here’s the counterpart for removing the LUNs in the first place.

With that, you can simply run it against a NetApp controller and remove every LUN map except the one with LUN ID 0 (which is pretty handy when installing/reinstalling ESX servers).

NetApp – Copy LUN mappings

Well, today I had another idea (basically like the one I wrote for SVC’s VDisk mappings a while back) for a script:

I’ll post the counterpart of the script (to remove the LUNs) in a second post later on.

NetApp – Get a list of volumes containing too much LUNs

Well, after figuring things out (and realizing that if you create a LUN in the same size as the volume it’ll break), I decided to write yet another script to figure out which LUNs needed fixing.

And with that you have a list of volumes, with the amount of space they need to resized in order to accomodate the contained LUNs and the snapshots.