VMs in Alarm state after scheduled maintainance

Well, I’m back at work after three weeks of vacation (some pictures to follow) and the provider hosting our disaster datacenter had their annual (or is it monthly now?) SAN maintainance, so we shut down everything over there by 9:00 am.

After things were back up around 5pm, I booted the ESX hosts, however the VMs we’re all displaying the alert state – as if either the VMs had an HA event or we’re using to much CPU time. It didn’t matter whether or not the VM was running or not, the state persisted.

vCenter - VM in alarm state
vCenter – VM in alarm state

Lucky me, someone else already ran into this issue. So, after simply vMotion’ing the VMs to another host and the VM would no longer be in that alert state.

Extending VMotion compatiblity (continued)

Remember my last post about cpu masking ? Well, turns out that you can do it to a “template”.

The only point you don’t need to do, is to mark the VM as a “template“. You still can clone it and move it around and all that other stuff, but the good part is, that the cloned VM keeps the cpu mask set to the “template*shrug*

I don’t know, why VMware didn’t include that feature into the templates, since it’s a real freaky way to do.

Extending vMotion compatiblity

Today I did something horrible. I yet again noticed that I bought the wrong CPU’s (basically I bought Xeon DP’s with four cores). Those have apparently a feature called SSSE3, which makes vMotion with our old Xeon DP’s (dual cores) fail before even trying.

But as we had a cooling outage today (basically ’cause it broke), I needed to turn off some ESX servers. Thus leaving me with the new ones and one of the old ones. *yuck*

So after a bit of googling, I found this VMware KB entry, which luckily lists the registers (on level 1) you need to zero out.

Only problem after that was that it still wasn’t enough. So back to the drawing board. The final solution came rather quick and looks like this:

The only stupid thing about this is, that

  1. it ain’t supported by VMware (as in if you’re having trouble with your ESX/VC and you have a VM running with this, you’re shit outta luck!)
  2. you have to define this on a *per VM basis*, which really is a pain in the ass for larger installations

True, I just should’ve bought vMotion compatible CPU’s, that would have spared me the hassle … but it’s too late now, I have to live with those ones.