Using the integrated kickstart generator

VMware built an kickstart generator into ESX 3.5. You just need to enable it, simply by editing an XML configuration and restarting the webAccess service. Simply edit /usr/lib/vmware/webAccess/tomcat/apache-tomcat-5.5.26/webapps/ui/WEB-INF/struts-config.xml and look for the line saying:

This line needs to be commented out (<– and –>) and the lines following, having those comment marks around them needs to be removed.

After doing that, you should be able to restart the webAccess service, and after that access your ESX host.

If that worked, you should see the Login to Script Installer link on the Dashboard of the Web interface.

VMware vSphere and templates

I just converted one of my (old) templates, as I wanted to refresh the updates and the virus scanner. After converting, I was asked about the UUID (no clue why), and expected to be done with it. But after looking at the console, I got the following, completely cryptic message:

Unable to connect to MKS
Unable to connect to MKS

After digging a bit deeper (that is looking at the vmware.log of the virtual machine, since the message of the GUI is *real* cryptic), I’m a bit wiser:

After softly shutting the VM down, and the powering the VM back up everything is back to working order.

Fixing vmkernel symlinks

Since I do happen to be in the situation pretty often where the kernel inside a VM is newer than what VMware currently has in their tools (as in the SUSE kernel is newer than the binary modules built by VMware), here’s a quick reminder for myself on how to to fix the .ko symlinks.

SUSE Linux Enterprise Server 10 on VMware ESX (continued)

Well, after some searching today (we applied the VMware Update 2 today, thus the VMware Tools update too), I finally found out what is causing that problem.

Though the problem seems to be not limited to virtual systems alone, I just browsed through this Novell Forum thread which pretty much describes my problem. I found the same error in the VM’s I tried to mount a CD image.

Only difference between my behaviour and the one described, is that the virtual maschine is switched off immediately after you try to mount a CD image.

Now, this guy is saying Novell is working on it … But you’re gonna have to ask the question, why in gods name did such an update get through QA ? Or ain’t there no QA for updates ? *shrug*

SUSE Linux Enterprise Server 10 on VMware ESX

We’re currently having a *really* weird problem with our VM’s. Sometime last week, SUSE released a kernel update. Now, once you install it and you reboot the selected VM with a DVD/CD image present, you’re gonna see this:

msg.vmxaiomgr.retrycontabort.unkown
msg.vmxaiomgr.retrycontabort.unkown

The only workaround so far has been to unmount *any* cleanse any CD-Drives attached to the VM. And yes, this is reproduceable, even reinstalling from scratch doesn’t change the fact, that after installing the patch the VM quits working.

I also know, SLES10 SP2 ain’t officially supported yet by VMware, but I’d still suspect it to just work and not produce such weird errors. The only thing I found so far is this VMTN thread ..

Lucky us, VMware just today released Update 2 for VirtualCenter and ESX, wherein SLES10SP2 should be officially supported!

Extending vMotion compatiblity

Today I did something horrible. I yet again noticed that I bought the wrong CPU’s (basically I bought Xeon DP’s with four cores). Those have apparently a feature called SSSE3, which makes vMotion with our old Xeon DP’s (dual cores) fail before even trying.

But as we had a cooling outage today (basically ’cause it broke), I needed to turn off some ESX servers. Thus leaving me with the new ones and one of the old ones. *yuck*

So after a bit of googling, I found this VMware KB entry, which luckily lists the registers (on level 1) you need to zero out.

Only problem after that was that it still wasn’t enough. So back to the drawing board. The final solution came rather quick and looks like this:

The only stupid thing about this is, that

  1. it ain’t supported by VMware (as in if you’re having trouble with your ESX/VC and you have a VM running with this, you’re shit outta luck!)
  2. you have to define this on a *per VM basis*, which really is a pain in the ass for larger installations

True, I just should’ve bought vMotion compatible CPU’s, that would have spared me the hassle … but it’s too late now, I have to live with those ones.

VMware design rules

I’m just got back from four days in Rostock over at S&N, where I was attending a VMware design course and here’s a list of questions I did ask the trainer:

  1. What’s the disadvantage of having a 1016 ported vSwitch ?
  2. Any clues on how to exchange the default certificate of the Virtual Center ?
  3. Are there any tools to stress test the virtual system ?
  4. Are there any performance impacts of having more than 10 users in Virtual Center ?
  5. Any clues and/or guides on how to do time synchronization in VMware guests, especially Linux guests ?
  6. What’s the preferred NIC type for Linux guests ?
  7. Any clues to using Raw Device Mappings with VMotion ?
  8. Is there a way of defining CPU masks on a global level ?

Answers:

  1. There might be a small overhead, though that’s limited to a really, non-measureable amount
  2. Hasn’t done it yet.
  3. Yes, there are free stress test tools like cpubusy.vbs, cpubusy.pl, iometer.exe, ..
  4. Nope, you should only experienece load problems starting at 25 or so users
  5. Select *one* variant, either time synchronization by use of the VMware tools or ntpupdate; if ntpupdate, select a single time source for your whole environment
  6. For ESX 3.5.0 that would be “Flexible” (as per VMware Knowledgebase), as the vmxnet type is a leftover from ESX 3.0
  7. Raw device mappings are *absolutely* supported by VMware, and also work without any troubles (when mapping/zonig is correctly configured)
  8. Currently there’s no known way of doing this
  • When adjusting the CPU afinity of a VM, *always* completely stop the virtual machine afterwards
  • When trying to figure out CPU bottlenecks, check whether or not hyperthreading is enabled. The hyperthreaded (second) core is only giving you a CPU with 15% of the first.

Also, here are some guidelines on how the trainer extended the defaults:

ESX Server:

  • Extend the “/” size to 10GiB
  • Extend the “swap” partition to about 1GiB
  • Extend the “/var/log” partition to about 4 GiB
  • don’t mess around with creating too many vSwitches; just keep it simple
  • set the duplex mode manually if the ESX is giving you any trouble
  • disable the Traffic Shaping, unless you *really* need it

VirtualCenter:

  • There’s two options when installing VirtualCenter: either install it on a physical box or simply put it into a virtual machine itself
  • A problem with putting it into a virtual machine is, when the VM is shutting down or powered off due to isolation of the ESX running it, any ESX Server powering up isn’t going to start any virtual machines as that in return requires the License Server (as Michael pointed out in #c1, the VM is still gonna start as the HA agent is able to start virtual machines on the basis of the 14-day grace period)
  • Only use the SQL Server Express variant if you really have to. It’s limited to 4GB database size, so if your installation grows above say 50 hosts and 2000 VM’s, this is gonna break the limits of SQL Server Express