Extending VMotion compatiblity (continued)

Remember my last post about cpu masking ? Well, turns out that you can do it to a “template”.

The only point you don’t need to do, is to mark the VM as a “template“. You still can clone it and move it around and all that other stuff, but the good part is, that the cloned VM keeps the cpu mask set to the “template*shrug*

I don’t know, why VMware didn’t include that feature into the templates, since it’s a real freaky way to do.

Nagios3 with Active Directory authorization on SLES10

Well, it seems to be getting a “trend” for me, to integrate stuff into our Active Directory. Now that I know why, and how easy that is, I expect to add more stuff. The good thing about the integration is, that you only need to maintain a single source for authorization.

The bad thing about that is, that stuff becomes dependent on the Active Directory (we do have four domain controllers, so that should be fine).

Now, here’s the ssl-(only) apache2 configuration file for my vhost:

As you can see, AuthLDAPUrl holds the four LDAP servers separated by spaces (that’s what the Apache2 documentation says about that), and that actually works.

The only additional thing I had to change from the nagios part is in /etc/nagios/cgi.cfg to allow everyone to issue system commands. Also, if you ever stumble upon extraneous chars in the check_nrpe output, update to a newer NRPE version, that fixed it for me (that is on the receiver side – as in the box running the NRPE agent).

Extending vMotion compatiblity

Today I did something horrible. I yet again noticed that I bought the wrong CPU’s (basically I bought Xeon DP’s with four cores). Those have apparently a feature called SSSE3, which makes vMotion with our old Xeon DP’s (dual cores) fail before even trying.

But as we had a cooling outage today (basically ’cause it broke), I needed to turn off some ESX servers. Thus leaving me with the new ones and one of the old ones. *yuck*

So after a bit of googling, I found this VMware KB entry, which luckily lists the registers (on level 1) you need to zero out.

Only problem after that was that it still wasn’t enough. So back to the drawing board. The final solution came rather quick and looks like this:

The only stupid thing about this is, that

  1. it ain’t supported by VMware (as in if you’re having trouble with your ESX/VC and you have a VM running with this, you’re shit outta luck!)
  2. you have to define this on a *per VM basis*, which really is a pain in the ass for larger installations

True, I just should’ve bought vMotion compatible CPU’s, that would have spared me the hassle … but it’s too late now, I have to live with those ones.

Managing unixODBC connections on SLES10

Recently I got the task, to implement unixODBC/freetds on one (well, it’s really three) of our web servers, as someone wanted to use Microsoft SQL Server 2005 with PHP (without using the MSSQL functions, which PHP provides soo nicely; don’t ask me why).

With that I also got a set of “instructions” on how to install freetds from source (remember, I was a Gentoo dev, so I know my way around, when it comes to building from source), as well as a small set of instructions on how to create the connection.

Well, after trying to figure out why the hell the connection ain’t working with unixODBC’s tsql and PHP’s odbc functions, and yet the plain connection using telnet works … *shrug* turns out it was a simple mistake …

The “howto” said something like this:

See the difference ? If not, I’ll show you a diff:

Something as simple as adding another part of a word (as in “name“) to Server, makes the whole thing go wonko. Well, it ain’t going wonko per se, as Servername is different from the meaning of Server, at least when it comes to freetds.

Servername is the SQL-Server Instance name, while Server is the DNS name .. figures.

subversion on WebDAV with Active Directory authorization on SLES10

Okay, so I ended up toying with subversion via WebDAV on SLES today (I know, I know .. it’s bloody Sunday). It wasn’t much of a hassle though, after reading this. Sure, I made a few errors at first (simply confused the logic behind “Location” and “Directory“), but after that plain subversion commits via WebDAV (thus utilizing Apache) worked fine.

For POC or as a hint to myself, here’s where and what I needed to add/change:

Add the following modules to APACHE_MODULES in /etc/sysconfig/apache2:

  1. dav_svn (dav_svn needs dav, thus the need to add it too)
  2. dav
  3. authnz_ldap (authnz_ldap needs ldap, so again we need that too!)
  4. ldap

After that, we can add our repository (or our multi-repository folder) to /etc/apache2/conf.d/subversion.conf:

Now, as you can see, my goal was to not rely on a separate authorization database, but to use our already existing Active Directory at work. Generally this works just fine, but it didn’t. I tried various things, like trying another user, changing the group (as in the “require ldap-group“) as well as changing my own password. Zip.

All I got was this line in the error_log of Apache:

Now, that itself does tell you what is happening, but not why. So again, I ended up googling till I found this:

The suggested step was to add “REFERRALS off” to /etc/ldap/ldap.conf. Surprise, the file don’t exist. Heck, there’s that one in /etc/ldap.conf. I did that, still zip.

Did I get the wrong file ? Absolutely.

/etc/ldap.conf is used by nsswitch and pam_ldap, but not by openldap2 (which is what Apache is using). So reading this comment, adding the line to /etc/openldap2/ldap.conf, and *kaching*! Works.

Now I just need to install redmine (already installed ruby, rubygems and rubygem-rails from the SDK Addon), but I’ll leave that for tomorrow, today I’m gonna watch Band of Brothers.

The clue to build ppc64 RPM’s

Remember, I talked about building RPM’s on SLES10SP2 on ppc64 ? Well, turns out I was rather stupid .. and it was rather simple (don’t ask me why I didn’t think of that). I tried asking solar, I used Google (apparently with the wrong search parameters), nothing though. Not a clue.

Today it bugged me again, so I used Google again. This time with “ppc64 suse rpmbuild“, and guess what I saw within the preview of the second hit ..

And here I thought I was missing something, turns out I was really stupid though .. *shrug* Building stuff like nagios works with that just fine ..

Update: or not. It worked only a single time and is broken ever since again. Guess I’m gonna reload the box on Tuesday.

VMware design rules

I’m just got back from four days in Rostock over at S&N, where I was attending a VMware design course and here’s a list of questions I did ask the trainer:

  1. What’s the disadvantage of having a 1016 ported vSwitch ?
  2. Any clues on how to exchange the default certificate of the Virtual Center ?
  3. Are there any tools to stress test the virtual system ?
  4. Are there any performance impacts of having more than 10 users in Virtual Center ?
  5. Any clues and/or guides on how to do time synchronization in VMware guests, especially Linux guests ?
  6. What’s the preferred NIC type for Linux guests ?
  7. Any clues to using Raw Device Mappings with VMotion ?
  8. Is there a way of defining CPU masks on a global level ?

Answers:

  1. There might be a small overhead, though that’s limited to a really, non-measureable amount
  2. Hasn’t done it yet.
  3. Yes, there are free stress test tools like cpubusy.vbs, cpubusy.pl, iometer.exe, ..
  4. Nope, you should only experienece load problems starting at 25 or so users
  5. Select *one* variant, either time synchronization by use of the VMware tools or ntpupdate; if ntpupdate, select a single time source for your whole environment
  6. For ESX 3.5.0 that would be “Flexible” (as per VMware Knowledgebase), as the vmxnet type is a leftover from ESX 3.0
  7. Raw device mappings are *absolutely* supported by VMware, and also work without any troubles (when mapping/zonig is correctly configured)
  8. Currently there’s no known way of doing this
  • When adjusting the CPU afinity of a VM, *always* completely stop the virtual machine afterwards
  • When trying to figure out CPU bottlenecks, check whether or not hyperthreading is enabled. The hyperthreaded (second) core is only giving you a CPU with 15% of the first.

Also, here are some guidelines on how the trainer extended the defaults:

ESX Server:

  • Extend the “/” size to 10GiB
  • Extend the “swap” partition to about 1GiB
  • Extend the “/var/log” partition to about 4 GiB
  • don’t mess around with creating too many vSwitches; just keep it simple
  • set the duplex mode manually if the ESX is giving you any trouble
  • disable the Traffic Shaping, unless you *really* need it

VirtualCenter:

  • There’s two options when installing VirtualCenter: either install it on a physical box or simply put it into a virtual machine itself
  • A problem with putting it into a virtual machine is, when the VM is shutting down or powered off due to isolation of the ESX running it, any ESX Server powering up isn’t going to start any virtual machines as that in return requires the License Server (as Michael pointed out in #c1, the VM is still gonna start as the HA agent is able to start virtual machines on the basis of the 14-day grace period)
  • Only use the SQL Server Express variant if you really have to. It’s limited to 4GB database size, so if your installation grows above say 50 hosts and 2000 VM’s, this is gonna break the limits of SQL Server Express

Building RPMs on SLES10SP2-ppc64

Well, it turns out that building stuff on ppc64 is a *real* pain in the ass, at least on anything SUSE related. I do have to tweak every damn spec to include this:

Otherwise, ld is gonna fail when linking, as it’s gonna try linking the generated 64bit code (-m64 is passed on via RPM_OPT_FLAGS to CFLAGS) as 32bit code, which ain’t gonna work at all …

On top of that, stuff ain’t building due to multiple problems (for example nagios and vim, cause ld is unable to find the fitting -lperl (for nagios) and -lXt (for vim)) as well as source errors …

GPO (behind the scenes)

Well, to begin with we had this really weird problem that the thin clients as well as the terminal server would only load user based group policy if you are a member of the group of local administrators. While that’s ok for the thin clients (users can’t actually change something unless they log in as “Administrator” – don’t ask me why), it’s a real no-no on the terminal server.

We tried redoing *everything* (that is, starting with the domain, then terminal server and after that the thin clients) and yet nothing changed, it didn’t work either. That’s what I’ve been doing the last 2 weeks. Up till now, I always thought a user would have access to the ntuser.dat (that is HKEY_CURRENT_USER), if his NTFS permissions would be correct. But nooooooooooooooooooooo, Microsoft had to introduce another layer of permissions.

Old permissions on HKEY_CURRENT_USER
Old permissions on HKEY_CURRENT_USER
Once you change it to be proper (as in remove the dead user entry and add a group that actually gets you somewhere), it’s all starting to work!

New permissions on HKEY_CURRENT_USER
New permissions on HKEY_CURRENT_USER