Nagios Hostgroup Inheritance

As I wrote earlier, I recently virtualized our nagios. Along with that came a complete “redesign” of how checks are applied. Up till now, I defined checks for each and every single server, thus ending up with ~25 files, each holding roughly 6 checks which are in the same file just sorted by hostname.

As you can imagine, it gets quite confusing with that amount of checks (~150). So the last two days I spent on reorganizing (with Visio), on which object/hostgroup placing a check would make sense. Now, this is my first result of two days planning, reorganizing, reordering and moving hosts into different hostgroups.

Nagios Hostgroup Inheritance - Linux
Nagios Hostgroup Inheritance – Linux
Nagios Hostgroup Inheritance - Windows
Nagios Hostgroup Inheritance – Windows
Thanks to Josh (and Chris I think), realizing the above is gonna get quite easy. Gonna talk about the config layout itself about once I have it all wrapped up. Stay tuned!

Nagios virtualization

As virtualization seems to be a trendy thing to do, I went ahead and virtualized our nagios (while reinstalling the whole thing …).

Now as I went into work today and started my email client, I received 4 nagios warnings about a LOAD service reaching critical state. Looked at the nagios box itself, opened up the VM console, looked into the syslog. Nothing.

Yet over 3/4 of the services were flapping, some ping checks were critical (for whatever reason). So I opened the nagios webinterface again, and noticed it dropping the connection over and over again (had to reauthentificate me again and again).

So I opened up Putty, which established the connection without a single problem, but dropped me like a stone after a short amount of time. I restarted the session and got a security warning from Putty (due to different than the saved sshd public key). That raised my suspicion. So I took a look at the hostname, and lookie there.

Somehow my old nagios box (which is a physical box), got turned online again, thus having the same IP address as my virtualized one. So the virtualized nagios wasn’t really dropping my connection, but I was being directed to the old nagios.

Walked over into the data center, turned of the old box (well, I kept the power button pressed for a short time), and away went my troubles.

Extending VMotion compatiblity (continued)

Remember my last post about cpu masking ? Well, turns out that you can do it to a “template”.

The only point you don’t need to do, is to mark the VM as a “template“. You still can clone it and move it around and all that other stuff, but the good part is, that the cloned VM keeps the cpu mask set to the “template*shrug*

I don’t know, why VMware didn’t include that feature into the templates, since it’s a real freaky way to do.

Nagios3 with Active Directory authorization on SLES10

Well, it seems to be getting a “trend” for me, to integrate stuff into our Active Directory. Now that I know why, and how easy that is, I expect to add more stuff. The good thing about the integration is, that you only need to maintain a single source for authorization.

The bad thing about that is, that stuff becomes dependent on the Active Directory (we do have four domain controllers, so that should be fine).

Now, here’s the ssl-(only) apache2 configuration file for my vhost:

As you can see, AuthLDAPUrl holds the four LDAP servers separated by spaces (that’s what the Apache2 documentation says about that), and that actually works.

The only additional thing I had to change from the nagios part is in /etc/nagios/cgi.cfg to allow everyone to issue system commands. Also, if you ever stumble upon extraneous chars in the check_nrpe output, update to a newer NRPE version, that fixed it for me (that is on the receiver side – as in the box running the NRPE agent).

Extending vMotion compatiblity

Today I did something horrible. I yet again noticed that I bought the wrong CPU’s (basically I bought Xeon DP’s with four cores). Those have apparently a feature called SSSE3, which makes vMotion with our old Xeon DP’s (dual cores) fail before even trying.

But as we had a cooling outage today (basically ’cause it broke), I needed to turn off some ESX servers. Thus leaving me with the new ones and one of the old ones. *yuck*

So after a bit of googling, I found this VMware KB entry, which luckily lists the registers (on level 1) you need to zero out.

Only problem after that was that it still wasn’t enough. So back to the drawing board. The final solution came rather quick and looks like this:

The only stupid thing about this is, that

  1. it ain’t supported by VMware (as in if you’re having trouble with your ESX/VC and you have a VM running with this, you’re shit outta luck!)
  2. you have to define this on a *per VM basis*, which really is a pain in the ass for larger installations

True, I just should’ve bought vMotion compatible CPU’s, that would have spared me the hassle … but it’s too late now, I have to live with those ones.

Managing unixODBC connections on SLES10

Recently I got the task, to implement unixODBC/freetds on one (well, it’s really three) of our web servers, as someone wanted to use Microsoft SQL Server 2005 with PHP (without using the MSSQL functions, which PHP provides soo nicely; don’t ask me why).

With that I also got a set of “instructions” on how to install freetds from source (remember, I was a Gentoo dev, so I know my way around, when it comes to building from source), as well as a small set of instructions on how to create the connection.

Well, after trying to figure out why the hell the connection ain’t working with unixODBC’s tsql and PHP’s odbc functions, and yet the plain connection using telnet works … *shrug* turns out it was a simple mistake …

The “howto” said something like this:

See the difference ? If not, I’ll show you a diff:

Something as simple as adding another part of a word (as in “name“) to Server, makes the whole thing go wonko. Well, it ain’t going wonko per se, as Servername is different from the meaning of Server, at least when it comes to freetds.

Servername is the SQL-Server Instance name, while Server is the DNS name .. figures.