Nagios virtualization

As virtualization seems to be a trendy thing to do, I went ahead and virtualized our nagios (while reinstalling the whole thing …).

Now as I went into work today and started my email client, I received 4 nagios warnings about a LOAD service reaching critical state. Looked at the nagios box itself, opened up the VM console, looked into the syslog. Nothing.

Yet over 3/4 of the services were flapping, some ping checks were critical (for whatever reason). So I opened the nagios webinterface again, and noticed it dropping the connection over and over again (had to reauthentificate me again and again).

So I opened up Putty, which established the connection without a single problem, but dropped me like a stone after a short amount of time. I restarted the session and got a security warning from Putty (due to different than the saved sshd public key). That raised my suspicion. So I took a look at the hostname, and lookie there.

Somehow my old nagios box (which is a physical box), got turned online again, thus having the same IP address as my virtualized one. So the virtualized nagios wasn’t really dropping my connection, but I was being directed to the old nagios.

Walked over into the data center, turned of the old box (well, I kept the power button pressed for a short time), and away went my troubles.

Nagios3 with Active Directory authorization on SLES10

Well, it seems to be getting a “trend” for me, to integrate stuff into our Active Directory. Now that I know why, and how easy that is, I expect to add more stuff. The good thing about the integration is, that you only need to maintain a single source for authorization.

The bad thing about that is, that stuff becomes dependent on the Active Directory (we do have four domain controllers, so that should be fine).

Now, here’s the ssl-(only) apache2 configuration file for my vhost:

As you can see, AuthLDAPUrl holds the four LDAP servers separated by spaces (that’s what the Apache2 documentation says about that), and that actually works.

The only additional thing I had to change from the nagios part is in /etc/nagios/cgi.cfg to allow everyone to issue system commands. Also, if you ever stumble upon extraneous chars in the check_nrpe output, update to a newer NRPE version, that fixed it for me (that is on the receiver side – as in the box running the NRPE agent).

Nagios & plugins

Since we started utilizing Nagios‘s power two months ago, I finally came up with a C-based ram-plugin for nagios. The biggest problem I had with the python and perl based plugins, that some distributions (yes, SLES and Debian) don’t install either Python or Perl.

Since I wanted a manageable setup (as in unified code base across all distributions), I wanted it to work without installing too much. So I took the swap plugin and basically removed what wasn’t necessary and voila!

Here we go, yay ME!

Only thing I need to finish sometime soon, is getting the NSClient++ work on my Windows boxen (which I do have quite a few, the domain controllers, nas-cluster, …)