Firefox: Hosting Xmarks (formerly Foxmarks) on lighttpd

Well, I am an enthusiastic user of Xmarks (or Foxmarks) and played with this again and again. So this weekend, I finally decided to do it properly. I sat down, recreated the whole WebDAV stuff (even if I cheated of this HowtoForge article).

Always redirect traffic to HTTPS, since transmitting username and passwords via HTTP ain’t that secure (MITM)

Okay, so here are the shortended setup instructions:

  1. Enable mod_access, mod_auth, mod_redirect and mod_webdav in /etc/lighttpd/lighttpd.conf
  2. Create the necessary directories
  3. Create the htpasswd-file
  4. Configure the redirections

Since we just created the necessary directories, as well as a htpasswd-file containing a user we should be able to change the configuration now:

Now, just restart the lighttpd service and watch your WebDAV shine. Seriously, there are a couple of things you should be aware of:

  1. When using a home-grown WebDAV server with HTTPS (meaning, custom certificate), Firefox is gonna be blocking the site at first (and Xmarks is gonna fail with a rather cryptic “Error 8172“). Navigate to the URL manually and add an Exception for the certificate.
  2. Before changing the URL’s in Xmarks, I made the error and manually created directories named “bookmarks” and “passwords”, which I then entered in the respective dialogboxes in the settings window. That however made Xmarks cry horribly when running the synchronization.

After deleting the folders, it works just fine.

TSM: Restoring the database/recovery log to a point-in-time

Well, my co-worker just called on my cell (it’s Friday, 16:00), and asked me which start-up script he needed to change in order to restore the database. My first response was, “ummm, that’s gonna be hard, we’re using heartbeat”.

Okay, so after a bit of asking I got out of him what he wanted to achieve by changing the start-up script. Apparently he did something to crash Tivoli Storage Manager (or rather repeatedly crash it) and wanted to restore the database. He talked to one of the systems partner we do have (and I’m happy we have them most of the time), who in return told him how to do it, but forgot a minute after he hung up the phone.

So, I went digging while he still was telling me how he got Tivoli to kick his own ass … After a bit, I thought “hrrrrrm, shouldn’t this be covered in the Tivoli documentation ?”, and surprisingly it’s actually covered in the documentation.

It’s actually rather simple.

  1. Stop the dsmserv Linux-HA cluster service (tsm-control ha stop tsm1)
  2. Setup the environment (since we’re running multiple instances of Tivoli Storage Manager – export DSMSERV_DIR, export DSMSERV_CONFIG)
  3. Enter the path of the server
  4. Run dsmserv restore db
  5. Wait some time (took about half an hour to restore the 95G database and the 10G recovery log)
  6. Start the dsmserv Linux-HA cluster service (tsm-control ha start tsm1)
  7. Update the server-to-server communication, since the restore db changes the communication verification token

Nagios: Service Check Timed Out

Since I got the pleasure of watching some Windows boxen with Nagios, I took the Windows Update plugin from Michal Jankowski and implemented it. It took me some time, to initially set up the nsclient++ correctly so it just works, but up till now the check plugin sometimes reported the usual “Service Check Timed Out”.

Usually I ended up increasing the cscript timeout, or the nsclient++ socket timeout, but it still kept showing up. Since I rely heavily on my surveillance tools, I have the demand, that as few as possible false positives show up. So I ended up chasing down this error today, and after that I have to say it was quite simple.

In my case, it wasn’t cscript (that timeout is set to 300 seconds), neither nsclient++ (socket timeout is set to 300 seconds too), nor the nrpe plugin itself (that has 300 seconds as well).

As it turns out, Nagios got an additional setting controlling these things, called service_check_timeout which defaults to 60 seconds. Sadly the plugin, or rather Windows needs longer than those 60 seconds to figure out whether or not it needs updating, thus Nagios is killing the plugin and returning a CRITICAL message.

After increasing the value of service_check_timeout that’ll be fixed hopefully.

SLES10: zypper.log

Well, I just stumbled upon something .. My Nagios at work wasn’t working anymore, and I went looking.

After that, zip – nada. Next thing, check whether or not the device is really full … Okay, df ..

So, it is actually completely filled up. So, now we need to find who’s hogging the space. Since I had a assumption (pnp4nagios), I went straight for /var/lib …

That wasn’t it .. so heading to the next place, that’s suspicious most of the time, /var/log.

I was like “WTF ? 5.2G for YaST2 logs ?” when I initially saw that output … As of now, I got a crontab emptying /var/log/YaST2 every 24 hours …

Nagios: SNMP OID’s for IBM’s RSA II adapter

Well, after some poking around I finally found some OID’s for the RSA’s (only through these two links: check_rsa_fan and check_rsa_temp).

For Nagios, I dismissed the fans, since the fan speed is only passed on in percent values. So I only added this:

Oh, and if anyone else is curious like me, here’s the list with the OID’s, courtesy of Gerhard Gschlad and Leonardo Calamai.

For the fans:

And for the temperatures:

I just found a proper list of OID’s for the IBM RSA adapter. That’s rather nice, since I really was looking for the OID’s for the VRM failure OID and other warning/critical events.

Nagios: Watching Clustered environments (the other way)

Well, recently I stepped up to watch our cluster environments … Michael has a good howto on how to watch Windows Cluster environments in the NSclient++ wiki.

Now, this has it’s own perks … Which I stumbled upon when trying to write a Linux-HA OCF resource agent for the Nagios NRPE server. Combining that Linux-HA with SLES10 is a good thing generally, but using startproc in that resource agent is not such a good idea.

Apparently Novell (or SuSE GmbH) thought it might be wise to include some additional logic into the wrapper. startproc, checkproc and killproc do check for the name of the executable. So if you try to start an additional process with the same name, you need to dig a bit deeper.

For this to work, you need two additional things (quotations directly from man 8 startproc):

-p pid_file
(Former option -f changed due to the LSB specification.) Use an alternate pid file instead of the default (/var/run/<basename>.pid). The pid read from this file is being matched against the pid of running processes that have an executable with specified path of the program. In order to avoid confusion with stale pid files, a not up-to-date pid will be ignored.

Now, then apparently this isn’t enough. startproc is still refusing to start a second process.

-i ignore_file
The pid found in this file is used as session id of the same binary program which should be ignored by startproc.

Read More

Linux-HA: Creating a random authkey

I just looked over the slides of a presentation one of my trainees bought back from Chemnitz, and there was this nifty one-line command, with which you can generate a random sha1sum for your authkeys file.

Now, since I’m a bit lazy here’s the full command line to fill /etc/ha.d/authkeys for you:

TSM client: Backing up files with umlauts on SLES

In the past, I always had problems with SLES and our Tivoli Storage Manager client’s when backing up files with german umlauts. Well, today I looked a bit harder, and quite quickly found a solution.

As you can see from the above, SLES9/10 ain’t setting LANG or LC_ALL (which I searched for first), but is setting LC_CTYPE.

So, simply changing the LC_CTYPE in the init-script and/or prepending the dsmc command line with a new LC_CTYPE fixes my umlauts problems!

Well, I had a long’ish talk with one of my trustworthy IBM senior consultants the day after writing this …

He told me something along the lines of this:

If you would like to back up files with names containing characters with a code > 127 please ensure that you have chosen a SBCS character set for your locale. The default code page C or the code page POSIX supports characters up to 127 only. Files whose names contain special characters will be skipped if C or POSIX is used. It is strongly recommended to perform a system backup by using a SBCS character set to prevent any file or directory from being skipped. This behavior for different locales is intended.

And this:

The UTF-8 locale is default on some Linux platforms. However, TSM Client currently does not support running under UTF-8 locales (such as en_US.UTF-8 and ja_JP.UTF-8). Export your LANG and LC_ALL environment variables to the iso8859-1 or EUC versions of your locale and then start a new xterm (or mlterm) session prior to running TSM Client.

That basically means, at least for using the TSM Client Java Interface (dsmj) and the scheduler/client acceptor daemon you have to switch your locales to something _not_ UTF-8 capable.

He also mentioned, that IBM doesn’t have a real solution for this problem, as well that there is no real workaround. You need to invest some time into figuring out the “right” locale setting for your system(s), since after writing the above I came to the result that it ain’t enough ..

You need to do the following:

After doing so, the scheduler and the command-line client works …