More MD weirdness

Well, at last I’m getting somewhere with my troubles. This only seems to be happening when creating an RAID5 multiple device with four disks, this doesn’t happen with three.

Now, the next thing I tried was to create a three disk array, and then adding the fourth disk as spare and then extending the array with that fourth disk. After that, all these errors seem to appear again *yuck* So I either possess rather faulty disks, or something else is fishy, since I’m having another four disk RAID5 array with the old disks …

MD (Multiple Devices) weirdness

Well, I don’t think my problem has anything to do with the DawiControl card anymore. I did a little experiment today. I created a 1TiB EXT3 file system on a single drive (one of the new 1TiB drives obviously) and started syncing data over to it (roughly 800MiB).

Now, then I unmounted the drive(s), ran fsck -C -f /dev/sd${deviceletter}1 and it went through without any trouble. Then I removed the partition and created a 1GiB partition on each drive, which I then used to build a new device mapper RAID5 array (with EXT3 on top …).

And guess what happened after I copied the data over, unmounted the file system and ran fsck ? Sure, same thing as yesterday. Now, this means either it’s a mdadm bug, while creating the array or really MD’s fault (which I can rule out, since the same happens on 2.6.25 as well as on 2.6.28) … *shrug*

SIL 3114 barfing

Well, after I had so much trouble with the USB converter (which isn’t really suited for Linux), I went ahead and bought a DawiControl DC-154 (which is using a SIL3114) controller to migrate my stuff.

After fucking up the new RAID array with the 1TB disks on the old controller (luckily I had the old hard disks still lying around, which still contained the RAID array), I plugged the 1TB disks onto the new controller and started building the array. So after 760 minutes (that’s nearly 13 hours) of synchronizing the newly created array, I was finally able to create the file system — that should be without trouble, right ?

Well, yeah … it was … So I started putting the data on the newly created array (using rsync). Only problem: something seems to be corrupting data (as in EXT3 is barfing up a lot of file system errors).

(fsck.ext3 is returning much, much more ..)

After putting the blame on EXT3, I tried out reiserfs (yeah, yeah I know .. baaaad idea). Well, at first it didn’t put out any errors, but running fsck.reiserfs turned up errors that looked a lot like the ones fsck.ext3 returned.

Then, I started looking at the array size (since I was curious), and it said the new array on four 1TB disks is ~760GB. Now according to my improper math, using 4* 1000GB drives the total usable amount of disk space should be something like 2793.96GB, and not ~760GB. *shrug*

I’m out of idea’s right now, and I’m gonna wait till January till I do anything else.

USB weirdness

Well, I was at work for a brief moment, where I grabbed me one of our SATA->USB bridges, since I need to migrate some (~750GB) data of the old raid-array and onto a new one. The troublesome about that is simply, that the current RAID controller only supports four attached devices, that’s why I do have to use something like this … Sure I could have bought a new RAID controller, but why spend 45+ EUR on something, that you can solve differently ?

Well, after figuring that I need to change my kernel config yet again (didn’t have USB support till Tue Dec 23 ~16:45:00 CET 2008) I attached the adapter to two adjacent USB ports. And shortly after copying 4-10MB, the transfer would result in a read-only EXT3 file system with something like this in the syslog:

Well, now what ? I googled a for a bit, apparently this happens when EHCI tries to write to the device and gets a timeout, cause the device is rather slow — or whatever (or the device drops down to USB 1.1). So, after disabling EHCI, the transfer has been running for about three hours now, and roughly only 1/12 of the data transferred to the external disk. Only trouble with that is, that even USB 1.1 is kinda slow to transfer 750GiB ❗

Followup: Well, due to USB 1.1 being slow as a snail, I went surfing for alternatives using Windows (since I know that the bridge does full USB 2.0 with Windows without any troubles). And guess what I found ?
There’s an EXT2/3 device driver for Windows XP, yay! So I’m copying with full 100Mbit speed right now *shrug*

Short vacation

Well, Arne recently (not really recently though .. 😛 ) complained about my blog being waaay to technical, so I ended up writing this lil’ anecdote.

I’m finally on my long awaited, the remaining year lasting vacation. Last week was interrupted by a short job interview in Nuremberg, and also by the flu (not “again“, I still got it in me, haven’t been able to shake it now for about three months).

Today I’m spending the night on Spiekeroog, which is a smallish island in the middle of the North Sea. It’s a real neat island, and if I would have the choice of deciding whether or not to move here for the same salary, I’d probably do it. Spiekeroog certainly does have a certain amount of flair, which I ain’t gonna deny. But it also does have it’s drawbacks.

One thing “neat” about Spiekeroog is, that it’s completely isolated from the CO² pollution. There is not a single car on the island (well, besides emergency services like police and fire/rescue) – only electric cars.

It’s located in the North Sea, so it’s exposed to the raw Atlantic weather of which I got a good taste today. When I was planning the trip a few weeks back, I allotted about 8 hours for the 400km’ish trip to Neuharlingersiel. Due to my driving “skills” ( 😀 others would say I do have a rather heavy foot – 170km/h or 105.6331 mph for those not able to deal with the metric system ❗ ), I was there about two hours early. So I went eating some crab soup (which was really delicious) and then went down to the harbour, where the hotel boat was gonna pick me up. And then it started raining (as in pouring), with only about 2°C air temperature. So basically I was freezing my bollocks off.

Anyway, after eating dinner worth fifty bucks in the restaurant (well, don’t forget the expensive rose wine – which was a bit sour actually, as well as the three bucks of delicious tea) I’m now all cosy within my bed and all tucked in.

Cheerio for now!

IBM TS7530 and DNS

Well, we had our TS7530 delivered in late September, the day after the IBM service guys came by to prep the VTL for our needs (IBM sells the thing as black box). Now, since that day; they fought with the Call Home functionality. The trouble was simply, that the Call Home Service running on the Virtualization Engines just didn’t start.

After about 6 weeks of trial and error (and the IBM service guys popping in every second week), they finally found the cause of the Call Home Service not being able to start. Domain Name Resolution. Neither the IP addresses of the VE’s nor the VE console were registered in our DNS/or local host files.

After I walked over to the networking department and had them register them IP addresses, everything is honky donkey.

Nagios and check_ram yet again

As some people know, I previously “created” (mostly modified the check_swap plug-in to print RAM usage) check_ram in C. Now one of my problems for the past few months was putting the C plug-in as well as “supported” environment under the same hat. Today I had another look at the amount of available plug-ins in NagiosExchange. There are quite a few plug-ins available, but as I do have some experience with Python, I used the one written in Python.

It was rather easy hacking in support for performance data into it, as the below shows. Someone else already posted a non-unified diff for performance data support, but that ain’t quite right according to the Nagios plug-in development guidelines.

Read More

IBM TS7530 engine failover and HBA mode

Well, when they delivered the VTL about four weeks ago, nobody figured this thing would be such a mess. Apparently IBM hasn’t set up that much VTL’s with engine failover.

Point being, the VE’s have eight HBA ports (four inside, four outside the black box). Now, as they configured the VTL, the ports were all in initiator mode. And we needed the fourth port in target mode as well, as it’s better to have 4 independent paths to the VTL. The only problem was, the VE console didn’t think so.

There is no way in hell you can switch the darn HBA port to the target mode. — Well, IBM just called and told us the solution.

Disolve the Failover group, reconfigure the HBA port and then recreate the Failover group. Tada …..

IBM TS7530 zoning

At first, as we prepped the zoning for the VTL, we did it WWN-based. Now the trouble with the HBA’s of the VTL is simply that it has different WWPN’s on the same WWN. And WWN-based zoning simply doesn’t allow access to that.

So off we went and switched to Switchport-based zoning, and see. It just works *shrug*

MessPC Ethernetbox 2 and Nagios

As I talked to Tobi yesterday, we came to talk about our Ethernet Box thermometer. It’s a neat device, which works pretty much out of the box. Integrating it with Nagios is a bit of a bummer.

Ethernetbox 2
Ethernet box 2

That’s what the ~300 EUR box looks like. It’s basically a small black box with a RJ45 jack, and four RJ11 jacks for attached external devices. The box itself only functions as a “management station” and doesn’t come with a sensor.
Normally, you can attach up till four RJ11 sensors to it. But, MessPC also has RJ11 port splitters, which enables you to attach up to eight RJ11 sensors to the MessPC.

Thermometer RJ45 jacks
Thermometer RJ45 jacks

As you can see, the box has a RJ45 jack on the other side, which you basically hook up to your network and then configure an IP address (or if you fancy DHCP for those things, it’s possible too).

Thermometer RJ11 jacks
Thermometer RJ11 jacks

On the opposite site, are the RJ11 jacks for the sensors. As you can see, we currently do have 4 splitters attachted to the box, enabling up till 8 sensors to be measured.
Once you have it up and running, you can look at the web interface and you’ll be able to see the state of the sensors right on the first page.
Read More