Distribution running on IBM TS7530 Virtualization Engine

Well, I was just a bit curious earlier what distribution might be running on our IBM TS7530 Virtualization engines .. well, I just had a look-see ..

Main difference to a “normal” SUSE Linux Enterprise Server 10 installation (there’s about zip normal with that kind of installation, thus the quotes) thus far are:

  1. the build for the VE uses busybox as init
  2. IBM stripped man/info
  3. they are running Xorg/Fluxbox on it

Just don’t ask me why there’s a DE (desktop environment) running, it ain’t even hooked up to a monitor. Only reason would be for the RSA’ remote monitor stuff … *lala*

Read More

IBM RDAC and Windows Cluster Service

Okay, so we received a brand new x3650 the other day entitled to replace one (or better two) of our NAS frontend servers. We installed Windows on it the other day (had to create a custom Windows Server 2003 CD first, since the default one doesn’t recognize the integrated ServeRAID), and we prepped the box during the week with the usual things.

On Monday I started installing the “IBM StorageManager RDAC” MultiPath driver (since the box got two single port PCIe FC-HBA’s) and figured I’d be nice if we had this. I asked a IBM Systems Engineer of one of our partners, which told me generally there wouldn’t be a problem with Microsoft Cluster Services (MSCS) and the IBM MPIO driver. Only requirement would be that I’d install the new storport.sys driver (version 5.2.3790.4021) first (as in Microsoft KB932755).

Now, yesterday I finished the zoning, did the mappings on the storage arrays and then figured the box should see the hard disks. So I started adding another node to our existing Microsoft Cluster.

Result: Zip (as in MSCS telling me not all nodes could see the quorum disk)

Reason: a combination of two things. First, said IBM Storage Manager RDAC. The first time I installed it, I forgot about the storage mappings, thus the box seeing zero disks. After uninstalling it, I was seeing 121 (that’s right, one hundred and twenty one) new devices.

Visible volumes previous to installing the RDAC driver
Visible volumes previous to installing the RDAC driver

That is basically a result of the zoning I did for this particular device, which has *all* controllers present in a single SAN zone, thus the HBA’s seeing devices eight (or nine) times .. Update: yes, I’m missing one controller … 😀

SAN zoning for the box
SAN zoning for the box

Now, as I reinstalled the RDAC *after* the host discovered the volumes, it’s showing only a dozen drives.

Visible volumes after installing the RDAC driver
Visible volumes after installing the RDAC driver

Now, as I figured this out, I told myself “Hey, adding the third node to the Windows Cluster should now work without a clue …” … guess what ?

It’s Microsoft and it doesn’t. Now why doesn’t it work ? ‘Cause the Cluster Setup Wizard is getting confused in Typical mode, as it’s creating a “local quorum disk” which naturally isn’t present in the cluster it’s joining. Now, switching the wizard to “Advanced (minimum) configuration” as suggested in Q331801, just works … *shrug*

SLES, ZendOptimizer and IBM PowerPC(4)+

What would you figure from the above ? Hopefully the rather obvious, that it’s a *really* shitty combination.

So we figured it would be a nice thing to test our new setup before going into pre-production testing or production, but we don’t have an extra spare box. So we took one of the power4 boxes we have mounted in the rack basically consuming energy all day (that’s about 38kWh a day) and installed SLES10 onto it. Which wasn’t all that bad (at first the box repeatedly started back to AIX, from CD and after convincing the SMS – that’s basically the bios on the power*-boxes also known as System Management Services with a hammer to boot from the first hard disk).

The real bad part started later. First the box committed suicide sometime on the weekend (the last one that is), which is rather not so good.

So we installed the ocfs2-tools (which is obviously needed if you want do writes on a SAN volume mounted on two separate boxes), configured the o2cb thing to start automatically on boot and added the entry to /etc/fstab.

So far so good, but as we slowly activated the apache-vhosts, we finally came to what cost me about three damned hours of my life:

Now guess what … ZendOptimizer just went bye-bye … Damn and what now ? So I looked at the Knowledgebase on zend.com, even found an Article stating it’d do that from time to time

And attached also the usual crap .. “Please update to the latest version”. Only problem with that is that the latest version is indeed available for x86_64 (meaning amd64 in Gentoo terms), but ain’t for ppc (even if the product page states it should be).

So I went home, knowing what the problem is – since it was already past 4pm – swearing a short “frack that“.

Now that I’m home, ate something (a rather good salad), listening to some Korn/Kid Rock/Offspring and after doing some undertakers work, I asked myself “Why exactly do we need that crappy application anyway ?” (beyond the obvious point, that the ZendOptimizer is like/ is a php-compiler cache).

It turns out, one of my co-workers wrote a TYPO3-plugin interfacing our local research database .. and the catchy thing is, guess what …

He “guarded” it with ZendGuard, thus we need to use the ZendOptimizer thingy; otherwise we couldn’t use it either … 😯

O RLY ?
O RLY ?

SLES10 on pSeries

Okay, yet another day passed by blazing fast. I had a good day at work, spent nearly the whole day trying to get my bloody systems hooked up to our SAN (which was interrupted by a non-working SAN-switch, disappearing WWN’s, lunch and my trainees), messing around with our internal network, hacking our Blade Chassis switches to get me what I want and some random paperwork.

But first things first .. We installed SLES10 on a pSeries box the other day (I think on Monday), and now I’m trying to get the WWN of it’s Emulex HBA, out of either sysfs or procfs. But whatcha’ thinking ?

I can’t get the dreaded WWN our of anything. Emulex’s hbacmd (from their HBAnyware utility) tells me there is no HBA and/or I don’t have the lpfc driver loaded (which can’t be, since I see IBM Tape Drives and my DS4300/FAStT900 via the lpfc), which is like … 😡

So if any Emulex/pSeries expert is reading this, *please* (I beg you) tell me how the frack I get the WWN squashed out of it without looking either at the back of the rack or into the BIOS.

And here’s just for the record (my own – so I don’t need to look it up more often) the way on how to reset the attention indicators (basically LED’s) on the front of a pSeries box running Linux, which gets turned on when either resetting the box or killing it in startup:

That’s it, the LED is off.

Waiting

We are still waiting for the money promised by the state and the country for our HBFG (again, it’s “Hochschulbauförderungsgesetz”), that hopefully is reducing or eliminating our storage/SAN problem we have currently. Right now we have to Cisco MDS9216 (that’s a 16-port 2GBps SAN-switch, two for redundancy), which means we only have 16 SAN-ports. That isn’t much, but still is to less, as we have like 30 machines or so, that *really* need access to the SAN, so we either end up unplugging some of them from the SAN or merge them onto some big machines (like our x366).

The other side of the problem is the storage .. Currently that isn’t redundant, which means we’re fucked if the storage decides to not come up, or one of the controller smokes .. So were looking at two DS4700 with 2 enclosures each filled with 300GB 2GBps FC disks. That will hopefully also solve our constant lack of rackspace.

Apart from that, we took a look at the terminal server market, heard someone from Citrix, looked ourselves at 2X (and I think we are going with the 2X solution – even if they don’t support the authentication passthrough – yet). We might want to consider buying dedicated hardware for the terminal servers, as I implemented them running on the ESX which isn’t a permanent solution, as at least the students will work on those terminal servers 0700-2200, that means continuous load in that time, which isn’t good for the ESX Cluster, as they are pretty loaded already.

We’re also looking in buying a third box for the ESX Cluster, probably one of the same as we have currently (that is x366 – with 2 DC Xeon’s, 16GB RAM, 2×73 GB SAS, 2x dual-port Intel NIC, 2x dual-port FC HBA) to get some extra capacity.

Recently I did some experiments with Gentoo as MySQL cluster (master< ->master replication for our upcoming database servers – that’s what the blade chassis and the two blades are for) and noticed that the Gentoo VM’s were sucking up RAM and didn’t release it, so I had to reset them every morning, in order to free some RAM. I guess I should poke Chris a bit about that, as he told me back at FOSDEM that he was doing some load testing with a similar setup not so far ago.

AIX 5.3 Linux Toolkit

OK, so I skipped rebuilding a newer RPM version (for now) and I’m currently rebuilding anything that fit’s into app-dev according to IBM …

The list reads like this:

OK, I’m not exactly rebuilding these old versions, I’m actually using their old specs to compile newer versions of these. I’m currently at coreutils-6.7, which really takes ages. But will see about the rest.

Oh, and btw .. if anyone happens to search for a way to extend a logical volume on AIX, use chfs.

That’s what I used to enlarge the logical volume containing /opt about 4 (what kind of unit is that ?).

AIX-5.3 & rpm-4.4.7

OK, so I tried to install the AIX Toolkit today, to build some newer rpm’s (yaaaaah, I *hate* RPMS myself, still it’s way better than distributing plain tar.gz archives) but looks like either AIX or rpm-4.4.7 doesn’t like me.

Now I’ve to figure out how to get libm (that’s /lib/libm.so) installed on AIX. Will see about that later and/or tomorrow.

IBM

We just received the long awaited shipment of sixteen 300GB FC-HDD’s (2Gbps with 10000rpm) for our SAN (a pretty old DS4500/FaStT 900).

But there’s still the software option missing we ordered within the same breath. So I called our trustworthy IBM distributor (hah!) and asked the guy responsible for sales, what the ETA on this software option is (if someone is interested its VolumeCopy/FlashCopy).

He told me, that we’ll receive a letter with the license key about 4 weeks after commission !!!!!!

I nearly fell from my chair when he told me that.

I still can’t believe it, that sending a license key printed upon a simple page is taking 4 weeks 😮

Luca had a nice comment about that:

Mood sucks

christel, you remember the mood-swings we were talking about ?

I think I’m undergoing just another 🙁 I’m currently pretty much pissed. Basically everything is pestering me currently (except #gentoo-dev and Gentoo work).

Work just ripped another piece of me (hah, thanks VMware & BigBlue). I started the day with a ice cold shower (if I’m talking about ice cold it was ice cold), they’re currently replacing our old gas heating and unfortunately that means no warm water at all! *arg*

Summer – finally

Well it’s mid of July and the weather seems to be my friend. 25°C ain’t that bad. I really liked the weather last week (although everyone at work was bitching about it being tooo warm :P) and would like to keep it (for the rest of the year of course!).

Hrm, for everyone who loved the music within Kill Bill – Volume 1: Tomoyasu Hotei really rocks (playing Battle without Honor or Humanity).

Work is finally getting interesting. The x366 have been delivered, as well as the new Netbay 42U rack and of course the optional Cisco Catalyst 3650. Hopefully my co-worker will let me work on those babies in a short time (I would really like to :))

And now to my Gentoo related work:

I had an interesting conversation with Hendrik yesterday about some of his packages, and I’m taking over some of them …

I bumped sys-cluster/vzctl, hopefully all bugs related to the upstream changes (well I missed a config var in the init-script #138469) are now fixed. Oh yeah, they finally decided to switch to a sane versioning scheme. Thanks Kir and Igor!

Linux VServer stuff is also approaching its final release (even if Herbert is always saying When its done. I’ll prepare the patch tarballs for 2.{0.2,1.1}_rc26 later.

Meh, I nearly forgot to say thank you. Thanks a lot Mr. Bush. They just showed how the local waste disposal company is blocking the road with dumpsters while being protected by ~20 policemen … And only to prevent anybody driving into the city with a car which could endanger Mr. President!