July 2007 – BAFM

Bloody cluster solutions (continued)

July 12, 2007August 16, 2014 Christian 1 Comment

So, as the previous try on getting the teamix people to fix the bloody LoadBalancer (as in sending at least an identification string for the SSH check) didn’t work so well (they told me, I should configure MASQuerading/ROUTEing on the PacketPro (which is kinda icky), I went on today and looked at what SLES10 installs as default logger.

Surprisingly they install a rather new syslog-ng (well, syslog-ng-1.6.8 is what they ship) so it was rather easy to workaround the situation.

Here’s what already was in the syslog-ng.conf.in (more on that later):

  filter f_iptables { facility(kern) and match("IN=") and match("OUT="); };
  filter f_messages { not facility(news, mail) and not filter(f_iptables) };

1 2	filter f_iptables { facility(kern) and match("IN=") and match("OUT="); }; filter f_messages { not facility(news, mail) and not filter(f_iptables) };

which I just extended with the following:

  filter f_iptables { facility(kern) and match("IN=") and match("OUT="); };
  filter f_messages { not facility(news, mail) and not filter(f_iptables)
                      and not match ("Did not receive identification string from 172.16.(123|234)");
  };

filter f_iptables { facility(kern) and match("IN=") and match("OUT="); };

filter f_messages { not facility(news, mail) and not filter(f_iptables)

and not match ("Did not receive identification string from 172.16.(123|234)");

};

Afterwards just a quick SuSEconfig -module syslog-ng, restart the syslog daemon and the messages were gonse. Sure I know it’s a rather ugly hack 😆 , but since they refused to provide a “true” fix and it seemed like that question has been asked more than once it works for me, so *shrug* 😛

But now you’d ask why syslog-ng.conf.in ? Simply because Novell figured it would be too easy to just invent things like CONFIG_PROTECT for RPM/YaST, so they placed yet another file in there; from which the syslog-ng.conf files is generated every time SuSEconfig is being executed (that’s like every time you install a package using YaST).

Fujitsu Siemens, onboard NIC’s, Quality assurance and vendors

July 12, 2007June 21, 2013 Christian Leave a comment

So we bought some Fujitsu Siemens P5916 Intel vPro back in January/February for the Boss and his secretary.

These boxes are quite nice, come with a Core 2 Duo (which is waaay to overrated for simple business applications like Word, Excel, Access and Outlook), but he insisted on having Windows Vista Ultimate ready PC’s.

We got them, as expected completely *blank*. Wasn’t so much of a problem though, since we have a Select ~~5.0~~ 6.0 contract with M$. Only problem was, they refused to install Vista (as in freezing after preping the HDD). So I called our local vendor, who told me “Go, grab the latest BIOS from the support page and perform a BIOS update!” – Which I wasn’t so happy about to hear and to do … That didn’t work, the box would freeze on boot now …

So we reprimand our local vendor, who pushed the liability away from themselves and onto Fujitsu Siemens Computers (since they labeled these things Vista Ready). Next thing I know, I was talking to the sales person responsible for the R&D (F&L in german) in Mecklenburg-Vorpommern, claiming “It would have been bettar if you bought these with Vista preinstalled – eh ?“, which I doubted (and still doubt) since drivers can’t change if you can install Vista on it when Vista itself considers the BIOS “not ACPI compatible” … 👿

That was about the time when I stopped listening and thought about buying Dell desktops from now on … since I’m completely sick and tired of being treated like the last low-tech moron by a) sales representatives, b) vendors, c) lvl2 technical support and d) engineering.

Anyway – I was trying to tell today’s story .. So the Boss called me in around 9′, asking me to take a look at his Outlook since it complained about “H:Outlook.pst” not being present (H: is the drive for the roaming profiles and the private data for every employee). So I looked a bit further, into the Event log of this Vista box where I found something like “No Logon Server found, your last locally saved profile is being reused, please contact your administrator”. From there on, I was rather – err -puzzled about the way Windows Vista is handling Roaming profiles.

Opened up a command prompt, tried ping‘ing the router in the subnet and got a garblish response from ping (which I’ve never seen before). First I checked whether the cable was OK (it was), afterwards I went grabbing his computer back to the workroom, plugged in a separate NIC, which worked but Vista didn’t had drivers for. So plugged in the next one, googled for Vista drivers (which I luckily found), plugged in my pendrive and hoped they’d work with Vista .. but NOOOOOOOOOOOOOO.

So I pulled the NIC again, only to see that the model numbers differed in the second digit (I plugged in a 500TX while I had a 530TX in my hands to look at the model number). Plugged in the NIC in my hands, did the same game again .. and Voilà, “Houston, we have lifted off … “.

Carried the PC back into his office, plugged it in, told him he could try to login now … and at finally around 10:30’ish he had his PC in a working condition back, and at least it seemed as if he was rather happy about it 😆

SLES, ZendOptimizer and IBM PowerPC(4)+

July 10, 2007June 21, 2013 Christian 2 Comments

What would you figure from the above ? Hopefully the rather obvious, that it’s a *really* shitty combination.

So we figured it would be a nice thing to test our new setup before going into pre-production testing or production, but we don’t have an extra spare box. So we took one of the power4 boxes we have mounted in the rack basically consuming energy all day (that’s about 38kWh a day) and installed SLES10 onto it. Which wasn’t all that bad (at first the box repeatedly started back to AIX, from CD and after convincing the SMS – that’s basically the bios on the power*-boxes also known as System Management Services with a hammer to boot from the first hard disk).

The real bad part started later. First the box committed suicide sometime on the weekend (the last one that is), which is rather not so good.

So we installed the ocfs2-tools (which is obviously needed if you want do writes on a SAN volume mounted on two separate boxes), configured the o2cb thing to start automatically on boot and added the entry to /etc/fstab.

So far so good, but as we slowly activated the apache-vhosts, we finally came to what cost me about three damned hours of my life:

child pid ### exit signal Segmentation fault (11)

1	child pid ### exit signal Segmentation fault (11)

Now guess what … ZendOptimizer just went bye-bye … Damn and what now ? So I looked at the Knowledgebase on zend.com, even found an Article stating it’d do that from time to time …

And attached also the usual crap .. “Please update to the latest version”. Only problem with that is that the latest version is indeed available for x86_64 (meaning amd64 in Gentoo terms), but ain’t for ppc (even if the product page states it should be).

So I went home, knowing what the problem is – since it was already past 4pm – swearing a short “frack that“.

Now that I’m home, ate something (a rather good salad), listening to some Korn/Kid Rock/Offspring and after doing some undertakers work, I asked myself “Why exactly do we need that crappy application anyway ?” (beyond the obvious point, that the ZendOptimizer is like/ is a php-compiler cache).

It turns out, one of my co-workers wrote a TYPO3-plugin interfacing our local research database .. and the catchy thing is, guess what …

He “guarded” it with ZendGuard, thus we need to use the ZendOptimizer thingy; otherwise we couldn’t use it either … 😯

Handling files/directories with spaces in `for’-loops

July 7, 2007July 7, 2007 Christian 2 Comments

So I have one or the other file, that needs to be extracted to a directory. And why not name it as the archive itself .. Only problem with it is the handling of variables with bash …

Try it yourself, stuff some directories with a space in inside a variables, and use something like this:

epimetheus tmp [0] $ mkdir files
epimetheus tmp [0] $ touch files/"I hate directories.archive" files/"Me luuv you looong time.archive"
epimetheus tmp [0] $ for i in $( /bin/ls --color=none files/ ); do mkdir "${i/.archive/}"; done

epimetheus tmp [0] $ mkdir files

epimetheus tmp [0] $ touch files/"I hate directories.archive" files/"Me luuv you looong time.archive"

epimetheus tmp [0] $ for i in $( /bin/ls --color=none files/ ); do mkdir "${i/.archive/}"; done

And now take a look at the output of that ..

epimetheus tmp [0] $ ls
I/  Me/  directories/  files/  hate/  looong/  luuv/  time/  you/

1 2	epimetheus tmp [0] $ ls I/ Me/ directories/ files/ hate/ looong/ luuv/ time/ you/

Means, the mkdir' created a directory for every entry in the ls‘ output that was separated by a space char … and I’ve no frickin clue on how to get that thing right … 😡

Update:
Thanks to Roy I know how one handles such things … 😳 It’s rather simple *g*

epimetheus tmp [0] $ for i in files/*; do mkdir "$( basename "${i/.archive/}" )"; done

1	epimetheus tmp [0] $ for i in files/*; do mkdir "$( basename "${i/.archive/}" )"; done

That should give you the desired effect ❗

Dell PowerEdge 1855, DRAC/MC, firmware updates, telnet and csr’s

July 6, 2007June 21, 2013 Christian Leave a comment

Today I played a bit with our PE Chassis, or more specifically the DRAC/MC (remote management console). One of the things I’ve been experiencing was that the DRAC/MC was rather slow when browsing on the web interface (as in waiting a minute for the jnlp for the KVM to download). So I went ahead, fired up net-misc/atftp on my notebook, put the firmware update provided by Dell in the TFTPROOT and executed this in my telnet session on the DRAC/MC:

DRAC/MC # racadm fwupdate -a &lt;ip-ADDRESS&gt; -d mgmt.bin

1	DRAC/MC # racadm fwupdate -a <ip-ADDRESS> -d mgmt.bin

You may ask now, wtf does he use telnet for on that box ? It’s as simple as Dell isn’t providing anything else to use, the switches come w/ ssh, but not the management console. Only way to get ssh is to buy a new one, which is like 500 EUR.

Waited a few minutes impatiently for the DRAC/MC to come back up (and it finally came back up). The good thing is, the DRAC/MC is now at least a bit faster (at least I feel its a bit faster) and we’re up at mgmt-1.4.2.

Now, since we are a member of the DFN CA, we are able to generate signed certificates (at least Internet Explorer recognizes it through the DTAG Root certificate – which Mozilla products sadly don’t have by default). For that I need a 2048 bit PCKS#10 (or CSR), which I tried to squash out of the DRAC/MC. But what the hell ❓

The DRAC/MC only gives me a 1024 bit one without the possibility to choose what kind of CSR I want to generate … 😡

miimon, arp_interval and the code

July 4, 2007August 16, 2014 Christian Leave a comment

After today’s adventure with the kernel bonding, I just took a look at the code …

         if (miimon) {
                 printk(KERN_INFO DRV_NAME
                        ": MII link monitoring set to %d msn",
                        miimon);
         } else if (arp_interval) {
                 int i;

                 printk(KERN_INFO DRV_NAME
                        ": ARP monitoring set to %d ms, validate %s, with %d target(s):",
                        arp_interval,
                        arp_validate_tbl[arp_validate_value].modename,
                        arp_ip_count);

                 for (i = 0; i &lt; arp_ip_count; i++)
                         printk (" %s", arp_ip_target[i]);

                 printk("n");

         } else {
                 /* miimon and arp_interval not set, we need one so things
                  * work as expected, see bonding.txt for details
                  */
                 printk(KERN_WARNING DRV_NAME
                        ": Warning: either miimon or arp_interval and "
                        "arp_ip_target module parameters must be specified, "
                        "otherwise bonding will not detect link failures! see "
                        "bonding.txt for details.n");
         }

if (miimon) {

printk(KERN_INFO DRV_NAME

": MII link monitoring set to %d msn",

miimon);

} else if (arp_interval) {

int i;

printk(KERN_INFO DRV_NAME

": ARP monitoring set to %d ms, validate %s, with %d target(s):",

arp_interval,

arp_validate_tbl[arp_validate_value].modename,

arp_ip_count);

for (i = 0; i < arp_ip_count; i++)

printk (" %s", arp_ip_target[i]);

printk("n");

} else {

/* miimon and arp_interval not set, we need one so things

* work as expected, see bonding.txt for details

printk(KERN_WARNING DRV_NAME

": Warning: either miimon or arp_interval and "

"arp_ip_target module parameters must be specified, "

"otherwise bonding will not detect link failures! see "

"bonding.txt for details.n");

}

If I read it right, you only get the KERN_WARNING for “either miimon or arp_interval” only if miimon or arp_interval isn’t set … but at least my config says it is .. *shrug* .. bed time for me 🙄

Bloody cluster solutions

July 4, 2007June 21, 2013 Christian 1 Comment

In preparation to get our website (and all those other websites – like www.fh-neubrandenburg.de or www.hmt-rostock.de) clustered, someone bought the cluster version of the PacketPro 450. These things are nice, especially considering you don’t need to fiddle around with LVS yourself (which is a *real* pain in the ass).

The only problem I have currently with them is that they scan the database and web nodes every 30 seconds, and since we have an active node and a hot-standby both do this and producing this:

Jul  4 18:27:29 dbc-mysql1 sshd[7313]: Did not receive identification string from 172.16.234.11
Jul  4 18:27:30 dbc-mysql1 sshd[7350]: Did not receive identification string from 172.16.234.12
Jul  4 18:27:59 dbc-mysql1 sshd[7363]: Did not receive identification string from 172.16.234.11
Jul  4 18:28:01 dbc-mysql1 sshd[7364]: Did not receive identification string from 172.16.234.12
Jul  4 18:28:31 dbc-mysql1 sshd[7393]: Did not receive identification string from 172.16.234.11
Jul  4 18:28:33 dbc-mysql1 sshd[7394]: Did not receive identification string from 172.16.234.12
Jul  4 18:29:04 dbc-mysql1 sshd[7417]: Did not receive identification string from 172.16.234.11
Jul  4 18:29:05 dbc-mysql1 sshd[7418]: Did not receive identification string from 172.16.234.12
Jul  4 18:29:36 dbc-mysql1 sshd[7419]: Did not receive identification string from 172.16.234.11
Jul  4 18:29:37 dbc-mysql1 sshd[7420]: Did not receive identification string from 172.16.234.12
Jul  4 18:30:06 dbc-mysql1 sshd[7419]: Did not receive identification string from 172.16.234.11
Jul  4 18:30:07 dbc-mysql1 sshd[7420]: Did not receive identification string from 172.16.234.12

Jul 4 18:27:29 dbc-mysql1 sshd[7313]: Did not receive identification string from 172.16.234.11

Jul 4 18:27:30 dbc-mysql1 sshd[7350]: Did not receive identification string from 172.16.234.12

Jul 4 18:27:59 dbc-mysql1 sshd[7363]: Did not receive identification string from 172.16.234.11

Jul 4 18:28:01 dbc-mysql1 sshd[7364]: Did not receive identification string from 172.16.234.12

Jul 4 18:28:31 dbc-mysql1 sshd[7393]: Did not receive identification string from 172.16.234.11

Jul 4 18:28:33 dbc-mysql1 sshd[7394]: Did not receive identification string from 172.16.234.12

Jul 4 18:29:04 dbc-mysql1 sshd[7417]: Did not receive identification string from 172.16.234.11

Jul 4 18:29:05 dbc-mysql1 sshd[7418]: Did not receive identification string from 172.16.234.12

Jul 4 18:29:36 dbc-mysql1 sshd[7419]: Did not receive identification string from 172.16.234.11

Jul 4 18:29:37 dbc-mysql1 sshd[7420]: Did not receive identification string from 172.16.234.12

Jul 4 18:30:06 dbc-mysql1 sshd[7419]: Did not receive identification string from 172.16.234.11

Jul 4 18:30:07 dbc-mysql1 sshd[7420]: Did not receive identification string from 172.16.234.12

That’s only the logs from three minutes … now figure you have it running for like four days and figure what the average log size due to such crap is … But at least it looks solvable, though I gonna have to call them tomorrow and ask for a patch/update to get their ssh-scan to send some banner when performing the service check.

Adapter teaming on SLES10

July 4, 2007June 21, 2013 Christian 2 Comments

Since one of the requirements for my current project is having NIC redundancy, I didn’t get around looking at the available “adapter teaming” (or adapter bonding) solutions available for Linux/SLES.

First I tried to dig into the Broadcom solution (since the Blade I first implemented the stuff uses a Broadcom NetXtreme II card) , but found out pretty soon that the basp configuration tool, which is *only* available on the Broadcom driver CD’s shipped with the Blade itself, pretty much doesn’t work.

Some hours googling later at how to get the frickin’ Broadcom crap working, I stumbled upon a file linked as bonding.txt. Turns out, that the kernel already supports adapter teaming (only that it’s called adapter bonding) by itself. No need for the Broadcom solution anymore.

Setting it up was rather easy (besides my lazy SUSE admin can’t do it via yast; it has to be done on the file layer since “yast lan” is too stupid to even show the thing), it’s simply creating the interface configs via said “yast lan“, copying one of the “ifcfg-eth-id” files to another file called “ifcfg-bond0“, removing some stuff out of it and cleaning out the other interface configs.

Then simply shove in the following into the ifcfg-bond0 in /etc/sysconfig/network:

IPADDR=&quot;141.53.5.141&quot;
NETMASK=&quot;255.255.255.0&quot;
NETWORK=&quot;141.53.5.0&quot;
MTU=&quot;&quot;
REMOTE_IPADDR=&quot;&quot;
STARTMODE=&quot;auto&quot;
BONDING_MASTER=&quot;yes&quot;
BONDING_MODULE_OPTS=&quot;miimon=100 mode=balance-alb&quot;
BONDING_SLAVE0=&quot;bus-pci-0000:02:00.0&quot;
BONDING_SLAVE1=&quot;bus-pci-0000:06:00.0&quot;

IPADDR_int=&quot;172.16.234.41&quot;
NETMASK_int=&quot;255.255.255.0&quot;
NETWORK_int=&quot;172.16.234.0&quot;
LABEL_int=&quot;int&quot;

IPADDR="141.53.5.141"

NETMASK="255.255.255.0"

NETWORK="141.53.5.0"

MTU=""

REMOTE_IPADDR=""

STARTMODE="auto"

BONDING_MASTER="yes"

BONDING_MODULE_OPTS="miimon=100 mode=balance-alb"

BONDING_SLAVE0="bus-pci-0000:02:00.0"

BONDING_SLAVE1="bus-pci-0000:06:00.0"

IPADDR_int="172.16.234.41"

NETMASK_int="255.255.255.0"

NETWORK_int="172.16.234.0"

LABEL_int="int"

That’s it .. We just defined an adapter IP (the 141.53.5.x) and an virtual interface labeled as “int“. We also configured the MII-Monitor to check every 100ms(?) the link of each interface (those defined in BONDING_SLAVEx) if they are either up or down, as well as the adaptive load balancing (“mode=balance-alb“).

Only thing annoying me with that solution is the following entry in /var/log/messages:

Jul  4 18:32:00 dbc-mysql1 kernel: Ethernet Channel Bonding Driver: v3.0.1 (January 9, 2006)
Jul  4 18:32:00 dbc-mysql1 kernel: bonding: Warning: either miimon or arp_interval and arp_ip_target module parameters must be specified, otherwise bonding will not detect link failures! see bonding.txt for details.
Jul  4 18:32:00 dbc-mysql1 kernel: bonding: bond0: Setting MII monitoring interval to 100.
Jul  4 18:32:00 dbc-mysql1 kernel: bonding: bond0: setting mode to balance-alb (6).
Jul  4 18:32:00 dbc-mysql1 kernel: bnx2: eth1: using MSI
Jul  4 18:32:00 dbc-mysql1 kernel: bonding: bond0: enslaving eth1 as an active interface with a down link.
Jul  4 18:32:00 dbc-mysql1 kernel: bnx2: eth1 NIC Link is Up, 1000 Mbps full duplex
Jul  4 18:32:00 dbc-mysql1 kernel: bonding: bond0: link status definitely up for interface eth1.
Jul  4 18:32:00 dbc-mysql1 kernel: bonding: bond0: making interface eth1 the new active one.
Jul  4 18:32:00 dbc-mysql1 kernel: bnx2: eth0: using MSI
Jul  4 18:32:00 dbc-mysql1 kernel: bonding: bond0: enslaving eth0 as an active interface with a down link.
Jul  4 18:32:00 dbc-mysql1 kernel: bnx2: eth0 NIC Link is Up, 1000 Mbps full duplex
Jul  4 18:32:01 dbc-mysql1 kernel: bonding: bond0: link status definitely up for interface eth0.

Jul 4 18:32:00 dbc-mysql1 kernel: Ethernet Channel Bonding Driver: v3.0.1 (January 9, 2006)

Jul 4 18:32:00 dbc-mysql1 kernel: bonding: Warning: either miimon or arp_interval and arp_ip_target module parameters must be specified, otherwise bonding will not detect link failures! see bonding.txt for details.

Jul 4 18:32:00 dbc-mysql1 kernel: bonding: bond0: Setting MII monitoring interval to 100.

Jul 4 18:32:00 dbc-mysql1 kernel: bonding: bond0: setting mode to balance-alb (6).

Jul 4 18:32:00 dbc-mysql1 kernel: bnx2: eth1: using MSI

Jul 4 18:32:00 dbc-mysql1 kernel: bonding: bond0: enslaving eth1 as an active interface with a down link.

Jul 4 18:32:00 dbc-mysql1 kernel: bnx2: eth1 NIC Link is Up, 1000 Mbps full duplex

Jul 4 18:32:00 dbc-mysql1 kernel: bonding: bond0: link status definitely up for interface eth1.

Jul 4 18:32:00 dbc-mysql1 kernel: bonding: bond0: making interface eth1 the new active one.

Jul 4 18:32:00 dbc-mysql1 kernel: bnx2: eth0: using MSI

Jul 4 18:32:00 dbc-mysql1 kernel: bonding: bond0: enslaving eth0 as an active interface with a down link.

Jul 4 18:32:00 dbc-mysql1 kernel: bnx2: eth0 NIC Link is Up, 1000 Mbps full duplex

Jul 4 18:32:01 dbc-mysql1 kernel: bonding: bond0: link status definitely up for interface eth0.

See the warning ? I can’t get it to shut up .. I also tried loading the mii.ko module, but it won’t shut up … damn 🙁

Well, at least the adapter teaming works as desired (still haven’t measured the performance impact with this setup – really need a clever way to do that) and I can plug one of the two cables connected to this box and still have one interface online and a continuous connection. yay ❗

SLES10 on pSeries

July 4, 2007June 21, 2013 Christian Leave a comment

Okay, yet another day passed by blazing fast. I had a good day at work, spent nearly the whole day trying to get my bloody systems hooked up to our SAN (which was interrupted by a non-working SAN-switch, disappearing WWN’s, lunch and my trainees), messing around with our internal network, hacking our Blade Chassis switches to get me what I want and some random paperwork.

But first things first .. We installed SLES10 on a pSeries box the other day (I think on Monday), and now I’m trying to get the WWN of it’s Emulex HBA, out of either sysfs or procfs. But whatcha’ thinking ?

I can’t get the dreaded WWN our of anything. Emulex’s hbacmd (from their HBAnyware utility) tells me there is no HBA and/or I don’t have the lpfc driver loaded (which can’t be, since I see IBM Tape Drives and my DS4300/FAStT900 via the lpfc), which is like … 😡

So if any Emulex/pSeries expert is reading this, *please* (I beg you) tell me how the frack I get the WWN squashed out of it without looking either at the back of the rack or into the BIOS.

And here’s just for the record (my own – so I don’t need to look it up more often) the way on how to reset the attention indicators (basically LED’s) on the front of a pSeries box running Linux, which gets turned on when either resetting the box or killing it in startup:

# Make sure we have powerpc-utils installed ..
pSeries ~ [0] $ rpm -qa | grep powerpc-utils
powerpc-utils-1.0.0-5.4

# Tell us, which LEDs have which address/status
pSeries ~ [0] $ usysattn
U0.1    [on]

# Turn of the given LED
pSeries ~ [0] $ usysattn -l U0.1 -s normal

# Make sure we have powerpc-utils installed ..

pSeries ~ [0] $ rpm -qa | grep powerpc-utils

powerpc-utils-1.0.0-5.4

# Tell us, which LEDs have which address/status

pSeries ~ [0] $ usysattn

U0.1 [on]

# Turn of the given LED

pSeries ~ [0] $ usysattn -l U0.1 -s normal

That’s it, the LED is off.