I’m coming back today from a six day vacation in the warm south (that is Stuttgart), back at work and find three sheets of paper on my desk. Two tell me something I haven’t done yet, the other one tells me something I haven’t seen yet.
One of my colleagues had to restart one of our web nodes and now the thing can’t mount the logging volume (and thus, logrotate / awstats failed to do it’s job). OCFS2 ain’t spitting any error messages, when trying to mount the volume you see it joining the domain the volume belongs to on the other nodes, so from a first glance at things .. nothing is wrong ?
One thing I’ll have to add is, that you can’t reboot the box cleanly (as in you have to use the power button, so I figure something is either stuck or something is malfunctioning ..) *shrug*
Well, it’s been quite a while since most of the people last heard a word from me. The last few months I’ve been extremely busy with work-related tasks (and as a side-effect of that, didn’t want to spend much time in front of the computer after 9 hours of work). I also started spending more and more time in the gym, like nearly two hours every Tuesday and Thursday.
I finally fixed our replication issues, we do now have a working! MySQL Multi-Master (1. Node, 2. Node — bear in mind, this boxes are *only* serving MySQL and nothing else, so don’t use these configurations on mixed setups) Replication Setup as database back end for our TYPO3-vHosts.
all the web nodes are now serving the content from a clustered, shared SAN volume (is that a good thing ? π – don’t know yet …)
our VI environment is getting more and more acceptance (even if you hear some complaints now and then, like “awww, damn that crap my 4GiB RAM, 2×3.0GHz Windows 2008 is running soooo choppy” – simple answer, don’t use Windows Server 2008 and/or Windows Vista!)
I finished prepping our VM templates (at least the Windows ones)
we’re still putting together the plans on whether or not invest into a VDI solution.
The next few weeks are gonna be as frantic as the weeks before, I still have to migrate a lot of TYPO3 installations to our new cluster (which sadly needs time, as we need to wait for DNS changes to propagate). Honestly, I might be ending up extending the SAN volume for the MySQL data storage, as even with only three somewhat busy sites, the binary log of the last 5 days is about 2GiB in size. And we still have ~20 other busy sites on a separate box.
Lucky me, I created the MySQL data storage on a logical volume, so I can easily extend the volume in the san-manager semi-online (the fs needs to be unmounted and thus the MySQL process), then extend the physical volume (LVM2 PV) and the logical volume (LV) afterwards, and at last the underlying EXT3 file system.
As some of you know by now, I am on extended leave for now. I don’t have tree access (at my own request), though I’m gonna try to keep up with Chris and 2008.0 … So long!
Well, the title nearly says everything .. I managed to loose my second pair of car keys, today I somehow found out that I was driving without a drivers license, so I have to go to the registration office and apply for a new one, hopefully should be done in about 4-6 weeks. Oh hell, and I have to spend about 40 β¬ on it ..
Well, live kinda sucks if you’re oblivious. Anyway, work is giving me a ass-load of fun right now, so I’m kinda happy, though it’s Saturday evening, I’m sitting back home, just lost all my custom build Debian packages (yes, I happen to use that at work, right after SLES) and listening to Hed PE.
So, as the previous try on getting the teamix people to fix the bloody LoadBalancer (as in sending at least an identification string for the SSH check) didn’t work so well (they told me, I should configure MASQuerading/ROUTEing on the PacketPro (which is kinda icky), I went on today and looked at what SLES10 installs as default logger.
Surprisingly they install a rather new syslog-ng (well, syslog-ng-1.6.8 is what they ship) so it was rather easy to workaround the situation.
Here’s what already was in the syslog-ng.conf.in (more on that later):
andnotmatch("Did not receive identification string from 172.16.(123|234)");
};
Afterwards just a quick SuSEconfig -module syslog-ng, restart the syslog daemon and the messages were gonse. Sure I know it’s a rather ugly hack π , but since they refused to provide a “true” fix and it seemed like that question has been asked more than once it works for me, so *shrug* π
But now you’d ask why syslog-ng.conf.in ? Simply because Novell figured it would be too easy to just invent things like CONFIG_PROTECT for RPM/YaST, so they placed yet another file in there; from which the syslog-ng.conf files is generated every time SuSEconfig is being executed (that’s like every time you install a package using YaST).
What would you figure from the above ? Hopefully the rather obvious, that it’s a *really* shitty combination.
So we figured it would be a nice thing to test our new setup before going into pre-production testing or production, but we don’t have an extra spare box. So we took one of the power4 boxes we have mounted in the rack basically consuming energy all day (that’s about 38kWh a day) and installed SLES10 onto it. Which wasn’t all that bad (at first the box repeatedly started back to AIX, from CD and after convincing the SMS – that’s basically the bios on the power*-boxes also known as System Management Services with a hammer to boot from the first hard disk).
The real bad part started later. First the box committed suicide sometime on the weekend (the last one that is), which is rather not so good.
So we installed the ocfs2-tools (which is obviously needed if you want do writes on a SAN volume mounted on two separate boxes), configured the o2cb thing to start automatically on boot and added the entry to /etc/fstab.
So far so good, but as we slowly activated the apache-vhosts, we finally came to what cost me about three damned hours of my life:
And attached also the usual crap .. “Please update to the latest version”. Only problem with that is that the latest version is indeed available for x86_64 (meaning amd64 in Gentoo terms), but ain’t for ppc (even if the product page states it should be).
So I went home, knowing what the problem is – since it was already past 4pm – swearing a short “frack that“.
Now that I’m home, ate something (a rather good salad), listening to some Korn/Kid Rock/Offspring and after doing some undertakers work, I asked myself “Why exactly do we need that crappy application anyway ?” (beyond the obvious point, that the ZendOptimizer is like/ is a php-compiler cache).
It turns out, one of my co-workers wrote a TYPO3-plugin interfacing our local research database .. and the catchy thing is, guess what …
He “guarded” it with ZendGuard, thus we need to use the ZendOptimizer thingy; otherwise we couldn’t use it either … π―
": ARP monitoring set to %d ms, validate %s, with %d target(s):",
arp_interval,
arp_validate_tbl[arp_validate_value].modename,
arp_ip_count);
for(i=0;i<arp_ip_count;i++)
printk(" %s",arp_ip_target[i]);
printk("n");
}else{
/* miimon and arp_interval not set, we need one so things
* work as expected, see bonding.txt for details
*/
printk(KERN_WARNING DRV_NAME
": Warning: either miimon or arp_interval and "
"arp_ip_target module parameters must be specified, "
"otherwise bonding will not detect link failures! see "
"bonding.txt for details.n");
}
If I read it right, you only get the KERN_WARNING for “either miimon or arp_interval” only if miimon or arp_interval isn’t set … but at least my config says it is .. *shrug* .. bed time for me π
Since one of the requirements for my current project is having NIC redundancy, I didn’t get around looking at the available “adapter teaming” (or adapter bonding) solutions available for Linux/SLES.
First I tried to dig into the Broadcom solution (since the Blade I first implemented the stuff uses a Broadcom NetXtreme II card) , but found out pretty soon that the basp configuration tool, which is *only* available on the Broadcom driver CD’s shipped with the Blade itself, pretty much doesn’t work.
Some hours googling later at how to get the frickin’ Broadcom crap working, I stumbled upon a file linked as bonding.txt. Turns out, that the kernel already supports adapter teaming (only that it’s called adapter bonding) by itself. No need for the Broadcom solution anymore.
Setting it up was rather easy (besides my lazy SUSE admin can’t do it via yast; it has to be done on the file layer since “yast lan” is too stupid to even show the thing), it’s simply creating the interface configs via said “yast lan“, copying one of the “ifcfg-eth-id” files to another file called “ifcfg-bond0“, removing some stuff out of it and cleaning out the other interface configs.
Then simply shove in the following into the ifcfg-bond0 in /etc/sysconfig/network:
That’s it .. We just defined an adapter IP (the 141.53.5.x) and an virtual interface labeled as “int“. We also configured the MII-Monitor to check every 100ms(?) the link of each interface (those defined in BONDING_SLAVEx) if they are either up or down, as well as the adaptive load balancing (“mode=balance-alb“).
Only thing annoying me with that solution is the following entry in /var/log/messages:
Jul418:32:00dbc-mysql1 kernel:bonding:Warning:either miimon orarp_interval andarp_ip_target module parameters must be specified,otherwise bonding will notdetect link failures!see bonding.txt fordetails.
Jul418:32:00dbc-mysql1 kernel:bonding:bond0:Setting MII monitoring interval to100.
Jul418:32:00dbc-mysql1 kernel:bonding:bond0:enslaving eth1 asan active interfacewithadown link.
Jul418:32:00dbc-mysql1 kernel:bnx2:eth1 NIC Link isUp,1000Mbps full duplex
Jul418:32:00dbc-mysql1 kernel:bonding:bond0:link status definitely up forinterfaceeth1.
Jul418:32:00dbc-mysql1 kernel:bonding:bond0:making interfaceeth1 the newactive one.
Jul418:32:00dbc-mysql1 kernel:bnx2:eth0:using MSI
Jul418:32:00dbc-mysql1 kernel:bonding:bond0:enslaving eth0 asan active interfacewithadown link.
Jul418:32:00dbc-mysql1 kernel:bnx2:eth0 NIC Link isUp,1000Mbps full duplex
Jul418:32:01dbc-mysql1 kernel:bonding:bond0:link status definitely up forinterfaceeth0.
See the warning ? I can’t get it to shut up .. I also tried loading the mii.ko module, but it won’t shut up … damn π
Well, at least the adapter teaming works as desired (still haven’t measured the performance impact with this setup – really need a clever way to do that) and I can plug one of the two cables connected to this box and still have one interface online and a continuous connection. yay β
Okay, yet another day passed by blazing fast. I had a good day at work, spent nearly the whole day trying to get my bloody systems hooked up to our SAN (which was interrupted by a non-working SAN-switch, disappearing WWN’s, lunch and my trainees), messing around with our internal network, hacking our Blade Chassis switches to get me what I want and some random paperwork.
But first things first .. We installed SLES10 on a pSeries box the other day (I think on Monday), and now I’m trying to get the WWN of it’s Emulex HBA, out of either sysfs or procfs. But whatcha’ thinking ?
I can’t get the dreaded WWN our of anything. Emulex’shbacmd (from their HBAnyware utility) tells me there is no HBA and/or I don’t have the lpfc driver loaded (which can’t be, since I see IBM Tape Drives and my DS4300/FAStT900 via the lpfc), which is like … π‘
So if any Emulex/pSeries expert is reading this, *please* (I beg you) tell me how the frack I get the WWN squashed out of it without looking either at the back of the rack or into the BIOS.
And here’s just for the record (my own – so I don’t need to look it up more often) the way on how to reset the attention indicators (basically LED’s) on the front of a pSeries box running Linux, which gets turned on when either resetting the box or killing it in startup:
OK, it turns out that I was rather stupid when configuring the my.cnf. As it turned out, the effect I was seeing was due to the presence of two log-bin lines, which looked like the following:
1
2
3
4
5
6
7
8
9
10
11
12
[mysqld]
port=3306
datadir=/mysql/dbase
log=/mysql/logs/dbc-mysql1.log
log-error=/mysql/logs/dbc-mysql1.err
socket=/var/lib/mysql/mysql.sock
bind=172.16.234.31
# custom paths for binary logs
log-bin=/mysql/binlogs/dbc-mysql1
log-bin-index=/mysql/binlogs/dbc-mysql1.idx
relay-log=/mysql/binlogs/dbc-mysql1.relay
And some lines down there was this:
1
2
# custom paths for binary logs
log-bin
Now the next thing I encountered was while importing our old databases (they are like 1.1GiB each, 25 databases total). The second MySQL Master (and his Slave) will choke as soon as you dump the data too fast into the first Master, as the binlog seems to be too big for MySQL to transfer it via TCP (smth like “Packet too large – try increasing max_packet_size” in the error-log; only problem was that max_packet_size was already at 1GiB which is the absolut maximum for MySQL 5.0 according to the handbook).
A way around this (thanks to a co-worker who pushed me towards this road) is disabling all the MySQL Master/Slave stuff in your my.cnf, start the mysql daemon as a simple, dumb database, import all your databases, stop the mysql daemon; tar up the whole BASEDIR and scp/rssh it to your second master.
Clean out the BASEDIR on the second master, untar your tarball, edit your my.cnf again to include the whole Master/Slave portions on both boxes and you should be up and running π
I haven’t run any tests on the MasterMaster replication yet, but I’ll do that as soon as I’m at work again (which is the 27th June, as I’m off for vacation since yesterday, yay!)
Here I am, sitting at my desk on a Thuesday evening thinking about what happened the last few days.
I finally got to play around with our PacketPro 450 Cluster (nifty LoadBalancing appliance)
We reworked the network the way *we* want it (and not that tool of a wannabe sysadmin)
We mostly figured out how to do the LoadBalancing right, we just need to find some bugs in the LoadBalancer software (like the thing is failing over to its slave from time to time, but keeping the IP address for himself) or let the guys at teamix do their work and hopefully get a working release within the next week or so
I figured out how to setup interface bonding with SLES10 (it was quite straight forward, thanks to the excellent in-kernel documentation), and we’re using an active-backup mode for now
I still need to figure out how to do the MySQL Master<->Master replication right .. I’m currently building fresh RPM’s on one of those Dell blades (yes, they ROCK!) which will hopefully be finished till I’m at the office tomorrow.
Pt. 5 also includes figuring out how to pass MySQL a custom location for the binary-log, at least that’s what the handbook says in Chapter “5.11.3. The Binary Log” …
When started with the –log-bin[=base_name] option, mysqld writes a log file containing all SQL commands that update data. If no base_name value is given, the default name is the name of the host machine followed by -bin. If the basename is given, but not as an absolute pathname, the server writes the file in the data directory. It is recommended that you specify a basename; see Section B.1.8.1, βOpen Issues in MySQLβ, for the reason.
That behavior works for –log-bin-index (like log-bin-index=/mysql/binlogs/$HOSTNAME.idx), but doesn’t for –log-bin. *shrug* I’ll see if that is fixed with something >5.0.18 (that’s what SLES10 currently ships).
I’m also looking for a network topology drawing program (possibly free), as Microsoft Visio (either 2003 or 2007, Standard or Professional) is nice, but still can’t draw shit correctly. So I stumbled upon yEd, which seems to look nice (I haven’t yet looked at it, but will tomorrow) that hopefully gives me the opportunity to draw/visualize my setup at work π―