Windows Server 2003: taskmgr giving “Logon failure”

I had myself a lot of fun today. I ended up patching a Windows Server 2003 x64 SP1, where the Task Manager wouldn’t start anymore. It simply failed (or in case of right clicking on the task bar wouldn’t even appear), so I went downstairs and pulled a hard disk out of the RAID1 array, just to be sure.

Really weird Windows errorI went ahead, installed SP2 (as you can see on the above picture) while having the jitters. Also installed the VirusScan I was scheduled to install, and the system came back online. Phewww.

After my maintenance window was over, I looked into this issue a bit deeper. First tried copying over a taskmgr.exe (both 32bit and 64bit) from another Windows Server 2003 x64 SP2 system with no luck. The next step, was looking at PATH. As it turns out it has something to do with that ….

As you can see, after fixing up the PATH environment variable, it works apparently .. Weirdly though, this issue doesn’t come up on another (identical) system, same PATH modifications, main difference: calling taskmgr.exe from the Run dialog works .. while it doesn’t on this particular system.

*Shrug* Gonna have to talk to my SAP guys tomorrow … 🙂

VBscript: Query remote OS and SP info (continued)

After some more crunching on my VBscript, I think I finally have a working script that runs through a csv-list I point it to and walk onto each system (by ip-address only sadly) and query the os and the Service Pack that is installed. The CSV may look like this:

After saving that one, and running a cscript //NoLogo win_sp_level.vbs you should find a completed list like this:

The final script looks like this:

The only thing I still need to improve is the error handling (as in notify when a system is being skipped due to RPC being unavailable).

VBscript: Query remote OS and SP info

As I wrote on Thursday, I am battling with Windows Server 2003. Now I got a list out of our change management database, which sadly ain’t that accurate. So in order to get reliable information about the target systems (in order to do some accurate planning), I ended up writing a small vbscript which simply takes the hostname on the command line (cscript //NoLogo win_sp_level.vbs 10.0.0.5) and returns a csv-like element.

We may have to tune the script a bit more for our use, but it should show the basic functions I need.

Windows Server 2003 SP1, WSUS and Security Updates

Recently, we found some systems (sadly, customer systems) that  weren’t getting any Security Updates anymore. Much more sadly, them is running Windows Server 2003, and as you know Security Updates are pretty important for Windows Systems.

At the time of finding this, I had no clue as to why the were not getting any updates. At first we thought it had something to do with the WSUS server, so I upgraded the WSUS 3.0 SP1 to SP2. Since that didn’t solve nothing, I went searching for a internal VM, that showed the same symptoms and I quickly found one.

After cloning said VM (since that one is running in the production environment), a bit of hacking on it (you know, disabling the network of the VM, switching IP and Hostname, running NewSID, …) I went cracking at the problem.

Stopped the Windows Update Service, cleaned the %WINDIR%SoftwareDistribution, and started the Windows Update Service again; triggered a wuauclt.exe /detectnow /reportnow. Yet again the same result. “0 updates detected”. Shite.

Went ahead, and tried what Microsoft in their “If you have trouble with Windows Updateknowledge base article, but then again. Same result.

Another try, was simply reinstalling the Windows Update Agent, which also resulted in the same old … “0 updates detected”

Due to some discussion with my co-workers, I ended up clicking through a Microsoft KB for a recently released patch. What I found, was that any newer update I looked at, only had “Windows Server 2003 with Service Pack 2” listed as download element. Shite.

Somehow, I stumbled over a link (in the same KB article) detailing the Support Lifecycle for Service Packs in general, as well as the Lifecycle announcements for each Service Pack.

End of the story and solution to my problem basically is, Microsoft terminated the Lifecycle for Windows Server 2003 SP1 on 14.04.2009, which is the target date after which Security and Critical Updates are no longer issued for systems running SP1.

In the end, I don’t really blame them, since SP2 was already released in 2007. But what I would’ve expected is some kind of press release or a public note, that Security releases are gonna end. Another construction area identified, more work for me!

Tivoli Storage Manager Client and Microsoft Cluster Services (continued)

As you might recall from my first article about this topic, I had some troubles with the Microsoft Cluster Services and the registration replication. Now, today as we tried switching the TSM-Server for some resources, we ran into this again.

We were using the service install tool (dsmcutil install scheduler) to set the new password as well as the GUI. Now, as we brought the resource online with the local service manager, everything was honky dory. But as soon as we brought it online using the Cluster Manager, it failed horribly. Why ?

Well, as I read the Microsoft KB the last time, I started remembering something about the replication.

  • When the resource goes online, the registry keys are updated with the previously checkpointed information.
  • When the resource is brought offline, all the checkpoints associated with this resource are saved.

If you manually update these registry keys while the application or service is offline, the changes may not be replicated or may be lost. To prevent this from happening, make any manual changes while the service or application resource is online.

Simply put, when you toggle the resource offline, the cluster saves the registry from the currently running node onto the quorum (checkpoints). As we changed those settings while the resource was offline, it discarded them, as we toggled it back online with the Cluster Manager.

Simple solution: just remove the registry replication parameter when the resource is offline (and click “Apply” and “OK” afterwards). After that update the registry on the cluster node currently owning the physical disk drive (either using the GUI or dsmcutil). Afterwards, re-add the registration key and you should be able to “force” the Microsoft Cluster into thinking that the registry you have on this cluster node is the valid one.

Restarting the NSclient++ service without the management applet

For people, who are as click and point-lazy as me, here is how you restart the service without using the service management applet.

Windows Server 2003 Terminal services

Well, once you thought you don’t have any more problems, another one just pops up. I’m currently bashing my head against the wall, why the hell the forwarded (or is it redirected ?) drives are not shown in the in the “My Computer” explorer view. I pretty sure have an idea why (basically, HKEY_CURRENT_USERSSoftwareClasses isn’t writeable, but that’s where Windows, or rather the Terminal Services — or whatever is creating the associations), just don’t know a clever way around/by it.

It’s basically a dead end. The user has no access to that particular subkey, and I can’t change the permissions by changing it in ntuser.dat apparently. Neither do the inherited permissions apply, so I’m basically stuck. 🙁

Microsoft Cluster Services powered by IBM

If you think back, I talked about my problems with MSCS while utilizing the IBM RDAC Multipath driver for Windows.

Everyone I talked to about this, including our IBM business partner and it’s systems engineers; as well as some IBM systems engineer (who in fact was an freelance guy hired by IBM), told me it had to do with how we did the zoning (stuffing every controller into a single zone), and that would be the reason why the x3650 was seeing that many drives.

When the freelance SE came to visit us, we redid the zoning, separating each endpoint connection (each HBA port to each controller port) into a different zone.

Additionally he told me, that was the only IBM™ supported configuration.

SAN Zoning (Overview)
SAN Zoning (Overview)

As you can see, I had to create ten different zones for each single port of the dual port fibre channel HBA and it’s corresponding endpoint (I guess, I still have to create more, since the DS4700 is having *two* ports per controller).

SAN Zoning (Detailed)
SAN Zoning (Detailed)

After we finished that, we rebooted the x3650 and hoped that would have fixed. Afterwards the IBM SE was baffled. Still seeing ~112 devices. What the heck ? He ranted about how awful this was and did some mumbo jumbo with his notebook, uploaded the ds4?00 configuration files to some web interface, but shortly afterwards said the storage configuration seemed to be fine on the first glance.

So we had another look at the storage configuration and he quickly found, that the other cluster ports were set to “Windows Cluster 2003 (Supporting DMP)” in the port configuration and said that’d be the cause why stuff still ain’t working (I think he guessed wildly, since he had no clue either). After I told him, I just can’t change those ports right now (since the remaining part of the cluster is in full production), we agreed that I’d do it some other time and tell him about my results.

Anyways, the next day my co-workers suggested, trying a newer Storage Manager version on the x3650, at the same level with the highest firmware version on the storages (thus being the DS4700 and v09.23). Now guess what ?

That fucking works. The cluster is still behaving weird sometimes (now the other boxen seem to have trouble bringing resources online, but only sometimes).

So here my hint: Always keep an old version of the Storage Manager around, you can’t get them from IBM anymore *shrug*

GPO (behind the scenes)

Well, to begin with we had this really weird problem that the thin clients as well as the terminal server would only load user based group policy if you are a member of the group of local administrators. While that’s ok for the thin clients (users can’t actually change something unless they log in as “Administrator” – don’t ask me why), it’s a real no-no on the terminal server.

We tried redoing *everything* (that is, starting with the domain, then terminal server and after that the thin clients) and yet nothing changed, it didn’t work either. That’s what I’ve been doing the last 2 weeks. Up till now, I always thought a user would have access to the ntuser.dat (that is HKEY_CURRENT_USER), if his NTFS permissions would be correct. But nooooooooooooooooooooo, Microsoft had to introduce another layer of permissions.

Old permissions on HKEY_CURRENT_USER
Old permissions on HKEY_CURRENT_USER
Once you change it to be proper (as in remove the dead user entry and add a group that actually gets you somewhere), it’s all starting to work!

New permissions on HKEY_CURRENT_USER
New permissions on HKEY_CURRENT_USER

Windows XP Embedded, Windows Server 2003 and GPO settings (the solution)

OK, so about an hour (yeah, yeah; I know .. I shouldn’t be working at that time, but it really gave me sleepless nights) ago, I finally figured out why the hell both my Windows XP Embedded thin clients as well as my Windows Server 2003 systems where showing this real *weird* behaviour when applying group policies, or more precise the user based configuration of a group policy.

The inspiration came to me after reading this and taking a look at regedit myself, where I noticed the entry “Permissions” for the first time ever since I’m using regedit. I also noticed, that the regedit permissions seem to be using the same groups, one would assign to NTFS resources.

That said, it really all boils down to the ntuser.dat (which *IS* HKEY_CURRENT_USER). As I created the profile with a different user than I am using it with (basically, I want ~12.000 users to use this one profile), I needed to change the permissions *INSIDE* regedit to include a group containing all these users. After that, any user could again merge the settings from ntuser.pol into HKEY_CURRENT_USERSoftwarePolicies, which in return gives you the joy of your fucking policies working again.

TADAAAAAA! About two weeks worth of work spent for such a shitty thing, and noticing it when you’re off work — priceless!