If you think back, I talked about my problems with MSCS while utilizing the IBM RDAC Multipath driver for Windows.
Everyone I talked to about this, including our IBM business partner and it’s systems engineers; as well as some IBM systems engineer (who in fact was an freelance guy hired by IBM), told me it had to do with how we did the zoning (stuffing every controller into a single zone), and that would be the reason why the x3650 was seeing that many drives.
When the freelance SE came to visit us, we redid the zoning, separating each endpoint connection (each HBA port to each controller port) into a different zone.
Additionally he told me, that was the only IBM™ supported configuration.
As you can see, I had to create ten different zones for each single port of the dual port fibre channel HBA and it’s corresponding endpoint (I guess, I still have to create more, since the DS4700 is having *two* ports per controller).
After we finished that, we rebooted the x3650 and hoped that would have fixed. Afterwards the IBM SE was baffled. Still seeing ~112 devices. What the heck ? He ranted about how awful this was and did some mumbo jumbo with his notebook, uploaded the ds4?00 configuration files to some web interface, but shortly afterwards said the storage configuration seemed to be fine on the first glance.
So we had another look at the storage configuration and he quickly found, that the other cluster ports were set to “Windows Cluster 2003 (Supporting DMP)” in the port configuration and said that’d be the cause why stuff still ain’t working (I think he guessed wildly, since he had no clue either). After I told him, I just can’t change those ports right now (since the remaining part of the cluster is in full production), we agreed that I’d do it some other time and tell him about my results.
Anyways, the next day my co-workers suggested, trying a newer Storage Manager version on the x3650, at the same level with the highest firmware version on the storages (thus being the DS4700 and v09.23). Now guess what ?
That fucking works. The cluster is still behaving weird sometimes (now the other boxen seem to have trouble bringing resources online, but only sometimes).
So here my hint: Always keep an old version of the Storage Manager around, you can’t get them from IBM anymore *shrug*