We make extensive use of Hyper-V within Basefarm and I recently encountered a strange problem when doing maintenance on a 5 node windows cluster running the hyper-v role. For reasons I won’t bore you with here (pre-production testing basically), I had been evicting and adding nodes to my host level windows cluster, but when trying to add a node back I encountered errors in the validation tests. This was strange as the node had actually previously been in the cluster and nothing had been changed on it whatsoever, so I already knew that it had previously passed the validation test and been successfully running as a cluster node!
The cluster validation reported an error of this format in the storage tests named
Validate SCSI device Vital Product Data (VPD)
It looks like the above picture when you view the validation report.
The actual errors returned were like this:
Failed to get SCSI page 83h VPD descriptors for cluster disk 1 from node <nodename> status 2
(I’ve removed the node name here obviously but it does say specifically which one in the report):
Fortunately before too long I found that this was due to a bug in hyper-v validation for which a hotfix is available here
Downloading and installing this on all the nodes and potential new nodes resolves the error.
This goes to show the value of pre-production testing as the aim of this cluster is to provide a dynamically expanding virtualisation service for one of our largest customers. If it came to a production situation where I needed to add a node quickly, this is not the type of scenario I would be wanting to troubleshoot live!