Hard Disk Failure - Germany host node

24/05/2011 00:06

Last night we performed a kernel upgrade on the host node. For this to take effect a reboot was required.

Unfortunately the host server did not come back online following the reboot and we have been working on this since approximately 2300 BST to restore services. We established that one of the hard disks in the RAID array had totally failed.

We decided that it was going to be best to copy all data to a remote location from the remaining hard disk, before getting the failed hard disk replaced. We need to do it this way because sometimes when you get this type of failure the hard disk identity between hard disk 1 and hard disk 2 get swapped so it is possible the DC would unwttingly replace the wrong drive and then without a remote copy everything would be lost.

We estimate the backup to a remote location will complete in the next 1 - 2 hours. We will then be able instruct the DC to replace the hard disk. Once that has been done we will need to re-install everything and restore client VPSs. We are of course trying to get this done ASAP but our best estimate based on the time it has taken to get to this point is that it is going to take another 12 to 15 hours assuming there are no further complications encountered.


BST 0000 24/05/2011 Hard Disk has been replaced and we are working in re-install of OS, SolusVM, and restoration where necessary. Further update around 0900 if not resolved by then.

BST 0907 24/05/2011 Still issues with booting of server. Waiting for Datacentre to attach full KVM device.

BST 1035 24/05/2011 This should now be resolved.