Meant to blog this a while back, but work has been keeping me busy. A friend of mine in the US, Joe Landman, runs a business making serious HPC storage gear and has found a rather disturbing problem with Corsair CMFSSD-32D1 SSD drives. Here is how he describes it after Corsair went silent on him about this issue (ellipses are his):
We are experiencing about a 70% failure rate, within 3 months of acquisition. In many different chassis, in many different parts of the world, with many different power supplies, many different motherboards. This is a time correlated failure. I have never … ever … in 25+ years doing this stuff … ever … seen anything like this. Its either a really … really bad silicon error in a controller chip or a firmware bug … or some other crappy part.
It came right out of the blue and the failure mode is pretty scary:
Imagine for a moment, you have these in a RAID 1 configuration. And because of the the failure, the unit refuses to get past the POST section. So there you are, with a remote machine, say, I dunno, 6000 miles away from you, and an SSD, with a putative 100+ year MTBF fails, and fails in a way that stops POST. So the system on reboot, freezes at the drive detection phase.
Remember that with a 2 drive RAID1 mirror and a 70% failure rate (plus Murphy) you’re looking at a real risk of a double disk failure, which Joe has seen at some of his customers. He’s got a neat way to use a loopback device on a spinning disk as an extra member of the RAID1 set to at least have a copy of the data where it can be recovered from.
So tell your friends, just say “NO” to Corsair CMFSSD-32D1 SSD’s.