Everything You Know About Disks Is Wrong
http://storagemojo.com/?p=383"The Google engineers just published a paper on Failure Trends in a Large Disk Drive Population. Based on a study of 100,000 disk drives over 5 years they find some interesting stuff. To quote from the abstract: 'Our analysis identifies several parameters from the drive's self monitoring facility (SMART) that correlate highly with failures. Despite this high correlation, we conclude that models based on SMART parameters alone are unlikely to be useful for predicting individual drive failures. Surprisingly, we found that temperature and activity levels were much less correlated with drive failures than previously reported.'"
google的工程师edpin,wolf,luiz调查研究了 5 年间10万片硬盘的使用情况,发表了一篇关于硬盘的论文,论文见附件
硬盘温度和硬盘的工作强度(activity levels)和硬盘失效狗屁关系都没有。
* Expensive 'enterprise' drives don't have notably better reliability than their 'consumer' counterparts (consider this conclusion in the context of my past recommendation of Western Digital 10,000 RPM Raptor SATA HDDs as a credible alternative to other manufacturers' much more costly SAS drives)
* S.M.A.R.T. error reporting only encompasses a fraction of all experience HDD failure mechanisms, and, specifically to this writeup's theme,
* RAID 1 and 5 are less robust than might appear to be the case at first glance...particularly when (as in my case...ahem) all of the drives in the RAID array come from the same manufacturer, and especially when they come from the same manufacturing lot. If one drive fails, the likelihood that a second drive will fail shortly thereafter is uncomfortably...likely.
*相对消费级的硬盘,昂贵的企业级硬盘驱动器并没有表现出更好的可靠性,因此,在这种情况下,我曾推荐的西数10000 RPM的猛禽消费级SATA硬盘,理所当然地可以作为一个相对更为可靠的昂贵企业级SAS(Serial-Attached SCSI )硬盘驱动器的替代
* s.m.a.r.t.错误报告只涵盖了一小部分硬盘失效的机制,特别是在我描述的这种情况下。
*RAID 1和5.更加显得脆弱 ,尤其是如同这种情况时:所有的硬盘来自同一制造商,尤其是当他们来自同一个批次,如果一个驱动器出故障,有可能第二个驱动器就会在此后不久出现令人不安的失效
* Costly FC and SCSI drives are more reliable than cheap SATA drives.
* RAID 5 is safe because the odds of two drives failing in the same RAID set are so low.
* After infant mortality, drives are highly reliable until they reach the end of their useful life.
* Vendor MTBF are a useful yardstick for comparing drives.
[ 此贴被winding在2008-03-03 15:51重新编辑 ]