This post started life earlier this year as a post on the death of RAID-5 being signaled by the arrival of 3TB drives. The point being that you can’t afford to be exposed to a second drive failure for 2 or 3 whole days especially given the stress those drives are under during that rebuild period.
But the more I thought about RAID rebuild times the more I realized how little I actually knew about it and how little most other people know about it. I realized that what I knew was based a little too much on snippets of data, unreliable sources and too many assumptions and extrapolations. Everybody thinks they know something about disk rebuilds, but most people don’t really know much about it at all and thinking you know something is worse than knowing you don’t.
In reading this so far it started to remind me of an old Abbot and Costello sketch.
Anyway you’d think that the folks who should know the real answers might be operational IT staff who watch rebuilds nervously to make sure their systems stay up, and maybe vendor lab staff who you would think might get the time and resources to test these things, but I have found it surprisingly hard to find any systematic information.
I plan to add to this post as information comes to hand (new content in green) but let’s examine what I have been able to find so far:
1. The IBM N Series MS Exchange 2007 best practices whitepaper mentions a RAID-DP (RAID6) rebuild of a 146GB 15KRPM drive in a 14+2 array taking 90 minutes (best case).
Netapp points out that there are many variables to consider, including the setting of raid.reconstruct.perf_impact at either low, medium or high, and they warn that a single reconstruction effectively doubles the I/O occurring on the stack/loop, which becomes a problem when the baseline workload is more than 50%.
Netapp also says that rebuild times of 10-15 hours are normal for 500GB drives, and 10-30 hours for 1TB drives.
2. The IBM DS5000 Redpiece “Considerations for RAID-6 Availability and Format/Rebuild Performance on the DS5000” shows the following results for array rebuild times on 300GB drives as the arrays get bigger:
I’m not sure how we project this onto larger drive sizes without more lab data. In these two examples there was little difference between N Series 14+2 146GB and DS5000 14+2 300GB, but common belief is that rebuild times rise proportionally to drive size. The 2008 Hitachi whitepaper “Why Growing Businesses Need RAID 6 Storage” however, mentions a minimum of 24 hours for a rebuild of an array with just 11 x 1TB drives in it on an otherwise idle disk system.
What both IBM and Netapp seem to advise is that rebuild time is fairly flat until you get above 16 drives, although Netapp seems to be increasingly comfortable with larger RAID sets as well.
3. A 2008 post from Tony Pearson suggests that “In a typical RAID environment, say 7+P RAID-5, you might have to read 7 drives to rebuild one drive, and in the case of a 14+2 RAID-6, reading 15 drives to rebuild one drive. It turns out the performance bottleneck is the one drive to write, and today’s systems can rebuild faster Fibre Channel (FC) drives at about 50-55 MB/sec, and slower ATA disk at around 40-42 MB/sec. At these rates, a 750GB SATA rebuild would take at least 5 hours.”
Extrapolating from that would suggest that a RAID5 1TB rebuild is going to take at least 9 hours, 2TB 18 hours, and 3TB 27 hours. The Hitachi whitepaper figure seems to be a high outlier, perhaps dependent on something specific to the Hitachi USP architecture.
Tony does point out that his explanation is a deliberate over-simplification for the purposes of accessibility, perhaps that’s why it doesn’t explain why there might be step increases in drive rebuild times at 8 and 16 drives.
4. The IBM DS8000 Performance Monitoring and Tuning redbook states “RAID 6 rebuild times are close to RAID 5 rebuild times (for the same size disk drive modules (DDMs)), because rebuild times are primarily limited by the achievable write throughput to the spare disk during data reconstruction.” and also “For array rebuilds, RAID 5, RAID 6, and RAID 10 require approximately the same elapsed time, although RAID 5 and RAID 6 require significantly more disk operations and therefore are more likely to impact other disk activity on the same disk array.”
The below image just came to hand. It shows how the new predictive rebuilds feature on DS8000 can reduce rebuild times. Netapp do a similar thing I believe. Interesting that it does show a much higher rebuild rate than the 50MB/sec that is usually talked about.
5. The EMC whitepaper “The Effect of Priorities on LUN Management Operations” focuses on the effect of assigned priority as one would expect, but is nonetheless very useful in helping to understanding generic rebuild times (although it does contain a strange assertion that SATA drives rebuild faster than 10KRPM drives, which I assume must be a tranposition error). Anyway, the doc broadly reinforces the data from IBM and Netapp, including this table.
This seems to show that increase in rebuild times is more linear as the RAID sets get bigger, as compared to IBM’s data which showed steps at 8 and 16. One person with CX4 experience reported to me that you’d be lucky to get close to 30MB/sec on a RAID5 rebuild on a typical working system and when a vault drive is rebuilding with priority set to ASAP not much else gets done on the system at all. It remains unclear to me how much of the vendor variation I am seeing is due to reporting differences and detail levels versus architectural differences.
6. IBM SONAS 1.3 reports a rebuild time of only 9.8 hours for a 3TB drive RAID6 8+2 on an idle system, and 6.1 hours on a 2TB drive (down from 12 hours in SONAS 1.2). This change from 12 hours down to 6.1 comes simply from a code update, so I guess this highlights that not all constraints on rebuild are physical or vendor-generic.
7. March 2012: I just found this pic from the IBM Advanced Technical Skills team in the US. This gives me the clearest measure yet of rebuild times on IBM’s Storwize V7000. Immediately obvious is that the Nearline drive rebuild times stretch out a lot when the target rebuild rate is limited so as to reduce host I/O impact, but the SAS and SSD drive rebuild times are pretty impressive. The table also came with an comment estimating that 600GB SAS drives would take twice the rebuild time of the 300GB SAS drives shown.
~
In 2006 Hu Yoshida posted that “it is time to replace 20 year old RAID architectures with something that does not impact I/O as much as it does today with our larger capacity disks. This is a challenge for our developers and researchers in Hitachi.”
I haven’t seen any sign of that from Hitachi, but IBM’s XIV RAID-X system is perhaps the kind of thing he was contemplating. RAID-X achieves re-protection rates of more than 1TB of actual data per hour and there is no real reason why other disk systems couldn’t implement the scattered RAID-X approach that XIV uses to bring a large number of drives into play on data rebuilds, where protection is about making another copy of data blocks as quickly as possible, not about drive substitution.
So that’s about as much as I know about RAID rebuilds. Please feel free to send me your own rebuild experiences and measurements if you have any.
-40.096942
168.162158
Filed under: N Series, Storwize V7000, XIV | 9 Comments »