RAID-5 was long hailed as the enterprise-level storage solution and a fit for nearly every application. The truth is, RAID-5 was designed back in the 80’s to save cost without completely sacrificing redundancy. Back then the cost per byte for storage on enterprise-class drives was so expensive that researchers were scrambling for a solution to store more data for less money.
Let’s say you needed 100MB of storage space and disk-level redundancy. Let’s also say, a 20MB SCSI drive cost $1,000.00. Before RAID-5, you’d buy 10 drives, create 5 RAID-1 arrays at 20MB each, and split your data set up to fit across these 5 separate arrays. Not only is this expensive at $10,000.00, but the storage space you require is split across 5 arrays. With RAID-5, 6 20MB disks gave you 100MB of space, and redundancy. That saves $4,000.00 per storage unit implemented! Sure, there were caveats, but with those kinds of savings, nobody was paying attention.
Welcome to the 21st century. The database is king, and everyone wants performance! Unfortunately, one of RAID-5’s biggest caveats is sacrificing performance, and developers and admins are finally starting to notice. Let’s take a look at the 5 biggest caveats of the RAID level most synonymous with enterprise storage for so many years:
- Performance, Performance, Performance! RAID-5 has significant write penalties all the time due to the requirement for parity calculation. Most implementations also suffer poor read performance, even though RAID-5 proponents consider this one of the “strengths” of RAID-5.
- Rebuild times are horrifying slow. Try days instead of hours for large storage arrays due to the need to read, calculate parity and write every disk in the array for each megabyte rebuilt. This can literally translate to days of downtime for a single disk failure depending on the I/O performance required for the storage to be usable.
- Double trouble. Because rebuilds can take so long, and encompass so many reads and writes, the potential for a second disk to fail when using large drives is actually not unreasonable. A second failed disk in a RAID-5 always means goodbye to all data.
- Trashed Bits. Most controllers perform no parity check on data when read from disk. If they did, the read performance of RAID-5 would be as poor as the writes. Unfortunately, this means if any disk ever returns garbage data on a read, and most reads will utilize blocks from every disk in the array, and that data is modified and re-written, garbage data is now written across the entire array. This is the #1 killer of RAID-5 arrays, and I bet you never heard about it.
- Wasted Cache. When using RAID-5, controllers hold unmodified reads in the cache as long as possible in an attempt to mitigate RAID-5’s write penalty. Controller cache is utilized much more intelligently in almost all other RAID configurations for reads.
Utilizing RAID-10, or multiple RAID-1 arrays if your data set permits will yield much higher I/O performance in nearly all cases, and RAID-10 can sustain multiple drive failures depending on where in the array the disks fail. There’s still a cost penalty for raw storage capacity, but not only is it minimal with todays drive prices, but if you’re primary goal is performance, RAID-10 is actually cheaper overall!
So when do you use RAID-5?
RAID-5 is still a legitimate fit for high-capacity storage with little or no I/O performance requirements and few reads/writes. Archiving data and low-utilization file-stores are a good example. If you need to archive 10TB of data a year, and access to the data is sporadic or low, RAID-5 is probably a worthwhile solution to consider and will still offer protection from single-drive failure.



Share this article:













