In 1978 IBM employee Norman Ken Ouchi was awarded patent 4092732 for a “System for recovering data stored in failed memory unit.” Technology that would later be known as RAID 5 with full stripe writes.
Hands up who’s still doing that or its RAID6 derivative 33 years later?
I have a particular distaste for technologies that need to be manually tuned. Maybe it stems from my having spent my early career in the 1980’s doing things like nightly delete-and-define operations on an IBM mainframe to eliminate file fragmentation, and doing weekly disk formats on Data General RDOS and AOS minicomputers, carefully positioning the bitmap by temporarily creating large empty files at the start of the drive, then placing the bitmap, then more empty space for new files, before restoring from tape to eliminate file fragmentation.
Anything that feels like it is even vaguely related to those procedures of the past belongs in the past a far as I’m concerned.
I have a couple of times been recently reminded of the continuing issue of sizing RAID arrays to take account of full stripe writes and using offsets to align writes to block boundaries. I’m not an expert on the details of start block alignment, but it’s an issue that affects most systems deployed today. Essentially what you’re trying to achieve is to have the system write each chunk of data (e.g. 64KiB) onto a single drive rather than have to split it across drives.
I started to put together a blog entry on how HBA max I/O size and strip size and file system block size affect each other but I bored myself so much I had to stick a needle in my eye to get myself to snap out of it.
So instead you’ll have to make do with this old blog entry from 2008 on disk alignment.
Full Stripe Writes are mainly relevant to large sequential writes e.g. when your setting up a d2d backup storage, or maybe a rich media archive, but also maybe general document filing and data warehousing. Sequential writes are not always the stuff of everyday OLTP tier1 storage. The notable exception being Netapp’s Data ONTAP which coalesces random writes into sequential writes. SVC 6.1 and Storwize V7000 have also had some recent tweaks to get them to exploit this and thereby improve SATA performance.
In RAID systems like RAID4/5/6 in order to change part of a stripe you need to take the existing data and parity into account (which requires extra I/Os) before writing the new data and parity changes. If you are doing sequential writes, you get the opportunity to bundle up the writes and do a whole RAID stripe at a time, which means you don’t need to muck around reading the existing data parity, you just write the whole new stripe, which uses fewer I/Os and is therefore faster. Cached disk systems will try to do full stripe writes if they can, but in order to take full advantage of a full stripe writes, your I/O size needs to match the RAID stripe size.
So by now you can see what I mean about the needle thing. Is all this low level complexity being exposed to the storage admin really the best we can do???