[Note some additional info on RtC core usage added in blue 12th June 2012]
[Note also that testing shows that RtC works best with small block I/O e.g. databases, 4K, 8K, and has higher performance impact on larger I/O sizes. 13 March 2013]
Code level 6.4 has just been announced for IBM SAN Volume Controller and Storwize V7000 and among the new features is Realtime Compression (RtC) for primary data.
Comparing IBM RtC to post-process compression offerings from other vendors is a bit like comparing freshly squeezed orange juice to a box of reconstituted juice. IBM’s Realtime Compression is made fresh as you need it, but sooner or later the other vendors always rely on a big batch process. As it turns out, not only is reconstituted juice not very fresh, but neither is that box of not-from-concentrate juice. Only freshly squeezed is freshly squeezed. I found this quite interesting, so let’s digress for a moment…
What they don’t tell you about Not-from-Concentrate juice – Not-from-concentrate juice can be stored for up to a year. First they strip the juice of oxygen, so it doesn’t oxidize, but that also strips it of its flavour providing chemicals. Juice companies hire fragrance companies to engineer flavour packs to add back to the juice to try to make it taste fresh. Flavour packs aren’t listed as ingredients on the label because they are derived from orange essential oil. The packs added to juice earmarked for the US market contain a lot of ethyl butyrate. Mexicans and Brazilians favour the decanals (aldehydes), or terpene compounds such as valencine.
You can read this and a little more about the difference between freshly squeezed and boxed juice here. http://civileats.com/2009/05/06/freshly-squeezed-the-truth-about-orange-juice-in-boxes/
IBM’s Realtime Compression is based on the Random Access Compression Engine (RACE) that IBM acquired a couple of years ago. The unique offering here is that RtC is designed to work with primary data, not just archival data. It is, as the name implies, completely real-time for both reads and writes. A compressed volume is just another volume type, similar to a thin provisioned volume and new metrics are provided to monitor the performance overhead of the compression.
The system will report the volume size as seen by the host, the thin provisioned space assuming there was no compression, and the real space used nett of thin provisioning and compression savings. Also presented is a quick bubble showing savings from compression across the whole system. Space saving estimates are as per the following table:
Capacity Magic 5.7.4 now supports compression and caters for the variety of data types. Disk Magic will also be updated to take account of compression and a new redbook will be available shortly to cover it as well.
Most performance modelling I have seen on Storwize V7000 up until now shows controllers that are less than 10% busy, which is a good thing as RtC will use [up to] 3 out of 4 (Storwize V7000, SVC CF8) or 4 out of 6 (SVC CG8) CPU cores and 2GB of RAM. The GUI and other services still get to use the cores that RtC ‘owns’, but non-compressed I/O gets routed to the other cores. There has always been some hard-wiring of SVC cores, but we just haven’t talked about it before. The GUI can’t run on more than 2 out of 6 cores for example, and non-compressed I/O will never use more than 4 cores, that’s the way it’s always been, and RtC doesn’t change that.
Anyway, if you are more than 20% CPU busy on your current SVC or Storwize V7000 systems [extremely unlikely as SVC is a very low-CPU consumption architecture] the best way to deploy RtC would be to add another I/O group to your clustered system. I expect future hardware enhancements will see more cores per system. Storwize V7000 is currently a single 4 core processor per node, so there’s plenty of scope for increase there.
RtC is a licensed feature – licensed per enclosure on Storwize V7000 and per TB on SVC. In the coming weeks we will see how the pricing works out and that will determine the practical limits on the range of use cases. [Looks like it’s pretty cost-effective from what I’ve seen so far].
RACE sits below the copy services in the stack, so they all get to take advantage of the compression. RACE is integrated into the Thin Provisioning layer of the code so all of the usual Thin Provisioning capabilities like auto-expand are supported.
When you add a volume mirror you can choose to add the mirror as a compressed volume, which will be very useful for converting your existing volumes.
IBM’s patented approach to compression is quite different from the other vendors’.
Fixed Input : Variable Output – Netapp takes 32K chunks and spits them into some number of 4K compressed chunks with some amount of padding, but Netapp block re-writes are not compressed in real-time so the volume will grow as it’s used. Most workloads need to be run as post-process compression and you will need to be very careful of the interactions with Snapshots because of the way Netapp stores snaps inside the original volume.
Variable Input : Fixed Output – IBM’s RtC is designed for use on primary data. It takes a large variable input stream e.g. up to 160K in some cases (so has a larger scope to look for a repeated bit stream = better compression rates) and spits the compressed data out into a 32K fully allocated compressed chunk. Writing out a fixed 32K with no padding is more efficient and a key benefit is that all re-writes continue to be compressed. This is a completely real-time solution.
Note that RtC is not yet available on Storwize V7000 Unified.