Freshly Squeezed

[Note some additional info on RtC core usage added in blue 12th June 2012]

[Note also that testing shows that RtC works best with small block I/O e.g. databases, 4K, 8K, and has higher performance impact on larger I/O sizes. 13 March 2013]

Code level 6.4 has just been announced for IBM SAN Volume Controller and Storwize V7000 and among the new features is Realtime Compression (RtC) for primary data.

Comparing IBM RtC to post-process compression offerings from other vendors is a bit like comparing freshly squeezed orange juice to a box of reconstituted juice. IBM’s Realtime Compression is made fresh as you need it, but sooner or later the other vendors always rely on a big batch process. As it turns out, not only is reconstituted juice not very fresh, but neither is that box of not-from-concentrate juice. Only freshly squeezed is freshly squeezed. I found this quite interesting, so let’s digress for a moment…

What they don’t tell you about Not-from-Concentrate juice – Not-from-concentrate juice can be stored for up to a year. First they strip the juice of oxygen, so it doesn’t oxidize, but that also strips it of its flavour providing chemicals. Juice companies hire  fragrance companies to engineer flavour packs to add back to the juice to try to make it taste fresh. Flavour packs aren’t listed as ingredients on the label because they are derived from orange essential oil. The packs added to juice earmarked for the US market contain a lot of ethyl butyrate. Mexicans and Brazilians favour the decanals (aldehydes), or terpene compounds such as valencine.

You can read this and a little more about the difference between freshly squeezed and boxed juice here. http://civileats.com/2009/05/06/freshly-squeezed-the-truth-about-orange-juice-in-boxes/

IBM’s Realtime Compression is based on the Random Access Compression Engine (RACE) that IBM acquired a couple of years ago. The unique offering here is that RtC is designed to work with primary data, not just archival data. It is, as the name implies, completely real-time for both reads and writes. A compressed volume is just another volume type, similar to a thin provisioned volume and new metrics are provided to monitor the performance overhead of the compression.

The system will report the volume size as seen by the host, the thin provisioned space assuming there was no compression, and the real space used nett of thin provisioning and compression savings. Also presented is a quick bubble showing savings from compression across the whole system. Space saving estimates are as per the following table:

Capacity Magic 5.7.4 now supports compression and caters for the variety of data types. Disk Magic will also be updated to take account of compression and a new redbook will be available shortly to cover it as well.

Most performance modelling I have seen on Storwize V7000 up until now shows controllers that are less than 10% busy, which is a good thing as RtC will use [up to] 3 out of 4 (Storwize V7000, SVC CF8) or 4 out of 6 (SVC CG8) CPU cores and 2GB of RAM. The GUI and other services still get to use the cores that RtC ‘owns’, but non-compressed I/O gets routed to the other cores. There has always been some hard-wiring of SVC cores, but we just haven’t talked about it before. The GUI can’t run on more than 2 out of 6 cores for example, and non-compressed I/O will never use more than 4 cores, that’s the way it’s always been, and RtC doesn’t change that.

Anyway, if you are more than 20% CPU busy on your current SVC or Storwize V7000 systems [extremely unlikely as SVC is a very low-CPU consumption architecture] the best way to deploy RtC would be to add another I/O group to your clustered system. I expect future hardware enhancements will see more cores per system. Storwize V7000 is currently a single 4 core processor per node, so there’s plenty of scope for increase there.

RtC is a licensed feature – licensed per enclosure on Storwize V7000 and per TB on SVC. In the coming weeks we will see how the pricing works out and that will determine the practical limits on the range of use cases. [Looks like it’s pretty cost-effective from what I’ve seen so far].

RACE sits below the copy services in the stack, so they all get to take advantage of the compression. RACE is integrated into the Thin Provisioning layer of the code so all of the usual Thin Provisioning capabilities like auto-expand are supported.

When you add a volume mirror you can choose to add the mirror as a compressed volume, which will be very useful for converting your existing volumes.

IBM’s patented approach to compression is quite different from the other vendors’.

Fixed Input : Variable Output – Netapp takes 32K chunks and spits them into some number of 4K compressed chunks with some amount of padding, but Netapp block re-writes are not compressed in real-time so the volume will grow as it’s used. Most workloads need to be run as post-process compression and you will need to be very careful of the interactions with Snapshots because of the way Netapp stores snaps inside the original volume.

Variable Input : Fixed Output – IBM’s RtC is designed for use on primary data. It takes a large variable input stream e.g. up to 160K in some cases (so has a larger scope to look for a repeated bit stream = better compression rates) and spits the compressed data out into a 32K fully allocated compressed chunk. Writing out a fixed 32K with no padding is more efficient and a key benefit is that all re-writes continue to be compressed. This is a completely real-time solution.

Note that RtC is not yet available on Storwize V7000 Unified.


Advertisements

12 Responses

  1. interesting, is this some kind of deduplication? since I’ve been workign with v7000 and SVC and I was thinking these 2 products did not have any kind of data compression….

    Like

    • RtC is more like the LZW compression that you get when you write to a tape drive. It looks for savings within a time-contiguous stream. Dedup by comparison tries to find savings across a volume so it has to compare more data. Dedup (e.g. in TSM) is probably best suited to storing a large number of full backup copies to disk. RtC is more suited to providing everyday savings.

      Like

  2. Finally we know, that IBM bought Storwize not only to have nice name for new storage :)
    On serious note, I assume, that compressed volumes will be natively “thin”, but will we be able to monitor not only final, compressed size of volume, but also size already written by host (like in normal thin provisioned volumes). It’s important, so we can see real gain from compression, as well as do our future sizing properly.

    Like

    • Yes indeed – “The system will report the volume size as seen by the host, the thin provisioned space assuming there was no compression, and the real space used nett of thin provisioning and compression savings.”

      Like

  3. […] on here Rate this:Share this:TwitterEmailLinkedInPrintDiggFacebook Leave a Comment by rogerluethy on […]

    Like

  4. Thanks for the writeup.

    How many concurrent compression process can be run on the V7000? i.e. how many volumes can be compressed simultaneously considering it is in-line compression?

    And what is license cost of compression on V7000? What is the total cost including maintenance per enclosure per year?

    Like

    • There is currently a maximum of 200 compressed volumes supported per iogroup (i.e. per Storwize V7000 head unit, or SVC node pair). Pricing you will need to get from your local IBM Business Partner.

      Like

  5. This is a great feature and I bet EMC, HDS and NetApp engineers must now be working overtime to get something similar.

    Like

  6. The only downside I see is that RTC is currently not supported on EasyTier LUN’s.

    However would be great for File/Exchange type LUN’s

    Like

    • The foundation has been laid and I expect that over time the RtC/Easy Tier thing will be addressed, along with other refinements and increases in performance and scalability etc.

      Like

    • One thing that Jim didn’t mention is that with RtC alone you may well get a performance increase (their being less data to read/write etc.). I’m not suggesting it would be in the realm of easy-tier performance though. We did a little modelling on Capacity Magic and found a pleasant increase with RtC enabled. Now we just have to convince our customer :-)

      Also look for the Comprestimator tool to run against your actual data. http://www-01.ibm.com/support/docview.wss?uid=ssg1S4001012

      You can try RtC for ‘free’ for 45 (?) days. Just get to V6.4 create a compressed mirror and see how it goes.

      Like

  7. […] Compression (RtC) for Storwize disk systems I covered the technology in a post entitled “Freshly Squeezed“. The challenge with RtC in practice turned out to be that on many workloads it just […]

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: