Dedup is happening fast all around us and the vendors are lining up, but it’s not always easy to compare what’s going on.
Netapp purchased Alacritus in April 2005 and delivered batch post-process dedup (ASIS) for its mainstream filers in 2007 and since then has really been the only vendor to successfully apply the technology to production workloads and it’s there ‘free’ with every filer you buy. The original Netapp Nearstor VTL (which didn’t run ONTAP) however was a slug and Netapp withdrew it from active marketing in 2009. Netapp tried to acquire Data Domain in 2009, but was beaten by EMC.
EMC acquired dedup technology when it bought Data Domain in June 2009 after a tussle with Netapp. Prior to the purchase of Data Domain, EMC used Quantum dedup technology in its VTLs.
Dell has just announced its intention to purchase Ocarina, with mention that their dedup technology will be integrated into Dell’s Equallogic product.
HP offer a VTL based on Sepaton’s product, but I barely hear it mentioned out in the market, which suggests to me that they are probably just quietly selling it into their installed base accounts.
IBM acquired its ProtecTIER dedup technology when it bought Diligent in April 2008. Diligent was another one of those companies based in Israel and operating under the guidance of Moshe Yanai. ProtecTIER inline dedup ‘HyperFactor’ technology is a little different that most others in that it doesn’t use hash addressing to assume two blocks are identical, but does an actual comparison to guarantee two blocks are identical before deduping. Interestingly this makes the I/O profile on the ProtecTIER repository random read intensive (rather than seq write intensive as many people assume).
Meanwhile the IBM Tivoli folks developed their own dedup capability into Tivoli Storage Manager as both a post-process option (TSM 6.1) and as a client-side dedup option (TSM 6.2) both included in the base TSM product (for ‘free’ as I love to say and as people love to correct me on).
HDS has had a partnership with Diligent and now IBM to resell ProtecTIER since 2006.
IBM initially offered its ProtecTIER (TS7650) product only as a gateway, ideally paired with XIV as a disk repository. Then in 2009 IBM offered a range of appliances, which were still really targeted at the enterprise space.
Specs on the IBM gateway and enterprise appliances are:
- Gateway – 500MB/sec single, 1000MB/sec clustered
- 7TB nett Appliance – 100MB/sec
- 18TB nett Appliance – 250MB/sec
- 36TB nett Appliance – 500MB/sec
This week IBM enhanced that by bringing its dedup appliances down to the entry-level space previously owned by the Data Domain DD600 family.
The new IBM TS7610:
- 4.4 TB nett Appliance, with optional non-disruptive upgrade to 5.9 TB nett, Up to 80 MB/sec (=~300GB/hr) and more, up to 150 TB deduped, weekly full of 3TB, daily incremental of 1TB.
- Restore speed is typically faster than the quoted backup speed
- Larger upgrade capacity promised soon
Meanwhile from the Data Domain DD600 spec sheet we get the following info on DD610 and DD630:
- DD610 “Up to 6 TB” raw decimal, max throughput 675 GB/hr backup or double that with “boost” and dedup capacity 75 TB decimal (195 TB “redundant”)
- DD630 “Up to 12 TB” raw decimal, max throughput 1.1 TB/hr backup or almost double that with “boost” and dedup capacity 165 TB decimal (420 TB “redundant”)
- Am I correct in thinking that restore speed is typically slower than backup speed?
EMC uses the “up to” phrase. Both vendors also talk in speculative terms about dedup capacity. Throughput is quoted by IBM in MB/sec (which I have translated roughly into GB/hr) while EMC quotes GB/hr.
IBM also uses the interesting “up to 80MB/sec and more” phrase, while internal presos show slightly faster rates. The IBM internal presos make mention that the figures quoted in public are pretty conservative – IBM likes to play it safe.
I’ll leave the final word on dedup capacity to xkcd.com