This is a very brief update designed to help clarify a few things about IBM’s ProtecTIER dedup VTL solutions. The details of the software functions I will leave to the redbooks (see links below).
What is ProtecTIER?
The dedup algorithm in ProtecTIER is HyperFactor, which detects recurring data in multiple backups. HyperFactor is unique in that it avoids the risk of data corruption due to hash collisions, a risk that is inherent in products based on hashing algorithms. HyperFactor uses a memory resident index, rather than disk-resident hash tables and one consequence of this is that ProtecTIER’s restore times are shorter than backup times, in contrast to other products where restore times are generally much longer.
The amount of space saved is mainly a function of the backup policies and retention periods, and the variance of the data between them, but in general HyperFactor can deliver slightly better dedup ratios than hash-based systems. The more full-backups retained on ProtecTIER, and the more intervening incremental backups, the more space that will be saved overall.
One of the key advantages of ProtecTIER is the ability to replicate deduped data in a many to many grid. ProtecTIER also supports SMB/CIFS and NFS access.
While Tivoli Storage Manager also includes many of the same capabilities as ProtecTIER, the latter will generally deliver higher performance dedup, by offloading the process to a dedicated system, leaving TSM or other backup software to concentrate on selecting and copying files.
For more information on the software functionality etc, please refer to these links:
- TS7650G Product Page
- TS7620 Product Page
- ProtecTIER 3.3 Redbook
- ProtecTIER Implementation Redbook
- Tips on using ProtecTIER with NFS
In the past IBM has offered three models of ProtecTIER systems, two of which are now withdrawn, and a new one has since appeared.
- TS7610 (withdrawn) – entry level appliance up to 6 TB and 80 MB/sec.
- TS7620 – new entry level system. Up to 35 TB of deduped capacity. Backup speed of 300 MB/sec was originally quoted, but with recent capacity increases I am still trying to confirm if the rated throughput has changed.
- TS7650A (withdrawn) – the midrange appliance which was rated at up to 36 TB and 500 MB/sec. This appliance was based on a back-end IBM (LSI/Netapp) DS4700 disk system with 450GB drives in RAID5 configuration.
- TS7650G – the enterprise gateway, which is currently rated at 9 TB per hour backup and up to 11.2 TB per hour restore. Each TS7650G has support for multiple Storwize V7000 or XIV disk systems, both of which offer non-disruptive drive firmware update capability.
There are a couple of rules of thumb I try to use when doing an initial quick glance sizing with the TS7650G with V7000 disk.
- Every V7000 disk will give you another 20 GB per hour of ProtecTIER backup throughput. The I/O profile for files is approx 80/20 random R/W with a 60KB block size and we generally use RAID6 for that. Metadata is generally placed on separate RAID10 drives and is more like 20/80 R/W.
- Backup storage (traditionally on tape) can be five to ten times the production storage capacity, so assuming a 10:1 dedup ratio, you might need a dedup disk repository between half and the same size as your production disk. However, if you know you are already storing x TB of backups on tape, don’t plan on buying less than x/10 dedup capacity. The dedup ratio can sometimes be as high as 25:1 but more typically it will be closer to 10:1.
- It’s probably not a good idea to buy a dedup system that can’t easily grow to double the sized initial capacity. Dedup capacity is notoriously hard to predict and it can turn out to need more than you expected.
Those rules of thumb are not robust enough to be called a formal sizing, but they do give you a place to start in your thinking.