HSM is essentially a way to push disk files to lower tiers, mainly tape, while leaving behind a stub-file on disk, so that the file maintains it’s accessibility and its place in the directory tree.
I say tape because there are other ways to do it between disk tiers that don’t involve stub files. e.g. IBM’s SONAS uses it’s built-in virtualization capabilites to move files between disk tiers, without changing their place in the directory tree, but SONAS can also use Tivoli Space Management to migrate those files to tape using HSM.
HSM started life as DFHSM [DFSMShsm] on IBM mainframe and I use it most weeks in that context when I log into one of IBM’s mainframe apps and wait a minute or two for it to recall my database query files to disk. That’s some pretty aggressive archiving that’s going on, and yes it’s bullet-proof.
I know of a couple of instances in the early 2000’s when companies got excited about file-based Information Lifecycle Management, and implemented HSM products (not IBM ones) on Microsoft Windows. Both of those companies removed HSM not long after, having experienced blue screens of death and long delays. The software was flaky and the migration policies probably not well thought out (probably too aggressive given the maturity of open systems HSM at the time). Being conservative, IBM came a little late to the game with Open Systems HSM, which is not necessarily a bad thing, but when it came, it came to kick butt.
Tivoli Space Management is a pretty cool product. Rock solid and feature rich. It runs on *NIX and our customers rely on it for some pretty heavy-duty workloads, migrating and recalling files to and from tape at high speed. I know one customer with hundreds of terabytes under HSM control in this way. TSM HSM for Windows is another slightly less sophisticated product in the family, but one I’m not so familiar with.
One could argue that Space Management has been limited as a product by its running on *NIX operating systems only, when most file servers out in the world were either Windows or Netapp, but things are changing. HSM is most valuable in really large file envionments – yes, the proverbial BIG DATA, and BIG DATA is not typically running on either Windows or Netapp. IBM’s SONAS for example, scalable to 14 Petabytes of files, is an ideal place for BIG DATA, and hence an ideal place for HSM.
As luck would have it, IBM has integrated Space Management into SONAS. SONAS will feed out as much CIFS, NFS, FTP, HTTP etc as you want, and if you install a Space Management server it will also provide easy integration to HSM policies that will migrate and recall data from tape based on any number of file attributes, but I guess most typically ‘time last accessed’ and file size.
Tape is by far the cheapest way to store large amounts of data, the trick is in making the data easily accessible. I have in the past tried to architect HSM solutions for both Netapp and Windows environments, and both times it ended up in the too hard basket, but with SONAS, HSM is easy. SONAS is going to be a really big product for IBM over the coming years as the BIG DATA explosion takes hold, and the ability to really easily integrate HSM to tape, from terabytes to petabytes, and have it perform so solidly is a feature of SONAS that I really like.
Tape has many uses…