IBM Software-defined Storage

The phrase ‘Software-defined Storage’ (SDS) has quickly become one of the most widely used marketing buzz terms in storage. It seems to have originated from Nicira’s use of the term ‘Software-defined Networking’ and then adopted by VMware when they bought Nicira in 2012, where it evolved to become the ‘Software-defined Data Center’ including ‘Software-defined Storage’. VMware’s VSAN technology therefore has the top of mind position when we are talking about SDS. I really wish they’d called it something other than VSAN though, so as to avoid the clash with the ANSI T.11 VSAN standard developed by Cisco.

I have seen IBM regularly use the term ‘Software-defined Storage’ to refer to:

  1. GPFS
  2. Storwize family (which would include FlashSystem V840)
  3. Virtual Storage Center / Tivoli Storage Productivity Center

I recently saw someone at IBM referring to FlashSystem 840 as SDS even though to my mind it is very much a hardware/firmware-defined ultra-low-latency system with a very thin layer of software so as to avoid adding latency.

Interestingly, IBM does not seem to market XIV as SDS, even though it is clearly a software solution running on commodity hardware that has been ‘applianced’ so as to maintain reliability and supportability.

Let’s take a quick look at the contenders:

1. GPFS: GPFS is a file system with a lot of storage features built in or added-on, including de-clustered RAID, policy-based file tiering, snapshots, block replication, support for NAS protocols, WAN caching, continuous data protection, single namespace clustering, HSM integration, TSM backup integration, and even a nice new GUI. GPFS is the current basis for IBM’s NAS products (SONAS and V7000U) as well as the GSS (gpfs storage server) which is currently targeted at HPC markets but I suspect is likely to re-emerge as a more broadly targeted product in 2015. I get the impression that gpfs may well be the basis of IBM’s SDS strategy going forward.

2. Storwize: The Storwize family is derived from IBM’s SAN Volume Controller technology and it has always been a software-defined product, but tightly integrated to hardware so as to control reliability and supportability. In the Storwize V7000U we see the coming together of Storwize and gpfs, and at some point IBM will need to make the call whether to stay with the DS8000-derived RAID that is in Storwize currently, or move to the gpfs-based de-clustered RAID. I’d be very surprised if gpfs hasn’t already won that long-term strategy argument.

3. Virtual Storage Center: The next contender in the great SDS shootout is IBM’s Virtual Storage Center and it’s sub-component Tivoli Storage Productivity Center. Within some parts of IBM, VSC is talked about as the key to SDS. VSC is edition dependent but usually includes the SAN Volume Controller / Storwize code developed by IBM Systems and Technology Group, as well as the TPC and FlashCopy Manager code developed by IBM Software Group, plus some additional TPC analytics and automation. VSC gives you a tremendous amount of functionality to manage a large complex site but it requires real commitment to secure that value. I think of VSC and XIV as the polar opposites of IBM’s storage product line, even though some will suggest you do both. XIV drives out complexity based on a kind of 80/20 rule and VSC is designed to let you manage and automate a complex environment.

Commodity Hardware: Many proponents of SDS will claim that it’s not really SDS unless it runs on pretty much any commodity server. GPFS and VSC qualify by this definition, but Storwize does not, unless you count the fact that SVC nodes are x3650 or x3550 servers. However, we are already seeing the rise of certified VMware VSAN-ready nodes as a way to control reliability and supportability, so perhaps we are heading for a happy medium between the two extremes of a traditional HCL menu and a fully buttoned down appliance.

Product Strategy: While IBM has been pretty clear in defining its focus markets – Cloud, Analytics, Mobile, Social, Security (the ‘CAMSS’ message that is repeatedly referred to inside IBM) I think it has been somewhat less clear in articulating a clear and consistent storage strategy, and I am finding that as the storage market matures, smart people are increasingly wanting to know what the vendors’ strategies are. I say vendors plural because I see the same lack of strategic clarity when I look at EMC and HP for example. That’s not to say the products aren’t good, or the roadmaps are wrong, but just that the long-term strategy is either not well defined or not clearly articulated.

It’s easier for new players and niche players of course, and VMware’s Software-defined Storage strategy, for example, is both well-defined and clearly articulated, which will inevitably make it a baseline for comparison with the strategies of the traditional storage vendors.

A/NZ STG Symposium: For the A/NZ audience, if you want to understand IBM’s SDS product strategy, the 2014 STG Tech Symposium in August is the perfect opportunity. Speakers include Sven Oehme from IBM Research who is deeply involved with gpfs development, Barry Whyte from IBM STG in Hursley who is deeply involved in Storwize development, and Dietmar Noll from IBM in Frankfurt who is deeply involved in the development of Virtual Storage Center.

Melbourne – August 19-22

Auckland – August 26-28

My name is Storage and I’ll be your Server tonight…

Ever since companies like Data General moved RAID control into an external disk sub-system back in the early ’90s it has been standard received knowledge that servers and storage should be separate.

While the capital cost of storage in the server is generally lower than for an external centralised storage subsystem, having storage as part of each server creates fragmentation and higher operational management overhead. Asset life-cycle management is also a consideration – servers typically last 3 years and storage can often be sweated for 5 years since the pace of storage technology change has traditionally been slower than for servers.

When you look at some common storage systems however, what you see is that they do include servers that have been ‘applianced’ i.e. closed off to general apps, so as to ensure reliability and supportability.

  • IBM DS8000 includes two POWER/AIX servers
  • IBM SAN Volume Controller includes two IBM SystemX x3650 Intel/Linux servers
  • IBM Storwize is a custom variant of the above SVC
  • IBM Storwize V7000U includes a pair of x3650 file heads running RHEL and Tivoli Storage Manager (TSM) clients and Space Management (HSM) clients
  • IBM GSS (GPFS Storage Server) also uses a pair of x3650 servers, running RHEL

At one point the DS8000 was available with LPAR separation into two storage servers (intended to cater to a split production/non-production environment) and there was talk at the time of the possibility of other apps such as TSM being able to be loaded onto an LPAR (a feature that was never released).

Apps or features?: There are a bunch of apps that could be run on storage systems, and in fact many already are, except they are usually called ‘features’ rather than apps. The clearest examples are probably in the NAS world, where TSM and Space Management and SAMBA/CTDB and Ganesha/NFS, and maybe LTFS, for example, could all be treated as features.

I also recall Netapp once talking about a Fujitsu-only implementation of ONTAP that could be run in a VM on a blade server, and EMC has talked up the possibility of running apps on storage.

GPFS: In my last post I illustrated an example of using IBM’s GPFS to construct a server-based shared storage system. The challenge with these kinds of systems is that they put onus onto the installer/administrator to get it right, rather than the traditional storage appliance approach where the vendor pre-constructs the system.

Virtualization: Reliability and supportability are vital, but virtualization does allow the possibility that we could have ring-fenced partitions for core storage functions and still provide server capacity for a range of other data-oriented functions e.g. MapReduce, Hadoop, OpenStack Cinder & Swift, as well as apps like TSM and HSM, and maybe even things like compression, dedup, anti-virus, LTFS etc., but treated not so much as storage system features, but more as genuine apps that you can buy from 3rd parties or write yourself, just as you would with traditional apps on servers.

The question is not so much ‘can this be done’, but more, ‘is it a good thing to do’? Would it be a good thing to open up storage systems and expose the fact that these are truly software-defined systems running on servers, or does that just make support harder and add no real value (apart from providing a new fashion to follow in a fashion-driven industry)? My guess is that there is a gradual path towards a happy medium to be explored here.

IBM GPFS – Software Defined Storage

GPFS (General Parallel File System) is one of those very cool technologies that you can do so much with that it’s actually fun to design solutions with it (provided you’re the kind of person that also gets a kick from a nice elegant mathematical proof by induction).

Back in 2010 I was asked by an IBM systems software strategist for my opinion as to whether GPFS had potential as a mainstream product, or if it was best kept back as an underlying component in mainstream solutions. I was strongly in the component camp, but now I almost regret that, because it may be that really the only thing that was holding GPFS back was the lack of its own comprehensive GUI. That is something I still hope will be addressed in the not too distant future.

Anyway, this is a sample design that attempts to show some of the things you can do with GPFS by way of building a software defined storage and server environment.

The central box shows GPFS servers (virtualized in this example) and the left and right boxes show GPFS clients. GPFS also supports ILM policies between disk tiers and out to LTFS tape, as well as optional integration with HSM (via Tivoli Space Management) and fast efficient backup with Tivoli Storage Manager.

GPFS Software Defined Storage v4

There are of course a few caveats and restrictions. Check out the GPFS infocenter for the technical details.

This second diagram shows a simpler view of how to build a highly available software defined storage environment. The example shows two physical servers, but you can add many servers and still have a single storage pool. Mirroring is on a per volume basis. Also you could use GPFS native RAID to build a RAID6 array in each server for example.

VMware gpfs

IBM FlashSystem 840 for Legacy-free Flash

Flash storage is at an interesting place and it’s worth taking the time to understand IBM’s new FlashSystem 840 and how it might be useful.

A traditional approach to flash is to treat it like a fast disk drive with a SAS interface, and assume that a faster version of traditional systems are the way of the future. This is not a bad idea, and with auto-tiering technologies this kind of approach was mastered by the big vendors some time ago, and can be seen for example in IBM’s Storwize family and DS8000, and as a cache layer in the XIV. Using auto-tiering we can perhaps expect large quantities of storage to deliver latencies around 5 millseconds, rather than a more traditional 10 ms or higher (e.g. MS Exchange’s jetstress test only fails when you get to 20 ms).

No SSDs 3

Some players want to use all SSDs in their disk systems, which you can do with Storwize for example, but this is again really just a variation on a fairly traditional approach and you’re generally looking at storage latencies down around one or two millseconds. That sounds pretty good compared to 10 ms, but there are ways to do better and I suspect that SSD-based systems will not be where it’s at in 5 years time.

The IBM FlashSystem 840 is a little different and it uses flash chips, not SSDs. It’s primary purpose is to be very very low latency. We’re talking as low as 90 microseconds write, and 135 microseconds read. This is not a traditional system with a soup-to-nuts software stack. FlashSystem has a new Storwize GUI, but it is stripped back to keep it simple and to avoid anything that would impact latency.

This extreme low latency is a unique IBM proposition, since it turns out that even when other vendors use MLC flash chips instead of SSDs, by their own admission they generally still end up with latency close to 1 ms, presumably because of their controller and code-path overheads.

FlashSystem 840

  • 2u appliance with hot swap modules, power and cooling, controllers etc
  • Concurrent firmware upgrade and call-home support
  • Encryption is standard
  • Choice of 16G FC, 8G FC, 40G IB and 10G FCoE interfaces
  • Choice of upgradeable capacity
Nett of 2-D RAID5 4 modules 8 modules 12 modules
2GB modules 4 TB 12 TB 20 TB
4GB modules 8 TB 24 TB 40 TB
  • Also a 2 TB starter option with RAID0
  • Each module has 10 flash chips and each chip has 16 planes
  • RAID5 is applied both across modules and within modules
  • Variable stripe RAID within modules is self-healing

I’m thinking that prime targets for these systems include Databases and VDI, but also folks looking to future-proof their general performance. If you’re making a 5 year purchase, not everyone will want to buy a ‘mature’ SSD legacy-style flash solution, when they could instead buy into a disk-free architecture of the future.

But, as mentioned, FlashSystem does not have a full traditional software stack, so let’s consider the options if you need some of that stuff:

  • IMHO, when it comes to replication, databases are usually best replicated using log shipping, Oracle Data Guard etc.
  • VMware volumes can be replicated with native VMware server-based tools.
  • AIX volumes can be replicated using AIX Geographic Mirroring.
  • On AIX and some other systems you can use logical volume mirroring to set up a mirror of your volumes with preferred read set to the FlashSystem 840, and writes mirrored to a V7000 or (DS8000 or XIV etc), thereby allowing full software stack functions on the volumes (on the V7000) without slowing down the reads off the FlashSystem.
  • You can also virtualize FlashSystem behind SVC or V7000
  • Consider using Tivoli Storage Manager dedup disk to disk to create a DR environment

Right now, FlashSystem 840 is mainly about screamingly low latency and high performance, with some reasonable data center class credentials, and all at a pretty good price. If you have a data warehouse, or a database that wants that kind of I/O performance, or a VDI implementation that you want to de-risk, or a general workload that you want to future-proof, then maybe you should talk to IBM about FlashSystem 840.

Meanwhile I suggest you check out these docs:

A Quick IBM ProtecTIER (Dedup VTL) Update

This is a very brief update designed to help clarify a few things about IBM’s ProtecTIER dedup VTL solutions. The details of the software functions I will leave to the redbooks (see links below).

What is ProtecTIER?

The dedup algorithm in ProtecTIER is HyperFactor, which detects recurring data in multiple backups. HyperFactor is unique in that it avoids the risk of data corruption due to hash collisions, a risk that is inherent in products based on hashing algorithms. HyperFactor uses a memory resident index, rather than disk-resident hash tables and one consequence of this is that ProtecTIER’s restore times are shorter than backup times, in contrast to other products where restore times are generally much longer.

The amount of space saved is mainly a function of the backup policies and retention periods, and the variance of the data between them, but in general HyperFactor can deliver slightly better dedup ratios than hash-based systems. The more full-backups retained on ProtecTIER, and the more intervening incremental backups, the more space that will be saved overall.

One of the key advantages of ProtecTIER is the ability to replicate deduped data in a many to many grid. ProtecTIER also supports SMB/CIFS and NFS access.

While Tivoli Storage Manager also includes many of the same capabilities as ProtecTIER, the latter will generally deliver higher performance dedup, by offloading the process to a dedicated system, leaving TSM or other backup software to concentrate on selecting and copying files.

For more information on the software functionality etc, please refer to these links:

 

ProtecTIER Systems

In the past IBM has offered three models of ProtecTIER systems, two of which are now withdrawn, and a new one has since appeared.

  • TS7610 (withdrawn) – entry level appliance up to 6 TB and 80 MB/sec.
  • TS7620 – new entry level system. Up to 35 TB of deduped capacity. Backup speed of 300 MB/sec was originally quoted, but with recent capacity increases I am still trying to confirm if the rated throughput has changed.
  • TS7650A (withdrawn) – the midrange appliance which was rated at up to 36 TB and 500 MB/sec. This appliance was based on a back-end IBM (LSI/Netapp) DS4700 disk system with 450GB drives in RAID5 configuration.
  • TS7650G – the enterprise gateway, which is currently rated at 9 TB per hour backup and up to 11.2 TB per hour restore. Each TS7650G has support for multiple Storwize V7000 or XIV disk systems, both of which offer non-disruptive drive firmware update capability.

Sizing

There are a couple of rules of thumb I try to use when doing an initial quick glance sizing with the TS7650G with V7000 disk.

  • Every V7000 disk will give you another 20 GB per hour of ProtecTIER backup throughput. The I/O profile for files is approx 80/20 random R/W with a 60KB block size and we generally use RAID6 for that. Metadata is generally placed on separate RAID10 drives and is more like 20/80 R/W.
  • Backup storage (traditionally on tape) can be five to ten times the production storage capacity, so assuming a 10:1 dedup ratio, you might need a dedup disk repository between half and the same size as your production disk. However, if you know you are already storing x TB of backups on tape, don’t plan on buying less than x/10 dedup capacity. The dedup ratio can sometimes be as high as 25:1 but more typically it will be closer to 10:1.
  • It’s probably not a good idea to buy a dedup system that can’t easily grow to double the sized initial capacity. Dedup capacity is notoriously hard to predict and it can turn out to need more than you expected.

Those rules of thumb are not robust enough to be called a formal sizing, but they do give you a place to start in your thinking.

ProtecTIER

What do you get at an IBM Systems Technical Symposium?

What do you get at an IBM Systems Technical Symposium? Well for the event in Auckland, New Zealand November 13-15 I’ve tried to make the storage content as interesting as possible. If you’re interested in attending, send me an email at jkelly@nz.ibm.com and I will put you in contact with Jacell who can help you get registered. There is of course content from our server teams as well, but my focus has been on the storage content, planned as follows:

Erik Eyberg, who has just joined IBM in Houston from Texas Memory Systems following IBM’s recent acquisition of TMS, will be presenting “RAMSAN – The World’s Fastest Storage”. Where does IBM see RAMSAN fitting in and what is the future of flash? Check out RAMSAN on the web, on twitter, on facebook and on youtube.

Fresh from IBM Portugal and recently transferred to IBM Auckland we also welcome Joao Almeida who will deliver a topic that is sure to be one of the highlights, but unfortunately I can’t tell you what it is since the product hasn’t been announced yet (although if you click here you might get a clue).

Zivan Ori, head of XIV software development in Israel knows XIV at a very detailed level – possibly better than anyone, so come along and bring all your hardest questions! He will be here and presenting on:

  • XIV Performance – What you need to know
  • Looking Beyond the XIV GUI

John Sing will be flying in from IBM San Jose to demonstrate his versatility and expertise in all things to do with Business Continuance, presenting on:

  • Big Data – Get IBM’s take on where Big Data is heading and the challenges it presents and also how some of IBM’s products are designed to meet that challenge.
  • ProtecTIER Dedup VTL options, sizing and replication
  • Active/Active datacentres with SAN Volume Controller Stretched Cluster
  • Storwize V7000U/SONAS Global Active Cloud Engine multi-site file caching and replication

Andrew Martin will come in from IBM’s Hursley development labs to give you the inside details you need on three very topical areas:

  • Storwize V7000 performance
  • Storwize V7000 & SVC 6.4 Real-time Compression
  • Storwize V7000 & SVC Thin Provisioning

Senaka Meegama will be arriving from Sydney with three hot topics around VMware and FCoE:

  • Implementing SVC & Storwize V7000 in a VMware Environment
  • Implementing XIV in a VMware Environment
  • FCoE Network Design with IBM System Storage

Jacques Butcher is also coming over from Australia to provide the technical details you all crave on Tivoli storage management:

  • Tivoli FlashCopy Manager 3.2 including Vmware Integration
  • TSM for Virtual Environments 6.4
  • TSM 6.4 Introduction and Update plus TSM Roadmap for 2013

Maurice McCullough will join us from Atlanta, Georgia to speak on:

  • The new high-end DS8870 Disk System
  • XIV Gen3 overview and tour

Sandy Leadbeater will be joining us from Wellington to cover:

  • Storwize V7000 overview
  • Scale-Out NAS and V7000U overview

I will be reprising my Sydney presentations with updates:

  • Designing Scale Out NAS & Storwize V7000 Unified Solutions
  • Replication with SVC and Storwize V7000

And finally, Mike McKenzie will be joining us from Brocade in Australia to give us the skinny on IBM/Brocade FCIP Router Implementation.

Hierarchical Storage Management (HSM)

HSM is essentially a way to push disk files to lower tiers, mainly tape, while leaving behind a stub-file on disk, so that the file maintains it’s accessibility and its place in the directory tree.

I say tape because there are other ways to do it between disk tiers that don’t involve stub files. e.g. IBM’s SONAS uses it’s built-in virtualization capabilites to move files between disk tiers, without changing their place in the directory tree, but SONAS can also use Tivoli Space Management to migrate those files to tape using HSM.

HSM started life as DFHSM [DFSMShsm] on IBM mainframe and I use it most weeks in that context when I log into one of IBM’s mainframe apps and wait a minute or two for it to recall my database query files to disk. That’s some pretty aggressive archiving that’s going on, and yes it’s bullet-proof.

I know of a couple of instances in the early 2000’s when companies got excited about file-based Information Lifecycle Management, and implemented HSM products (not IBM ones) on Microsoft Windows. Both of those companies removed HSM not long after, having experienced blue screens of death and long delays. The software was flaky and the migration policies probably not well thought out (probably too aggressive given the maturity of open systems HSM at the time). Being conservative, IBM came a little late to the game with Open Systems HSM, which is not necessarily a bad thing, but when it came, it came to kick butt.

Tivoli Space Management is a pretty cool product. Rock solid and feature rich. It runs on *NIX and our customers rely on it for some pretty heavy-duty workloads, migrating and recalling files to and from tape at high speed. I know one customer with hundreds of terabytes under HSM control in this way. TSM HSM for Windows is another slightly less sophisticated product in the family, but one I’m not so familiar with.

One could argue that Space Management has been limited as a product by its running on *NIX operating systems only, when most file servers out in the world were either Windows or Netapp, but things are changing. HSM is most valuable in really large file envionments – yes, the proverbial BIG DATA, and BIG DATA is not typically running on either Windows or Netapp. IBM’s SONAS for example, scalable to 14 Petabytes of files, is an ideal place for BIG DATA, and hence an ideal place for HSM.

As luck would have it, IBM has integrated Space Management into SONAS. SONAS will feed out as much CIFS, NFS, FTP, HTTP etc as you want, and if you install a Space Management server it will also provide easy integration to HSM policies that will migrate and recall data from tape based on any number of file attributes, but I guess most typically ‘time last accessed’ and file size.

Tape is by far the cheapest way to store large amounts of data, the trick is in making the data easily accessible. I have in the past tried to architect HSM solutions for both Netapp and Windows environments, and both times it ended up in the too hard basket, but with SONAS, HSM is easy. SONAS is going to be a really big product for IBM over the coming years as the BIG DATA explosion takes hold, and the ability to really easily integrate HSM to tape, from terabytes to petabytes, and have it perform so solidly is a feature of SONAS that I really like.

Tape has many uses…

%d bloggers like this: