Decoupling Storage Performance from Capacity

SplitDecoupling storage performance from storage capacity is an interesting concept that has gained extra attention in recent times. Decoupling is predicated on a desire to scale performance when you need performance and to scale capacity when you need capacity, rather than traditional spindle-based scaling delivering both performance and capacity.

Also relevant is the idea that today’s legacy disk systems are holding back app performance. For example, VMware apparently claimed that 70% of all app performance support calls were caused by external disk systems.

The Business Value of Storage Performance

IT operations have spent the last 10 years trying to keep up with capacity growth, with less focus on performance growth. The advent of flash has however shown that even though you might not have a pressing storage performance problem, if you add flash your whole app environment will generally run faster and that can mean business advantages ranging from better customer experiences to more accurate business decision making.

A Better Customer Experience

My favorite example of performance affecting customer experience is from my past dealings with an ISP of whom I was a residential customer. I was talking to a call centre operator who explained to me that ‘the computer was slow’ and that it would take a while to pull up the information I was seeking. We chatted as he slowly navigated the system, and as we waited, one of the things he was keen to chat about was how much he disliked working for that ISP   : o

I have previously referenced a mobile phone company in the US who replaced all of their call centre storage with flash, specifically so as to deliver a better customer experience. The challenge with that is cost. The CIO was quoted as saying that the cost to go all flash was not much more per TB than he had paid for tier1 storage in the previous buying cycle (i.e. 3 or maybe 5 years earlier). So effectively he was conceding that he was paying more per TB for tier1 storage now than he was some years ago. Because the environment deployed did not decouple performance from capacity however, that company has almost certainly significantly over-provisioned storage performance, hence the cost per TB being higher than on the last buying cycle.

More Accurate Business Decision Making

There are many examples of storage performance improvements leading to better business decisions, most typically in the area of data warehousing. When business intelligence reports have more up to date data in them, and they run more quickly, they are used more often and decisions are more likely to be evidence-based rather than based on intuition. I recall one CIO telling me about a meeting of the executive leadership team of his company some years ago where each exec was asked to write down the name of the company’s largest supplier – and each wrote a different name – illustrating the risk of making decisions based on intuition rather than on evidence/business intelligence.

Decoupling Old School Style

Of course we have always been able to decouple performance and capacity to some extent, and it was traditionally called tiering. You could run your databases on small fast drives RAID10 and your less demanding storage on larger drives with RAID5 or RAID6. What that didn’t necessarily give you was a lot of flexibility.

Products like IBM’s SAN Volume Controller introduced flexibility to move volumes around between tiers in real-time, and more recently VMware’s Storage vMotion has provided a sub-set of the same functionality.

And then sub-lun tiering (Automatic Data Relocation, Easy Tier, FAST, etc) reduced the need for volume migration as a means of managing performance, by automatically promoting hot chunks to flash, and dropping cooler chunks to slower disks. You could decouple performance from capacity somewhat by choosing your flash to disk ratio appropriately, but you still typically had to be careful with these solutions since the performance of, for example, random writes that do not go to flash would be heavily dependent on the disk spindle count and speed.

So for the most part, decoupling storage performance and capacity in an existing disk system has been about adding flash and trying not to hit internal bottlenecks.

Traditional random I/O performance is therefore a function of:

  1. the amount/percent of flash cf the data block working set size
  2. the number and speed of disk spindles
  3. bus and cache (and sometimes CPU) limitations

Two products that bring their own twists to the game:

Nimble Storage

CASL

Nimble Storage uses flash to accelerate random reads, and accelerates writes through compression into sequential 4.5MB stripes (compare this to IBM’s Storwize RtC which compresses into 32K chunks and you can see that what Nimble is doing is a little different).

Nimble performance is therefore primarily a function of

  1. the amount of flash (read cache)
  2. the CPU available to do the compression/write coalescing

The number of spindles is not quite so important when you’re writing 4.5MB stripes. Nimble systems generally support at least 190 TB nett (if I assume 1.5x compression average, or 254 TB if you expect 2x) from 57 disks and they claim that performance is pretty much decoupled from disk space since you will generally hit the wall on flash and CPU before you hit the wall on sequential writes to disk. Also this kind of decoupling allows you to get good performance and capacity in a very small amount of rack space. Nimble also offers CPU scaling in the form of a scale-out four-way cluster.

Nimble have come closer to decoupling performance and capacity than any other external storage vendor I have seen.

PernixData FVPPernixData

PernixData Flash Virtualization Platform (FVP) is a software solution designed to build a flash read/write cache inside a VMware ESXi cluster, thereby accelerating I/Os without needing to add anything to your external disk system. PernixData argue that it is more cost effective and efficient to add flash into the ESXi hosts than it is to add them into external storage systems. This has something in common with the current trend for converged scale-out server/storage solutions, but PernixData also works with existing external SAN environments.

There is criticism that flash technologies deployed in external storage are too far away from the app to be efficient. I recall Amit Dave (IBM Distinguished Engineer) recounting an analogy of I/O to eating, for which I have created my own version below:

  • Data in the CPU cache is like food in your spoon
  • Data in the server RAM is like food on your plate
  • Data in the shared Disk System cache is like food in the serving bowl in the kitchen
  • Data on the shared Disk System SSDs is like food you can get from your garden
  • Data on hard disks is like food in the supermarket down the road

PernixData works by keeping your data closer to the CPU – decoupling performance and capacity by focusing on a server-side caching layer and scaling alongside your compute ESXi cluster. So this is analagous to getting food from your table rather than food from your garden. With PernixData you tend to scale performance as you add more compute nodes, rather than when you add more back-end capacity.

To Decouple or not to Decouple?

Decoupling as a theoretical concept is surely a good thing – independent scaling in two dimensions – and it is especially nice if it can be done without introducing significant extra cost, complexity or management overhead.

It is however probably also fair to say that many other systems can approximate the effect, albeit with a little more complexity.

———————————————————————————————————-

Disclosures:

Jim Kelly holds PernixPrime accreditation from PernixData and is a certified Nimble Storage Sales Professional. ViFX is a reseller of both Nimble Storage and PernixData.

IBM Software-defined Storage

The phrase ‘Software-defined Storage’ (SDS) has quickly become one of the most widely used marketing buzz terms in storage. It seems to have originated from Nicira’s use of the term ‘Software-defined Networking’ and then adopted by VMware when they bought Nicira in 2012, where it evolved to become the ‘Software-defined Data Center’ including ‘Software-defined Storage’. VMware’s VSAN technology therefore has the top of mind position when we are talking about SDS. I really wish they’d called it something other than VSAN though, so as to avoid the clash with the ANSI T.11 VSAN standard developed by Cisco.

I have seen IBM regularly use the term ‘Software-defined Storage’ to refer to:

  1. GPFS
  2. Storwize family (which would include FlashSystem V840)
  3. Virtual Storage Center / Tivoli Storage Productivity Center

I recently saw someone at IBM referring to FlashSystem 840 as SDS even though to my mind it is very much a hardware/firmware-defined ultra-low-latency system with a very thin layer of software so as to avoid adding latency.

Interestingly, IBM does not seem to market XIV as SDS, even though it is clearly a software solution running on commodity hardware that has been ‘applianced’ so as to maintain reliability and supportability.

Let’s take a quick look at the contenders:

1. GPFS: GPFS is a file system with a lot of storage features built in or added-on, including de-clustered RAID, policy-based file tiering, snapshots, block replication, support for NAS protocols, WAN caching, continuous data protection, single namespace clustering, HSM integration, TSM backup integration, and even a nice new GUI. GPFS is the current basis for IBM’s NAS products (SONAS and V7000U) as well as the GSS (gpfs storage server) which is currently targeted at HPC markets but I suspect is likely to re-emerge as a more broadly targeted product in 2015. I get the impression that gpfs may well be the basis of IBM’s SDS strategy going forward.

2. Storwize: The Storwize family is derived from IBM’s SAN Volume Controller technology and it has always been a software-defined product, but tightly integrated to hardware so as to control reliability and supportability. In the Storwize V7000U we see the coming together of Storwize and gpfs, and at some point IBM will need to make the call whether to stay with the DS8000-derived RAID that is in Storwize currently, or move to the gpfs-based de-clustered RAID. I’d be very surprised if gpfs hasn’t already won that long-term strategy argument.

3. Virtual Storage Center: The next contender in the great SDS shootout is IBM’s Virtual Storage Center and it’s sub-component Tivoli Storage Productivity Center. Within some parts of IBM, VSC is talked about as the key to SDS. VSC is edition dependent but usually includes the SAN Volume Controller / Storwize code developed by IBM Systems and Technology Group, as well as the TPC and FlashCopy Manager code developed by IBM Software Group, plus some additional TPC analytics and automation. VSC gives you a tremendous amount of functionality to manage a large complex site but it requires real commitment to secure that value. I think of VSC and XIV as the polar opposites of IBM’s storage product line, even though some will suggest you do both. XIV drives out complexity based on a kind of 80/20 rule and VSC is designed to let you manage and automate a complex environment.

Commodity Hardware: Many proponents of SDS will claim that it’s not really SDS unless it runs on pretty much any commodity server. GPFS and VSC qualify by this definition, but Storwize does not, unless you count the fact that SVC nodes are x3650 or x3550 servers. However, we are already seeing the rise of certified VMware VSAN-ready nodes as a way to control reliability and supportability, so perhaps we are heading for a happy medium between the two extremes of a traditional HCL menu and a fully buttoned down appliance.

Product Strategy: While IBM has been pretty clear in defining its focus markets – Cloud, Analytics, Mobile, Social, Security (the ‘CAMSS’ message that is repeatedly referred to inside IBM) I think it has been somewhat less clear in articulating a clear and consistent storage strategy, and I am finding that as the storage market matures, smart people are increasingly wanting to know what the vendors’ strategies are. I say vendors plural because I see the same lack of strategic clarity when I look at EMC and HP for example. That’s not to say the products aren’t good, or the roadmaps are wrong, but just that the long-term strategy is either not well defined or not clearly articulated.

It’s easier for new players and niche players of course, and VMware’s Software-defined Storage strategy, for example, is both well-defined and clearly articulated, which will inevitably make it a baseline for comparison with the strategies of the traditional storage vendors.

A/NZ STG Symposium: For the A/NZ audience, if you want to understand IBM’s SDS product strategy, the 2014 STG Tech Symposium in August is the perfect opportunity. Speakers include Sven Oehme from IBM Research who is deeply involved with gpfs development, Barry Whyte from IBM STG in Hursley who is deeply involved in Storwize development, and Dietmar Noll from IBM in Frankfurt who is deeply involved in the development of Virtual Storage Center.

Melbourne – August 19-22

Auckland – August 26-28

Steve Wozniak’s Birthday

Just a quick post to let readers know that I have resigned from IBM after 14 years with the company and I’m looking forward to starting work at ViFX on Monday 11th August, which it seems also happens to be Steve Wozniak‘s birthday.

I will work out in time what this means for the blog (my move to ViFX, not Steve’s birthday) but it’s pretty likely that I will also start looking at some non-IBM technologies – maybe including such things as VMware, Nutanix, Commvault, Actifio, Violin and Nimble Storage.

And having failed to create any meaningful link whatsoever between my move and the birth of the Woz I will leave it at that… until the 11th : )

 

 

IBM’s Scale-out FlashSystem Solution

IBM’s Flash strategy is a two-pronged approach, targeting the two segments that IDC labels as:

  1. Absolute Performance Flash
  2. Enterprise Flash

Last week I outlined the new FlashSystem 840 and focused mainly on the Absolute Performance aspect. Absolute Performance for IBM means latencies down around 95 microseconds write and 135 microseconds read, whereas most Flash storage systems in the market are talking 500+ microseconds best case. I’m guessing that in the new world of I/O bound applications, having 3 or 4 times the latency overhead could be a real problem for those vendors at some stage.

This week however I’d like to focus on the Enterprise Flash market segment.

Enterprise Flash

When we and IDC talk about Enterprise we are more concerned with the software stack and how it is used to address issues of:

  • Scalability
  • Snapshots & Clones
  • Replication
  • Storage Efficiency
  • Interoperability

The short answer to all of these is IBM’s SAN Volume Controller. Folks who are not very familiar with SVC often assume that SVC adds latency to storage. In the case of spinning disk systems, my experience has been that SVC reduces latency (due to intelligent caching effects) but takes about 5% of the top of maximum native IOPS. In the real world that means that things will almost always go faster with SVC than without it.

Scale-out Flash Latency

In the case of Flash, the picture is slightly different. The latencies of the FlashSystem 840 are so low that SVC caching does not fully compensate for other effects and the nett is that putting SVC in front of your FlashSystem 840 is likely to add around 100 micro-seconds of latency.

Yes that’s right, only 100 micro-seconds. I should add that I have not personally verified this, but have been told that is what we are seeing in IBM’s internal lab tests.

When you add 100 micro-seconds to the low latency of the FlashSystem 840 (95 microseconds write, 135 microseconds read) you still have numbers down below 250 microseconds, which is twice as fast as the numbers quoted on products like XtremIO and Violin 6200.

Even way back in 2008 we announced a benchmark result of 1 million IOPS with SVC and Flash, code-named Quicksilver. At the time the IBM statement said that IBM was planning a complete end-to-end systems approach to Flash and…

“Performance improvements of this magnitude can have profound implications for business, allowing two to three times the work to [be completed] in a given time frame for . . . time-sensitive applications like reservations systems, and financial program trading systems, and creating opportunity for entirely new insights in information-warehouses and analytics solutions”

So this is not new for IBM. The recently announced FlashSystem Solution with SVC is the culmination of six years of preparation (including SVC tuning) by IBM.

Full Enterprise Software Function Set

So you can understand now why IBM does not need to reinvent a whole separate scale-out offering of the sort that Whiptail Invicta (Cisco’s new EMC killer) and XtremIO Cluster (EMC’s new fat-boy SSD system) have tried to create. IBM can deliver a much more mature and feature-rich solution with consistent management and feature functions right across the board from the small V3700 with Easy Tier Flash right through to high-end SVC Flash Solutions like the one implemented by Sprint in 2013.

An Elegant Scale-Out Flash Solution

SVC brings proven data center credentials to scale-out Flash, delivering the full Storwize software stack while adding as little as 100 microseconds of latency. That is a good story and one that will not be easily matched by any competitor, and if the market would prefer something that is more tightly coupled from a hardware point of view then I don’t see why IBM couldn’t also deliver that in future if it wanted to.

So IBM has avoided the need to reinvent, develop, or buy-in a new immature scale-out mechanism for Flash. By using SVC you get FlashCopy snapshots and clones, as well as volume replication over IP, and Real-time Compression. But possibly most important of all is the full SVC interoperability matrix. How’s that for a software defined storage strategy that delivers rapid time-to-value in exactly the way it’s meant to.

For more info you can check out the IBM FlashSystem product page and the IBM Redbook Solution Guide “Implementing FlashSystem 840 with SAN Volume Controller

IBM FlashSystem Solution

IBM FlashSystem 840 for Legacy-free Flash

Flash storage is at an interesting place and it’s worth taking the time to understand IBM’s new FlashSystem 840 and how it might be useful.

A traditional approach to flash is to treat it like a fast disk drive with a SAS interface, and assume that a faster version of traditional systems are the way of the future. This is not a bad idea, and with auto-tiering technologies this kind of approach was mastered by the big vendors some time ago, and can be seen for example in IBM’s Storwize family and DS8000, and as a cache layer in the XIV. Using auto-tiering we can perhaps expect large quantities of storage to deliver latencies around 5 millseconds, rather than a more traditional 10 ms or higher (e.g. MS Exchange’s jetstress test only fails when you get to 20 ms).

No SSDs 3

Some players want to use all SSDs in their disk systems, which you can do with Storwize for example, but this is again really just a variation on a fairly traditional approach and you’re generally looking at storage latencies down around one or two millseconds. That sounds pretty good compared to 10 ms, but there are ways to do better and I suspect that SSD-based systems will not be where it’s at in 5 years time.

The IBM FlashSystem 840 is a little different and it uses flash chips, not SSDs. It’s primary purpose is to be very very low latency. We’re talking as low as 90 microseconds write, and 135 microseconds read. This is not a traditional system with a soup-to-nuts software stack. FlashSystem has a new Storwize GUI, but it is stripped back to keep it simple and to avoid anything that would impact latency.

This extreme low latency is a unique IBM proposition, since it turns out that even when other vendors use MLC flash chips instead of SSDs, by their own admission they generally still end up with latency close to 1 ms, presumably because of their controller and code-path overheads.

FlashSystem 840

  • 2u appliance with hot swap modules, power and cooling, controllers etc
  • Concurrent firmware upgrade and call-home support
  • Encryption is standard
  • Choice of 16G FC, 8G FC, 40G IB and 10G FCoE interfaces
  • Choice of upgradeable capacity
Nett of 2-D RAID5 4 modules 8 modules 12 modules
2GB modules 4 TB 12 TB 20 TB
4GB modules 8 TB 24 TB 40 TB
  • Also a 2 TB starter option with RAID0
  • Each module has 10 flash chips and each chip has 16 planes
  • RAID5 is applied both across modules and within modules
  • Variable stripe RAID within modules is self-healing

I’m thinking that prime targets for these systems include Databases and VDI, but also folks looking to future-proof their general performance. If you’re making a 5 year purchase, not everyone will want to buy a ‘mature’ SSD legacy-style flash solution, when they could instead buy into a disk-free architecture of the future.

But, as mentioned, FlashSystem does not have a full traditional software stack, so let’s consider the options if you need some of that stuff:

  • IMHO, when it comes to replication, databases are usually best replicated using log shipping, Oracle Data Guard etc.
  • VMware volumes can be replicated with native VMware server-based tools.
  • AIX volumes can be replicated using AIX Geographic Mirroring.
  • On AIX and some other systems you can use logical volume mirroring to set up a mirror of your volumes with preferred read set to the FlashSystem 840, and writes mirrored to a V7000 or (DS8000 or XIV etc), thereby allowing full software stack functions on the volumes (on the V7000) without slowing down the reads off the FlashSystem.
  • You can also virtualize FlashSystem behind SVC or V7000
  • Consider using Tivoli Storage Manager dedup disk to disk to create a DR environment

Right now, FlashSystem 840 is mainly about screamingly low latency and high performance, with some reasonable data center class credentials, and all at a pretty good price. If you have a data warehouse, or a database that wants that kind of I/O performance, or a VDI implementation that you want to de-risk, or a general workload that you want to future-proof, then maybe you should talk to IBM about FlashSystem 840.

Meanwhile I suggest you check out these docs:

The Latent Heat of Flash

The market is heating up and things are about to change.

Latent heat of Flash

When budgets are tight the focus often goes onto the price per terabyte and that can mean storage that is just responsive enough to stop the application owners from beating down IT’s door to complain.

Latency is about end-user Productivity

Over the years, application owners have been conditioned to run their apps on unbalanced infrastructures, usually with storage being the slowest part of the system. A 10 millisecond delay per online transaction has been generally considered acceptable for storage, while the app, the CPU, and the RAM all sit and wait. Not only that, but the way we size for transaction loads is often based on knowing the current peak transaction rate and response time and then allowing a percentage for headroom and growth. The sluggish old system is used to define the speed of the new one.

If we instead fully address the slowest link, improving it 10 fold, the apps run much faster, and as someone who often spends time each day waiting for IT systems to respond, I know that faster systems lead to better productivity.

In my last post I looked at the way some systems respond to the SPC-1 benchmark, often hitting 5 milliseconds read latency at less than half their rated IOPS. With the maturation of flash storage, the time is fast looming when 5 to 10 milliseconds will be considered unacceptable for online transaction processing, and sub-millisecond response will be expected for important apps. 

At what point does the cost equation move away from being based on $/tolerable-TB to $/high-productivity-TB for mainstream transactional apps? It’s hard to quantify the productivity gain from a storage system that is 10 times more responsive. Is it worth double the $/TB? 50% more? 33% more?

An easy place to start is with transactional apps that need up to 20 TB of space, because that’s now relatively easy and cost-effective on Flash, but if you’re like Sprint Nextel and you need 150 TB of Flash then IBM can handle that as well using multiple 1u FlashSystem 820s behind SAN Volume Controller. Sprint Nextel are number one for customer service in their market and the purchase was designed to allow call-center operatives to respond rapidly to customer queries. They are visionary enough to see Flash as a competitive business advantage.

In my earlier post on Flash called Feeding The Hogs I focused on the traditional sweet spots for Flash, but what I’m hearing out in the world seems to be slightly different – the idea that every transactional app deserves Flash performance and a dawning realisation that there are real productivity gains to be had.

For more information on the IBM FlashSystem 820 check out IBM’s Flash product page.

Waiting for the computer

 

IBM FlashSystem: Feeding the Hogs

IBM has announced its new FlashSystem family following on from the acquisition of Texas Memory Systems (RAMSAN) late last year.

The first thing that interests me is where FlashSystem products are likely to play in 2013 and this graphic is intended to suggest some options. Over time the blue ‘candidate’ box is expected to stretch downwards.

Resource hogs

Flash Candidates2

For the full IBM FlashSystem family you can check out the product page at http://www.ibm.com/storage/flash

Probably the most popular product will be the FlashSystem 820, they key characteristics of which are as follows:

Usable capacity options with RAID5

  • 10.3 TB per FlashSystem
  • 20.6 TB per FlashSystem
  • Up to 865 TB usable in a single 42u rack

Latency

  • 110 usec read latency
  • 25 usec write latency

IOPS

  • Up to 525,000 4KB random read
  • Up to 430,000 4KB 70/30 read/write
  • Up to 280,000 4KB random write

Throughput

  • up to 3.3 GB/sec FC
  • up to 5 GB/sec IB

Physical

  • 4 x 8 GB/sec FC ports
  • or 4 x 40 Gbps QDR Infiniband ports
  • 300 VA
  • 1,024 BTU/hr
  • 13.3 Kg
  • 1 rack unit

High Availability including 2-Dimensional RAID

  • Module level Variable Stripe RAID
  • System level RAID5 across flash modules
  • Hot swap modules
  • eMLC (10 x the endurance of MLC)

For those who like to know how things plug together under the covers, the following three graphics take you through conceptual and physical layouts.

FlashSystem Logical

FlashSystem

2D Flash RAID

With IBM’s Variable Stripe RAID, if one die fails in a ten-chip stripe, only the failed die is bypassed, and then data is restriped across the remaining nine chips.

Integration with IBM SAN Volume Controller (and Storwize V7000)

The IBM System Storage Interoperation Center is showing these as supported with IBM POWER and IBM System X (Intel) servers, including VMware 5.1 support.

The IBM FlashSystem is all about being fast and resilient. The system is based on FPGA and hardware logic so as to minimize latency. For those customers who want advanced software features like volume replication, snapshots (ironically called FlashCopy), thin provisioning, broader host support etc, the best way to achieve all of that is by deploying FlashSystem 820 behind a SAN Volume Controller (or Storwize V7000). This can also be used in conjunction with Easy Tier, with the SVC/V7000 automatically promoting hot blocks to the FlashSystem.

I’ll leave you with this customer quote:

“With some of the other solutions we tested, we poked and pried at them for weeks to get the performance where the vendors claimed it should be.  With the RAMSAN we literally just turned it on and that’s all the performance tuning we did.  It just worked out of the box.”

Feeding the hogs—feeding the hogs