Hypervisor / Storage Convergence

This is simply a re-blogging of an interesting discussion by James Knapp at http://www.vifx.co.nz/testing-the-hyper-convergence-waters/ looking at VMware Virtual SAN. Even more interesting than the blog post however is the whitepaper “How hypervisor convergence is reinventing storage for the pay-as-you-grow era” which ViFX has come up with as a contribution to the debate/discussion around Hypervisor storage.

I would recommend going to the first link for a quick read of what James has to say and then downloading the whitepaper from there for a more detailed view of the technology.

 

 

IBM Software-defined Storage

The phrase ‘Software-defined Storage’ (SDS) has quickly become one of the most widely used marketing buzz terms in storage. It seems to have originated from Nicira’s use of the term ‘Software-defined Networking’ and then adopted by VMware when they bought Nicira in 2012, where it evolved to become the ‘Software-defined Data Center’ including ‘Software-defined Storage’. VMware’s VSAN technology therefore has the top of mind position when we are talking about SDS. I really wish they’d called it something other than VSAN though, so as to avoid the clash with the ANSI T.11 VSAN standard developed by Cisco.

I have seen IBM regularly use the term ‘Software-defined Storage’ to refer to:

  1. GPFS
  2. Storwize family (which would include FlashSystem V840)
  3. Virtual Storage Center / Tivoli Storage Productivity Center

I recently saw someone at IBM referring to FlashSystem 840 as SDS even though to my mind it is very much a hardware/firmware-defined ultra-low-latency system with a very thin layer of software so as to avoid adding latency.

Interestingly, IBM does not seem to market XIV as SDS, even though it is clearly a software solution running on commodity hardware that has been ‘applianced’ so as to maintain reliability and supportability.

Let’s take a quick look at the contenders:

1. GPFS: GPFS is a file system with a lot of storage features built in or added-on, including de-clustered RAID, policy-based file tiering, snapshots, block replication, support for NAS protocols, WAN caching, continuous data protection, single namespace clustering, HSM integration, TSM backup integration, and even a nice new GUI. GPFS is the current basis for IBM’s NAS products (SONAS and V7000U) as well as the GSS (gpfs storage server) which is currently targeted at HPC markets but I suspect is likely to re-emerge as a more broadly targeted product in 2015. I get the impression that gpfs may well be the basis of IBM’s SDS strategy going forward.

2. Storwize: The Storwize family is derived from IBM’s SAN Volume Controller technology and it has always been a software-defined product, but tightly integrated to hardware so as to control reliability and supportability. In the Storwize V7000U we see the coming together of Storwize and gpfs, and at some point IBM will need to make the call whether to stay with the DS8000-derived RAID that is in Storwize currently, or move to the gpfs-based de-clustered RAID. I’d be very surprised if gpfs hasn’t already won that long-term strategy argument.

3. Virtual Storage Center: The next contender in the great SDS shootout is IBM’s Virtual Storage Center and it’s sub-component Tivoli Storage Productivity Center. Within some parts of IBM, VSC is talked about as the key to SDS. VSC is edition dependent but usually includes the SAN Volume Controller / Storwize code developed by IBM Systems and Technology Group, as well as the TPC and FlashCopy Manager code developed by IBM Software Group, plus some additional TPC analytics and automation. VSC gives you a tremendous amount of functionality to manage a large complex site but it requires real commitment to secure that value. I think of VSC and XIV as the polar opposites of IBM’s storage product line, even though some will suggest you do both. XIV drives out complexity based on a kind of 80/20 rule and VSC is designed to let you manage and automate a complex environment.

Commodity Hardware: Many proponents of SDS will claim that it’s not really SDS unless it runs on pretty much any commodity server. GPFS and VSC qualify by this definition, but Storwize does not, unless you count the fact that SVC nodes are x3650 or x3550 servers. However, we are already seeing the rise of certified VMware VSAN-ready nodes as a way to control reliability and supportability, so perhaps we are heading for a happy medium between the two extremes of a traditional HCL menu and a fully buttoned down appliance.

Product Strategy: While IBM has been pretty clear in defining its focus markets – Cloud, Analytics, Mobile, Social, Security (the ‘CAMSS’ message that is repeatedly referred to inside IBM) I think it has been somewhat less clear in articulating a clear and consistent storage strategy, and I am finding that as the storage market matures, smart people are increasingly wanting to know what the vendors’ strategies are. I say vendors plural because I see the same lack of strategic clarity when I look at EMC and HP for example. That’s not to say the products aren’t good, or the roadmaps are wrong, but just that the long-term strategy is either not well defined or not clearly articulated.

It’s easier for new players and niche players of course, and VMware’s Software-defined Storage strategy, for example, is both well-defined and clearly articulated, which will inevitably make it a baseline for comparison with the strategies of the traditional storage vendors.

A/NZ STG Symposium: For the A/NZ audience, if you want to understand IBM’s SDS product strategy, the 2014 STG Tech Symposium in August is the perfect opportunity. Speakers include Sven Oehme from IBM Research who is deeply involved with gpfs development, Barry Whyte from IBM STG in Hursley who is deeply involved in Storwize development, and Dietmar Noll from IBM in Frankfurt who is deeply involved in the development of Virtual Storage Center.

Melbourne – August 19-22

Auckland – August 26-28

Steve Wozniak’s Birthday

Just a quick post to let readers know that I have resigned from IBM after 14 years with the company and I’m looking forward to starting work at ViFX on Monday 11th August, which it seems also happens to be Steve Wozniak‘s birthday.

I will work out in time what this means for the blog (my move to ViFX, not Steve’s birthday) but it’s pretty likely that I will also start looking at some non-IBM technologies – maybe including such things as VMware, Nutanix, Commvault, Actifio, Violin and Nimble Storage.

And having failed to create any meaningful link whatsoever between my move and the birth of the Woz I will leave it at that… until the 11th : )

 

 

IBM Storwize V7000 RtC: “Freshly Squeezed” Revisited

Back in 2012 after IBM announced Real-time Compression (RtC) for Storwize disk systems I covered the technology in a post entitled “Freshly Squeezed“. The challenge with RtC in practice turned out to be that on many workloads it just couldn’t get the CPU resources it needed, and I/O rates were very disappointing, especially in its newly-released un-tuned state,

We quickly learned that lesson and IBM’s Disk Magic was an essential tool to warn us aboout unsuitable workloads. Even in August 2013 when I was asked at the Auckland IBM STG Tech Symposium “Do you recommend RtC for general use?” My answer was “Wait until mid 2014″.

Now that the new V7000 (I’m not sure we’re supposed to call it Gen2, but that works for me) is out, I’m hoping that time has come.

The New V7000: I was really impressed when we announced the new V7000 in May 2014 with it’s 504 disk drives, faster CPUs, 2 x RtC (Intel Coleto Creek comms encryption processor) offload engines per node canister, and extra cache resources (up to 64GB RAM per node canister, of which 36GB is dedicated to RtC) but having been caught out in 2012, I wanted to see what Disk Magic had to say about it before I started recommending it to people. That’s why this post has taken until now to happen – Disk Magic 9.16.0 has just been released.

Coleto Creek RtC offload engine:Coleto Creek

After a quick look at Disk Magic I almost titled this post “Bigger, Better, Juicier than before” but I felt I should restrain myself a little, and there are still a few devils in the details.

50% Extra: I have been working on the conservative assumption of getting an extra 50% nett space from RtC across an entire disk system if little was known about the data. It is best to run IBM’s Comprestimator so you can get a better picture if you have access to do that however.

Getting an extra 50% is the same as setting Capacity Magic to use 33% compression. Until now I believed that this was a very conservative position, but one thing I really don’t enjoy is setting an expectation and then being unable to deliver on it.

Easy Tier: The one major deficiency in Disk Magic 9.16.0 is that you can’t model Easy Tier and RtC in the same model. That is pretty annoying since on the new V7000 you will almost certainly want both. So unfortunately that means Disk Magic 9.16.0 is still a bit of a waste of time in testing most real-life configurations that include RtC and the real measure will have to wait until the next release due in August 2014.

What you can use 9.16.0 however is to validate the performance of RtC (without Easy Tier) and look at the usage on the new offload engines. What I found was that the load on the RtC engines is still very dependent on the I/O size.

I/O Size: When I am doing general modelling I used to use 16KB as a default size since that is the kind of figure I had generally seen in mixed workload environments, but in more recent times I have gone back to using the default of 4KB since the automatic cache modelling in Disk Magic takes a lot of notice of the I/O size when deciding how random the workload is likely to be. Using 4KB forces Disk Magic to assume that the workload is very random, and that once again builds in some headroom (all part of my under-promise+over-deliver strategy). If you use 16KB, or even 24KB as I have seen in some VMware environments, then Disk Magic will assume there are a lot of sequential I/Os and I’m not entirely comfortable with the huge modeled performance improvement you get from that assumption. (For the same reason these days I tend to model Easy Tier using the ‘Intermediate’ setting rather than the default/recommended ‘High Skew’ setting.)

However, using a small I/O size in your Disk Magic modelling has the exact opposite effect when modelling RtC. RtC runs really well when the I/O size is small, and not so well when the I/O size is large. So my past conservative practice of modelling a small I/O size might not be so conservative when it comes to RtC.

Different Data Types: In the past I have also tended to build Disk Magic models with one server, this is because my testing showed that having several servers or a single server gave the same result. All Disk Magic cared about was the number of I/O requests coming in over a given number of fibres. Now however we might need to take more careful account of data types and focus less on the overall average I/O size and more on the individual workloads and which are suitable for RtC and which are not.

50% Busy: And just as we should all be aware that going over 50% busy on a dual controller system is a recipe for problems should we lose a controller for any reason (and faults are also more likely to happen when the system is being pushed hard) similarly going over 50% busy on your Coleto Creek RtC offload engines would also lead to problems if you lose a controller.

I always recommend that you use all 4 compression engines +extra cache on each dual controller V7000 and now I’m planning to work on the assumption that, yes I can get 1.5:1 compression overall, but that is more likely to come from 50% being without compression and 50% being at 2:1 compression and my Disk Magic models will reflect that. So I still expect to need 66% physical nett to get to 100% target, but I’m now going to treat each model as being made up of at least two pools, one compressed and one not.

Transparent Compression: RtC on the new Gen2 V7000 is a huge improvement over the Gen1 V7000. The hardware has been specifically designed to support it, and remember that it is truly transparent and doesn’t lose compression over time or require any kind of batch processing. That all goes to make it a very nice technology solution that most V7000 buyers should take advantage of.

My name is Storage and I’ll be your Server tonight…

Ever since companies like Data General moved RAID control into an external disk sub-system back in the early ’90s it has been standard received knowledge that servers and storage should be separate.

While the capital cost of storage in the server is generally lower than for an external centralised storage subsystem, having storage as part of each server creates fragmentation and higher operational management overhead. Asset life-cycle management is also a consideration – servers typically last 3 years and storage can often be sweated for 5 years since the pace of storage technology change has traditionally been slower than for servers.

When you look at some common storage systems however, what you see is that they do include servers that have been ‘applianced’ i.e. closed off to general apps, so as to ensure reliability and supportability.

  • IBM DS8000 includes two POWER/AIX servers
  • IBM SAN Volume Controller includes two IBM SystemX x3650 Intel/Linux servers
  • IBM Storwize is a custom variant of the above SVC
  • IBM Storwize V7000U includes a pair of x3650 file heads running RHEL and Tivoli Storage Manager (TSM) clients and Space Management (HSM) clients
  • IBM GSS (GPFS Storage Server) also uses a pair of x3650 servers, running RHEL

At one point the DS8000 was available with LPAR separation into two storage servers (intended to cater to a split production/non-production environment) and there was talk at the time of the possibility of other apps such as TSM being able to be loaded onto an LPAR (a feature that was never released).

Apps or features?: There are a bunch of apps that could be run on storage systems, and in fact many already are, except they are usually called ‘features’ rather than apps. The clearest examples are probably in the NAS world, where TSM and Space Management and SAMBA/CTDB and Ganesha/NFS, and maybe LTFS, for example, could all be treated as features.

I also recall Netapp once talking about a Fujitsu-only implementation of ONTAP that could be run in a VM on a blade server, and EMC has talked up the possibility of running apps on storage.

GPFS: In my last post I illustrated an example of using IBM’s GPFS to construct a server-based shared storage system. The challenge with these kinds of systems is that they put onus onto the installer/administrator to get it right, rather than the traditional storage appliance approach where the vendor pre-constructs the system.

Virtualization: Reliability and supportability are vital, but virtualization does allow the possibility that we could have ring-fenced partitions for core storage functions and still provide server capacity for a range of other data-oriented functions e.g. MapReduce, Hadoop, OpenStack Cinder & Swift, as well as apps like TSM and HSM, and maybe even things like compression, dedup, anti-virus, LTFS etc., but treated not so much as storage system features, but more as genuine apps that you can buy from 3rd parties or write yourself, just as you would with traditional apps on servers.

The question is not so much ‘can this be done’, but more, ‘is it a good thing to do’? Would it be a good thing to open up storage systems and expose the fact that these are truly software-defined systems running on servers, or does that just make support harder and add no real value (apart from providing a new fashion to follow in a fashion-driven industry)? My guess is that there is a gradual path towards a happy medium to be explored here.

IBM GPFS – Software Defined Storage

GPFS (General Parallel File System) is one of those very cool technologies that you can do so much with that it’s actually fun to design solutions with it (provided you’re the kind of person that also gets a kick from a nice elegant mathematical proof by induction).

Back in 2010 I was asked by an IBM systems software strategist for my opinion as to whether GPFS had potential as a mainstream product, or if it was best kept back as an underlying component in mainstream solutions. I was strongly in the component camp, but now I almost regret that, because it may be that really the only thing that was holding GPFS back was the lack of its own comprehensive GUI. That is something I still hope will be addressed in the not too distant future.

Anyway, this is a sample design that attempts to show some of the things you can do with GPFS by way of building a software defined storage and server environment.

The central box shows GPFS servers (virtualized in this example) and the left and right boxes show GPFS clients. GPFS also supports ILM policies between disk tiers and out to LTFS tape, as well as optional integration with HSM (via Tivoli Space Management) and fast efficient backup with Tivoli Storage Manager.

GPFS Software Defined Storage v4

There are of course a few caveats and restrictions. Check out the GPFS infocenter for the technical details.

This second diagram shows a simpler view of how to build a highly available software defined storage environment. The example shows two physical servers, but you can add many servers and still have a single storage pool. Mirroring is on a per volume basis. Also you could use GPFS native RAID to build a RAID6 array in each server for example.

VMware gpfs

Building Scale-out NAS with IBM Storwize V7000 Unified

If you need scalable NAS and what you’re primarily interested in is capacity scaling, with less emphasis on performance, and more on cost-effective entry price, then you might be interested in building a scale-out NAS from Storwize V7000 Unified systems, especially if you have some block I/O requirements to go with your NAS requirements.

There are three ways that I can think of doing this and each has its strengths. The documentation on these options is not always easy to find, so these diagrams and bullets might help to show what is possible.

One key point that is not well understood is that clustering V7000 systems to a V7000U allows SMB2 access to all of the capacity – a feature that you don’t get if you virtualize secondary systems rather than cluster them.

V7000U Cluster

V7000U Virtualization

V7000U Active Cloud

And of course systems management is made relatively simple with the Storwize GUI.

V7000U

Follow

Get every new post delivered to your Inbox.

Join 817 other followers

%d bloggers like this: