Thank you for your I.T. Support

Back in 2011 I blogged on buying a new car, entitled the anatomy of a purchase. Well, the transmission on the Jag has given out and I am now the proud owner of a Toyota Mark X.

Toyota Mark-X

The anatomy of the purchase was however a little different this time. Over the last 4 years and I found that the official Jaguar service agents (25 Kms away) offered excellent support. 25 Kms is not always a convenient distance however, so I did try using local neighbourhood mechanics for minor things, but quickly realized that they were going to struggle with anything more complicated.

Support became my number one priority

When it came to buying a replacement, the proximity of a fully trained and equipped service agent became my number one priority. There is only one such agency in my neighbourhood, and that is Toyota, so my first decision was that I was going to buy a Toyota.

I.T. Support

Coming from a traditional I.T. vendor background my approach to I.T. support has always been that it should be fully contracted 7 x 24, preferably with a 2 hour response time, for anything that business depended on. But something has changed.

Scale-Out Systems

The support requirements for software haven’t really changed, but hardware is now a different game. Clustered systems, scale-out systems, web-scale systems, including hyper-converged (server/storage) systems will typically quickly re-protect a system after a node failure, thereby removing the need for panic-level hardware support response. Scale-out systems have a real advantage over standalone servers and dual controller storage systems in this respect.

It has taken me some time to get used to not having 7×24 on-site hardware support, but the message from customers is that next-business-day service or next+1 is a satisfactory hardware support model for clustered mission-critical systems.

Nutanix Logo

Nutanix gold level support for example, offers next-business-day on-site service (after failure confirmation) or next+1 if the call is logged after 3pm so, given a potential day or two delay, it is worth asking the question “What happens if a second node fails?”

If the second node failure occurs after the data from the first node has been re-protected, then there will only be the same impact as if one host had failed. You can continue to lose nodes in a Nutanix cluster provided the failures happen after the short re-protection time, and until you run out of physical space to re-protect the VM’s. (Readers familiar with the IBM XIV distributed cache grid architecture will also recognise this approach to rinse-and-repeat re-protection.)

Nutanix CVM failure2

This is discussed in more detail in a Nutanix blog post by Andre Leibovici.

To find out more about options for scale-out infrastructure, try talking to ViFX.

Toyota Support

Storage Spaghetti Anyone?

I recall Tom West (Chief Scientist at Data General, and star of Soul of a New Machine) once saying to me when he visited New Zealand that there was an old saying “Hardware lasts three years, Operating Systems last 20 years, but applications can go on forever.”

Over the years I have known many application developers and several development managers, and one thing that they seem to agree on is that it is almost impossible to maintain good code structure inside an app over a period of many years. The pressures of deadlines for features, changes in market, fashion and the way people use applications, the occasional weak programmer, and the occasional weak dev manager, or temporary lapse in discipline due to other pressures all contribute to fragmentation over time. It is generally by this slow attrition that apps end up being full of structural compromises and the occasional corner that is complete spaghetti.

I am sure there are exceptions, and there can be periodic rebuilds that improve things, but rebuilds are expensive.

If I think about the OS layer, I recall Data General rebuilding much of their DG/UX UNIX kernel to make it more structured because they considered the System V code to be pretty loose. Similarly IBM rebuilt UNIX into a more structured AIX kernel around the same time, and Digital UNIX (OSF/1) was also a rebuild based on Mach. Ironically HPUX eventually won out over Digital UNIX after the merger, with HPUX rumoured to be the much less structured product, a choice that I’m told has slowed a lot of ongoing development. Microsoft rebuilt Windows as NT and Apple rebuilt Mac OS to base it on the Mach kernel.

So where am I heading with this?

Well I have discussed this topic with a couple of people in recent times in relation to storage operating systems. If I line up some storage OS’s and their approximate date of original release you’ll see what I mean:

Netapp Data ONTAP 1992 22 years
EMC VNX / CLARiiON 1993 21 years
IBM DS8000 (assuming ESS code base) 1999 15 years
HP 3PAR 2002 12 years
IBM Storwize 2003 11 years
IBM XIV / Nextra 2006 8 years
Nimble Storage 2010 4 years

I’m not trying to suggest that this is a line-up in reverse order of quality, and no doubt some vendors might claim rebuilds or superb structural discipline, but knowing what I know about software development, the age of the original code is certainly a point of interest.

With the current market disruption in storage, cost pressures are bound to take their toll on development quality, and the problem is amplified if vendors try to save money by out-sourcing development to non-integrated teams in low-cost countries (e.g. build your GUI in Romania, or your iSCSI module in India).

Spaghetti

How well do you know your scale-out storage architectures?

The clustered/scale-out storage world keeps getting more and more interesting and for some they would say more and more confusing.

There are too many to list them all here, but here are block diagrams depicting seven interesting storage or converged hardware architectures. See if you can decipher my diagrams and match the labels by choosing between the three sets of options in the multi-choice poll at the bottom of the page:

 

A VMware EVO: RACK
B IBM XIV
C VMware EVO: RAIL
D Nutanix
E Nimble
F IBM GPFS Storage Server (GSS)
G VMware Virtual SAN

 

Clusters3

 

A VMware EVO: RACK
B IBM XIV
C VMware EVO: RAIL
D Nutanix
E Nimble
F IBM GPFS Storage Server (GSS)
G VMware Virtual SAN

 

You can read more on VMware’s EVO:RAIL here.

IBM Software-defined Storage

The phrase ‘Software-defined Storage’ (SDS) has quickly become one of the most widely used marketing buzz terms in storage. It seems to have originated from Nicira’s use of the term ‘Software-defined Networking’ and then adopted by VMware when they bought Nicira in 2012, where it evolved to become the ‘Software-defined Data Center’ including ‘Software-defined Storage’. VMware’s VSAN technology therefore has the top of mind position when we are talking about SDS. I really wish they’d called it something other than VSAN though, so as to avoid the clash with the ANSI T.11 VSAN standard developed by Cisco.

I have seen IBM regularly use the term ‘Software-defined Storage’ to refer to:

  1. GPFS
  2. Storwize family (which would include FlashSystem V840)
  3. Virtual Storage Center / Tivoli Storage Productivity Center

I recently saw someone at IBM referring to FlashSystem 840 as SDS even though to my mind it is very much a hardware/firmware-defined ultra-low-latency system with a very thin layer of software so as to avoid adding latency.

Interestingly, IBM does not seem to market XIV as SDS, even though it is clearly a software solution running on commodity hardware that has been ‘applianced’ so as to maintain reliability and supportability.

Let’s take a quick look at the contenders:

1. GPFS: GPFS is a file system with a lot of storage features built in or added-on, including de-clustered RAID, policy-based file tiering, snapshots, block replication, support for NAS protocols, WAN caching, continuous data protection, single namespace clustering, HSM integration, TSM backup integration, and even a nice new GUI. GPFS is the current basis for IBM’s NAS products (SONAS and V7000U) as well as the GSS (gpfs storage server) which is currently targeted at HPC markets but I suspect is likely to re-emerge as a more broadly targeted product in 2015. I get the impression that gpfs may well be the basis of IBM’s SDS strategy going forward.

2. Storwize: The Storwize family is derived from IBM’s SAN Volume Controller technology and it has always been a software-defined product, but tightly integrated to hardware so as to control reliability and supportability. In the Storwize V7000U we see the coming together of Storwize and gpfs, and at some point IBM will need to make the call whether to stay with the DS8000-derived RAID that is in Storwize currently, or move to the gpfs-based de-clustered RAID. I’d be very surprised if gpfs hasn’t already won that long-term strategy argument.

3. Virtual Storage Center: The next contender in the great SDS shootout is IBM’s Virtual Storage Center and it’s sub-component Tivoli Storage Productivity Center. Within some parts of IBM, VSC is talked about as the key to SDS. VSC is edition dependent but usually includes the SAN Volume Controller / Storwize code developed by IBM Systems and Technology Group, as well as the TPC and FlashCopy Manager code developed by IBM Software Group, plus some additional TPC analytics and automation. VSC gives you a tremendous amount of functionality to manage a large complex site but it requires real commitment to secure that value. I think of VSC and XIV as the polar opposites of IBM’s storage product line, even though some will suggest you do both. XIV drives out complexity based on a kind of 80/20 rule and VSC is designed to let you manage and automate a complex environment.

Commodity Hardware: Many proponents of SDS will claim that it’s not really SDS unless it runs on pretty much any commodity server. GPFS and VSC qualify by this definition, but Storwize does not, unless you count the fact that SVC nodes are x3650 or x3550 servers. However, we are already seeing the rise of certified VMware VSAN-ready nodes as a way to control reliability and supportability, so perhaps we are heading for a happy medium between the two extremes of a traditional HCL menu and a fully buttoned down appliance.

Product Strategy: While IBM has been pretty clear in defining its focus markets – Cloud, Analytics, Mobile, Social, Security (the ‘CAMSS’ message that is repeatedly referred to inside IBM) I think it has been somewhat less clear in articulating a clear and consistent storage strategy, and I am finding that as the storage market matures, smart people are increasingly wanting to know what the vendors’ strategies are. I say vendors plural because I see the same lack of strategic clarity when I look at EMC and HP for example. That’s not to say the products aren’t good, or the roadmaps are wrong, but just that the long-term strategy is either not well defined or not clearly articulated.

It’s easier for new players and niche players of course, and VMware’s Software-defined Storage strategy, for example, is both well-defined and clearly articulated, which will inevitably make it a baseline for comparison with the strategies of the traditional storage vendors.

A/NZ STG Symposium: For the A/NZ audience, if you want to understand IBM’s SDS product strategy, the 2014 STG Tech Symposium in August is the perfect opportunity. Speakers include Sven Oehme from IBM Research who is deeply involved with gpfs development, Barry Whyte from IBM STG in Hursley who is deeply involved in Storwize development, and Dietmar Noll from IBM in Frankfurt who is deeply involved in the development of Virtual Storage Center.

Melbourne – August 19-22

Auckland – August 26-28

My name is Storage and I’ll be your Server tonight…

Ever since companies like Data General moved RAID control into an external disk sub-system back in the early ’90s it has been standard received knowledge that servers and storage should be separate.

While the capital cost of storage in the server is generally lower than for an external centralised storage subsystem, having storage as part of each server creates fragmentation and higher operational management overhead. Asset life-cycle management is also a consideration – servers typically last 3 years and storage can often be sweated for 5 years since the pace of storage technology change has traditionally been slower than for servers.

When you look at some common storage systems however, what you see is that they do include servers that have been ‘applianced’ i.e. closed off to general apps, so as to ensure reliability and supportability.

  • IBM DS8000 includes two POWER/AIX servers
  • IBM SAN Volume Controller includes two IBM SystemX x3650 Intel/Linux servers
  • IBM Storwize is a custom variant of the above SVC
  • IBM Storwize V7000U includes a pair of x3650 file heads running RHEL and Tivoli Storage Manager (TSM) clients and Space Management (HSM) clients
  • IBM GSS (GPFS Storage Server) also uses a pair of x3650 servers, running RHEL

At one point the DS8000 was available with LPAR separation into two storage servers (intended to cater to a split production/non-production environment) and there was talk at the time of the possibility of other apps such as TSM being able to be loaded onto an LPAR (a feature that was never released).

Apps or features?: There are a bunch of apps that could be run on storage systems, and in fact many already are, except they are usually called ‘features’ rather than apps. The clearest examples are probably in the NAS world, where TSM and Space Management and SAMBA/CTDB and Ganesha/NFS, and maybe LTFS, for example, could all be treated as features.

I also recall Netapp once talking about a Fujitsu-only implementation of ONTAP that could be run in a VM on a blade server, and EMC has talked up the possibility of running apps on storage.

GPFS: In my last post I illustrated an example of using IBM’s GPFS to construct a server-based shared storage system. The challenge with these kinds of systems is that they put onus onto the installer/administrator to get it right, rather than the traditional storage appliance approach where the vendor pre-constructs the system.

Virtualization: Reliability and supportability are vital, but virtualization does allow the possibility that we could have ring-fenced partitions for core storage functions and still provide server capacity for a range of other data-oriented functions e.g. MapReduce, Hadoop, OpenStack Cinder & Swift, as well as apps like TSM and HSM, and maybe even things like compression, dedup, anti-virus, LTFS etc., but treated not so much as storage system features, but more as genuine apps that you can buy from 3rd parties or write yourself, just as you would with traditional apps on servers.

The question is not so much ‘can this be done’, but more, ‘is it a good thing to do’? Would it be a good thing to open up storage systems and expose the fact that these are truly software-defined systems running on servers, or does that just make support harder and add no real value (apart from providing a new fashion to follow in a fashion-driven industry)? My guess is that there is a gradual path towards a happy medium to be explored here.

IBM FlashSystem 840 for Legacy-free Flash

Flash storage is at an interesting place and it’s worth taking the time to understand IBM’s new FlashSystem 840 and how it might be useful.

A traditional approach to flash is to treat it like a fast disk drive with a SAS interface, and assume that a faster version of traditional systems are the way of the future. This is not a bad idea, and with auto-tiering technologies this kind of approach was mastered by the big vendors some time ago, and can be seen for example in IBM’s Storwize family and DS8000, and as a cache layer in the XIV. Using auto-tiering we can perhaps expect large quantities of storage to deliver latencies around 5 millseconds, rather than a more traditional 10 ms or higher (e.g. MS Exchange’s jetstress test only fails when you get to 20 ms).

No SSDs 3

Some players want to use all SSDs in their disk systems, which you can do with Storwize for example, but this is again really just a variation on a fairly traditional approach and you’re generally looking at storage latencies down around one or two millseconds. That sounds pretty good compared to 10 ms, but there are ways to do better and I suspect that SSD-based systems will not be where it’s at in 5 years time.

The IBM FlashSystem 840 is a little different and it uses flash chips, not SSDs. It’s primary purpose is to be very very low latency. We’re talking as low as 90 microseconds write, and 135 microseconds read. This is not a traditional system with a soup-to-nuts software stack. FlashSystem has a new Storwize GUI, but it is stripped back to keep it simple and to avoid anything that would impact latency.

This extreme low latency is a unique IBM proposition, since it turns out that even when other vendors use MLC flash chips instead of SSDs, by their own admission they generally still end up with latency close to 1 ms, presumably because of their controller and code-path overheads.

FlashSystem 840

  • 2u appliance with hot swap modules, power and cooling, controllers etc
  • Concurrent firmware upgrade and call-home support
  • Encryption is standard
  • Choice of 16G FC, 8G FC, 40G IB and 10G FCoE interfaces
  • Choice of upgradeable capacity
Nett of 2-D RAID5 4 modules 8 modules 12 modules
2GB modules 4 TB 12 TB 20 TB
4GB modules 8 TB 24 TB 40 TB
  • Also a 2 TB starter option with RAID0
  • Each module has 10 flash chips and each chip has 16 planes
  • RAID5 is applied both across modules and within modules
  • Variable stripe RAID within modules is self-healing

I’m thinking that prime targets for these systems include Databases and VDI, but also folks looking to future-proof their general performance. If you’re making a 5 year purchase, not everyone will want to buy a ‘mature’ SSD legacy-style flash solution, when they could instead buy into a disk-free architecture of the future.

But, as mentioned, FlashSystem does not have a full traditional software stack, so let’s consider the options if you need some of that stuff:

  • IMHO, when it comes to replication, databases are usually best replicated using log shipping, Oracle Data Guard etc.
  • VMware volumes can be replicated with native VMware server-based tools.
  • AIX volumes can be replicated using AIX Geographic Mirroring.
  • On AIX and some other systems you can use logical volume mirroring to set up a mirror of your volumes with preferred read set to the FlashSystem 840, and writes mirrored to a V7000 or (DS8000 or XIV etc), thereby allowing full software stack functions on the volumes (on the V7000) without slowing down the reads off the FlashSystem.
  • You can also virtualize FlashSystem behind SVC or V7000
  • Consider using Tivoli Storage Manager dedup disk to disk to create a DR environment

Right now, FlashSystem 840 is mainly about screamingly low latency and high performance, with some reasonable data center class credentials, and all at a pretty good price. If you have a data warehouse, or a database that wants that kind of I/O performance, or a VDI implementation that you want to de-risk, or a general workload that you want to future-proof, then maybe you should talk to IBM about FlashSystem 840.

Meanwhile I suggest you check out these docs:

A Quick IBM ProtecTIER (Dedup VTL) Update

This is a very brief update designed to help clarify a few things about IBM’s ProtecTIER dedup VTL solutions. The details of the software functions I will leave to the redbooks (see links below).

What is ProtecTIER?

The dedup algorithm in ProtecTIER is HyperFactor, which detects recurring data in multiple backups. HyperFactor is unique in that it avoids the risk of data corruption due to hash collisions, a risk that is inherent in products based on hashing algorithms. HyperFactor uses a memory resident index, rather than disk-resident hash tables and one consequence of this is that ProtecTIER’s restore times are shorter than backup times, in contrast to other products where restore times are generally much longer.

The amount of space saved is mainly a function of the backup policies and retention periods, and the variance of the data between them, but in general HyperFactor can deliver slightly better dedup ratios than hash-based systems. The more full-backups retained on ProtecTIER, and the more intervening incremental backups, the more space that will be saved overall.

One of the key advantages of ProtecTIER is the ability to replicate deduped data in a many to many grid. ProtecTIER also supports SMB/CIFS and NFS access.

While Tivoli Storage Manager also includes many of the same capabilities as ProtecTIER, the latter will generally deliver higher performance dedup, by offloading the process to a dedicated system, leaving TSM or other backup software to concentrate on selecting and copying files.

For more information on the software functionality etc, please refer to these links:

 

ProtecTIER Systems

In the past IBM has offered three models of ProtecTIER systems, two of which are now withdrawn, and a new one has since appeared.

  • TS7610 (withdrawn) – entry level appliance up to 6 TB and 80 MB/sec.
  • TS7620 – new entry level system. Up to 35 TB of deduped capacity. Backup speed of 300 MB/sec was originally quoted, but with recent capacity increases I am still trying to confirm if the rated throughput has changed.
  • TS7650A (withdrawn) – the midrange appliance which was rated at up to 36 TB and 500 MB/sec. This appliance was based on a back-end IBM (LSI/Netapp) DS4700 disk system with 450GB drives in RAID5 configuration.
  • TS7650G – the enterprise gateway, which is currently rated at 9 TB per hour backup and up to 11.2 TB per hour restore. Each TS7650G has support for multiple Storwize V7000 or XIV disk systems, both of which offer non-disruptive drive firmware update capability.

Sizing

There are a couple of rules of thumb I try to use when doing an initial quick glance sizing with the TS7650G with V7000 disk.

  • Every V7000 disk will give you another 20 GB per hour of ProtecTIER backup throughput. The I/O profile for files is approx 80/20 random R/W with a 60KB block size and we generally use RAID6 for that. Metadata is generally placed on separate RAID10 drives and is more like 20/80 R/W.
  • Backup storage (traditionally on tape) can be five to ten times the production storage capacity, so assuming a 10:1 dedup ratio, you might need a dedup disk repository between half and the same size as your production disk. However, if you know you are already storing x TB of backups on tape, don’t plan on buying less than x/10 dedup capacity. The dedup ratio can sometimes be as high as 25:1 but more typically it will be closer to 10:1.
  • It’s probably not a good idea to buy a dedup system that can’t easily grow to double the sized initial capacity. Dedup capacity is notoriously hard to predict and it can turn out to need more than you expected.

Those rules of thumb are not robust enough to be called a formal sizing, but they do give you a place to start in your thinking.

ProtecTIER

IBM XIV Gen3 and SPC-1

IBM has just published an SPC-1 benchmark result for XIV. The magic number is 180,020 low latency IOPS in a single rack. This part of my blog post was delayed by my waiting for the official SPC-1 published document so I could focus in on an aspect of SPC-1 that I find particularly interesting.

XIV has always been a work horse rather than a race horse, being fast enough, and beating other traditional systems by never going out of tune, but 180,020 is still a lot of IOPS in a single rack.

SPC-1 has been criticised occasionally as being a drive-centric benchmark, but it’s actually more true to observe that many modern disk systems are drive-centric (XIV is obviously not one of those). Things do change and there was a time in the early 2000’s when, as I recall, most disk systems were controller-bound, and as systems continue to evolve I would expect SPC-1 to continue to expose some architectural quirks, and some vendors will continue to avoid SPC-1 so that their quirks are not exposed.

For example, as some vendors try to scale their architectures, keeping latency low becomes a challenge, and SPC-1 reports give us a lot more detail than just the topline IOPS number if we care to look.

The SPC-1 rules allow average response times up to 30 milliseconds, but generally I would plan real-world solutions around an upper limit of 10 milliseconds average, and for tier1 systems you might sometimes even want to design for 5 milliseconds.

I find read latency interesting because not only does SPC-1 allow for a lot of variance, but different architectures do seem to give very different results. Write latency on the other hand seems to stay universally low right up until the end. Let’s use the SPC-1 reports to look at how some of these systems stack up to my 5 millisecond average read latency test:

DS8870 – this is my baseline as a low-latency, high-performance system

  • 1,536 x 15KRPM drives RAID10 in four frames
  • 451,000 SPC-1 IOPS
  • Read latency hits 4.27 milliseconds at 361,000 IOPS

HP 3PAR V800

  • 1,920 x 15KRPM drives RAID10 in seven frames [sorry for reporting this initially as 3,840 – I was counting the drives and also the drive support package for the same number of drives]
  • 450,000 SPC-1 IOPS
  • Average read latency hits 4.23 millsconds at only 45,000 IOPS

Pausing for a moment to compare DS8870 with 3PAR V800 you’d have to say DS8870 is clearly in a different class when it comes to read latency.

Hitachi VSP

  • 1,152 x 15KRPM drives RAID10 in four frames
  • 270,000 SPC-1 IOPS
  • Average read latency hits 3.76 ms at only 27,000 IOPS and is well above 5 ms at 135,000

Hitachi HUS-VM

  • 608 x 15KRPM drives RAID10 in two frames
  • 181,000 SPC-1 IOPS
  • Average read latency hits 3.72 ms at only 91,000 IOPS and is above 5 ms at 145,000

Netapp FAS3270A

  • 2 x 512GB Flash Cache
  • 120 x 15KRPM drives RAID-DP in a single frame
  • 68,034 SPC-1 IOPS
  • Average read latency hits 2.73 ms at 34,000 IOPS and is well over 6 ms at 54,000

So how does XIV stack up?

  • 15 x 400GB Flash Cache
  • 180 x 7200RPM drives RAID-X in a single frame
  • 180,020 SPC-1 IOPS
  • Average read latency hits 4.08 millseconds at 144,000 IOPS

And while I know that there are many ways to analyse and measure the value of things, it is interesting that the two large IBM disk systems seem to be the only ones that can keep read latency down below 5 ms when they are heavily loaded.

[SPC-1 capacity data removed on 130612 as it wasn’t adding anything, just clutter]

Update 130617: I have just found another comment from HP in my spam filter, pointing out that the DS8870 had 1,536 drives not 1,296. I will have to remember not to write in a such a rush next time. This post was really just an add-on to the more important  first half of the post on the new XIV features, and was intended to celebrate the long-awaited SPC-1 result from the XIV team.

IBM XIV Gen3 Latest Announcements

Recently announced XIV 11.3 adds several valuable new features…

  • 48GB cache per grid module (15 x 48 = 720GB RAM cache per system standard)
  • 4TB drives (325 TB in a single rack) encryption-ready
  • Consistent rebuild times of 13 minutes per TB of data on a busy 15 module system
  • The on-the-go XIV monitoring for iPhone is now also provided for Android
  • Support for OpenStack Grizzly (the latest release of open source software for building your own cloud, similar to Amazon EC2, Azure, etc)
  • Hyperscale Mobility (non-disruptive volume migration between XIVs). No need for monolithic expansion when you can build a grid of grids : )
  • Support for vCenter Operations Manager
  • Host kit enhancements to make best-practice host connectivity easier

Plus a Statement of Direction: “IBM intends to add support for self-service capacity provisioning of block storage, including IBM XIV Storage System, through use of IBM SmartCloud Storage Access.”

Sales Manual

Announcement

IBM has also just published an SPC-1 benchmark result for XIV. Because the document hasn’t quite made it to the SPC-1 web site,  and because I wanted to focus on a particular detail of SPC-1 that I find interesting, I have split this blog post into two parts and I will delay the second part until the XIV result appears in public.

Meanwhile you can check out the new IBM XIV Performance Whitepaper here.

XIV 11.2 Quick Update: The Best Just Became Awesome…

Not only is XIV Gen3 proving now to be just about the most robust thing you could ever wish to own, with significant improvements over Gen2, but IBM has just announced some interesting additional enhancements to Gen3, both new hardware and new version 11.2 firmware.

  • A major improvement in performance through improved SSD caching algorithms (including storing checksums in RAM rather than on SSD)
  • New 6 core Intel E5645 CPUs refresh (15 x 6 = 90 physical cores) and optimisation  for hyper-threading (180 logical cores) including some processor affinity optimization for iSCSI.
  • Up to twelve 10G iSCSI ports and 9K jumbo MTU support with tested performance up to 13.7GB/sec sequential read
  • A lot of work has been done on the light-weight IP stack, using Infiniband techniques for DMA so as to remove locking and CPU overhead. This driver runs in user space with very low CPU overhead and can drive iSCSI at full line rate (12 x 10Gbps).
  • The work on iSCSI also has benefits for IP replication, with multiple sessions being used to improve robustness and improve performance, as well as enhancements to concurrent code load.

10G

Some of the other cool things in 11.2 include:

  • The rebuild time for 3TB data (3TB drive 100% full) used to be 76 minutes, which was industry leading, now with 11.2 of the firmware that time has been halved to just 38 minutes, and the rebuild time is virtually unaffected by system user load!
  • Space reclamation enhancements.
  • More efficient power supplies.
  • An export to csv option is now available on every information table in the system

XIV export

So in summary you could say the big points are:

  • Availability is now best in industry
  • Real-world IOPS performance is well into six figures with single digit latency, and it just keeps getting better
  • iSCSI has been made awesome/enterprise-class – quite unlike some other iSCSI implementations around
  • The rebuild time for 3TB of data is so far beyond what the opposition can do that it looks like sorcery

 If you haven’t thought about XIV for a while, it’s time you took another look.

 

Storage Complexity…

This week I’m on a summer camping holiday, so why not head over to Storagebod’s blog and read what The Bod has to say on the critical topic of storage complexity…

What do you get at an IBM Systems Technical Symposium?

What do you get at an IBM Systems Technical Symposium? Well for the event in Auckland, New Zealand November 13-15 I’ve tried to make the storage content as interesting as possible. If you’re interested in attending, send me an email at jkelly@nz.ibm.com and I will put you in contact with Jacell who can help you get registered. There is of course content from our server teams as well, but my focus has been on the storage content, planned as follows:

Erik Eyberg, who has just joined IBM in Houston from Texas Memory Systems following IBM’s recent acquisition of TMS, will be presenting “RAMSAN – The World’s Fastest Storage”. Where does IBM see RAMSAN fitting in and what is the future of flash? Check out RAMSAN on the web, on twitter, on facebook and on youtube.

Fresh from IBM Portugal and recently transferred to IBM Auckland we also welcome Joao Almeida who will deliver a topic that is sure to be one of the highlights, but unfortunately I can’t tell you what it is since the product hasn’t been announced yet (although if you click here you might get a clue).

Zivan Ori, head of XIV software development in Israel knows XIV at a very detailed level – possibly better than anyone, so come along and bring all your hardest questions! He will be here and presenting on:

  • XIV Performance – What you need to know
  • Looking Beyond the XIV GUI

John Sing will be flying in from IBM San Jose to demonstrate his versatility and expertise in all things to do with Business Continuance, presenting on:

  • Big Data – Get IBM’s take on where Big Data is heading and the challenges it presents and also how some of IBM’s products are designed to meet that challenge.
  • ProtecTIER Dedup VTL options, sizing and replication
  • Active/Active datacentres with SAN Volume Controller Stretched Cluster
  • Storwize V7000U/SONAS Global Active Cloud Engine multi-site file caching and replication

Andrew Martin will come in from IBM’s Hursley development labs to give you the inside details you need on three very topical areas:

  • Storwize V7000 performance
  • Storwize V7000 & SVC 6.4 Real-time Compression
  • Storwize V7000 & SVC Thin Provisioning

Senaka Meegama will be arriving from Sydney with three hot topics around VMware and FCoE:

  • Implementing SVC & Storwize V7000 in a VMware Environment
  • Implementing XIV in a VMware Environment
  • FCoE Network Design with IBM System Storage

Jacques Butcher is also coming over from Australia to provide the technical details you all crave on Tivoli storage management:

  • Tivoli FlashCopy Manager 3.2 including Vmware Integration
  • TSM for Virtual Environments 6.4
  • TSM 6.4 Introduction and Update plus TSM Roadmap for 2013

Maurice McCullough will join us from Atlanta, Georgia to speak on:

  • The new high-end DS8870 Disk System
  • XIV Gen3 overview and tour

Sandy Leadbeater will be joining us from Wellington to cover:

  • Storwize V7000 overview
  • Scale-Out NAS and V7000U overview

I will be reprising my Sydney presentations with updates:

  • Designing Scale Out NAS & Storwize V7000 Unified Solutions
  • Replication with SVC and Storwize V7000

And finally, Mike McKenzie will be joining us from Brocade in Australia to give us the skinny on IBM/Brocade FCIP Router Implementation.

XIV: “They call me Flash, ’cause I can run fast, really fast.”

IBM XIV 11.1 has just announced support for SSD Flash Cache. The title of this post is taken from DC Comics Flash Annual Vol 2 #3 and it’s all about running fast. Not everyone is going to need the XIV Flash Cache performance kicker, but if you want a couple of hundred TiB of massively fast storage in a single rack then XIV Gen3 with distributed Flash Cache is your dream come true.

To deliver this amount of capacity and extreme performance in a single rack with the industry’s best ease of use is a real game changer. You need never buy an old style multi-frame box with hundreds of 15K disk drives in it ever again.

The XIV SSD Flash Cache implementation has some at-first-glance conceptual similarities to Netapp’s FlashCache. Both XIV and Netapp are unusual in that they are natively write-optimized architectures (albeit massively different architectures) so using Flash Cache to optimize read performance gives a disproportionately good return on investment compared to other vendors’ products. But there the similarity ends.

XIV is a grid so there are 15 SSDs operating independently rather than everything being funnelled to and from a centralised cache.

…so this diagram is only one module out of 15 in each system.

The SSDs in XIV Flash Cache are at least 400GB each, but I won’t promise which exact drive since they may be multi-sourced.

IBM does some things differently courtesy of IBM Research, when it comes to wear-levelling tricks, plus some innovative thinking from the XIV team on caching. You have to be careful how you use SSDs or their efficiency and performance can degrade with use. SSD drive manufacturers have some tricks to try to minimize that, but IBM goes one step further than other vendors on that front. XIV buffers 512KB chunks in cache and then writes them sequentially onto SSD in a circular log format. Thereby avoiding random writes to the SSDs, which is the main cause of degradation on other vendors’ implementations.

15 Flash Caches – not a centralised funnelled cache

You can add Flash Caches non-disruptively to any XIV Gen3 system. XIV will bypass Flash Cache for sequential reads, and you can set specific volumes to bypass Flash Cache if you want to. This can be used to complement the per-host QoS capabilities of XIV, but we usually suggest letting the system work out how best to use Flash Cache in most cases.

Flash Cache data is persistent across both hot and cold reboot, so there is no need to wait for the cache to fill again before it’s effective.

The SSDs now make an appearance throughout the GUI, including in the performance tools where you can see the read-hit split between SSD and Memory.

There are many other feature enhancements in 11.1 like mirroring between Gen2 and Gen3 XIVs. Check out the XIV product page for more details, including the ability to see 9 XIVs on a screen and manage up to 81 XIVs from one console. This is starting to become important as we now have 59 customers who have more than 1PB of XIV storage (nett usable) and 16 of them have more than 2PB (nett usable). Also, I’m a Blackberry man myself (courtesy of IBM) but if you’re an Apple fanboy you might like the new iPhone 4S XIV Mobile Dashboard app (to add to the iPad app already released).

From what I have seen, the performance improvement from Flash Cache is more dramatic on XIV than on other disk systems. The IOPS kick can be 3x on a 70/30/50 style OLTP, and in some extreme cases could go as high as 6x. Response times are also dramatically improved. XIV without Flash Cache has really good write performance (remember the screenshot from July last year showing almost 500,000 IOPS 4KB write hits at around 1 second latency?) and now with Flash Cache the read performance gets to share in that awesomeness as well : )

But, bragging rights aside, I like to be a little conservative with real-world performance claims. This graph shows 2x IOPS for an internally tested CRM and Financial ERP database workload, 70/30 read/write with an 8KB I/O size – the most conservative of the internal test results.

Back in October I speculated that we might see an industry-standard OLTP style benchmark published by the XIV team once Flash Cache was available. I’m still hoping that will happen. It would be interesting to see how it stacks up. It seems like everyone’s doing Flash these days.

And now one more Big Bang Theory ‘Flash’ link just for fun…

.

Hu’s on first, Tony’s on second, I Don’t Know’s on third

This post started life earlier this year as a post on the death of RAID-5 being signaled by the arrival of 3TB drives. The point being that you can’t afford to be exposed to a second drive failure for 2 or 3 whole days especially given the stress those drives are under during that rebuild period.

But the more I thought about RAID rebuild times the more I realized how little I actually knew about it and how little most other people know about it. I realized that what I knew was based a little too much on snippets of data, unreliable sources and too many assumptions and extrapolations. Everybody thinks they know something about disk rebuilds, but most people don’t really know much about it at all and thinking you know something is worse than knowing you don’t.

In reading this so far it started to remind me of an old Abbot and Costello sketch.

Anyway you’d think that the folks who should know the real answers might be operational IT staff who watch rebuilds nervously to make sure their systems stay up, and maybe vendor lab staff who you would think might get the time and resources to test these things, but I have found it surprisingly hard to find any systematic information.

I plan to add to this post as information comes to hand (new content in green) but let’s examine what I have been able to find so far:

1. The IBM N Series MS Exchange 2007 best practices whitepaper mentions a RAID-DP (RAID6) rebuild of a 146GB 15KRPM drive in a 14+2 array taking 90 minutes (best case).

Netapp points out that there are many variables to consider, including the setting of raid.reconstruct.perf_impact at either low, medium or high, and they warn that a single reconstruction effectively doubles the I/O occurring on the stack/loop, which becomes a problem when the baseline workload is more than 50%.

Netapp also says that rebuild times of 10-15 hours are normal for 500GB drives, and 10-30 hours for 1TB drives.

2. The IBM DS5000 Redpiece “Considerations for RAID-6 Availability and Format/Rebuild Performance on the DS5000” shows the following results for array rebuild times on 300GB drives as the arrays get bigger:

I’m not sure how we project this onto larger drive sizes without more lab data. In these two examples there was little difference between N Series 14+2 146GB and DS5000 14+2 300GB, but common belief is that rebuild times rise proportionally to drive size. The 2008 Hitachi whitepaper “Why Growing Businesses Need RAID 6 Storage” however, mentions a minimum of 24 hours for a rebuild of an array with just 11 x 1TB drives in it on an otherwise idle disk system.

What both IBM and Netapp seem to advise is that rebuild time is fairly flat until you get above 16 drives, although Netapp seems to be increasingly comfortable with larger RAID sets as well.

3. A 2008 post from Tony Pearson suggests that “In a typical RAID environment, say 7+P RAID-5, you might have to read 7 drives to rebuild one drive, and in the case of a 14+2 RAID-6, reading 15 drives to rebuild one drive. It turns out the performance bottleneck is the one drive to write, and today’s systems can rebuild faster Fibre Channel (FC) drives at about 50-55 MB/sec, and slower ATA disk at around 40-42 MB/sec. At these rates, a 750GB SATA rebuild would take at least 5 hours.”

Extrapolating from that would suggest that a RAID5 1TB rebuild is going to take at least 9 hours, 2TB 18 hours, and 3TB 27 hours. The Hitachi whitepaper figure seems to be a high outlier, perhaps dependent on something specific to the Hitachi USP architecture.

Tony does point out that his explanation is a deliberate over-simplification for the purposes of accessibility, perhaps that’s why it doesn’t explain why there might be step increases in drive rebuild times at 8 and 16 drives.

4. The IBM DS8000 Performance Monitoring and Tuning redbook states “RAID 6 rebuild times are close to RAID 5 rebuild times (for the same size disk drive modules (DDMs)), because rebuild times are primarily limited by the achievable write throughput to the spare disk during data reconstruction.” and also “For array rebuilds, RAID 5, RAID 6, and RAID 10 require approximately the same elapsed time, although RAID 5 and RAID 6 require significantly more disk operations and therefore are more likely to impact other disk activity on the same disk array.”

The below image just came to hand. It shows how the new predictive rebuilds feature on DS8000 can reduce rebuild times. Netapp do a similar thing I believe. Interesting that it does show a much higher rebuild rate than the 50MB/sec that is usually talked about.

5. The EMC whitepaper “The Effect of Priorities on LUN Management Operations” focuses on the effect of assigned priority as one would expect, but is nonetheless very useful in helping to understanding generic rebuild times (although it does contain a strange assertion that SATA drives rebuild faster than 10KRPM drives, which I assume must be a tranposition error). Anyway, the doc broadly reinforces the data from IBM and Netapp, including this table.

This seems to show that increase in rebuild times is more linear as the RAID sets get bigger, as compared to IBM’s data which showed steps at 8 and 16. One person with CX4 experience reported to me that you’d be lucky to get close to 30MB/sec on a RAID5 rebuild on a typical working system and when a vault drive is rebuilding with priority set to ASAP not much else gets done on the system at all. It remains unclear to me how much of the vendor variation I am seeing is due to reporting differences and detail levels versus architectural differences.

6. IBM SONAS 1.3 reports a rebuild time of only 9.8 hours for a 3TB drive RAID6 8+2 on an idle system, and 6.1 hours on a 2TB drive (down from 12 hours in SONAS 1.2). This change from 12 hours down to 6.1 comes simply from a code update, so I guess this highlights that not all constraints on rebuild are physical or vendor-generic.

7. March 2012: I just found this pic from the IBM Advanced Technical Skills team in the US. This gives me the clearest measure yet of rebuild times on IBM’s Storwize V7000. Immediately obvious is that the Nearline drive rebuild times stretch out a lot when the target rebuild rate is limited so as to reduce host I/O impact, but the SAS and SSD drive rebuild times are pretty impressive. The table also came with an comment estimating that 600GB SAS drives would take twice the rebuild time of the 300GB SAS drives shown.

~

In 2006 Hu Yoshida posted that “it is time to replace 20 year old RAID architectures with something that does not impact I/O as much as it does today with our larger capacity disks. This is a challenge for our developers and researchers in Hitachi.”

I haven’t seen any sign of that from Hitachi, but IBM’s XIV RAID-X system is perhaps the kind of thing he was contemplating. RAID-X achieves re-protection rates of more than 1TB of actual data per hour and there is no real reason why other disk systems couldn’t implement the scattered RAID-X approach that XIV uses to bring a large number of drives into play on data rebuilds, where protection is about making another copy of data blocks as quickly as possible, not about drive substitution.

So that’s about as much as I know about RAID rebuilds. Please feel free to send me your own rebuild experiences and measurements if you have any.

XIV Gen3 Sequential Performance

Big Data can take a variety of forms but what better way to get a feeling for the performance of a big data storage system than using a standard audited benchmark to measure large file processing, large query processing, and video streaming.

From the www.storageperformance.org website:

“SPC-2 consists of three distinct workloads designed to demonstrate the performance of a storage subsystem during… large-scale, sequential movement of data…

  • Large File Processing: Applications… which require simple sequential process of one or more large files such as scientific computing and large-scale financial processing.
  • Large Database Queries: Applications that involve scans or joins of large relational tables, such as those performed for data mining or business intelligence.
  • Video on Demand: Applications that provide individualized video entertainment to a community of subscribers by drawing from a digital film library.”

The Storage Performance Council also recently published its first SPC-2E benchmark result. “The SPC-2/E benchmark extension consists of the complete set of SPC-2 performance measurement and reporting plus the measurement and reporting of energy use.”

It uses the same performance test as the SPC-2 so the results can be compared. It does look as though only IBM and Oracle are publishing SPC-2 numbers these days however and the IBM DS5300 and DS5020 are the same LSI OEM boxes as the Oracle 6780 and 6180, so that doesn’t really add a lot to the mix. HP and HDS seem to have fled some time ago, and although Fujitsu and Texas Memory do publish, I have never encountered either of those systems out in the market. So the SPC-2 right now is mainly a way to compare sequential performance among IBM systems.

XIV is certainly interesting, because in its Generation 2 format it was never marketed as a box for sequential or single-threaded workloads. XIV Gen2 was a box for random workloads, and the more random and mixed the workload the better it seemed to be. With XIV Generation 3 however we have a system that is seen to be great with sequential workloads, especially Large File Processing, although not quite so strong for Video on Demand.

The distinguishing characteristic of LFP is that it is a read/write workload, while the others appear to be read-only. XIV’s strong write performance comes through on the LFP benchmark.

Drilling down one layer deeper we can look at the components that make up Large File Processing. Sub-results are reported for reads, writes, and mixed read/write, as well as for 256 KiB and 1,024 KiB I/O sizes in each category.

So what we see is that XIV is actually slightly faster than DS8800 on the write workloads, but falls off a little when the read percentage of the I/O mix is higher.

A Small Challenge with NAS Gateways

SAN Volume Controller

Late in 2010, Netapp quietly announced they were not planning to support V Series (and by extension IBM N Series NAS Gateways) to be used with any recent version of IBM’s SAN Volume Controller.

This was discussed more fully on the Netapp communities forum (you’ll need to create a login) and the reason given was insufficient sales revenue to justify on-going support.

This is to some extent generically true for all N Series NAS gateways. For example, if all you need is basic CIFS access to your disk storage, most of the spend still goes on the disk and the SVC licensing, not on the N Series gateway. This is partly a result of the way Netapp prices their systems – the package of the head units and base software (including the first protocol) is relatively cheap, while the drives and optional software features are relatively expensive.

Netapp however did not withdraw support for V Series NAS gateways on XIV or DS8000, and nor do they seem to have any intention to, as best I can tell, considering that support to be core capability for V Series NAS Gateways.

I also note that Netapp occasionally tries to position V Series gateways as a kind of SVC-lite, to virtualize other disk systems for block I/O access.

Anyway, it was interesting that what IBM announced was a little different to what Netapp announced “NetApp & N Series Gateway support is available with SVC 6.2.x for selected configurations via RPQ [case-by-case lab approval] only

Storwize V7000

What made this all a bit trickier was IBM’s announcement of the Storwize V7000 as its new premier midrange disk system.

Soon after on the Netapp communities forum it was stated that there was a “joint decision” between Netapp and IBM that there would be no V Series NAS gateway support and no PVRs [Netapp one-off lab support] for Storwize V7000 either.

Now the Storwize V7000 disk system, which is projected to have sold close to 5,000 systems in its first 12 months, shares the same code-base and features as SVC (including the ability to virtualize other disk systems). So think about that for a moment, that’s two products and only one set of testing and interface support – that sounds like the support ROI just improved, so maybe you’d think that the original ROI objection might have faded away at this point? It appears not.

Anyway, once again, what IBM announced was a little different to the Netapp statement “NetApp & N Series Gateway support is available with IBM Storwize V7000 6.2.x for selected configurations via RPQ only“.

Whither from here?

The good news is that IBM’s SONAS gateways support XIV and SVC (and other storage behind SVC) and SONAS delivers some great features that N Series doesn’t have (such as file-based ILM to disk or tape tiers) so SVC is pretty well catered for when it comes to NAS gateway funtionality.

When it comes to Storwize V7000 the solution is a bit trickier. SONAS is a scale-out system designed to cater for 100’s of TBs up to 14 PBs. That’s not an ideal fit for the midrange Storwize V7000 market. So the Netapp gateway/V-series announcement has created potential difficulties for IBM’s midrange NAS gateway portfolio… hence the title of this blog post.

XIV Gen3 at full speed

Don’t try this at home on your production systems… but it’s nice to see the XIV flying at 455 thousand IOPS. It actually peaked above 460K on this lab test but what’s 5,000 IOPS here or there…

Thanks to Mert Baki

 

XIV Gen3 & MS Exchange 2010 ESRP

So here’s a quick comparison of XIV Gen3 and Gen2 with some competitors. Note that ESRP is designed to be more of a proof of concept than a benchmark, but it has a performance component which is relevant. Exchange 2010 has reduced disk I/O over Exchange 2007 which has allowed vendors to switch to using 7200 RPM drives for the testing.

The ESRP reports are actually quite confusing to read since they test a fail-over situation so require two disk systems, but some of the info in them relates to a single disk system. I have chosen to include both machines in everything for consistency. The XIV report may not be up on the website for a few days.

Once again XIV demonstrates its uniqueness in not being a just another drive-dominated architecture. Performance on XIV is about intelligent use of distributed grid caches:

  • XIV Gen 3 returns 2.5 times the IOPS from a NL-SAS drive that a VNX5700 does.
  • XIV Gen 3 returns 1.8 times from NL-SAS 7200RPM what a CX4 can get out of FC 10KRPM drives.
  • Even XIV Gen2 with SATA drives can get 25% more IOPS per SATA drive than VMAX.

And to answer a question asked on my earlier post. No these XIV results do not include SSD drives, although the XIV is now SSD-ready and IBM has issued a statement of direction saying that up to 7.5TB of PCIe-SSD cache is planned for 1H 2012. Maybe that’s 15 x 500GB SSDs (one per grid node).

XIV Gen3: Both Hands Clapping

xiv

Pronunciation:/zɪv/

noun

  1. the sound storage makes as it zooms past its competitors:

                 there was a loud xiv as the new IBM system arrived and the other vendors’ disk systems all collapsed under the weight of their own complexity

XIV Generation 3 is here and XIV Generation 2 remains in the family. Here is a quick sampler of Gen3 Vs Gen2 performance:

For more information on today’s announcements check out the XIV product page on ibm.com and the general overview on youtube.

Some of you might also consider that XIV Gen3’s use of Infiniband interconnect and NL-SAS drives brings new relevance to my two recent blog posts on those subjects : )

To Infiniband… and Beyond!

Nearline-SAS: Who Dares Wins

Nearline-SAS: Who Dares Wins

Maybe you think NL-SAS is old news and it’s already swept SATA aside?

Well if you check out the specs on FAS, Isilon, 3PAR, or VMAX, or even the monolithic VSP, you will see that they all list SATA drives, not NL-SAS on their spec sheets.

Of the serious contenders, it seems that only VNX, Ibrix, IBM SONAS, IBM XIV Gen3 and IBM Storwize V7000 have made the move to NL-SAS so far.

First we had PATA (Parallel ATA) and then SATA drives, and then for a while we had FATA drives (Fibre Channel attached ATA) or what EMC at one point confusingly  marketed as “low-cost Fibre Channel”. These were ATA drive mechanics, with SCSI command sets handled by a FC front-end on the drive.

Now we have drives that are being referred to as Capacity-Optimized SAS, or Nearline SAS (NL-SAS) both of which terms once again have the potential to be confusing. NL-SAS is a similar concept to FATA – mechanically an ATA drive (head, media, rotational speed) – but with a SAS interface (rather than a FC bridge) to handle the SCSI command set.

When SCSI made the jump from parallel to serial the designers took the opportunity to build in compatibility with SATA via a SATA tunneling protocol, so SAS controllers can support both SAS and SATA drives.

The reason we use ATA drive mechanics is that they have higher capacity and a lower price. So what are some of the advantages of using NL-SAS drives, over using traditional SATA drives?

  1. SCSI offers more sophisticated command queuing (which leads directly to reduced head movement) although ATA command queuing enhancements have closed the gap considerably in recent years.
  2. SCSI also offers better error handling and reporting.
  3. One of the things I learned the hard way when working with Engenio disk systems is that bridge technology to go from FC to SATA can introduce latency, and as it turns out, so does the translation required from a SAS controller to a SATA drive. Doing SCSI directly to a NL-SAS drive reduces controller latency, reduces load on the controller and also simplifies debugging.
  4. Overall performance can be anything from slightly better to more than double, depending on the workload.

And with only a small price premium over traditional SATA, it seems pretty clear to me that NL-SAS will soon come to dominate and SATA will be phased out over time.

NL-SAS drives also offer the option of T10 PI (SCSI Protection Information) which adds 8 bytes of data integrity field to each 512b disk block. The 8 bytes is split into three chunks allowing for cyclic redundancy check, application tagging (e.g.RAID information), and reference tagging to make sure the data blocks arrive in the right order. I expect 2012 to be a big year for PI deployment.

I’m assured that the photograph below is of a SAS engineer – maybe he’s testing the effectiveness of the PI extensions on the disk drive in his pocket?

To Infiniband… and Beyond!

Not here this time… over there >>>

This week I’m doing a guest blogging spot over at Barry Whyte’s storage virtualizatiom blog, so if you want to read this week’s post head over to:  https://www.ibm.com/developerworks/mydeveloperworks/blogs/storagevirtualization/entry/infinity_and_beyond?lang=en

p.s. Infiniband is the new interconnect being used in XIV Gen3

 

 

Storwize V7000 four-fold Scalability takes on VMAX & 3PAR

IBM recently announced that two Storwize V7000 systems could be clustered, in pretty much exactly the same way that two iogroups can be clustered in a SAN Volume Controller environment. Clustering two Storwize V7000s creates a system with up to 480 drives and any of the paired controllers can access any of the storage pools. Barry Whyte went one step further and said that if you apply for an RPQ you can cluster up to four Storwize V7000s (up to 960 drives). Continue reading

Am I boring you? Full stripe writes and other complexity…

In 1978 IBM employee Norman Ken Ouchi was awarded patent 4092732 for a “System for recovering data stored in failed memory unit.” Technology that would later be known as RAID 5 with full stripe writes.

Hands up who’s still doing that or its RAID6 derivative 33 years later?

I have a particular distaste for technologies that need to be manually tuned. Continue reading

You can’t always get what you want

There have been a raft of new storage efficiency elements brought to market in the last few years, but what has become obvious is that you can’t yet get it all in one product. Continue reading

Maximum Fibre Channel Distances

Just a quick hit and run blog post for today… This table authored by Karl Hohenauer just came into my inbox. With the changes in cable quality (OM3, OM4) the supported fibre channel distances have confused a few people, so this will be a good reference doc to remember. Continue reading

Favourite Product of 2010 that Never Was…

With everyone announcing best-of type choices for 2010 I thought I’d take a slightly less serious approach and announce my favourite product of 2010 that never was – a product so cool that either no-one but me thought of it, or more likely, it somehow doesn’t stack up technically or cost-wise. Continue reading

Exploiting the Intelligence of Inventors

In Tracey Kidder’s book “Soul of a New Machine” I recall Data General’s Tom West as saying that the design that the team at Data General came up with for the MV/8000 minicomputer was so complex that he was worried. He had a friend who had just purchased a first run Digital Equipment Corp VAX, and Tom went to visit him and picked through the VAX main boards counting and recording the IDs of all of the components used. He then realised that his design wasn’t so complex after all, compared to the VAX and so Tom proceeded to build the MV/8000 with confidence.

In this example, deconstruction of one product helped Tom to understand another product, and sanity check that he wasn’t making things too complicated. It didn’t tell him if MV/8000 would be better than VAX however.

I have many times seen buyers approach a storage solution evaluation using a deconstructionist approach. Once a solution is broken down into its isolated elements, it can be compared at a component level to another very different solution. It’s a pointless exercise in most cases. Continue reading

IBM’s New Midrange with Easy Tier & External Virtualization

Yes, IBM has announced a new midrange virtualized disk system, the Storwize V7000. A veritable CLARiiON-killer : ) Continue reading

Does my midrange look big in this?

IDC defines three categories of external disk. The midrange market leaders are EMC, Netapp and IBM (followed by Dell and HP with both slipping slightly over the last 12 months). Netapp is almost entirely a midrange business, while EMC and IBM are the market leaders in highend. Over the last 4 quarters midrange has accounted for almost half of the spending in external disk (cf just over a quarter on highend) so clearly midrange is where the action is. Continue reading

ALL YOUR BASE ARE BELONG TO US

There are four reasons I can think of why a company wants to buy another:

  1. To take a position in a market you didn’t expect to be in but has suddenly become important to you (e.g. EMC buying VMware)
  2. To take a position in a market you did expect to be in, but the internal projects to get you where you wanted have failed (e.g. HP buying 3PAR)
  3. To gain mass in a market in which you already play successfully (e.g. Oracle buying JDE and PeopleSoft)
  4. To prevent your competitor gaining an asset that they could use to attack your market (e.g. Oracle buying Sun/MySQL) Continue reading

Choice or Clutter?

Vendors often struggle to be strong in all market segments and address the broad range of customer requirements with a limited range of products. Products that fit well into one segment don’t always translate well to others, especially when trying to bridge both midrange and enterprise requirements. Continue reading

When Space, Time & Vendor Charges Collide…

Well the whole snapshot and replication thing got me thinking about vendor licensing. Licensing is a way to get a return on one’s R&D, it doesn’t really matter whether customers pay x for hardware and y for software, or x+y for the hardware ‘solution’ and zero for software functions etc, as long as the vendor gets the return it needs to keep its investors happy.

Vendor charges are like taxes, most of us appreciate that they are needed, but there are many different ways to levy the tax: e.g. flat tax rate, progressive, regressive, goods and services (GST/VAT/SalesTax).

I suspect that charging large licence fees for snapshot and replication functions has held IT back and IMHO the time has now come to set these functions free. Continue reading

Bow ties are cool – When time and space collide

Every storage vendor has sales slides that tell us that data growth rates are accelerating and the world will explode soon unless you buy their product to manage that…

…and yet the average IT shop is still mostly doing backups the old fashioned way, with weekly fulls and daily incrementals, and scratching their heads about how they are going to cope next year, given that the current full is taking 48 hours. They probably have a whole bunch of SATA disk somewhere that acts as the initial backup target, but it doesn’t go faster than tape, which is something they probably assumed it would do when they bought it, but somehow they feel that their backups to disk are probably a good thing anyway even though they’re more expensive… Continue reading

Is it time for the Enterprise Linux Server?

IBM’s Z10 Enterprise Linux Server is an interesting alternative to a large-scale VMware deployment. Essentially, any Linux workload that is a good fit for being virtualised with Vmware is a good fit for being virtualised on Z10. Continue reading

Hey this Gibibyte stuff is really taking off!

So you know we’re making progress on the binary units thing (see my post entitled “How many fingers am I holding up“) when Piratebay.org starts using GiB…

7,368,671,232  Bytes   =    7.37 GB     or    6.86 GiB

Now if we can only get the IT vendor community to consistently follow Piratebay’s excellent example  : )

XIV Async (Snapshot) Replication

Snapshot-based Replication/Mirroring:

I thought it might be worth taking a quick look at async (snapshot) replication/mirroring which was released for XIV earlier this year with 10.2.0.a of the firmware. XIV async is similar in concept to Netapp’s async SnapMirror, both are snapshot based and both consume snapshot space as a part of the mirroring process. One difference of course is that with XIV both async and sync replication are included in the base price of the XIV, there is no added licence fee or maintenance fee to pay. I’d call it ‘free’ but I’d just get another bunch of people on twitter telling me they still haven’t received their free XIVs yet… Continue reading

How many fingers am I holding up?

The base2 Vs base10 nett capacity question is an interesting one. It remains a place of confusion for customers and that’s not surprising as it remains a place of confusion for vendors also. Continue reading

What Happens When a Controller Fails

Whether XIV is a visionary way to reduce your storage TCO, or just a bizarre piece of foolishness as some bloggers would have you believe, is being tested daily, and every day that passes with large customers enjoying the freedom of XIV ease of use, performance and reliability, is another vote for it being a visionary product.

XIV has been criticised because there is a chance that it might break, just like every other storage system that has ever been invented, yet because XIV is a little different, the nay-sayers somehow feel that non-perfection is a sin.

So let’s talk about non-perfection in an old-style storage architecture. Continue reading

Layered Storage Monitoring Tools

If you’re a big XIV fan, one of the things you might love about it is the built-in (i.e. ‘free’) monitoring tools that are really easy to use.

Also the xivtop utility which will be immediately familiar to anyone who has used ‘top’ on linux or UNIX systems.

But for many others in the non-XIV world, layered monitoring tools are a pain in the wallet and also a pain in the administrative butt. Continue reading

Application-aware snapshots for IBM Storage

Something strange has happened. IBM’s Tivoli group has produced some low-priced high-value storage software that’s easy to understand and easy to use! FlashCopy Manager provides fast application-aware backups and restores, leveraging the snapshot features of IBM storage systems. Continue reading

Some quotes from the web about XIV

There are of course the official IBM references but these below are are a few unofficial public comments from customers and analysts that I found after a quick sweep of the web. I was looking for something else and started stumbling across these, so I thought I would post them. Continue reading

XIV & MS Exch 2007 Replicated Performance

Vendors typically only benchmark their fastest systems in any one class, which means that a bit of careful reflection is required to get a good understanding of things from a few results. The usual “nyah nyah ours is faster” kind of analysis and comment that seems to permeate the blogosphere doesn’t really achieve anything that’s for sure.

Let’s talk about benchmarking more generally… Continue reading

Survival in the Blogosphere

Hey here we are a few days in and so far I hope I haven’t rubbished anyone else’s product. I’m even trying to be more respectful of XIV’s denigrators : )

That second post was a marathon of detailed research I can tell you.

The third had a lot more research in it than it looks on the surface, but because the essence of the debate was always going to be around product positioning I left the maths out. Anyone can add up numbers of vendors brochures and get KVA and BTU/Hr results.

The aim of those two posts was to get a ‘fair suck of the sav’ (as we say in A/NZ) for XIV. It works really well in the field, and it could do with just a little more respect out in the blogosphere.

Comparing XIV Power Consumption

XIV relies on lots of Intel cores and distributed caches to deliver performance. People who get hung up on disk systems being all about disk drives have trouble understanding XIV in so many ways, including power consumption. Continue reading

XIV Drive Management

One issue that urgently needs accuracy and clarity is the disk management technology behind XIV. Continue reading