Back in 2011 I blogged on buying a new car, entitled the anatomy of a purchase. Well, the transmission on the Jag has given out and I am now the proud owner of a Toyota Mark X.
The anatomy of the purchase was however a little different this time. Over the last 4 years and I found that the official Jaguar service agents (25 Kms away) offered excellent support. 25 Kms is not always a convenient distance however, so I did try using local neighbourhood mechanics for minor things, but quickly realized that they were going to struggle with anything more complicated.
Support became my number one priority
When it came to buying a replacement, the proximity of a fully trained and equipped service agent became my number one priority. There is only one such agency in my neighbourhood, and that is Toyota, so my first decision was that I was going to buy a Toyota.
Coming from a traditional I.T. vendor background my approach to I.T. support has always been that it should be fully contracted 7 x 24, preferably with a 2 hour response time, for anything that business depended on. But something has changed.
The support requirements for software haven’t really changed, but hardware is now a different game. Clustered systems, scale-out systems, web-scale systems, including hyper-converged (server/storage) systems will typically quickly re-protect a system after a node failure, thereby removing the need for panic-level hardware support response. Scale-out systems have a real advantage over standalone servers and dual controller storage systems in this respect.
It has taken me some time to get used to not having 7×24 on-site hardware support, but the message from customers is that next-business-day service or next+1 is a satisfactory hardware support model for clustered mission-critical systems.
Nutanix gold level support for example, offers next-business-day on-site service (after failure confirmation) or next+1 if the call is logged after 3pm so, given a potential day or two delay, it is worth asking the question “What happens if a second node fails?”
If the second node failure occurs after the data from the first node has been re-protected, then there will only be the same impact as if one host had failed. You can continue to lose nodes in a Nutanix cluster provided the failures happen after the short re-protection time, and until you run out of physical space to re-protect the VM’s. (Readers familiar with the IBM XIV distributed cache grid architecture will also recognise this approach to rinse-and-repeat re-protection.)
This is discussed in more detail in a Nutanix blog post by Andre Leibovici.
To find out more about options for scale-out infrastructure, try talking to ViFX.
Filed under: Hyperconverged Storage, Nutanix, VMware, XIV | Leave a comment »