Panzura – Distributed Locking & Cloud Gateway for CAD

I have been watching the multi-site distributed NAS space for some years now. There have been some interesting products including Netapp’s Flexcache which looked nice but never really seemed to get market traction, and similarly IBM Global Active Cloud Engine (Panache) which was released as a feature of SONAS and Storwize V7000 Unified. Microsoft have played on the edge of this field more successfully with DFS Replication although that does not handle locking. Other technologies that encroach on this space are Microsoft Sharepoint and also WAN acceleration technologies like Microsoft Branchcache and Riverbed.

What none of these have been very good at however is solving the problem of distributed collaborative authoring of large complex multi-layered documents with high performance and sturdy locking. For example cross-referenced CAD drawings.


It’s no surprise that the founders of Panzura came from a networking background (Aruba, Alteon) since the issues to be solved are those that are introduced by the network. Panzura is a global file system tuned for CAD files and it’s not unusual to see Panzura sites experience file load times less than one tenth or sometimes even one hundredth of what they were prior to Panzura being deployed.

Rather than just provide efficient file locking however, Panzura has taken the concept to the Cloud, so that while caching appliances can be deployed to each work site, the main data repository can be in Amazon S3 or Azure for example. Panzura now claims to be the only proven global file locking solution that solves cross-site collaboration issues of applications like Revit, AutoCAD, Civil3D, and Bentley MicroStation as well as SOLIDWORKS CAD and Siemens NX PLM applications. The problems of collaboration in these environments are well-known to CAD users.


Panzura has been growing rapidly, with 400% revenue growth in 2013 and they have just come off another record quarter and a record year for 2014. Back in 2013 they decided to focus their energies on the Architectural, Engineering & Construction (so-called AEC) markets since that was where the technology delivered the greatest return on customer investment. In that space they have been growing more than 1000% per year.

ViFX recently successfully supplied Panzura to an international engineering company based in New Zealand. If you have problems with shared CAD file locking, please contact ViFX to see how we can solve the problem using Panzura.


Out of Space?

My wife has been complaining that we don’t have enough cupboard space, both in the kitchen, and also for linen. On the weekend we bought a dining room cabinet, and that allowed my wife to reorganize the kitchen cupboards and pantry.

What came to light was that the pantry in particular was so overloaded that it was very difficult to tell what was in there, and as a result we discovered that there were six bottles of cooking oil (three of rice bran oil, three of olive oil), three containers with standard flour, two with high grade flour, two with rice, two with brown sugar, two with white sugar, two with opened packets of malt biscuits, two with opened packets of crackers etc.

More capacity is always nice. My wife’s solution involved spending money on buying additional capacity, and also effort to select and install the cabinet, and hours to sort through the existing cupboards and drawers and pantry to work out what was there and decide where best to put things.

I have however always maintained that the real problem is that we own too much stuff. If the cupboards had been better organised in the first place, we would have owned fewer duplicates, and the odds are we would not have needed the new cabinet. But new capacity is always nice.

I am sure you have realised by now that the parallel with the world of IT Storage did not escape me. If I had to pay for ongoing support on the new cabinet and I knew it was only going to last 5 years, I would have been less keen on the acquisition and would have pushed back harder with the “we own too much stuff” line.

It seems that it’s easier to add more capacity than to ask the hard questions, but that’s not always a wise use of money.

To read more about right-sizing check out

Thank you for your I.T. Support

Back in 2011 I blogged on buying a new car, entitled the anatomy of a purchase. Well, the transmission on the Jag has given out and I am now the proud owner of a Toyota Mark X.

Toyota Mark-X

The anatomy of the purchase was however a little different this time. Over the last 4 years and I found that the official Jaguar service agents (25 Kms away) offered excellent support. 25 Kms is not always a convenient distance however, so I did try using local neighbourhood mechanics for minor things, but quickly realized that they were going to struggle with anything more complicated.

Support became my number one priority

When it came to buying a replacement, the proximity of a fully trained and equipped service agent became my number one priority. There is only one such agency in my neighbourhood, and that is Toyota, so my first decision was that I was going to buy a Toyota.

I.T. Support

Coming from a traditional I.T. vendor background my approach to I.T. support has always been that it should be fully contracted 7 x 24, preferably with a 2 hour response time, for anything that business depended on. But something has changed.

Scale-Out Systems

The support requirements for software haven’t really changed, but hardware is now a different game. Clustered systems, scale-out systems, web-scale systems, including hyper-converged (server/storage) systems will typically quickly re-protect a system after a node failure, thereby removing the need for panic-level hardware support response. Scale-out systems have a real advantage over standalone servers and dual controller storage systems in this respect.

It has taken me some time to get used to not having 7×24 on-site hardware support, but the message from customers is that next-business-day service or next+1 is a satisfactory hardware support model for clustered mission-critical systems.

Nutanix Logo

Nutanix gold level support for example, offers next-business-day on-site service (after failure confirmation) or next+1 if the call is logged after 3pm so, given a potential day or two delay, it is worth asking the question “What happens if a second node fails?”

If the second node failure occurs after the data from the first node has been re-protected, then there will only be the same impact as if one host had failed. You can continue to lose nodes in a Nutanix cluster provided the failures happen after the short re-protection time, and until you run out of physical space to re-protect the VM’s. (Readers familiar with the IBM XIV distributed cache grid architecture will also recognise this approach to rinse-and-repeat re-protection.)

Nutanix CVM failure2

This is discussed in more detail in a Nutanix blog post by Andre Leibovici.

To find out more about options for scale-out infrastructure, try talking to ViFX.

Toyota Support

The Rise of I.T. as a Service Broker

Just a quick blog post today in the run up to Christmas week and I thought I’d briefly summarize some of the things I have been dealing with recently and also touch on the role of the I.T. department as we move boldly into a cloudy world.

We have seen I.T. move through the virtualization phase to deliver greater efficiency and some have moved on to the Cloud phase to deliver more automation, elasticity and metering. Cloud can be private, public, community or hybrid, so Cloud does not necessarily imply an external service provider.

Iterative Right-Sizing

One of the things that has become clear is the need for right-sizing as part of any move to an external provider. External provision has a low base cost and a high metered cost, so you get best value by making sure your allowances for CPU, RAM and disk are a reasonably tight fit with your actual requirements, and relying on service elasticity to expand as needed. The traditional approach of building a lot of advance headroom into everything will cost you dearly. You cannot expect an external provider to deliver “your mess for less” and in fact what you will get if you don’t right size is “your mess for more”.


And it’s not necessarily true that all of your services are best met by the one or two tiers that a single Cloud provider offers. This is where the Hybrid Cloud comes in, and more than that, this is where a Cloud Management Platform (CMP) function comes in.

“Any substantive cloud strategy will ultimately require using multiple cloud services from different providers, and a combination of both internal and external cloud.” Gartner, September 2013, (Hybrid Cloud Is Driving the Shift From Control to Coordination).

A CMP such as VMware’s vRealize Automation, RightScale, or Scalr can actually take you one step further than a simple Hybrid Cloud. A CMP can allow you to right-locate your services in a policy-driven and centrally managed way. This might mean keeping some services in-house, some in an enterprise I.T. focused Cloud with a high level of performance and wrap-around services, and some in a race-to-the-bottom Public Cloud focused primarily on price.


Some organisations are indeed consuming multiple services from multiple providers, but very few are managing this in a co-ordinated policy-driven manner. The kinds of problems than can arise are:

  • Offshore Public Cloud instances may be started up for temporary use and then forgotten rather than turned off, incurring unnecessary cost.
  • Important SQL database services might be running on a low cost IaaS with database administration duties neglected, creating unnecessary risk.
  • Low value test systems might be running on a high-service, high-performance enterprise cloud service, incurring unnecessary cost.

I.T. as a Service Broker

This layer of policy and management has a natural home with the I.T. department, but as an enabler for enterprise-wide in-policy consumption rather than as an obstacle.


With the Service Brokering Capability, I.T. becomes the central point of control, provision, self-service and integration for all IT services regardless of whether they are sourced internally or externally. This allows an organisation to mitigate the risks and take the opportunities associated with Cloud.


I will be enjoying the Christmas break and extending that well into January as is traditional in this part of the world where Christmas coincides with the start of Summer.

Happy holidays to all.

What is Cloud Computing?

LarryI remember being entertained by Larry Ellison’s Cloud Computing rant back in 2009 in which he pointed out that cloud was really just processors and memory and operating systems and databases and storage and the internet. While Larry was making a valid point, and he also made a point about IT being a fashion-driven industry, the positive goals of Cloud Computing should by now be much clearer to everyone.

When we talk about Cloud Computing it’s probably important that we try to work from a common understanding of what Cloud is and what the terms mean, and that’s where NIST comes in.

The National Institute of Standards and Technology (NIST) is an agency of the US Department of Commerce. In 2011, two years after Larry Ellison’s outburst, and after many drafts and years of research and discussion, NIST published their ‘Cloud Computing Definition’ stating:

“The definition is intended to serve as a means for broad comparisons of cloud services and deployment strategies, and to provide a baseline for discussion from what is cloud computing to how to best use cloud computing”.

“When agencies or companies use this definition they have a tool to determine the extent to which the information technology implementations they are considering meet the cloud characteristics and models. This is important because by adopting an authentic cloud, they are more likely to reap the promised benefits of cloud—cost savings, energy savings, rapid deployment and customer empowerment.”

The definition lists the five essential characteristics, the three service models and the four deployment models. I have summarized them in this blog post so as to do my small bit in encouraging the adoption of this definition as widely as possible to give us a common language and measuring stick for assessing the value of Cloud Computing.NIST layers

The Five essential characteristics

  1. On-demand self-service.
    • A consumer can unilaterally provision computing capabilities without requiring human interaction with the service provider.
  2. Broad network access.
    • Support for a variety of client platforms including mobile phones, tablets, laptops, and workstations.
  3. Resource pooling.
    • The provider’s computing resources are pooled under a multi-tenant model, with physical and virtual resources dynamically assigned according to demand. There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter).
  4. Rapid elasticity.
    • Capabilities can be elastically provisioned and released commensurate with demand. Scaling is rapid and can appear to be unlimited.
  5. Metering.
    • Service usage (e.g., storage, processing, bandwidth, active user accounts) can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the service

The Three service models

  1. Software as a Service (SaaS).
    • The consumer uses the provider’s applications, accessible from client devices through either a thin client interface, such as a web browser (e.g., web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user- specific application configuration settings.
  1. Platform as a Service (PaaS).
    • The consumer deploys consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment.
  1. Infrastructure as a Service (IaaS).
    • Provisioning processing, storage, networks etc, where the consumer can run a range of operating systems and applications. The consumer does not manage the underlying infrastructure but has control over operating systems, storage, and deployed applications and possibly limited control of networking (e.g., host firewalls).

Note that NIST has resisted the urge to go on to define additional services such as Backup as a Service (BaaS), Desktop as a Service (DaaS), Disaster Recovery as a Service (DRaaS) etc, arguing that these are already covered in one way or another by  the three ‘standard’ service models. This does lead to an interesting situation where one vendor will offer DRaaS or BaaS effectively as an IaaS offering, and another will offer it under more of a SaaS or PaaS model.

The Four Deployment Models

  1. Private cloud.
    • The cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises.
  1. Community cloud.
    • The cloud infrastructure is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be owned, managed, and operated by one or more of the organizations in the community, a third party, or some combination of them, and it may exist on or off premises.
  1. Public cloud.
    • The cloud infrastructure is provisioned for open use by the general public. It exists on the premises of the cloud provider.
  1. Hybrid cloud.
    • The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are connected to enable data and application portability (e.g., cloud bursting for load balancing between clouds).

The NIST reference architecture also talks about the importance of the brokering function, which allows you to seamlessly deploy across a range of internal and external resources according to the policies you have set (e.g. cost, performance, sovereignty, security).

The NIST definition of Cloud Computing is the one adopted by ViFX and it is the simplest, clearest and best-researched definition of Cloud Computing I have come across.

2014 Update

On 22nd October 2014 NIST published a new document “US Government Cloud Computing Technology Roadmap” in two volumes which identifies ten high priority requirements for Cloud Computing adoption across the five areas of:

  • Security
  • Interoperability
  • Portability
  • Performance
  • Accessibility

The purpose of the document is to provide a cloud roadmap for US Government agencies highlighting ten high priority requirements to ensure that the benefits of cloud computing can be realized. Requirements seven and eight are particular to the US-Government but the others are generally applicable. My interpretation of NIST’s ten requirements is as follows:

  1. Standards-based products, processes, and services are essential to ensure that:
    • Technology investments do not become prematurely obsolete
    • Agencies can easily change cloud service providers
    • Agencies can economically acquire or develop private clouds
  2. Security technology solutions must be able to accommodate a wide range of business rules.
  3. Service-Level Agreements for performance and reliability should be clearly defined and enforceable.
  4. Multi-vendor consistent descriptions are required to make it easier for agencies to compare apples to apples.
  5. Federation in a community cloud environment needs more mature mechanisms to enable mutual sharing of resources.
  6. Data location and sovereignty policies are required so as to avoid technology limits becoming the de facto drivers of policy.
  7. US Federal Government requires special solutions that are not currently available from commercial cloud services.
  8. US Federal Government requires nation-scale non-proprietary technology including high security and emergency systems.
  9. High availability design goals, best practices, measurement and reporting is required to avoid catastrophic failures.
  10. Metrics need to be standardized so services can be sized and consumed with a high degree of predictability.

These are all worthwhile requirements, and there’s also a loopback here to some of Larry Ellison’s comments. Larry spoke about seeing value in rental arrangements, but also touched on the importance of innovation. NIST is trying to standardize and level the playing field to maximize value for consumers, but history tells us that vendors will try to innovate to differentiate themselves. For example, with the launch of VMware’s vCloud Air we are seeing the dominant player in infrastructure management software today staking its claim to establish itself as the de facto software standard for hybrid cloud. But that is really a topic for another day…


Storage Spaghetti Anyone?

I recall Tom West (Chief Scientist at Data General, and star of Soul of a New Machine) once saying to me when he visited New Zealand that there was an old saying “Hardware lasts three years, Operating Systems last 20 years, but applications can go on forever.”

Over the years I have known many application developers and several development managers, and one thing that they seem to agree on is that it is almost impossible to maintain good code structure inside an app over a period of many years. The pressures of deadlines for features, changes in market, fashion and the way people use applications, the occasional weak programmer, and the occasional weak dev manager, or temporary lapse in discipline due to other pressures all contribute to fragmentation over time. It is generally by this slow attrition that apps end up being full of structural compromises and the occasional corner that is complete spaghetti.

I am sure there are exceptions, and there can be periodic rebuilds that improve things, but rebuilds are expensive.

If I think about the OS layer, I recall Data General rebuilding much of their DG/UX UNIX kernel to make it more structured because they considered the System V code to be pretty loose. Similarly IBM rebuilt UNIX into a more structured AIX kernel around the same time, and Digital UNIX (OSF/1) was also a rebuild based on Mach. Ironically HPUX eventually won out over Digital UNIX after the merger, with HPUX rumoured to be the much less structured product, a choice that I’m told has slowed a lot of ongoing development. Microsoft rebuilt Windows as NT and Apple rebuilt Mac OS to base it on the Mach kernel.

So where am I heading with this?

Well I have discussed this topic with a couple of people in recent times in relation to storage operating systems. If I line up some storage OS’s and their approximate date of original release you’ll see what I mean:

Netapp Data ONTAP 1992 22 years
EMC VNX / CLARiiON 1993 21 years
IBM DS8000 (assuming ESS code base) 1999 15 years
HP 3PAR 2002 12 years
IBM Storwize 2003 11 years
IBM XIV / Nextra 2006 8 years
Nimble Storage 2010 4 years

I’m not trying to suggest that this is a line-up in reverse order of quality, and no doubt some vendors might claim rebuilds or superb structural discipline, but knowing what I know about software development, the age of the original code is certainly a point of interest.

With the current market disruption in storage, cost pressures are bound to take their toll on development quality, and the problem is amplified if vendors try to save money by out-sourcing development to non-integrated teams in low-cost countries (e.g. build your GUI in Romania, or your iSCSI module in India).


Decoupling Storage Performance from Capacity

SplitDecoupling storage performance from storage capacity is an interesting concept that has gained extra attention in recent times. Decoupling is predicated on a desire to scale performance when you need performance and to scale capacity when you need capacity, rather than traditional spindle-based scaling delivering both performance and capacity.

Also relevant is the idea that today’s legacy disk systems are holding back app performance. For example, VMware apparently claimed that 70% of all app performance support calls were caused by external disk systems.

The Business Value of Storage Performance

IT operations have spent the last 10 years trying to keep up with capacity growth, with less focus on performance growth. The advent of flash has however shown that even though you might not have a pressing storage performance problem, if you add flash your whole app environment will generally run faster and that can mean business advantages ranging from better customer experiences to more accurate business decision making.

A Better Customer Experience

My favorite example of performance affecting customer experience is from my past dealings with an ISP of whom I was a residential customer. I was talking to a call centre operator who explained to me that ‘the computer was slow’ and that it would take a while to pull up the information I was seeking. We chatted as he slowly navigated the system, and as we waited, one of the things he was keen to chat about was how much he disliked working for that ISP   : o

I have previously referenced a mobile phone company in the US who replaced all of their call centre storage with flash, specifically so as to deliver a better customer experience. The challenge with that is cost. The CIO was quoted as saying that the cost to go all flash was not much more per TB than he had paid for tier1 storage in the previous buying cycle (i.e. 3 or maybe 5 years earlier). So effectively he was conceding that he was paying more per TB for tier1 storage now than he was some years ago. Because the environment deployed did not decouple performance from capacity however, that company has almost certainly significantly over-provisioned storage performance, hence the cost per TB being higher than on the last buying cycle.

More Accurate Business Decision Making

There are many examples of storage performance improvements leading to better business decisions, most typically in the area of data warehousing. When business intelligence reports have more up to date data in them, and they run more quickly, they are used more often and decisions are more likely to be evidence-based rather than based on intuition. I recall one CIO telling me about a meeting of the executive leadership team of his company some years ago where each exec was asked to write down the name of the company’s largest supplier – and each wrote a different name – illustrating the risk of making decisions based on intuition rather than on evidence/business intelligence.

Decoupling Old School Style

Of course we have always been able to decouple performance and capacity to some extent, and it was traditionally called tiering. You could run your databases on small fast drives RAID10 and your less demanding storage on larger drives with RAID5 or RAID6. What that didn’t necessarily give you was a lot of flexibility.

Products like IBM’s SAN Volume Controller introduced flexibility to move volumes around between tiers in real-time, and more recently VMware’s Storage vMotion has provided a sub-set of the same functionality.

And then sub-lun tiering (Automatic Data Relocation, Easy Tier, FAST, etc) reduced the need for volume migration as a means of managing performance, by automatically promoting hot chunks to flash, and dropping cooler chunks to slower disks. You could decouple performance from capacity somewhat by choosing your flash to disk ratio appropriately, but you still typically had to be careful with these solutions since the performance of, for example, random writes that do not go to flash would be heavily dependent on the disk spindle count and speed.

So for the most part, decoupling storage performance and capacity in an existing disk system has been about adding flash and trying not to hit internal bottlenecks.

Traditional random I/O performance is therefore a function of:

  1. the amount/percent of flash cf the data block working set size
  2. the number and speed of disk spindles
  3. bus and cache (and sometimes CPU) limitations

Two products that bring their own twists to the game:

Nimble Storage


Nimble Storage uses flash to accelerate random reads, and accelerates writes through compression into sequential 4.5MB stripes (compare this to IBM’s Storwize RtC which compresses into 32K chunks and you can see that what Nimble is doing is a little different).

Nimble performance is therefore primarily a function of

  1. the amount of flash (read cache)
  2. the CPU available to do the compression/write coalescing

The number of spindles is not quite so important when you’re writing 4.5MB stripes. Nimble systems generally support at least 190 TB nett (if I assume 1.5x compression average, or 254 TB if you expect 2x) from 57 disks and they claim that performance is pretty much decoupled from disk space since you will generally hit the wall on flash and CPU before you hit the wall on sequential writes to disk. Also this kind of decoupling allows you to get good performance and capacity in a very small amount of rack space. Nimble also offers CPU scaling in the form of a scale-out four-way cluster.

Nimble have come closer to decoupling performance and capacity than any other external storage vendor I have seen.

PernixData FVPPernixData

PernixData Flash Virtualization Platform (FVP) is a software solution designed to build a flash read/write cache inside a VMware ESXi cluster, thereby accelerating I/Os without needing to add anything to your external disk system. PernixData argue that it is more cost effective and efficient to add flash into the ESXi hosts than it is to add them into external storage systems. This has something in common with the current trend for converged scale-out server/storage solutions, but PernixData also works with existing external SAN environments.

There is criticism that flash technologies deployed in external storage are too far away from the app to be efficient. I recall Amit Dave (IBM Distinguished Engineer) recounting an analogy of I/O to eating, for which I have created my own version below:

  • Data in the CPU cache is like food in your spoon
  • Data in the server RAM is like food on your plate
  • Data in the shared Disk System cache is like food in the serving bowl in the kitchen
  • Data on the shared Disk System SSDs is like food you can get from your garden
  • Data on hard disks is like food in the supermarket down the road

PernixData works by keeping your data closer to the CPU – decoupling performance and capacity by focusing on a server-side caching layer and scaling alongside your compute ESXi cluster. So this is analagous to getting food from your table rather than food from your garden. With PernixData you tend to scale performance as you add more compute nodes, rather than when you add more back-end capacity.

To Decouple or not to Decouple?

Decoupling as a theoretical concept is surely a good thing – independent scaling in two dimensions – and it is especially nice if it can be done without introducing significant extra cost, complexity or management overhead.

It is however probably also fair to say that many other systems can approximate the effect, albeit with a little more complexity.



Jim Kelly holds PernixPrime accreditation from PernixData and is a certified Nimble Storage Sales Professional. ViFX is a reseller of both Nimble Storage and PernixData.

%d bloggers like this: