IBM N Series (Netapp) Capacity Sizing

One thing I left off this post is a discussion of fractional reserve. This can be a major and I should have covered it. Some people allow 100% extra space when provisioning LUNs out of ONTAP. FR is about guaranteeing that you have space to write changed blocks in your active filesystem. It’s hard to explain it clearly – I’ve seen many try and fail. I myself find it confusing and complicated and just when I think I understand it, I end up with more questions. So I will just issue a general warning that if you are creating LUNs under ONTAP and you have snapshotting enabled, then discuss with your installer how much space needs to be set aside for the FR.

[Now updated to include base2 results]

I thought a quick post on calculating nett capacity with IBM N Series might be in order, since we have been caught out once or twice with this in the past. Hopefully this post will help others avoid problems of accidental under-capacity.

There are two tools available internally for these calculations, Netapp’s Synergy, and IBM’s Capacity Magic and like the man with two watches, if you use both tools you end up with different answers so you’re not quite sure which is correct.

Let me pick two examples:

1. N3400 (FAS2040 HA) with 36 x 600GB SAS 15K drives

I will specify two arrays of 16 drives each (14+2 RAID-DP) plus 4 hot spares. I’m not deducting snapshot space, and I want ONTAP to be on a Flexvol (i.e. a logical allocation rather than dedicating three drives for each ONTAP image).

  • Capacity Magic tells me that the nett useable is 13,531.50 GB (12,602 GiB) or 70% of raw
  • Netapp’s Synergy Ad Hoc Capacity Visualizer gives 14.8 TB or 77% of raw

I’m not quite sure where the extra 1.3TB comes from since both tools report 28 data drives and 4 parity drives.

2. N6060 (FAS3160 HA) with 224 x 1TB SATA

I will use 14 arrays of 16 drives each (14+2) and ignore hot spares. I’m not deducting snapshot space, and for ease of comparison with Synergy I will assume that ONTAP is on a flexvol rather than on a dedicated 1+2 array (which would generally be my preference on a system this large).

  • Capacity Magic reports base10 nett useable of 144,866 GB = 144.866 TB
  • Capacity Magic reports base2 nett useable of 134,917 GiB = 131.75 TiB
  • Netapp’s Synergy Ad Hoc Capacity Visualizer gives 156.768 TB (base10)
  • By popular demand here also is Netapp’s Synergy Ad Hoc Capacity Visualizer base2 result 142.579 TiB (base2)

Again, I’m not quite sure where the extra 12TB or 10.8TiB came from since both tools report 196 data drives and 28 parity drives.

Capacity Magic output is as follows (reports both base2 and base10):

Synergy output is as follows for base10:

Synergy output is as follows for base2:

So what we see is that Capacity Magic tells us, base2 134,917 GiB which in TiB is 134,917 / 1024 = 131.754 TiB

…and Synergy tells us 142.579 TiB which is 8.2% higher.

What installers tell me anecdotally however is that the Capacity Magic figures are closer to reality, but  they suggest that even the Capacity Magic figures might fail to include the additional 5% that aggregates require for the internal metadata snaps to maintain integrity, and they also tell me it is best to avoid filling flexvols past 95%.

I’m not an expert on precisely where the space is allocated (e.g. 10% WAFL, 12.5% block checksums etc) but whatever the ins and outs of it all, after one or two incidents of ending up with less storage than we expected, I have now taken to using Capacity Magic’s figures and then deducting 10% 5%, which leads to a rule of thumb of “about 60%” of raw.

The above discrepancies have nothing to do with the question of counting in binary or decimal.

As I said, hopefully this post will help others avoid problems of accidental under-capacity. The moral of the story is to be conservative, since the calculation of nett space is clearly not as simple as it might appear at first glance.

7 Responses

  1. Hi Jim,

    Maybe you want to read, for what I think is a pretty comprehensive explanation:

    http://bit.ly/b4hBig

    I’d say, since you’re with IBM, that you probably should look at getting lab access to a real system and check those figures for real. From what you write it seems this hasn’t happened.

    For your first example (2040 w/ 36x 600GB) synergy shows me 13.4TB (4 spares, 4 parity, 28 data).

    70% efficiency based on the real, base2 size of the drives.

    Adding the (OPTIONAL) 5% reserve shows 12.7TB. Which is pretty much what your Capacity Magic shows.

    66.5% efficiency based on the real, base2 size of the drives.

    You can also go down to 2 spares (which is fine), and increase the size of each RG to 17 (just 1 more drive). This takes you to 14.4TB usable (base2) or 75% efficiency.

    Even adding the (OPTIONAL) 5% snap reserve, you’re at 13.6 (71% efficiency).

    Similar story with the bigger system in your example, Synergy doesn’t give me anywhere near the number you posted. I get 132TB usable with the 5% reserve (73% usable). Or, 139.6TB without (77% usable).

    Having an optional separate root aggr per controller (similar to EMC’s 5-drive FLARE) reduces this by 6 drives. Which I’m not counting towards usable space. You’re then at 128.5TB with the aggr snap reserve and 135TB without (71% and 74.7% respectively).

    So, it looks as if you’re not using Synergy quite correctly. I’d be happy to provide a tutorial, it would help you sell more of our systems.

    D

    Like

    • @Dimitris Krekoukias

      I have posted the screenshot from Synergy so you can see where the figures come from.

      I didn’t count spares when I calculated percentages, but I do think you need to count root drives if you separate them out.

      What about the suggestion that you shouldn’t fill a flexvol more than 95%?

      Like

    • Dimitris, I am the N Series FTSS for the northeast.

      If you have time in your schedule, I would be interested in speaking
      with you about Synergy. I am responsible for all N Series sales in a 13 state region.

      Any additional insight that you can provide, would be most greatly appreciated.

      Best regards,
      Denis

      demond@us.ibm.com
      Cell: 508-816-0727

      Like

  2. Jim, please stop using base10 at the top left – you need to switch to base2 :) That’s the source of your problems… simple as that.

    Then you’ll get real numbers. Base10 in the GUI is just for comparison purposes when fighting vendors that quote base10 (you know who you are!)

    I had all necessary spares for the numbers in my config, and even had separate root drives as an option for the larger one (check my reply). Pointless quoting storage any other way – when I say usable, I mean usable after it’s all said and done!

    In the smaller box you wouldn’t separate out the root aggr.

    You can fill a flexvol if you’d like, check my post (linked in my first reply) on usable space for all the info. If it’s totally 100% full and it’s not set to auto-grow, trying to do some things like dedup and snaps might fail.

    Also, your original entry creates an unnecessary negative vibe around NetApp usable space, simply because you made a mistake in one of the sizer options.

    http://bit.ly/b4hBig is all you need…

    D

    Like

    • As long as one base is chosen consistently the numbers should still match. Or are you saying there’s a bug in Synergy that miscalculates the base10 figures. I am confident that if I do a base2 comparison I will get the same discrepancy unless there is a bug in Synergy.

      base10 is the SI, IEEE, and SNIA standard, so no I won’t stop using base10.

      By saying “You can fill a flexvol… If it’s totally 100% full and it’s not set to auto-grow, trying to do some things like dedup and snaps might fail.” you are confirming that it would be most unwise to assume you can fill a flexvol to 100%. Thanks for clearing that up for me.

      We have installed a few largish N series sites this year and I am involved in finalizing three smaller sites as we speak. I am not creating an unnecessary negative vibe around Netapp, I am trying to understand the actual situation re capacity and avoid over-promising and under-delivering. IBM Business Conduct Guidelines regularly remind us that we should be very clear with customers about things like this.

      Like

  3. I never use base10 – it causes exactly what you describe, people buy a box (ANYONE’S BOX) and don’t see the advertised space. Computers deal in base2…

    So when NetApp engineers are asked to provide a certain capacity, it will always be base2.

    All the numbers I did were base2 and seem to agree a lot more with capacity magic than what you did, so I think you’d be better served by going base2 for sizing of any storage system.

    Frankly, I can’t even believe we actually had this argument.

    D

    Like

  4. Let’s call it a discussion. I managed to find time to re-run the numbers with base2, and as expected the discrepancy remains. (see red text in the updated post)

    So to summarise:
    1. Our installers tell us that customers can use less than Capacity Magic advises.
    2. My testing shows that Capacity Magic is around 8% more conservative than Synergy.
    3. Therefore if you size using Synergy you are likely to significantly under-size real user capacity.

    Remember that the reason for this post in the first place is that we have had just this experience, where the customer ended up disappointed in the usable capacity.

    Like

Leave a comment