Out there in IBM land the field technical and sales people are often given a guideline of between 5% and 10% of total NAS capacity being allocated for metadata on SONAS or Storwize V7000 Unified systems. I instinctively knew that 10% was too high, but like an obedient little cog in the machine I have been dutifully deducting 5% from the estimated nett capacity that I have sized for customers – but no more!
Being able to size metadata more accurately becomes especially important when a customer wants to place the metadata on SSDs so as to speed up file creation/deletion but more particularly inode scans associated with replication or anti-virus.
[updated slightly on 130721]
The theory of gpfs metadata sizing is explained here and the really short version is that in most cases you will be OK with allowing 1 KiB per file per copy of metadata, but the worst case metadata sizing (when using extended attributes, for things like HSM) should be 16.5 KiB * (filecount+directorycount) * 2 for gpfs HA mirroring.
- if you have 20,000 files and directories the metadata space requirement should be no more than 16.5 * 20,000 * 2 = 660,000 KiB = 645 MiB
- if you have 40 million files and directories the metadata space requirement should be no more than 16.5 * 40,000,000 * 2 = 1,320,000,000 KiB = 1.23 TiB
So why isn’t 5% a good assumption? What I am tending to see is that average file size on a general purpose NAS is around 5MB rather than the default assumption of 1MB or lower.
So it’s more important to have a conservative estimate of your filecount (and directory count) than it is to know your capacity.
The corollary for me is that budget conscious customers are more likely to be able to afford to buy enough SSDs to host their metadata, because we may be talking 1% rather than 5%.
Note: When designing SSD RAID sets for metadata, SONAS/V7000U/gpfs will want to mirror the metadata across two volumes, so ideally those volumes should be on different RAID sets.
Because of the big difference between the 16.5 * formula and the 5% to 10% guideline I’d be keen to get additional validation of the formula from other real users of Storwize V7000 Unified or SONAS (or maybe even general gpfs users). Let me know what you are seeing on your own systems out there. Thanks.