My more recent blog articles are now appearing on LinkedIn

Please note that my more recent blog articles are now being posted on LinkedIn

 

https://www.linkedin.com/in/jimkellynz/detail/recent-activity/posts/

 

 

The Cool Way to do Security Analytics

https://www.linkedin.com/pulse/cool-way-do-security-analytics-jim-kelly/

ETP Elastic Stack Appliance2

 

Shared Nothing or Loosely Coupled?

Shared Nothing or Loosely Coupled?
https://www.linkedin.com/pulse/shared-nothing-loosely-coupled-jim-kelly

S3 on-premises for your devs @<$10K

S3 on-premises for your devs @<$10K
https://www.linkedin.com/pulse/s3-on-premises-your-devs-10k-jim-kelly

Four Ideas to Modernize your Network Strategy

Four Ideas to Modernize your Network Strategy
https://www.linkedin.com/pulse/four-ideas-modernize-your-network-strategy-jim-kelly

The Economics of Software Defined Storage

I’ve written before about object storage and scale-out software-defined storage. These seem to be ideas whose time has come, but I have also learned that the economics of these solutions need to be examined closely.

If you look to buy high function storage software, with per TB licensing, and premium support, on premium Intel servers with premium support, then my experience is that you have just cornered yourself into old-school economics. I have made this mistake before. Great solution, lousy economics. This is not what Facebook or Google does, by the way.

If you’re going to insist on premium-on-premium then, unless you have very specific drivers for SDS, or extremely large scale, you might be better to go and buy an integrated storage-controller-plus-expansion-trays solution from a storage hardware vendor (and make sure it’s one that doesn’t charge per TB).

With workloads such as analytics and disk-to-disk backups, we are not dealing with transactional systems of record and we should not be applying old-school economics to the solutions. Well managed risk should be in proportion to the critical availability requirements of the data. Which brings me to Open Source.SED

Open Source software has sometimes meant complexity and poorly tested features and bugs that require workarounds but the variety, maturity and general usability of Open Source storage software has been steadily improving, and feature/bug risks can be managed. The pay-off is software at $0 per usable TB instead of US$1,500 or US$2,000 per usable TB (seriously folks, I’m not just making these vendor prices up).

It should be noted that open source software without vendor support is not the same as unsupported. Community support is at the heart of the Open Source movement. There are also some Open Source storage software solutions that offer an option for full support, so you have choice about how far you want to go.

It’s taken us a while to work out that we can and should be doing all of this, rather than always seeking the most elegant solution, or the one that comes most highly recommended by Gartner, or the one that has the largest market share, or the newest thing from our favorite big vendors.

It’s not always easy and a big part of the success is making sure we can contain the costs of the underlying hardware. Documentation and quoting and design are all considerably harder in this world, because you’re going to have to work out a bunch of this for yourself. Most integrators just don’t have the patience or skill to make it happen reliably, but those that do can deliver significant benefits to their customers.

Right now we’re working solutions based on S3 or iSCSI or NFS scale out storage with options for community or full support. Ideal use cases are analytics, backup target storage, migration off AWS S3 to on-premises to save cost, and test/dev environments for those who are deploying to Amazon S3, but I’m sure you can think of others.

Read Ahead, Dead Ahead…

Just a short one to relate an experience and sound a warning about the wonderful modern invention of read ahead cache.

Let me start by quoting an arstechnica post from 2010:

I have this long-running job (i.e. running for MONTHS) which happens to be I/O-bound. I have 8 threads, each of which sequentially reads from an 80GB file, loads it into a specialized database, and then moves on to the next 80GB file. The machine has four CPUs, but the concurrency level was chosen empirically to get the maximum I/O throughput.

Today I was pondering how I could make this job finish before I die, and after some googling around I found you can jack up Linux’s read-ahead buffers to improve sequential reads. Basically it makes the kernel seek less, and slurp in more data before it moves on to the next operation. This is a good trade if you have tons of free memory.

Well, needless to say I was shocked at the improvement this brings. I set the readahead from the default of 256 (== 128KiB) to 65536 (== 32MiB) and the IO jumped way, way up. According to sar, in the ten-minute period before I made the change the input rate was 39.3MiB/s. In the first ten minute falling entirely after I made the change, the input rate was 90.0MiB/s. Output rate (to the database) leaped from 6MiB/s to 20MiB/s. CPU iowait% dropped from 49% to 0% , idle% dropped from 13% to 0%, and user% jumped from 37% to 97%.

In other words, this one simple command changed my workload from IO-bound to CPU-bound. I am using RHEL5, Linux 2.6.18.

blockdev --setra 65536 /dev/md0

Sounds great!

Why not make that the default setting for everything?

So here’s why not.

Without going into customer specific details I can tell you that right here in 2017 some workloads are very random, and truly random reads benefit very little from read ahead cache. In fact what can happen is that the storage just gets jammed up feeding data to the read ahead cache. If every 128 KiB random read gets translated into a 32 MiB read ahead and you start hitting high I/O rates then you can expect latency to go through the roof, and no amount of tuning at the storage end is going to help you.

So, if you’re diagnosing latency problems on a heavy random read workload, remember to ask your server admins about their read ahead settings.

figure-6-320x240-mpeg-no-read-version-with-varying-read-ahead

%d bloggers like this: