Read Ahead, Dead Ahead…

Just a short one to relate an experience and sound a warning about the wonderful modern invention of read ahead cache.

Let me start by quoting an arstechnica post from 2010:

I have this long-running job (i.e. running for MONTHS) which happens to be I/O-bound. I have 8 threads, each of which sequentially reads from an 80GB file, loads it into a specialized database, and then moves on to the next 80GB file. The machine has four CPUs, but the concurrency level was chosen empirically to get the maximum I/O throughput.

Today I was pondering how I could make this job finish before I die, and after some googling around I found you can jack up Linux’s read-ahead buffers to improve sequential reads. Basically it makes the kernel seek less, and slurp in more data before it moves on to the next operation. This is a good trade if you have tons of free memory.

Well, needless to say I was shocked at the improvement this brings. I set the readahead from the default of 256 (== 128KiB) to 65536 (== 32MiB) and the IO jumped way, way up. According to sar, in the ten-minute period before I made the change the input rate was 39.3MiB/s. In the first ten minute falling entirely after I made the change, the input rate was 90.0MiB/s. Output rate (to the database) leaped from 6MiB/s to 20MiB/s. CPU iowait% dropped from 49% to 0% , idle% dropped from 13% to 0%, and user% jumped from 37% to 97%.

In other words, this one simple command changed my workload from IO-bound to CPU-bound. I am using RHEL5, Linux 2.6.18.

blockdev --setra 65536 /dev/md0

Sounds great!

Why not make that the default setting for everything?

So here’s why not.

Without going into customer specific details I can tell you that right here in 2017 some workloads are very random, and truly random reads benefit very little from read ahead cache. In fact what can happen is that the storage just gets jammed up feeding data to the read ahead cache. If every 128 KiB random read gets translated into a 32 MiB read ahead and you start hitting high I/O rates then you can expect latency to go through the roof, and no amount of tuning at the storage end is going to help you.

So, if you’re diagnosing latency problems on a heavy random read workload, remember to ask your server admins about their read ahead settings.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: