The Cool Way to do Security Analytics

https://www.linkedin.com/pulse/cool-way-do-security-analytics-jim-kelly/

ETP Elastic Stack Appliance2

 

Advertisements

Shared Nothing or Loosely Coupled?

Shared Nothing or Loosely Coupled?
https://www.linkedin.com/pulse/shared-nothing-loosely-coupled-jim-kelly

S3 on-premises for your devs @<$10K

S3 on-premises for your devs @<$10K
https://www.linkedin.com/pulse/s3-on-premises-your-devs-10k-jim-kelly

Four Ideas to Modernize your Network Strategy

Four Ideas to Modernize your Network Strategy
https://www.linkedin.com/pulse/four-ideas-modernize-your-network-strategy-jim-kelly

The Economics of Software Defined Storage

I’ve written before about object storage and scale-out software-defined storage. These seem to be ideas whose time has come, but I have also learned that the economics of these solutions need to be examined closely.

If you look to buy high function storage software, with per TB licensing, and premium support, on premium Intel servers with premium support, then my experience is that you have just cornered yourself into old-school economics. I have made this mistake before. Great solution, lousy economics. This is not what Facebook or Google does, by the way.

If you’re going to insist on premium-on-premium then, unless you have very specific drivers for SDS, or extremely large scale, you might be better to go and buy an integrated storage-controller-plus-expansion-trays solution from a storage hardware vendor (and make sure it’s one that doesn’t charge per TB).

With workloads such as analytics and disk-to-disk backups, we are not dealing with transactional systems of record and we should not be applying old-school economics to the solutions. Well managed risk should be in proportion to the critical availability requirements of the data. Which brings me to Open Source.SED

Open Source software has sometimes meant complexity and poorly tested features and bugs that require workarounds but the variety, maturity and general usability of Open Source storage software has been steadily improving, and feature/bug risks can be managed. The pay-off is software at $0 per usable TB instead of US$1,500 or US$2,000 per usable TB (seriously folks, I’m not just making these vendor prices up).

It should be noted that open source software without vendor support is not the same as unsupported. Community support is at the heart of the Open Source movement. There are also some Open Source storage software solutions that offer an option for full support, so you have choice about how far you want to go.

It’s taken us a while to work out that we can and should be doing all of this, rather than always seeking the most elegant solution, or the one that comes most highly recommended by Gartner, or the one that has the largest market share, or the newest thing from our favorite big vendors.

It’s not always easy and a big part of the success is making sure we can contain the costs of the underlying hardware. Documentation and quoting and design are all considerably harder in this world, because you’re going to have to work out a bunch of this for yourself. Most integrators just don’t have the patience or skill to make it happen reliably, but those that do can deliver significant benefits to their customers.

Right now we’re working solutions based on S3 or iSCSI or NFS scale out storage with options for community or full support. Ideal use cases are analytics, backup target storage, migration off AWS S3 to on-premises to save cost, and test/dev environments for those who are deploying to Amazon S3, but I’m sure you can think of others.

Read Ahead, Dead Ahead…

Just a short one to relate an experience and sound a warning about the wonderful modern invention of read ahead cache.

Let me start by quoting an arstechnica post from 2010:

I have this long-running job (i.e. running for MONTHS) which happens to be I/O-bound. I have 8 threads, each of which sequentially reads from an 80GB file, loads it into a specialized database, and then moves on to the next 80GB file. The machine has four CPUs, but the concurrency level was chosen empirically to get the maximum I/O throughput.

Today I was pondering how I could make this job finish before I die, and after some googling around I found you can jack up Linux’s read-ahead buffers to improve sequential reads. Basically it makes the kernel seek less, and slurp in more data before it moves on to the next operation. This is a good trade if you have tons of free memory.

Well, needless to say I was shocked at the improvement this brings. I set the readahead from the default of 256 (== 128KiB) to 65536 (== 32MiB) and the IO jumped way, way up. According to sar, in the ten-minute period before I made the change the input rate was 39.3MiB/s. In the first ten minute falling entirely after I made the change, the input rate was 90.0MiB/s. Output rate (to the database) leaped from 6MiB/s to 20MiB/s. CPU iowait% dropped from 49% to 0% , idle% dropped from 13% to 0%, and user% jumped from 37% to 97%.

In other words, this one simple command changed my workload from IO-bound to CPU-bound. I am using RHEL5, Linux 2.6.18.

blockdev --setra 65536 /dev/md0

Sounds great!

Why not make that the default setting for everything?

So here’s why not.

Without going into customer specific details I can tell you that right here in 2017 some workloads are very random, and truly random reads benefit very little from read ahead cache. In fact what can happen is that the storage just gets jammed up feeding data to the read ahead cache. If every 128 KiB random read gets translated into a 32 MiB read ahead and you start hitting high I/O rates then you can expect latency to go through the roof, and no amount of tuning at the storage end is going to help you.

So, if you’re diagnosing latency problems on a heavy random read workload, remember to ask your server admins about their read ahead settings.

figure-6-320x240-mpeg-no-read-version-with-varying-read-ahead

Comprestimator Guesstimator

Hey folks, just a quick post for you based on recent experience of IBM’s NAS Comprestimator utility for Storwize V7000 Unified where it completely failed to predict an outcome that I had personally predicted 100% accurately, based on common sense. The lesson here is that you should read the NAS Comprestimator documentation very carefully before you trust it (and once you read and understand it you’ll realize that there are some situations in which you simply cannot trust it).data-swamp

We all know that Comprestimator is a sampling tool right? It looks at your actual data and works out the compression ratio you’re likely to get… well, kind of…

Let’s look first at the latest IBM spiel at https://www-304.ibm.com/webapp/set2/sas/f/comprestimator/home.html

“The Comprestimator utility uses advanced mathematical and statistical algorithms to perform the sampling and analysis process in a very short and efficient way.”

Cool, advanced mathematical and statistical algorithms – sounds great!

But there’s a slightly different story told on an older page that is somewhat more revealing http://m.ibm.com/http/www14.software.ibm.com/webapp/set2/sas/f/comprestimator/NAS_Compression_estimation_utility.html

“The NAS Compression Estimation Utility performs a very efficient and quick listing of file directories. The utility analyzes file-type distribution information in the scanned directories, and uses a pre-defined list of expected compression rates per filename extension. After completing the directory listing step the utility generates a spreadsheet report showing estimated compression savings per each file-type scanned and the total savings expected in the environment.

It is important to understand that this utility provides a rough estimation based on typical compression rates achieved for the file-types scanned in other customer and lab environments. Since data contained in files is diverse and is different between users and applications storing the data, actual compression achieved will vary between environments. This utility provides a rough estimation of expected compression savings rather than an accurate prediction.

The difference here is that one is for NAS and one is for block, but I’m assuming that the underlying tool is the same. So, what if you have a whole lot of files with no extension? Apparently Comprestimator then just assumes 50% compression.

Below I reveal the reverse-engineered source code for the NAS Comprestimator when it comes to assessing files with no extension, and I release this under an Apache licence. Live Free or Die people.

#include<stdio.h>

int main()
{
printf(“IBM advanced mathematical and statistical algorithms predict the following compression ratio: 50% \n”);
return 0;
}

enjoy : )

 

 

%d bloggers like this: