Disks are slow. Sure, they’ve gotten faster with time (like almost every other technology), but their speed still pales in comparison to system memory. Solid State Drives (SSDs) are changing that, but compared to traditional rotational disks, enterprise grade SSDs are still far more expensive.
If you start to read about disk performance, you’ll quickly realize that the performance of traditional hard disks varies greatly depending on whether you access data sequentially or not. In other words, if I want to read in 1GB of data that’s all together on the disk, that’s going to be much faster than reading 1GB of data that’s scattered about on the disk. That’s because the disk is constantly spinning, and if the data’s not being read in order, it might need to wait for the disk to spin around again in order to read the data. That’s a simplification, but you probably get the idea.
This probably doesn’t seem so bad. After all, a lot of the data should be in order, right? That really depends on the use case—every workload is going to be a mix of sequential and random reads and writes. There’s a caveat though—the more independent things you have accessing the disk, the more likely it is that a request that would be sequential might get interrupted in order to handle a different request. Think of it this way—if I’m picking up a bunch of papers laid out in a line, I can go pretty fast. If I’m on page 20 and someone comes over and hands me a replacement for page 1050, that’s going to slow me down a bit. And I can’t just tell him to wait, because it might be a while until I get there otherwise. It’s kind of the same way with disks, especially when virtualization is involved since many VMs are likely sharing those disks.
So how can you improve performance in this kind of situation? One answer is to use SSDs to store all of your data since they have very good random performance, but like we mentioned earlier, they’re still pretty expensive. Another answer is to use cache. Cache is high speed storage space (traditionally using RAM, but newer SANs can use SSDs as well) that can be used to hold some data temporarily. Here’s how our earlier example would have worked with cache—I’m on page 20 and someone comes over and hands me a replacement for page 1050. I put it in my magic backpack and just keep doing what I’m doing. If someone asks me for page 1050 before I get there, I know I have a new one and I pull it out of my backpack almost instantly. When I have a spare moment, I’ll go ahead and actually put that page where it belongs to free some room in my backpack (cache). As long as I have room in my backpack, I never have to stop what I’m doing to handle that new page.
What I just described is how write cache speeds up disk performance. It makes write requests very fast, and it helps read speeds by making it less likely that they’ll get interrupted. There’s also read cache which will take frequently requested data and copy it to the cache where it can be retrieved quickly. If the same data is being requested over and over, this can make a huge difference.
We recently added a layer of SSD cache to our Gated Community Cloud and the performance increase was dramatic– Latency (the amount of time it takes to handle a read or write request) is now a fraction of what it used to be. If you’re looking for ways to increase storage performance, don’t forget to take a look at the difference that a properly sized cache can make!


