There are several key things to consider when building a storage system. We decided to cover the most important and shine some light on the inner workings of a good storage systems. The first case we’ll cover is on storage sizing and IOPS density per GB.
Many companies go out to buy a storage solution without understanding their needs or their use case well enough. Here is an example from the recent days:
A company asks for 70 TB of usable storage for a virtualized environment. They are looking to get a solution which can do 10,000 IOPS.
IOPS density and keeping your user’s sanity
10,000 IOPS on 70 TB storage systems makes just 0.15 IOPS per GB. Thus a typical VM with 20-40 GB disk will get just 3 to 6 IOPS. Dismal. 50-100 IOPS per VM can be a good target for VMs which will be usable, not lagging. This will keep your users happy enough, instead of pulling their hair.
For reference Google’s Standard SSD Persistent volumes come with 30 IOPS per GB. I.e. 200 (!) times more than teh stated requirement: https://cloud.google.com/compute/docs/disks/performance#ssd-pd-performance
So a Google VM with 40 GB disk and 30 IOPS/GB will be able to peak at (maybe not sustain, though) 1,200 IOPS.
Storage system sizing in virtualized environments
A system with 70 TB usable can easily store the data of 1000-2000 VMs. Sometimes much more, depending on the average disk size of a VM and the gains from space saving features.
For the sake of exploring the boundaries, let’s assume that the average VM disk size is 40 GB and the storage solution has deep integration with a cloud management system, which provisions the VMs – say OpenStack, CloudStack, OpenNebula or similar. With a good integration we have measured 2 to 5 times gain in terms of logical-to-usable space. In other words on 70 TB usable one can save from 140 TB to 350 TB of logical VM disks. This is 3,500 to 8,750 VMs on the original 70 TB usable!
If we take the 50 IOPS per VM mark, then we should have a system which can deliver between 175,000 and 437,500 IOPS! Further we should be looking at latency metrics, since a system which delivers this many IOPS, but has a latency of over 0.5 milliseconds will deliver bad user experience too. Actually, latency is the more important metric for many block-level storage workloads.
Thus we’ll cover the latency aspect in a separate post.