StorPool is happy to present an enhanced version of its storage Monitoring and Analytics Tool. If you are using StorPool storage and taking the benefits of high throughput, low latency, and great reliability, this tool will be useful to provide your system administrators with statistics and valuable information. Now, it is even better, more detailed and provides better UX.
StorPool has implemented graphs visualized with Grafana – the open platform for beautiful analytics and monitoring. Using it, we are able to provide our customers with insights, important statistics, and metrics for the overall performance of their storage system. In addition, all StorPool’s users can track the health and performance of their servers, volumes, disks and individual nodes.
Grafana is a leading open source software for time series analytics with over 150,000 active installations. It is helping us to visualize important metrics, which help fine-tune the storage system and help improve your situational awareness.
What can you track with Grafana and StorPool’s storage monitoring?
By using our metrics system you can see important statistics for the clients, servers, disks, volumes, templates, CPUs and general performance metrics of your storage cluster.
- Clients monitoring – provides a statistic for the nodes connected to StorPool as clients. You can see three types of stats – overall for the whole system, parameters for all hosts and parameters per host. These parameters are mostly self-explanatory, like read or write operations per second.
- Servers monitoring – tracks the servers in your storage system. These again have a split for the whole system, parameters for all nodes and parameters per node. Here the displayed metrics are similar to the ones displayed for the clients (above), but they describe the back-end of the system, i.e. StorPool’s interaction with the hardware.
- Disks monitoring – gives you basic information for the usage per type (HDD or SSD).
- Volumes monitoring – these metrics describe the interaction of the user (for example, a VM) with a StorPool volume. Additionally, there is a dashboard to see the volumes that generate the most load on the system, for reads or writes.
- Templates monitoring – see what is the used and free space per template and its placement groups. This can help you monitor the available space and the actually provisioned space in the system.
- CPU monitoring – tracks the CPU usage on all nodes and the actual scheduling delay for tasks (the run-queue). Using these, you can monitor not only the overall CPU usage for user applications/VM and StorPool services but also see the scheduling delay and the introduced latencies from oversubscription. The available dashboards allow you to monitor the overall CPU usage, the usage in a node, or of a specific CPU in a node.
Inside the storage Monitoring and Analytics
StorPool’s analytics are organized in an interface that’s user-friendly and easy to navigate.
On the first row of your storage monitoring system, you can see the parameters for the overall front-end and back-end operations of the cluster. The front-end operations are those between the customers’ side and StorPool. The back-end operations are those from StorPool to the hardware of your storage system.
The second row presents the basic parameters of the system, where you can see information about the number of used volumes, physical drives, and active clients.
All the performance counters bellow present the data in two different time resolutions – per minute and per second:
– one minute resolution data – this data is kept for a year and shows the statistics per a single moment within the minute. This data type helps you to make a general analysis of the system.
– one second resolution – this data is kept for two days and provide all statistics per second. Helps you to make an in-depth analysis of every single process within the storage cluster.
Тhe functionalities you do not want to miss
As we strive to provide you detailed statistics and enable you to make an in-depth analysis of the whole system, there are some very cool functionalities:
- Time zoom – zoom the graphic in a certain time period to view a specific sub-period to analyze just the important events;
- Crosshair – by default, we have enabled a shared crosshair among different charts on the same screen. Showing the tooltip on all panels at the same time helps you co-relate the different graphs.
- Top volumes statistics (somewhat self-explanatory);
Useful shortcuts to work easier with the storage monitoring
Grafana has extensive navigational keyboard shortcuts. These come in handy when dealing with on-wall displays that may not have a mouse available (or when you just want to make your life easier). The shortcuts are the same across operating system and you can see them here: http://docs.grafana.org/reference/keyboard_shortcuts/
As the next logical step, we would introduce network and memory statistics. Check our blog regularly for more news!
Here are the stats for a single volume, which describe what the end-user “feels”:
In the “All server stat” dashboard, “Average disk write latency per server” can help you detect a problem with a specific server(s). The example below comes from a hybrid system which lost the batteries on the RAID controllers of two servers.
See the trends in storage load while adding new load. The example below comes from our own cluster, the difference being in a few more InfluxDB instances:
Another example from the CPU stats shows the result of consolidating the load on fewer hypervisors:
These examples show some basic usage, but the possibilities are almost limitless and by using this tool, you can keep your hand on the pulse of your storage cluster.