We are not fans of maintenance windows here at StorPool. We don’t like them as end users or as service providers. We don’t like them as suppliers of data storage. We believe we can all do better.
The Need for Always-On
As a consumer, you live always on. You want to check your bank account or stream music or set your smart thermostat at any time from anywhere. You shop from anywhere at any time. And if the website you’re ordering from tells you “Please try again later”, you buy somewhere else.
As a service provider, you need to provide highly available services to your customers. And as a consumer of services, you select your suppliers carefully and rely on their SLA promise because you can’t provide higher availability than your essential suppliers. Cloud providers need to experience and deliver always-on operations. Yes, we all know outages occur, but systems designed so that the failure of any individual component won’t take them down succeed and companies using these services prosper.
Modern IT and Modern Data Storage
Failure of components in a complex system isn’t a worst-case scenario, but rather an inevitability. Modern systems need to be designed such that no single component being offline will bring the entire service down. But you can’t just design for the failures you expect. In modern complex systems, a failure in one component feeds back into other components, creating conditions you can’t predict. Chaos Engineering as a practice was developed in response to these complex web systems, creating tools (famously Chaos Monkey at NetFlix) that intentionally disable parts of the system to test how the other components react. At StorPool, we have years of experience and massive real-time data where we’ve experienced these unpredictable feedback loops and surprise failures and learned how to ensure StorPool Storage continues to deliver.
Old Habits vs New Realities
But for some reason, It’s still not uncommon to see scheduled, late-night, or weekend maintenance windows where critical systems and services are brought down for an hour or more. It could be old habits. Or an IT department hazing ritual to put the newbies into their place? Maybe people lack the skills to design for resilience or don’t trust the resilient designs they have in place?
But it’s a bad idea to have services off-line. Your middle of the night is somebody else’s essential business hours. Two in the morning here in Sofia, Bulgaria is 4 pm in Silicon Valley. Sunday is a work day in Israel. Your “local” customers travel to other time zones. They expand geographically. In short, there is no longer a “downtime” when you can safely take a service offline.
Industry Moving to Always On
Now aren’t there some maintenance operations that require you to be off-line? You have to take a server down to do a kernel patch… Oh, wait. Red Hat says you don’t have to. And you definitely need to take a server down to change drives, right? Oh, Dell says otherwise. With modern systems and methods, you can even make changes to database schema or migrate a database without having to shut it down. The industry continues to move towards always-on operations.
Now this isn’t to say that designing for resilience is easy. It requires careful planning and consideration. It requires that developers pay attention to best practices in deployment and operations. But aren’t those all good things? Shouldn’t we be doing that anyway? Isn’t that what led to the DevOps movement? Developers can get lazy if they know there will be maintenance windows to bail them out. Knowing that there won’t be any maintenance windows and that any problems will result in highly visible downtime can trigger great focus and concentration from the developers, resulting in better, more reliable, more supportable applications.
StorPool is Always On
At StorPool, we looked outside of the traditional data storage routine and thought about what not only IT teams need for their data, but what the world needs from data storage. Our team of engineers designed our product from the ground up to be always on. Our unique, shared-nothing architecture ensures continuous operation even if a component fails. We support in-service software upgrades and configuration changes. Performance and capacity can be improved by adding new servers and drives to the cluster. Even a complete server refresh can be accomplished with no downtime, with replacement servers added to the network and retiring servers removed.
Maintenance windows are no longer acceptable. You can do better. We can help.
Get started with StorPool today.