Stormy weather? Who is monitoring the cloud?
Posted by Andy Singleton on Tue, Dec 29, 2009 @ 03:31 PM
Someone needs to monitor the cloud services that we depend on.
At Assembla, we have been building complex systems with cloud servers. In one recent case, we linked together voice servers, a Web application cluster (Web servers, app servers, database servers, message processors), and a secure credit card processing cage, all in different locations. The advantages of building a system on cloud services are huge. Such systems can be developed and scaled with unprecedented speed, and limited capital cost. However, they are dependent on the reliability of the underlying systems.
As Web users, we expect that servers will be up 24x7x365. I certainly do. As vendors, we have to swing into action instantly if there is any downtime.
That is why some of us got up last night to fix a problem with a client's system (not assembla.com). The network storage devices at one of our cloud hosts had become un-mounted. We have seen regular slowdowns or outages in this particular storage service. The Amazon EC2 system that hosts Assembla.com is more stable, but only a week ago we lost a bunch of virtual servers (a condition which Amazon warns us to expect).
We can jump into action with failover systems or workarounds for these problems. But, being notified about the problems, or likely sources of problems, is critical to both putting in a workaround, and planning failover. And, in last night's case, the vendor wasn't particularly helpful. They didn't actually notice that they had a problem with their storage devices until we pointed it out, and they didn't warn us about the outages they would cause while fixing it.
So, someone needs to monitor the cloud services that we depend on.
This monitoring would cover a few needs:
* Customers need to be alerted when there is a problem with the cloud services that they use.
* Customers want historical data to see the maturity and reliability of services they are considering
* Vendors need current quality metrics and trends, alerts, and comparative metrics.
As services mature, monitoring is less critical. However, hosting companies are constantly adding new services - new types of servers, storage, database, message queues, content distribution, higher level apps, etc., so there are a lot of potential problems to monitor.
Is this service available? Is anyone interested in going in on a monitoring system for cloud services, a sort of weather report?