Current Articles | RSS Feed RSS Feed

Stormy weather? Who is monitoring the cloud?

Posted by Andy Singleton on Tue, Dec 29, 2009 @ 03:31 PM

Someone needs to monitor the cloud services that we depend on.

At Assembla, we have been building complex systems with cloud servers.  In one recent case, we linked together voice servers, a Web application cluster (Web servers, app servers, database servers, message processors), and a secure credit card processing cage, all in different locations.  The advantages of building a system on cloud services are huge.  Such systems can be developed and scaled with unprecedented speed, and limited capital cost.  However, they are dependent on the reliability of the underlying systems.

As Web users, we expect that servers will be up 24x7x365.  I certainly do.  As vendors, we have to swing into action instantly if there is any downtime.

That is why some of us got up last night to fix a problem with a client's system (not assembla.com).  The network storage devices at one of our cloud hosts had become un-mounted.  We have seen regular slowdowns or outages in this particular storage service.  The Amazon EC2 system that hosts Assembla.com is more stable, but only a week ago we lost a bunch of virtual servers (a condition which Amazon warns us to expect).

We can jump into action with failover systems or workarounds for these problems.  But, being notified about the problems, or likely sources of problems, is critical to both putting in a workaround, and planning failover.  And, in last night's case, the vendor wasn't particularly helpful.  They didn't actually notice that they had a problem with their storage devices until we pointed it out, and they didn't warn us about the outages they would cause while fixing it.

So, someone needs to monitor the cloud services that we depend on.

This monitoring would cover a few needs:
* Customers need to be alerted when there is a problem with the cloud services that they use.
* Customers want historical data to see the maturity and reliability of services they are considering
* Vendors need current quality metrics and trends, alerts, and comparative metrics.

As services mature, monitoring is less critical.  However, hosting companies are constantly adding new services - new types of servers, storage, database, message queues, content distribution, higher level apps, etc., so there are a lot of potential problems to monitor.

Is this service available?  Is anyone interested in going in on a monitoring system for cloud services, a sort of weather report?


Tags: ,

COMMENTS

There are a few services working on this, check out the CloudStatus demo from Hyperic at http://www.cloudstatus.com. They also provide this app as a product you can install onsite. 
 
With that said, there is room in this space for more advanced monitoring; just because EC2 is available from Mountain View, CA that doesn't mean it is reachable from New York. Not only that, response times may vary based on geography - I'm not sure this problem will be going away anytime soon. 
 
Interestingly enough, the kind of response to a cloud outage may be quite different from the kind you would experience in a NOC. If cloud connectivity is degraded, an office manager or secretary can call the provider to determine the severity of the outage and the estimated time to correct it. This means traditional tools (Nagios, Zenoss, Cacti, etc.) may not be suitable for this kind of application. 
 
I think Hypernic has a good interpretation already, but there is probably room to improve on this.

posted @ Wednesday, December 30, 2009 2:10 PM by Eric Sarjeant


Most experienced operations folks that I know understand that monitoring/alerting systems are an essential part of being successful. Insert "tree falling in the woods" pun here. 
 
Implementing in the cloud, doesn't fundamentally change either the problem or the many solutions. Again, most experienced folks that I know with their stuff in the cloud are monitoring it too. 
 
There are a gazillion monitoring services and product that allow you to pay attention to your technology - cloud or otherwise. There are simple solutions that simply ping the front door to see if you appear to be alive. There are also sophisticated versions that allow you to go much further into your stack including some pretty fancy technology specific versions, like mobile that allows you to get all the way to through the carrier and to the handset. 
 
On the cloud front, many of them, Amazon included, already have their own weather report type monitoring of high level trends usually projected on a web page. 
 
So I don't think this is some new problem without ample solutions, nor do I think it is a cloud specific issue.  
 
Maybe what you are really pointing out relating to cloud computing is this... 
 
As cloud computing improves, becomes simpler to magically get stuff done and the audience broadens to include less experienced users that expect more to be included, perhaps they even unknowingly aren't paying attention to this age old challenge.

posted @ Thursday, January 07, 2010 11:32 AM by Chip Correra


Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics

Receive email when someone replies.

About This Blog

Author Andy Singleton writes about accelerating software development, distributed agile teams, and Assembla.com services.

Subscribe by Email

Your email: