Current Articles | RSS Feed RSS Feed

Dark Side of the Cloud: Problems with Storage

Posted by Andy Singleton on Tue, Oct 18, 2011
  
  

rain cloudWe recently moved from Amazon on-demand “cloud” hosting to our own dedicated servers.  It took about three months to order and set up the new servers versus a few minutes to get servers on Amazon.  However, the new servers are 2.5X faster and so far, more reliable.

We love Amazon for fostering development and innovation.  Cloud computing systems are great at getting you new servers.  This helps a lot when you are trying to innovate because you can quickly get new servers for your new services. If you are in a phase of trying new things, cloud hosts will help you.

Cloud hosts also help a lot when you are testing.  It’s amazing how many servers it takes to run an Internet service.  You don’t just need production systems.  You need failover systems.  You need development systems.  You need staging/QA systems.  You will need a lot of servers, and you may need to go to a cloud host.

However, there are problems with cloud hosting that emerge if you need high data throughput.  The problems aren’t with the servers but instead, with storage and networking.  To see why, let’s look at how a cloud architecture differs from a local box architecture.  You can’t directly attach each storage location to the box that it servers.  You have to use network attached storage.

DEDICATED ARCHITECTURE:  Server Box -> bus or lan or SAN -> Storage

CLOUD ARCHITECTURE:  Server Box -> Mesh network -> Storage cluster with network replication

1) Underlying problem:  Big data, slow networks

Network attached storage becomes a problem because there is a fundamental mismatch between networking and storage.  Storage capacity almost doubles every year.  Networking speed grows by a factor of ten about every 10 years – 100 times lower.  The net result is that storage gets much bigger than network capacity, and it takes a really long time to copy data over a network.  I first heard this trend analyzed by John Landry, who called it “Landry’s law.”  In my experience, this problem has gotten to the point where even sneakernet (putting on sneakers and carrying data on big storage media) cannot save us because after you lace up your sneakers, you have to copy the data OVER A NETWORK to get it onto the storage media and then copy it again to get it off.  When we replicated the Assembla data to the new datacenter, we realized that it would be slower to do those two copies than to replicate over  the Internet, which is slower than sneakernet for long distance transport but only requires one local network copy.

2) Mesh network inconsistency

The Internet was designed as a hub and spoke network, and that part of it works great.  When you send a packet up from your spoke, it travels a predictable route through various hubs to its destination.  When you plug dedicated servers into the Internet, you plug a spoke into the hub, and it works in the traditional way.  The IP network inside a cloud datacenter is more of a “mesh.”  Packets can take a variety of routes between the servers and the storage.  The mesh component is vulnerable to both packet loss and capacity problems.  I can’t present any technical reason why this is true, but in our observation, it is true.  We have seen two different issues:

* Slowdowns and brownouts:  This is a problem at both Amazon and GoGrid, but it is easier to see at Amazon.  Their network, and consequently their storage, has variable performance, with slow periods that I call “brownouts.”

* Packet loss:  This is related to the capacity problems as routers will throw away packets when they are overloaded.  However, the source of the packet loss seems to be much harder to debug in a mesh network.  We see these problems on the GoGrid network, and their attempts to diagnose it are often ineffectual.

3) Replication stoppages

The second goal of cloud computing is to provide high availability. The first goal is to never lose data.  When there is a failure in the storage cluster, the first goal (don’t lose data) kicks in and stomps on the second goal (high availability).  Systems will stop accepting new data and make sure that old data gets replicated.  Network attached storage will typically start replicating data to a new node.  It may either refuse new data until it can be replicated reliably, or it will absorb all network capacity and block normal operation in the mesh.

Note that in a large complex systems, variations in both network speed and storage capacity will follow a power law distribution.  This happens "chaotically."  When the variation reaches a certain low level of performance, the system fails because of the replication problem. 

I think that we should be able to predict the rate of major failures by observing the smaller variations and extrapolating them with a power law.  Amazon had  a major outage in April 2011. Throughout the previous 18 months, they had performance brownouts, and I think the frequency of one could be predicted from the other.

CONCLUSION

So, if your application is storage intensive and high availability, you must either:

1) Design it so that lots of replication is running all of the time, and you can afford to lose access to any specific storage node.  This places limits on the speed that your application can absorb data because you need to reserve a big percentage of scarce network capacity for replication.  So, you will have only a small percentage of network capacity available to for absorbing external data.  However, it is the required architecture for very large systems.  It  works well if you have a high ratio of output to input, since output just uses the replicated data rather than adding to it.

If you try this replication strategy, you will need to deal with two engineering issues.  First, you will think through replication specifically for your application.  There are many new database architectures that make this tradeoff in various ways.  Each has strengths and weaknesses, so if you design a distributed system, you will probably end up using several of these new architectures.  Second, you will need to distribute across multiple mesh network locations. It's not enough just to have several places to get your data, in the same network neighborhood.  If there is a problem, the entire mesh will jam up.  Ask about this.

2) Use local storage

2 Comments Click here to read comments

6 Power Streams That Accelerate Software Development

Posted by Andy Singleton on Mon, Oct 10, 2011
  
  

We're building software a lot faster now, compared with 10 or 15 years ago.  How are we doing it?  I put together this list to kick off a "Development Summit" event for our local trade organization, the Massachusetts Technology Leadership Council.

1) Staffing.  We are innovating with distributed teams, ecosystems, open source, incubators, outsourcing, crowdsourcing, etc.  This gets first place, because the team you put together has a bigger effect on outcomes than anything else.

2) Methodologies and management.  This includes the agile methodologies, continuous integration, lean startups, and people management techniques for both distributed and co-located groups.

3) Tools.  This includes installed and hosted hardware and software for developers

4) Platforms and Code.  The base development platforms and available code, much of it open source, are always improving.

5) Cloud and on-demand resources.  This encompasses a lot of the above (staffing, code, tools, platform as a service) but is worth its own category, because you can now go out on the global Internet and get a lot of things that will accelerate development "on demand".

6) Rapid deployment and feedback.  This includes SaaS, appstores, flash memory, phone home, feedback sites, and user reviews.

0 Comments Click here to read comments

Faster Dedicated Servers

Posted by Andy Singleton on Tue, Sep 20, 2011
  
  

Assembla.com is now running on shiny new dedicated servers in Atlanta.  We made the change on Saturday night.  The new servers give us advantages in reliability, failover, and speed (2.5 times faster) compared with our previous configuration in a cloud datacenter.

We apologize for some problems that users experienced during the move.  There may be some remaining configuration issues, but today we see that error rates in our logs have dropped to historic lows.

Reliability: We had downtime in the last year caused by problems with cloud storage.  The new system has directly connected disks and it will not be vulnerable to problems with cloud networking or storage.  I will analyze this aspect of cloud computing in a future article.

Failover:  The new servers are “triple redundant”.  (1) They have internally redundant power, disks, and networking.  (2) Data is replicated in real time to a local twin failover server.   (3) Data is also replicated to a disaster recovery system in a remote Amazon datacenter.  We will continue our tradition of never losing data, and we will be able to make your data accessible under (dare I say it) almost any circumstance.

Speed: This turned out to be a big win.  I am amazed at the difference in performance between today and last Friday.  The new servers are 2.5 times faster, with response times about 40% of the old response times.  For users that are not in North America, we will make further improvements in the next month by deploying a CDN.

THERE WERE PROBLEMS.  I apologize for glitches that affected our users.  This move was supposed to be transparent for users, and we were able to do it without shutting down the site at any time.  However, as the configurations and locations changed over the past three weeks, we had problems that affected users in various ways.  At one point, the system was not correctly creating new repositories or importing repositories.  Some repository operations saw network errors.  Some git users could not log in or saw “repository not found”.  Some FTP deploys failed.  We had problems immediately after announcing that the new Subversion servers were "more reliable".  It's true that the servers are more reliable, but the services running on them needed improved configurations.  In most cases, we fixed the problems within 12 hours.

For those of you who are curious, here is what we did:

Data migration:  We copied the data over the network.   This took many weeks.  In general, SaaS vendors have to cope with the trend (Landry’s law) of data storage growing faster than the network’s capacity to move that data.

Data replication: Then, we set our various systems to replicate data to the new servers, so that it would stay current in real time.

Proxying: We were able to move things around without affecting users (mostly) through the magic of HAProxy.  This is a proxy for HTTP and database traffic (and even other types of network requests) that that can be switched to send requests to different locations.

Subversion repository move:  We moved Subversion repositories first, because they were the most vulnerable to cloud storage problems.  I announced this move a few weeks ago.  We actually switched them one at a time with a lookup table in our front-end proxies.

Git and Mercurial repository move: Next, we moved the git and mercurial repositories.  The move was smooth, but they went into a new, more scalable configuration with front-end proxies, and this had a few glitches.

Switch the other systems (Db, app, queue, etc.):  With replication running, and all application servers configured, we used the proxy servers to send Web traffic to the new servers.  That part was smooth.

4 Comments Click here to read comments

Problems with Amazon EC2 is storage architecture

Posted by Andy Singleton on Thu, Apr 21, 2011
  
  

Some Assembla Subversion repositories are currently inaccessible.  This is because of problems in the EC2 storage architecture that we use.  The same problem is affecting services like Reddit and Quora.   Those services are completely down.  At Assembla, the majority of our services are working normally.  However,  the svn outage is very difficult for the people who can't code  [must  code, must  code].  

We made the decision to leave Amazon EC2 a few weeks ago because of this storage problem.  We are currently setting up dedicated servers with hardwired storage.

I discussed cloud storage architecture in a blog post last year. Now we have an update.   The Amazon version doesn't work well enough to deliver reliable service.  I think this is because it is network connected, and it uses a big and complicated network.  This network has failed at least four times in the last year.  They have had at least 10 months to fix it, but the problems have recurred.

We use Amazon EC2, and we recommend it because their truly on-demand server resources make it possible to rapidly try things, fix things, and innovate. Innovation speed is important. We recommend Amazon because they have done the most to deliver "on demand".

Assembla has a further responsibility to deliver at least 99.9% uptime - down no more than 2 hours per quarter. We beat this over the last year for all services except some svn repositories. We use some of the time budget on releases where we take down the database to make significant changes to the system. Then we run into the limits of the Internet and cloud infrastructure. We can easily exceed our reliability budget if EC2 storage network gets slow.  During the past year, we have had a total of about 24 hours of svn downtime scattered among our various repositories.  About 90 minutes of that was due to scheduled builds.  The rest was because of problems with EBS storage.

If you use EC2 (or you run any high-availability system), you can usually design your system to withstand the disappearance of at least one server. You use a cluster of servers where one can fail, or you use pairs with replication and failover between a live and a hot swap server.

However, storage failures are more difficult to manage.   You might try failing over to a different server, but if that server is using the same network attached storage system, it will also be slow or stopped.  You can  replicate to other storage  systems, but that gives you a tradeoff.  The replication uses network IO, so you get the storage bandwidth problems even more frequently.

In fact, this is exactly what happens at the Amazon EC2 datacenters that we use. Sometimes, the network that connects to the storage becomes slow. This crashes any system that uses a lot of storage IO.  Our Subversion systems are particularly sensitive to this.

The problem was particularly bad last summer. Amazon datacenters must have been overloaded. Since then, the disk IO speed has, on average, gotten much faster. However, it is still variable, and during the past two weeks, it crashed twice.

The obvious solution for us is to move into dedicated racks with attached SAN, and solid state disks, which is where we are going.

25 Comments Click here to read comments

Fast, Lean, and Global – Components and Practices of the Software Productivity Revolution

Posted by Andy Singleton on Thu, Aug 05, 2010
  
  

The last ten years have seen a massive increase in the productivity of software development.  New techniques that I call "fast, lean, and global" are allowing startups to build software for one fifth of the cost of ten years ago, and get it to market faster.  The same techniques are revolutionizing the delivery of software and online services from bigger companies, and moving into enterprise IT delivery.

At Assembla, we study fast, lean, and global software teams to build better tools for them.  We have identified a few key components that are driving the productivity revolution.  We also describe a set of practices - basic, intermediate, and advanced - for taking full advantage of these components.

Components

1) Open source and sharing

Free. Open source code, community development. Platforms and API's with sharing, versioning, and contribution strategies. Google searches that get you answers within minutes, not days.

2) Agile and Incremental development

Agile, scrum, kanban, daily builds and continuous integration/improvement, rapid release cycles, lean startups, minimum releasable products, automatic updates.

3) Distributed teams

Strategies for working together, collaborating, and managing. IP capture, management, and delivery.  Open-source communities, outsourcers, freelancers, multinationals, on-call admin and support.

4) On-demand Cloud services

Software and servers whenever you want them with no capital investment or setup.


Does it hold together?

The big question that I get is “Why link these components?” Yes, each of them is important.  Why not just study them, or teach them, one at a time, as the need arises?

We asked ourselves a similar question when we were building up our cloud development services.  We found that we weren’t just pitching “development and management of complex cloud applications.”  We were pitching the whole gummy ball of agile, continuous releases, and open source adoption, global on-demand teams, etc.  It all got stuck together.  We couldn’t pull it apart.  

I started thinking that these components really are stuck together.  For example, you can’t run a cloud datacenter without open source software.  There are too many servers to license, with too much infrastructure tweaking and versioning, to do it without using open source software.  And, you can’t build open source software without distributed teams.  You can’t make anything that complex work reliably without incremental development and releases.  Each is dependent on the other.

Practices

Practices are things that you can learn to do.  They help you take advantage of these large-scale components.  The “Fast Lean and Global” concept will be helpful if helps people learn useful practices in a straightforward way.

I find myself describing practices as Basic, Intermediate, and Advanced.  Let's look, for example, how these three apply to open source:

Basic: “Do research to find and test open source technology.”

Intermediate: “Contribute to open source projects.”

Advanced: “Organize and manage community projects.”

We don’t have a comprehensive list of the key practices in each of the three categories yet; that's where we need help from experts.  But, we do have a pretty good idea about some practices that work for us.  Here is a sample list:

practicetablefinal

0 Comments Click here to read comments

Terabytes on demand? Cloud storage options

Posted by Andy Singleton on Fri, Mar 26, 2010
  
  
Tags: 

If you use cloud virtual servers, you will sooner or later need a big pile of hosted storage to go with them.  Assembla.com uses a lot of storage for repositories. We also have customers of our cloud development/outsourcing practice that use a lot of storage for photos and other media.  So, I have been doing some research.  If you need storage, read on for a description of the types of options that I found for cloud hosted storage.

In working with hosted storage, you will need to remember that disks are much bigger than network pipes. Disk capacity doubles every two years or so. Network capacity doubles every four years or so - much, much more slowly. As a result, you can get a 1 terabyte drive for a few hundred dollars, but it will take two whole days to transfer the contents of this disk over an expensive Internet connection with an average speed of 50 mbit per second.

If you have many terabytes, as we do, your only practical option for moving data around the Internet is to use the old sneakernet - actually putting on your sneakers and delivering the media. Make sure that your storage provider will handle this type of delivery, both in and out.

Even if you don't intend to move between hosting locations, you will find that the network places a huge constraint on restore times. You can backup your data to remote locations, or even to a second device in the same datacenter, by using incremental processes like Rsync or backup software that moves only the changes. However, if you ever need to restore in a disaster recovery scenario, it will take you a long time, even over gigabit internal connections. If your customers are as demanding as Assembla customers, you might find yourself out of business before the restore completes between two devices. You will need to use storage which has redundancy inside the storage device.

S3 bucket-style file storage
Amazon created this category with their S3 service that allows you to add and read files in a type of filesystem that they call "buckets". Other vendors have followed. You can use various API's and protocol to put and get files, including HTTP get direct from storage. It is big, expandable storage, and it is highly available on the Internet, and it is cheap. This type of storage has a simplification that makes it almost, but not quite like, a filesystem. You can add files, replace files, and read files, but you can't modify the files. The vendor can use home-grown caching, layering, and redundancy without having to worry about locking any single version of the file. It's great for photos, videos, messages, message attachments, document repositories, and backup. On a byte count measure, it probably will dominate Internet storage. It's not useful for databases, repositories, indexes, or other systems that update, append, and modify files.

Single-mount storage
Amazon offers "Elastic Block Store", and many of the new cloud vendors offer even more integrated storage for your virtual servers. This is mounted like a local disk, but it is stored on a SAN or fileserver somewhere. If you need to restart your virtual server, it gets reattached automatically (in the integrated version) or manually (in the EBS version). This is a nice hosted version of a traditional hard disk, and it will satisfy most storage needs. It is what the cloud market is providing now.

These network-mounted volumes have the advantage of using RAID and/or SAN for underlying storage, so they presumably have redundancy and seldom need any backup or restore operations. However, in this case you do need to ask about the backup and restore plan if the underlying storage device fails. Don't just take it on faith that it will be well managed or rapidly restored, because some of these systems use file servers that can fail. You may find that you need an external backup, and this will introduce the long restore times.

My biggest complaint about this type of storage is size limits. For example, on Amazon you can get a volume up to 1 TB in size, and it can be mounted on one virtual computer. If you have more than 1 TB, you are going to do a lot of work to allocate files between multiple servers.

Shared cloud storage on a home-made file server
I have worked with vendors that sell "Cloud storage" that you can mount as a shared file system. Typically these can go bigger than single-mount storage, and sharing is a great advantage. It allows you to increase capacity by adding front-end servers, and provide higher reliability through failover. You can run two servers for everything, both connected to the same storage, and if one goes offline, the second one is still running, and you don't have to re-attach anything to keep going. Unfortunately, I have had bad experiences with the reliability and capacity of these devices, because they often are "home grown" and not the expensive NAS devices that I will describe later.

Dedicated file server or SAN
Many virtual server hosting companies have a "hybrid" option where they can offer you a dedicated fileserver, or a SAN (storage area network), which is a device that is like a fileserver, but shareable, and attached with special high speed fiber. This is, in fact, the best way to get high performance. And, if you are using databases that have a lot of file locking, it might be your only realistic option. It also may be the cheapest option over a two-year device lifetime.

However, this option requires an up-front investment in fixed hardware that might look expensive and archaic if you are used to buying on-demand services. It also causes a reliability problems. You are responsible for performance management, capacity management, and failover planning for this device. If the hosting company has trouble maintaining the reliability of "cloud storage" options in their own datacenter, it doesn't make sense that you will do a better job remotely.

Big, shared, modern network attached storage
In the last few months, some hosting companies have started installing modern "Network Attached Storage" devices like Netapp and EMC Atmos. These devices provide all of the advantages of the previous options. They are mounted as real filesystems, so they are easy to use and support any type of application. With a gigabit network you can use them for locking-intensive applications. They can give you hundrds of terabytes. They provide all of the failover and scaling advantages of shared volumes, supporting up to hundreds of clients. The only case where this isn't the easiest storage to work with is when you need the performance of a dedicated, fiber SAN. In any other case, you will probably find that this storage is the simplest option. However, there is one catch: Cost.

One magic trick of these devices is the way that they do "snapshots" and backup. You can ask them to take a snapshot of a database or filesystem, and they will copy and save an image, without any interruption, even while taking new changes and locks from dozens of client machines. This is a technical trick that requires a lot of software and redundancy at all hardware levels. It makes them expensive. I am not sure exactly how much this storage will cost because the hosting vendors don't have much experience with it, and I haven't gotten firm quotes.

My guess is that the future of cloud storage will boil down to
1)A commodity market of S3 bucket-style storage for discrete files.
2)Big shared NAS volumes for systems that require locally modified filesystems. These will get cheaper and eventually go open source.

Until then, you can use this list to find the storage that is right for you.


2 Comments Click here to read comments

Cloud - what it means, and why it is important for software operations

Posted by Andy Singleton on Wed, Feb 10, 2010
  
  

You might think that the word "Cloud" is tired because it gets used to promote any kind of Internet application and service.  I've been using diagrams that show the Internet as a cloud with connections sticking out of it, ever since it took its modern form almost 20 years ago.  So, what's fresh about this idea?  Why do I find it so exciting?  Why am I going all-in for cloud computing?

The vague concept has been around for decades, and it is still a doozy. A "Cloud" is a bag of stuff that is out there, in no particular place, but you can reach into it whenever you need it, or as they say "on demand".  You can apply this to servers, to apps - features and capabilities - or as we do at Assembla, to team members that are there when you need them.  A cloud resource should be globally accessible, and hosted, so you don't have to worry about how to get to it or maintain it.  The idea that we can rub our laptop and summon out of the smoke a powerful genie, a huge bag of resources, is emotionally intense.  That's why "Cloud" is such a great marketing term.

To me, "cloud" has a more specific meaning.  A "cloud" is a bag of virtual servers that you can get on demand.  This is a relatively new idea.  After a long and obscure gestation as "grid computing" and "utility computing", Amazon Web services created the mass market just a few years ago.

In my experience, this way of selling servers (virtual servers, on demand) has some important benefits for those who build and manage online applications.

Benefit 1) Increased speed of development and product delivery

If you can get a server quickly, you can test new software and new configurations quickly, and you can deploy them quickly.

Benefit 2) Increased reliability

When we moved from dedicated servers to cloud servers (currently with Amazon EC2), we found that our reliability and uptime increased.  On the dedicated host, we had runs of bad luck - a disk failing and then another, a cable "accidentally" disconnected, and even an incident where a power room blew up and took out 9000 servers.  That's life. Stuff happens.  We lost two Amazon servers this month for unexplained reasons.  But, in a dedicated environment, it takes a lot longer to recover.  Let's look at the ways that on-demand virtual hosting can increase reliability.

If you can fix and improve your app rapidly, then it will be more reliable than an application which takes a long time to fix.  So, increased reliability is an important side effect of benefit 1- increased development speed.

Furthermore, if you have access to on-demand servers, you have access to a big pool of failover devices.

Plus, you have access to servers for scaling.  If your app scales to handle load, rather than crashing or clogging, it is more reliable.

Finally, you have a lot more practice configuring replacement servers, so they are more likely to work when you need them.  In the old days, you didn't build your productions servers very often, and it was a big deal to build a new one.  Often, it took a lot of tweaking, and some luck.  That's scary if you need one in a hurry.  In the virtual hosting environment, you have two choices that eliminate the risk.  You can save an image, and just restore it when you need it, the same as the last image that you restored.  Or, you can make a script that builds the server.  You have access to servers on demand, so you can practice running the script whenever you want, and get it working correctly.

NOTES

Are cloud servers cheaper or more expensive?  That really depends on what you are using them for.  If you have a stable production installation, then dedicated servers are often cheaper, especially on a per-server basis. However, you need to buy almost twice as many dedicated servers as on-demand virtual servers, because you need a failover device for everything.  In the cloud environment, you need far fewer failover devices, because you can start new devices when you need them.

Not all vendors sell their servers "on demand", but they know it's important.  They are getting better at it, and I think we will see a lot more choices in this market.


1 Comments Click here to read comments

Stormy weather? Who is monitoring the cloud?

Posted by Andy Singleton on Tue, Dec 29, 2009
  
  

Someone needs to monitor the cloud services that we depend on.

At Assembla, we have been building complex systems with cloud servers.  In one recent case, we linked together voice servers, a Web application cluster (Web servers, app servers, database servers, message processors), and a secure credit card processing cage, all in different locations.  The advantages of building a system on cloud services are huge.  Such systems can be developed and scaled with unprecedented speed, and limited capital cost.  However, they are dependent on the reliability of the underlying systems.

As Web users, we expect that servers will be up 24x7x365.  I certainly do.  As vendors, we have to swing into action instantly if there is any downtime.

That is why some of us got up last night to fix a problem with a client's system (not assembla.com).  The network storage devices at one of our cloud hosts had become un-mounted.  We have seen regular slowdowns or outages in this particular storage service.  The Amazon EC2 system that hosts Assembla.com is more stable, but only a week ago we lost a bunch of virtual servers (a condition which Amazon warns us to expect).

We can jump into action with failover systems or workarounds for these problems.  But, being notified about the problems, or likely sources of problems, is critical to both putting in a workaround, and planning failover.  And, in last night's case, the vendor wasn't particularly helpful.  They didn't actually notice that they had a problem with their storage devices until we pointed it out, and they didn't warn us about the outages they would cause while fixing it.

So, someone needs to monitor the cloud services that we depend on.

This monitoring would cover a few needs:
* Customers need to be alerted when there is a problem with the cloud services that they use.
* Customers want historical data to see the maturity and reliability of services they are considering
* Vendors need current quality metrics and trends, alerts, and comparative metrics.

As services mature, monitoring is less critical.  However, hosting companies are constantly adding new services - new types of servers, storage, database, message queues, content distribution, higher level apps, etc., so there are a lot of potential problems to monitor.

Is this service available?  Is anyone interested in going in on a monitoring system for cloud services, a sort of weather report?

2 Comments Click here to read comments

Will Microsoft support distributed teams and cloud deployments?

Posted by Andy Singleton on Thu, Aug 13, 2009
  
  

Microsoft offers a nice .net technology with great developer tools.  And, they are working on cloud deployment with Azure.  We do some work with these systems, and I'd like to do more work with these systems with our distributed teams.  We are also looking at offering .net frameworks in the catalog, automated Azure builds, visual studio integration for our tools, etc.  However, I feel deterred by licensing issues, and I'm not the only one.

This article shows some of the issues.  It is appropriately labeled TFS licensing model demystification or what should I buy for my company in order not to step on the licensing mine?

Microsoft does a few things to support distributed teams and cloud deployments.  They offer "express" (free) versions of key software tools for developers.  Their Team Foundation Server product includes many of the features of the Assembla system for distributed teams.  They have licensed Windows and SQL Server for monthly rental at a number of hosting companies.

But, the landmines remain.  There are actually two licensing problems.  The first concerns developer licenses.  If you are working with a distributed team, it's hard to be sure that you can comply with the licensing requirements, because license are for individual developers, and the developers change.  How do you license your trial developers?  If you are working with someone part time on maintenance, when do they get taken off your list of licensed developers?

The second concerns server licenses.  If you are building online services, or cloud services, you end up with a LOT of servers.  You need server sets for development, staging, production, and production failover.  I you are smart, you design your system to accomodate redundant production servers.  You have local development systems.  Server proliferation has been a fact of life for every online service that I have worked on, and it becomes especially important as you work to increase release frequency and reliability.  AND, it's a requirement for cloud systems that you be able to image the servers and bring back multiple copies.

In this scenario, is it even possible to comply with Microsoft's server-based licenses?  Even if you could afford licenses for all of your copies of servers, you still won't be allowed to use the cloud images in many cases.

I'm going to send this post to my friends at Microsoft and hopefully collect some comments and guidelines.

0 Comments Click here to read comments

Cloudcamp Boston, PaaS and IaaS, and Unconferences

Posted by Andy Singleton on Sun, Aug 02, 2009
  
  
Tags: 

On Wednesday I stopped by Cloudcamp Boston and got straightened out on vocabulary by John Treadway and Judith Hurwitz.  They gave a nice introduction to cloud computing where they defined three categories of service providers:

  • Software as a Service  (Saas) which is applications like Assembla.
  • Platform as a Service (PaaS) which I previously described as guys who sell you clusters of services, like Google, Microsoft, and Salesforce.
  • Infrastructure as a Service (the unfortunate IaaS), which I previously described as guys who sell you virtual servers.

This still leaves open the big question about PaaS providers:  Can you trust them?  What if they change their business model and price you out, remove key services, or decide to compete with you? There is a lot of lock-in, and no history of second sourcing.  Maybe we should talk about classic second-sourcing providers for these platforms.

Cloudcamp Boston was billed as an Unconference, but it had some formatting problems.  I like unconferences.  I like the lack of commercial pressure, the chance to learn from peers and have discussions, and the chance to try out my own new material in an informal way with an informed audience.  In other words, I like the breakout sessions.

Cloudcamp Boston started with 45 minutes of "lightening talks", which turned out to be 5 minute advertisements from sponsors like Micrososft, Rackspace, EMC, and Rightscale.  I like those guys, and I'm a potential customer for most of them, but I would prefer not to have them in that slot.  After that came an "unpanel", which involved pulling people out of the audience to answer random questions from the same audience.  This was a unfortunate because 300 people had to sit through a generalized discussion that would work better for 20.  Many left. By the end, I had to head out to give a presentation elsewhere, and I missed the breakout sessions.  But, by that time there was only time for 2 sessions - hardly enough chances to share ideas with 300 informed people.  Next time, they could skip the lightening advertisements, and the unpanel, and get 2 more sessions in.

0 Comments Click here to read comments

All Posts | Next Page

Follow Assembla

twitter facebook youtube linkedin googleplus

Get Started

blog CTA button

Subscribe by Email

Your email: