Improving reliability with Triple-Redundant repository servers
Posted by Andy Singleton on Mon, Aug 22, 2011 @ 03:35 PM
We have just moved all of our subversion repositories to dedicated servers in Atlanta. This will increase reliability and reduce the risk that data will ever be offline. Now, every subversion commit is replicated in real time with three levels of redundancy:
1) The new servers are internally redundant, with RAID 10 disks, battery backed cache RAID controllers, dual network cards, and dual power supplies.
2) Each server replicates to a local twin that can take over in the event of failure.
3) Each commit replicates to a disaster recover cluster at a different datacenter. We are using the existing servers at Amazon EC2 in Virginia for this disaster recovery.
The new servers are faster, and have 4 to 10 times the disk IO capacity.
In the coming weeks we will complete the move to dedicated servers for git repositories (which have a different replication architecture) and then the databse and application servers. They will all have internal redundancy, failover, and disaster recover.
We moved the subversion repositories first because they are very disk-intensive, and they have been affected by problems with Amazon EC2 storage during the past year, including the incident in April when some of them were offline for three days. The new architecture should eliminate this source of problems and give Assembla the best uptime available from a repository host.
We will continue to use Amazon EC2 for disaster recover, and for development servers. The Amazon on-demand servers will continue to help us rapidly develop new services. Our new Assembla dedicated hardware will provide fast and reliable services for production users.