After a massive weight loss, our database feels much better and our operations team is happier.
Until recently, Assembla supported 2.3 million wiki pages in our main database. This required about 30GB of data, prevented the database from fitting in the memory of the database server, and frequently affected the performance of all database transactions.
The arrangement was also extremely wasteful, because every page was included in full, even if it had only minor differences from a previous version. If a user changed a one character on a 1,000-byte page, the revised version would also require 1,000 bytes of storage, for a total of 2,000 bytes.
We realized that migrating the wiki pages to Git repositories would free up the 30GB in the database and speed up performance.
Since we already have an established process for communicating with remote Git repositories from our web app using the BERT-RPC (http://bert-rpc.org), we decided to see if that would be a good way to carry out the migration.
However, initial results were not promising. The process was very slow, because every Ruby worker is single threaded. Also, every commit to a bare Git repository, and every blob retrieved from a repository, required about 5 calls to a Git binary, which slowed performance even more.
Git-ting it right
Our revised plan was to write our own RPC server using the Go programming language (http://golang.org), Git to go (https://github.com/libgit2/git2go) and libgit2 (http://libgit2.github.com). We also used MessagePack (http://msgpack.org) serialization format and its RPC capabilities.
With this RPC server on our development machine, the Go daemon was able to handle thousands of requests per second using a single OS thread (the default value for GOMAXPROCS). Increasing the value of GOMAXPROCS increased throughput even more.
The migration process
We started by migrating the biggest wikis, some of which had hundreds of thousands of versions of the same page, and which totaled gigabytes of information. After running “git gc” compression, the biggest wiki was down to 98 MB.
However, InnoDB doesn’t release space after content is deleted. Therefore we did a complete database reload by dumping the database, removing the InnoDB data files, then restoring everything back. Our operations team was able to do this without any downtime.
Healthier and happier
After this weight loss program our database is healthier and the Assembla operations team is happier. The new RPC Go daemon is a success, the database is no longer a resource hog, CPU and RAM consumption are much better, the team receives much less alerts than before, and our customers have fast access to their wiki pages.