Time for Genetic Programming?

Posted by Andy Singleton on June 23, 2008 13:50:00 PM

The announcement of a petaflop computer has resurrected my dream of computers that write their own software and come up with their own ideas.  That would take us a long way toward delivering our promise of "accelerating software development", with a completely different mechanism than the organizational mechanism I am using now. I've written before about ways that computer-generated code could be woven into a rapid application development process.


15 years ago I worked on genetic programming – the evolution of computer programs. However, I ran into a severe shortage of computing power, eventually filling a rented house with racks of single board computers (top of the line, 40 megahertz 80486) in clusters.  Genetic programming requires vast amounts of computing power to generate and simulate millions and millions of functional cases, and the computing power of my homemade cluster was pathetic by the standard of today's desktops. Genetic programming is ideally suited to make use of parallel computing, and it can make good use of almost any topology – clustered, or MIMD, or SIMD/vector. The IBM monster is a useful combination of these architectures - Cell vector processors out of the Playstation combined with independent Opteron processors out of a PC.


If we can't get useful results from these new computers, we're doing something wrong. And, in fact, we are doing something wrong. We fundamentally don't understand evolution. The evidence is in that increases in programming capability of 1000 or more over previous systems have not significantly increased the rate of new discoveries. There seems to be a limit on the complexity of the evolved artifacts, and with the faster computers, we are just hitting those complexity limits faster.


There is an "existence proof" for the effectiveness of evolutionary methods.  DNA-based life forms are amazingly complex.  A human genome has at least 30,000 genes.  The most complex genetic programming artifacts have only about 100 active elements.


What will get us over this hurdle?  I have some ideas, a three point plan.

Cooperative mechanisms 

The high-school depiction of evolution as a survival-of-the-fittest competition is crude and limiting.  In the real world, organisms and genes co-evolve and cooperate in expanding systems.  We still haven't fully absorbed the lessons of Lynn Margulis and others studying symbiosis and cooperation.

Genome structure

If you try genetic programming, you will find that running the evolutionary algorithm has an immediate and powerful effect on the genomes - the code fragments that you are evolving.  The first thing that they do is expand and add redundancy - something like the introns that make up most of an animal genome.  This change in the genotype moderates the effect of mutation operations, without changing the phenotype - the output - and it's crucial for making the process run smoothly.  So, we see evidence that the structure of the genome is a hugely important player in the game of evolution.  It's no accident that our animal genomes contain chromosomes which cooperate and co-evolve, or that they contain lots of "junk" DNA.  These structural features are what make complexity possible.  If we understood why the genomes are set up this way (we don't, mostly don't think about it), we could make much better genetic programs.

Discovery with Meta-Genetic Algorithms

We have questions cooperating systems and appropriate genomic  structures. Now that we have big computers, we can answer these questions with a meta-genetic algorithm machine.  This machine will generate new genomic structures and population parameters (new types of Genetic Programming), and run the lower level genetic programming runs to see the results for each set of parameters.  This is exactly the kind of machine that I built to run on my primitive rack clusters.  In one run, it could answer questions a the level of PhD thesis.  For example, the Meta-GA very quickly identified the programming operations and inputs that would be most effective.  However, it had to run for days because it would run thousands of runs with the lower level GP to find the higher level parameters.  Hence the problem with computing power.


With the more powerful machines, we can resurrect the meta-evolution layer, look at a range of genome structures, and find a way to harness cooperative evolution.  Did anyone follow that?


Get started today with a 14–day FREE trial.

No obligations, no credit card required.

Get Started Now

About the author

Andy SingletonWorking on Continuous Agile and Accelerating Innovation, Assembla CEO and startup founder

Get updates about development, productivity, and teamwork