Current Articles | RSS Feed RSS Feed

Time for Genetic Programming?

Posted by Andy Singleton on Mon, Jun 23, 2008 @ 12:50 PM
Digg digg it | Reddit reddit | del.icio.us del.icio.us | StumbleUpon StumbleUpon 

The announcement of a petaflop computer has resurrected my dream of computers that write their own software and come up with their own ideas.  That would take us a long way toward delivering our promise of "accelerating software development", with a completely different mechanism than the organizational mechanism I am using now. I've written before about ways that computer-generated code could be woven into a rapid application development process.

 

15 years ago I worked on genetic programming – the evolution of computer programs. However, I ran into a severe shortage of computing power, eventually filling a rented house with racks of single board computers (top of the line, 40 megahertz 80486) in clusters.  Genetic programming requires vast amounts of computing power to generate and simulate millions and millions of functional cases, and the computing power of my homemade cluster was pathetic by the standard of today's desktops. Genetic programming is ideally suited to make use of parallel computing, and it can make good use of almost any topology – clustered, or MIMD, or SIMD/vector. The IBM monster is a useful combination of these architectures - Cell vector processors out of the Playstation combined with independent Opteron processors out of a PC.

 

If we can't get useful results from these new computers, we're doing something wrong. And, in fact, we are doing something wrong. We fundamentally don't understand evolution. The evidence is in that increases in programming capability of 1000 or more over previous systems have not significantly increased the rate of new discoveries. There seems to be a limit on the complexity of the evolved artifacts, and with the faster computers, we are just hitting those complexity limits faster.

 

There is an "existence proof" for the effectiveness of evolutionary methods.  DNA-based life forms are amazingly complex.  A human genome has at least 30,000 genes.  The most complex genetic programming artifacts have only about 100 active elements.

 

What will get us over this hurdle?  I have some ideas, a three point plan.

Cooperative mechanisms 

The high-school depiction of evolution as a survival-of-the-fittest competition is crude and limiting.  In the real world, organisms and genes co-evolve and cooperate in expanding systems.  We still haven't fully absorbed the lessons of Lynn Margulis and others studying symbiosis and cooperation.

Genome structure

If you try genetic programming, you will find that running the evolutionary algorithm has an immediate and powerful effect on the genomes - the code fragments that you are evolving.  The first thing that they do is expand and add redundancy - something like the introns that make up most of an animal genome.  This change in the genotype moderates the effect of mutation operations, without changing the phenotype - the output - and it's crucial for making the process run smoothly.  So, we see evidence that the structure of the genome is a hugely important player in the game of evolution.  It's no accident that our animal genomes contain chromosomes which cooperate and co-evolve, or that they contain lots of "junk" DNA.  These structural features are what make complexity possible.  If we understood why the genomes are set up this way (we don't, mostly don't think about it), we could make much better genetic programs.

Discovery with Meta-Genetic Algorithms

We have questions cooperating systems and appropriate genomic  structures. Now that we have big computers, we can answer these questions with a meta-genetic algorithm machine.  This machine will generate new genomic structures and population parameters (new types of Genetic Programming), and run the lower level genetic programming runs to see the results for each set of parameters.  This is exactly the kind of machine that I built to run on my primitive rack clusters.  In one run, it could answer questions a the level of PhD thesis.  For example, the Meta-GA very quickly identified the programming operations and inputs that would be most effective.  However, it had to run for days because it would run thousands of runs with the lower level GP to find the higher level parameters.  Hence the problem with computing power.

 

With the more powerful machines, we can resurrect the meta-evolution layer, look at a range of genome structures, and find a way to harness cooperative evolution.  Did anyone follow that?

 

Tags: 

COMMENTS

What genetic programming program takes a long time to run? I remember back in the day of 486 PCs, my programs used to give good answers in a few minutes, hours, or a couple days max for hard problems. Another example is Tierra, the evolution environment, maybe a slightly different animal. It ran through most of its often-repeated scenarios (parasites, resistance, super-parasites, cannibals, etc.) in a few hours.
What kind of problem are you working on? Would be curious about specifics.

posted @ Monday, June 23, 2008 10:46 PM by Mark


Another post with more specifics would definitely be appreciated. I'm completely ignorant about genetic programming but am curious about the possibilities. Examples ranging from very simple to complex would be helpful for those that are uninitiated.

posted @ Monday, June 23, 2008 11:10 PM by Sean


I third the request for more details. This sounds interesting...

posted @ Tuesday, June 24, 2008 1:57 AM by Bob


We've gone a long way, cramming a house full of servers into a gaming console.
It's fascinating to draw the similarities between biology and self-evolving software, though it seems that life forms come in features not necessary desired in software -- redundancy, "junk DNA", etc. Academic curiosity aside, would it really be practical to petaflop a rand() function into a working software?

posted @ Tuesday, June 24, 2008 2:29 AM by Tony @ compsci.ca


Tony, evolvable apps do not have random behavior. They have evolved behavior that is finely tuned. But, they are "soft" - that is, unpredictable. And, they will require redundancy and junk DNA. I wrote about the differences here:
http://blog.assembla.com/assemblablog/tabid/12618/bid/2470/Soft-Landing-for-Evolvable-Apps.aspx
I can't provide a review of genetic programming here, but I can provide some reference links:
Wikipedia: http://en.wikipedia.org/wiki/Genetic_programming
John Koza's GP site:
http://www.genetic-programming.com/

posted @ Tuesday, June 24, 2008 9:32 AM by Andy Singleton


Mark, I think the basic problem with GP as usually practiced is that you do get most of the behavior that you are going to get relatively quickly. Ideally, as you could run indefinitely and get ever more interesting behavior. That's not happening .. yet.
GP is useful for problems that are hard for human programmers - machine vision, hearing, motion. Getting good results in these applications requires huge computing power because ultimately the processing is complex, and the data sets to be processed are large.
I was using it to evolve financial training rules. I actually had a problem that was opposite to the problem of sophistication. I wanted rules that were very simple, so they would not overlearn, and would be applicable outside the historical training set. I used the Meta-GA to find parameters that would constrain learning, and select inputs with real information content, and give results outside the training set. So, my issue was handling large training data sets, and running the Meta-GA, which consumed days or weeks of compute time even if the individual runs were relatively simple.

posted @ Tuesday, June 24, 2008 9:42 AM by Andy Singleton


I forgot to post some of my own links from way, way back in 1993. Here's an article from Byte magazine that gives you GPQuick- a fast execution GP system written in C++.
http://www.byte.com/art/9402/sec10/art1.htm
Here's the code. Amazingly, someone scooped it up and still serves it on the Internet:
http://ots.fh-brandenburg.de/downloads/gpquick/

posted @ Tuesday, June 24, 2008 10:08 AM by Andy Singleton


I just hope the religion and war parameters are set to zero, or they may end up producing viruses and MCPs (Tron reference for ya) and never have any real progress, since the MCPs will stop it from exploring any "morally objectionable" routes, and take control of the viruses to wipe the entire project out.

posted @ Wednesday, July 23, 2008 7:19 AM by Chaz


Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics

Receive email when someone replies.

Blog Navigator

Navigate By : 
[Article Index]

Subscribe by Email

Your email:

About This Blog

Accelerating Software Development with Agile, open-source style processes, distributed teams, on-demand teams, new product launches, Web 2.0 strategies, startups.  Author Andy Singleton builds new products fast.

About Us

Assembla offers services for building software with agile, distributed teams.