Current Articles | RSS Feed RSS Feed

Time for Genetic Programming?

Posted by Andy Singleton on Mon, Jun 23, 2008 @ 12:50 PM
 

The announcement of a petaflop computer has resurrected my dream of computers that write their own software and come up with their own ideas.  That would take us a long way toward delivering our promise of "accelerating software development", with a completely different mechanism than the organizational mechanism I am using now. I've written before about ways that computer-generated code could be woven into a rapid application development process.

 

15 years ago I worked on genetic programming – the evolution of computer programs. However, I ran into a severe shortage of computing power, eventually filling a rented house with racks of single board computers (top of the line, 40 megahertz 80486) in clusters.  Genetic programming requires vast amounts of computing power to generate and simulate millions and millions of functional cases, and the computing power of my homemade cluster was pathetic by the standard of today's desktops. Genetic programming is ideally suited to make use of parallel computing, and it can make good use of almost any topology – clustered, or MIMD, or SIMD/vector. The IBM monster is a useful combination of these architectures - Cell vector processors out of the Playstation combined with independent Opteron processors out of a PC.

 

If we can't get useful results from these new computers, we're doing something wrong. And, in fact, we are doing something wrong. We fundamentally don't understand evolution. The evidence is in that increases in programming capability of 1000 or more over previous systems have not significantly increased the rate of new discoveries. There seems to be a limit on the complexity of the evolved artifacts, and with the faster computers, we are just hitting those complexity limits faster.

 

There is an "existence proof" for the effectiveness of evolutionary methods.  DNA-based life forms are amazingly complex.  A human genome has at least 30,000 genes.  The most complex genetic programming artifacts have only about 100 active elements.

 

What will get us over this hurdle?  I have some ideas, a three point plan.

Cooperative mechanisms 

The high-school depiction of evolution as a survival-of-the-fittest competition is crude and limiting.  In the real world, organisms and genes co-evolve and cooperate in expanding systems.  We still haven't fully absorbed the lessons of Lynn Margulis and others studying symbiosis and cooperation.

Genome structure

If you try genetic programming, you will find that running the evolutionary algorithm has an immediate and powerful effect on the genomes - the code fragments that you are evolving.  The first thing that they do is expand and add redundancy - something like the introns that make up most of an animal genome.  This change in the genotype moderates the effect of mutation operations, without changing the phenotype - the output - and it's crucial for making the process run smoothly.  So, we see evidence that the structure of the genome is a hugely important player in the game of evolution.  It's no accident that our animal genomes contain chromosomes which cooperate and co-evolve, or that they contain lots of "junk" DNA.  These structural features are what make complexity possible.  If we understood why the genomes are set up this way (we don't, mostly don't think about it), we could make much better genetic programs.

Discovery with Meta-Genetic Algorithms

We have questions cooperating systems and appropriate genomic  structures. Now that we have big computers, we can answer these questions with a meta-genetic algorithm machine.  This machine will generate new genomic structures and population parameters (new types of Genetic Programming), and run the lower level genetic programming runs to see the results for each set of parameters.  This is exactly the kind of machine that I built to run on my primitive rack clusters.  In one run, it could answer questions a the level of PhD thesis.  For example, the Meta-GA very quickly identified the programming operations and inputs that would be most effective.  However, it had to run for days because it would run thousands of runs with the lower level GP to find the higher level parameters.  Hence the problem with computing power.

 

With the more powerful machines, we can resurrect the meta-evolution layer, look at a range of genome structures, and find a way to harness cooperative evolution.  Did anyone follow that?

 

Tags: 

COMMENTS

Another post with more specifics would definitely be appreciated. I'm completely ignorant about genetic programming but am curious about the possibilities. Examples ranging from very simple to complex would be helpful for those that are uninitiated.

posted @ Monday, June 23, 2008 11:10 PM by Sean


I third the request for more details. This sounds interesting...

posted @ Tuesday, June 24, 2008 1:57 AM by Bob


We've gone a long way, cramming a house full of servers into a gaming console.
It's fascinating to draw the similarities between biology and self-evolving software, though it seems that life forms come in features not necessary desired in software -- redundancy, "junk DNA", etc. Academic curiosity aside, would it really be practical to petaflop a rand() function into a working software?

posted @ Tuesday, June 24, 2008 2:29 AM by Tony @ compsci.ca


I just hope the religion and war parameters are set to zero, or they may end up producing viruses and MCPs (Tron reference for ya) and never have any real progress, since the MCPs will stop it from exploring any "morally objectionable" routes, and take control of the viruses to wipe the entire project out.

posted @ Wednesday, July 23, 2008 7:19 AM by Chaz


Hi Andy, 
 
A very interesting article. A couple of comments.  
 
I am not sure whether the rate of new discoveries via artificial evolution has or hasn't changed. As a factoid, since 2004 we have run a yearly competition for human competitive results produced by evolutionary computation at the ACM's Genetic and Evolution Computation Conference (GECCO), and every year we have judged a number of results to be human-competitive. In may cases this meant the results are effectively new inventions/discoveries automatically produced by evolution in the computer. But of course as we adjust to the fact that machines can do this sort of thing, we will probably higher the bar and expect more (the famous horizon problem: as you walk to wards it, it moves....). 
 
A second point: yes, people have focused more and more in recent years on the idea of using evolutionary algorithms to either adapt the parameters of other algorithms or even invent from scratch new search algorithms. With my collaborators/students I myself I've been involved in using genetic programming to invent new particle swarm optimisers, new bin packers, new TSP solvers and new SAT solvers. With the increased power of computers, these things become easier and easier, and the results produced by these automatically invented algorithms are virtually always superior to human designed algorithms. So, the outlook is good. 
 
Another point: yes, we do have a limited understanding of evolution, but there is theory out there which attempts to explain what goes on in evolutionary systems. I've spent over 10 years studying GP from a theoretical point of view, and I can tell you that we now have exact mathematical models of happens inside a GP system. The study of these models is very difficult because evolutionary algorithms (including GP) have zillions of degrees of freedom these lead to systems of zillions of non-linear equations. Nonetheless, we have made substantial progress in analysing such mathematical models.  
 
One case, where there has been important recent results, is the case you mention: the growth in "junk" code in GP. This is also known as "bloat". The phenomenon has been studied for over a decade, with lots and lots of (qualitative or semi-quantitative) theories having being proposed, but none ever emerging as a winner. One theory, which I think you allude to in your article, is that actually junk code is good. It certainly modulates the effect of the genetic operators (mutation and crossover), with the more the bloat the less disruptive the operators. However, it has now emerged very strongly from the exact theoretical models I mentioned above, that actually bloat is largely due to the interaction between selection and a particular bias of the GP crossover operator. This operator tends to produce a very large proportion of very short programs including only a handful of instructions. In most problems of practical interest, such programs cannot solve the problem to any acceptable degree and are therefore given a low fitness. As a result, during selection they tend not to be picked, which indirectly favours the selection of longer programs. This in turn, generation after generation, leads to the increase in the average program length (i.e., bloat).  
 
So, progress is being made in GP both at the level of solving interesting problems (including producing new inventions) and at the level of understanding what goes on. There are also a huge variety of practical techniques to improve GP or speed it up. 
 
Myself (Riccardo Poli), Bill Langdon and Nic McPhee have produced a recent survey of the field that some people might find useful. This is a book entitled A Field Guide to Genetic Programming. The book is freely available on the internet as a PDF or can be purchased at cost from Lulu.com or Amazon. The book is Creative Commons licenced.  
 
For more information, have a look at the book's web site: 
 
www.gp-field-guide.org.uk 
 
 

posted @ Wednesday, September 03, 2008 5:12 AM by Riccardo Poli


Andy, you talk about human genome and other. I'm like you, also curious in AI and looking ways. You know, AI research stalled since 60s.. And the cause is NOT computer power. The cause is the choosen method itself. Not exa-, not zetta-, not yotta-flops computing is not capable to solve artificial intilligence problems. Why? Because instead of determined (finite) logic there possibility to transcedental "computing" (or more correct - solving, organizing) processes. And about genome. You, as computer specialist should be able to make estimation about amount of information to describe human body construction. Its exa, or zettabytes, of not more. And about evolution and genetic processes. There is NO any live evolution process. Darwin evolution is only scientific fiction. We like it much, only because we don't like conception someone created us. So, we creating genetic algorithms. We creating powerful binary computing machines. But we should know, we should understand - this way IS NOT FOR AI. 
Thats all.

posted @ Wednesday, October 01, 2008 8:49 AM by asm


Andy, all I'm saying - human-like AI is impossible with current computing. Impossible because no one is not understanding what is process of thinking. 
Before to make something - you should understand what you make. Right? 
 
What is intuition? What is dreams? Believe, I'm not retorgrade, some years ago I was in strong belief in theory of evolution, theory of relativity and in belief two masses attracts to each other (until tested it with great accuracy) :) Even monthes ago I working on dreams problem - I thought dreams is a neural network relaxation for sorting knowledge. No, all above is not true All is mistake. Working on dreams revealed so impossible things so I'm abandoned any ideas.  
As to Darwin's Theory of evolution - You may read Forbidden Archeology by Michael A. Cremo and Richard L. Thompson 
Despite some idiots calls it "pseudo-scientific", it not. Its only hard facts collection. Only facts, and so much such facts... 
And standard question for you. How elefant trunk became elongated? When it short - it useless.. Elefant trunk is still simple case, what about skate electricity or eel? Can you imagine such long existing conditions to produce weak and therefore absolutely useless electrical battery? Read about Dicrocoelium dendriticum Life cycle http://en.wikipedia.org/wiki/Dicrocoelium_dendriticum and try to clue how this chain can be produced, if ant, trematoda and cattle is on different evolution ways/stages? Why ants, flies and spiders stopped evolution millions years ago? Why, if brain is so much effective instrument - only human have it advanced? 
There no evolution. Mutations yes. But not evolution.

posted @ Wednesday, October 01, 2008 4:30 PM by Asm


ASM, you are utterly and completely wrong, you keep listening to those kooky religious leaders who know nothing about Science, and I'll go on believing the Scientists who actually work in the fields you mentioned every day. Please keep your book of fears and ignorance to yourself. The real test is how Science is learning new things every day, while your book has been frozen solid for more than 2,000 years. People who believed the Earth was flat and stood still as the Sun circled it (for less than 10,000 years, no less) have no business trying to tell us how the world works. Keep it to yourself, and you can go on believing any crazy thing you want, just don't try to play real Science with it, or make laws based upon it.

posted @ Thursday, October 02, 2008 11:44 PM by Paradox


Hi, Andy -- slightly off topic here -- 
 
You and I talked a few times around 15 years ago when we both shared an interest in GP and the Koza book had just come out. I wrote a small C++ GP engine that I shared with you at the time. 
 
I haven't done a thing with genetic algorithms since then, nor am I very moved to do so at the moment -- so this is really mostly a "hello from the past" message, having found you because one of my projects happens to be using Assembla and I was surprised to see something about GP on the home page! Now that I know what you're up to, I'll be following developments here. 
 
To go very briefly back on topic :) -- I think it's inevitable that hardware evolution will feed a software evolution in studying, well, evolution. I am mostly interested in software problems that exercise my ability to distill human judgments about human activities, but that is probably a small subspace of all the things machines can do. So the future will be (as it always has been) interesting, and I decline to make any philosophical statements about what is or isn't possible. 
 
...j

posted @ Saturday, October 04, 2008 11:54 AM by Joe Berkovitz


> ASM, you are utterly and completely wrong, you keep listening to those kooky religious 
posted @ Thursday, October 02, 2008 11:44 PM by Paradox  
 
Hey, Paradox. I not here to bring you knowledge. To be enlightenmened - its each own task. Simply, you not right. Here not any "religious". Do not intepret. Its only your dumbass interpretation of my words. 
I only told there is NO Darwin evolution and natural selection. You hear me?  
There some thing called "LOGIC". 
If there exists one theory and much hard facts contradicts this theory - a right, logical action - to say this theory is incorrect.  
 
Its easiest to say others is dumb and read outdated books. But turn on your intellect and start to thinking. You can't drop logic. Darwin evolution contradicts logic so much, that if you start to read at last, you may be know - Darwin, at his last life days had great doubts about his own theory... Why? Because it illogical..

posted @ Tuesday, October 07, 2008 11:37 AM by asm


Evolution is a fact. 
But, don't confuse Evolution with the theory of evolution. 
Darwinian evolution is the best interpretation of the facts we have, it has been put to the test and proven right over and over, without it, we wouldn't have many of the life saving medicines we have. evolution is the basis of all biological research today. Darwin never doubted his theory, he was afraid at one time of releasing it because he knew religious nuts would lose their freaking minds over it.

posted @ Tuesday, October 07, 2008 3:54 PM by Paradox


You should take a look at a book titled 'The Plausibility of Life' for some ideas on the evolution of evolvability. 
 
Another way of looking at it is how genomes become resilient (such that mutations and recombinations are still very likely to produce a viable organism, if perhaps one less fit), rather than brittle.

posted @ Wednesday, October 22, 2008 8:28 PM by Michael R. Bernstein


Andy, introns and epigenetics is the next big thing in Genetics. We have cracked the gene code of the computer (binary) and the gene code of organisms (ATCG) - that was the easy part. Now, it's the epigenetics that needs to be investigated - what is the driving force. Crack that and you might create a computer program that evolves. Hey, those entrons aren't just the work-horses of the genome either. Don't count out those introns. We are finding out they they are not just junk - but major contributors to our functions! 
 
 
 
The computer world is so focused on speed and capacity. Note - we started as a single cell until we got better at it. Maybe to evolve the PC - we need to think small again. 
 

posted @ Friday, June 26, 2009 9:29 AM by Yvonne Chekaluk


It is tough even for geneticists to get through those types of sequences. Just like Lynn's theory of symbiosis, the worlds of Computer Science and Genetics are merging. UConn hosted a conference "Next-Generation Sequencing" that spoke about similar issues for programmers dealing with next-gen sequencing programs. 
 
http://www.regonline.com/builder/site/tab1.aspx?EventID=696366 
 
 
 
Dr. David Clements, NESCent 
 
“Seeing the forest and the trees: visualizing next generation sequence data” gave an excellent presentation!  
 
 
 
Without the computer - the Human Genome project would have never been posible. Maybe Assembla will lead the way - the Darwin PC! 
 
 
 

posted @ Friday, June 26, 2009 10:33 AM by Yvonne Chekaluk


Hi Andy, I've had the thought that evolutionary algorithms were a very simplistic imitation of the real thing for a long time. One of the things that I believe should be incorporated for improvements is evo-devo. That "junk" DNA that you were talking about isn't always junk. If you plant a seed that you bought from the store, then collect the seeds from that plant and plant those next year, that plant will be better adapted to the conditions of your soil and your climate. Much of this is the result of interactions between the genes of the individual plant and the environment. I believe that similar things happen with feral pigs (and just about any other critter) as well. I think that we need a mechanism that turns "genes" "on" and "off" as a reaction to "environmental" factors to achieve greater complexity.

posted @ Friday, December 11, 2009 12:36 PM by Jesse Johnson


There have been a few developments in the last 15 years such as Gene Expression Programming which uses a symbolic string as the genotype which is then 'expressed' into the phenotype abstract syntax tree for evaluation. The advantages of this approach are that it is considerably easier to manipulate a string of symbols rather than the AST directly but also that the genotype contains redundancy which allows facets to survive through selections and then re-emerge later. Hardware is also considerably faster. For example I evolve trading rules from 10 years worth of historic data and probably do around a billion evaluations a minute on my quad core i5. This could obviously be scaled up massively relatively cheaply. This sort of throughput allows more complex things to be attempted like automatically defined functions (ADF's) and more complex data types e.g. collections and distributions. It also means that fitness functions can be considerably more complex and simulate adverse environmental factors like market crashes, tobin tax - perhaps parallels for disease or predation. There is no doubt that GP has a great future.

posted @ Wednesday, June 23, 2010 6:14 AM by Andrew


Comments have been closed for this article.

Follow Assembla

twitter facebook youtube linkedin googleplus

Subscribe by Email

Your email: