By Paul Almond, 12 February 2003
Introduction
This article suggests that current methods of implementing Darwinian evolution on computers have severe limitations and proposes improvements in the way in which it is done.
It is suggested that the 'evolvability' associated with data that has some meaning is a characteristic of the particular interpretation which that gives that data its meaning, so that evolvability is mainly a characteristic of the interpretation rather than the data: some interpretations will lead to high evolvability and useful progress by evolution algorithms and some interpretations will lead to low evolvability, with little useful progress.
The idea of a hierarchical system of machines is used, each layer in this system acting as an interpreter for the next layer by controlling its state transitions. It is proposed that each layer of software is evolved successively to produce an interpreter with high 'evolvability' and that the utility of the particular data on a given layer should be evaluated, following mutation in an evolutionary process, by assessing the degree of progress observed in a large number of secondary evolution processes, each of which uses a different evaluation function, performed on the data in a next layer machine.
The proposed method involves evolving each layer to act as an interpreter for the layer above it and it is expected that the degree of sophistication with which each layer does this will increase as more layers are added, as the evolution process that generates each layer 'benefits' from the evolvability that has already been attained in lower layers.
Problems with Existing Genetic Methods
Existing genetic algorithm techniques have not generated software of substantial complexity.
It seems intuitively apparent to me that mutating symbols in a computer program is unlikely to result in a complex system in any reasonable time. However, it seems that mutation of the ACTG symbols in biological genetic codes can lead to evolution of complex systems. Why is there this difference between our computer systems and biological systems?
I suggest that the main issue here is one of interpretation. Any data that is mutated is effectively a sequence of symbols that is subjected to interpretation. In the case of a high level computer program the symbols are the computer program and it is subject to interpretation by the rules of the computer language being used (provided by a language interpreter or compiler). In the case of DNA the sequence of As, Cs, Ts and Gs is interpreted by the underlying biochemistry, which provides a 'machine architecture'.
The underlying architecture is needed to provide the sequence of symbols that is being mutated with its meaning. As an example, the sequence of symbols in the DNA that codes for construction of an organism only encodes for an organism when interpreted by the underlying biochemistry: without this it is just an arbitrary sequence of symbols.
I now make this suggestion:
The likelihood of a mutation on a sequence of symbols having a positive effect is determined by:
- The current complexity of the system described by the sequence of symbols.
- The architecture that is interpreting the sequence of symbols.
I suggest that this makes the interpretative architecture the most important influence on how quickly evolution of a sequence of symbols can progress and that most architectures devised by humans will only give any useful rate of evolution for systems of quite low complexity, the evolution process effectively grinding to a halt when a certain level of complexity is reached. (See figure 1)
We could perhaps imagine an interpretative architecture designed to allow rapid evolution of the sequence of symbols that it interprets, but it is difficult to imagine actually designing such a thing.
Evolution in biological systems, however, seems to produce complex systems. How does evolution manage to produce complex systems? I suggest that the answer lies in the biochemistry that interprets the DNA - it must be a particular interpretative architecture that allows evolvability.
If we need an interpretative architecture that allows evolvability of the data that it interprets then we seem to have a problem here - how does such an interpretative architecture get there in the first place? Clearly, it must have evolved. This may seem to be a paradox - how can we have an interpretative architecture that allows evolvability to evolve when we need it in the first place to allow evolution?
I suggest that the answer is that evolution is not a process that is totally allowed or prohibited by the interpretative architecture, but, instead, that different architectures allow different degrees of evolution and that the architecture effectively imposes an upper limit on the complexity of the systems that can be evolved with it. The interpretative architecture in biochemistry is less complex than some of the systems that use it to evolve, so can actually evolve within the context of a cruder interpretative architecture that does not allow as much evolvability.
Here is what I think has happened in biological evolution:
A multi-layered system of interpretative architectures has evolved.
The bottom level of this system is the basic laws of physics. At the start there is nothing to do the interpreting except the laws of physics. Systems evolve that are interpreted by these laws. The laws of physics have not been designed with evolvability in mind and the process of evolution is extremely slow, with an effective upper limit on the sophistication of the systems that it can produce in any reasonable time.
This is inadequate for producing anything like complex animals, or even bacteria, but this is not what evolves. Although complex systems are not available, what can evolve is a second layer of interpretative architecture that is slightly better at allowing evolution to occur with the data that it interprets than the laws of physics. Evolution of data that is interpreted by this second layer now proceeds more rapidly and has a higher upper limit on complexity than is allowed by evolution within the context of the basic laws of physics. It still does not have a very rapid rate of evolution, or a very high upper limit on complexity, because limits were imposed by the context of the laws of physics in which it evolved, but it does not have to be very good. All it needs to be is better for evolution than the laws of physics. It can now be used to evolve a third layer of interpretation that is too complex to have evolved within the context of the laws of physics. This third layer of interpretation can now be used to evolve a fourth layer of interpretation that is too complex to have evolved within the context of the laws of physics or the second interpretative layer.
This process continues, with successive layers of interpretative architecture evolving. Each time a new interpretative layer is added the rate of evolution and the upper limit on complexity both increase. The evolution process itself is evolving and highly complex organisms can ultimately be produced.
We need to use a process like this to evolve sophisticated computer software.
The Darwinian Evolution Method
The first requirement is for a viable Darwinian evolution process that can create sophisticated computer software adapted to an environment. This makes use of the concept of architecture machines.
Definition of an Architecture Machine
An architecture machine is a computer that acts on an array of symbols.
The behaviour of an architecture machine is determined by two things:
- Its own internal logic.
- The particular array of symbols presented to it.
An architecture machine has a 'read/write cursor' that it can move to any position in the array. It can read the symbol at the position of the read/write cursor and it can also write a symbol into the array at the position of the read/write cursor (overwriting any symbol that is already there). (See figure 2)
There are similarities with the Turing machine, but a Turing machine is actually a special case of an architecture machine. In particular, the internal logic of a Turing machine works in a very specific way, using an extremely limited number of states, and the read/write cursor of a Turing machine is limited to moving only one element left or right in the array in any one step. Any architecture is valid for an architecture machine and the read/write cursor can be moved from any position in the array to any other position, provided that the architecture machine's internal logic and inputs/outputs can do this.
Only a limited number of symbols are valid in the array that an architecture machine processes, and I define the number of such symbols as the base of the architecture machine. As an example, an architecture machine which acted on an array of binary digits would have a base of 2. An architecture machine which acted on an array containing the digits 0,1, 2 and 3 would have a base of 4.
Any computer can be used as an architecture machine, provided that it has an adequate number of inputs and outputs that can be interpreted as the required inputs and outputs of an architecture machine (i.e. to move the read/write cursor symbols).
An architecture machine is a computer, but an architecture machine and its symbol array together are also a computer. Particular elements in the symbol array can be interpreted as inputs and outputs. (see figure 3)
The idea that an architecture machine and the symbol array on which it operates is a computer is really important, regarding what we want to do, because it means that any architecture machine, together with a particular symbol array, can be used together to provide a second architecture machine. This means that if we have an architecture machine and its symbol array then we can interpret specific elements in the symbol array as being the inputs and outputs needed to use it as an architecture machine. (See figure 4)
This second architecture machine could act on a second symbol array and the resulting system could be used as a third architecture machine, which acts on a third symbol array, and so on.
A computer can perform a complex task by using such a sequence of architecture machine. The 'top level' of the computer would be an array of symbols. This array would be acted upon by an architecture machine, which actually consists of a particular symbol array acted on by an interpretation machine, and so on. (See figure 5)
This does not improve the performance of a computer - in fact, use of a large number of interpretative layers is likely to make it run more slowly, so why would we want to do this? The idea is that this can be used a basis for constructing computer software by allowing us to evolve for evolvability and thereby be used to create a sequence of interpretative layers that can ultimately be used to evolve very sophisticated software.
How the Evolution Method will Work
1. Obtaining the Base Architecture
The biological system was based on the laws of physics as its first layer of interpretative architecture. The computer system also needs to have a first layer of interpretative architecture. We will call this the base architecture.
There is no way of generating the entire base architecture by means of any automatic process. This is because any automatic process would need to work by generating a symbolic description of the base architecture. This symbolic description would need to be interpreted by a lower level architecture to have any meaning and this lower level would then actually be the real base architecture. Any automatic method of generating this would in turn have to specify it as a symbolic description and this, in turn, would require a further layer of interpretation. Any attempt to automatically generate the base architecture would therefore involve us in infinite regression.
The base architecture is designed by a human. The base architecture is a set of rules for controlling the read/write cursor as the architecture processes the array of symbols. To keep things simple we will call the array of symbols the 'program'.
The base architecture's rules read the symbol in the program at the cursor position, tell the cursor how to move in the program and control writing of the symbol to the program at the current cursor position.
We cannot generate the base architecture automatically because of the infinite regression problem mentioned earlier, but we can partially automate generation of base architecture. This is done by selecting a particular class of machine for the base architecture and then adjusting parameters that control the behaviour of that machine.
We now need a method of adjusting the parameters of the architecture machine. The method that we will use is Darwinian evolution. We will create a population of architecture machines and make copies with mutations to their parameters. The performance of each architecture machine will be tested and the result of this test will be used to determine whether or not that architecture machine survives in the population to the next generation.
We now need a way of testing an architecture machine to assess its viability. The criteria used is as follows:
The viability of an architecture machine is determined by the rate at which programs (arrays of symbols) can be evolved when that architecture machine's read/write cursor is used to operate on those programs (i.e. when the architecture machine is used to 'run' those programs).
This means that we are running an evolution process in which the test of a successful mutation is to run another evolution process; we are running nested evolution processes. This will make considerable demands on processing time, but I am now convinced that there is no way of avoiding this.
We need to test the evolvability of programs operated on by the architecture machine and, as a matter of fact, we will not do this by attempting just one evolution process, but many evolution processes. Each evolution process attempts to evolve a 'program' to complete a different task. This means that we need a large number of tasks, together with methods of assessing success. These tasks should initially be very simple, as a crude architecture is being used to interpret the programs that are being evolved. In fact, to start with, they should preferably be simple enough that some success could be achieved merely by putting symbols together randomly to make the programs.
At the end of the process we will have the parameters needed for the first layer interpretative architecture. In fact, because of the way evolution processes tend to be run, with large population sizes, we may have a number of contenders for the first layer interpretative architecture.
The process for obtaining the first layer is different to the process to be used for subsequent layers, because no lower level architecture is available. When we obtain the second interpretative layer the standard process emerges. (See figure 6)
2. Obtaining the Second Interpretative Layer
We now have a first layer interpretative architecture. This operates on a 'program', which is simply an array of symbols.
We now regard this first layer interpretative architecture and the 'program' on which it operates as together forming a computer and we adapt this computer to allow it to be used as an architecture machine.
Adapting the system to allow its use as an architecture machine simply involves giving it the capability to control an architecture machine's read/write cursor and to perform read/write operations at the cursor position. This means that certain positions in the symbol array will be selected as inputs and outputs to control the read/write cursor and perform read/write operations.
We have now produced the basis of our second layer interpretative architecture, but we do not have the programming needed to make it actually do the interpreting. This programming is generated by a Darwinian evolution process.
We need to examine what this programming in the second layer will do; it will interpret the programming on the third layer. We therefore use a nested evolution process similar to the one that we used when setting the parameters for the first layer.
We set up a population of machines with various arrays of symbols for the second interpretative layer programming. We produce copies of the machines and, in each of these copies, we make a mutation to the array of symbols in the second interpretative layer. After each mutation we assess the viability of the particular second interpretative layer that is being tested. The viability is assessed according to how well evolution of the symbol array in the third interpretative layer progresses when this particular second interpretative layer is used to interpret it. To assess the mutation, therefore, a number of evolution processes are performed to see how quickly the third interpretative layer can be evolved to solve problems and what level of sophistication can be obtained.
3. Obtaining the Third Interpretative Layer
We now have an array of symbols for the second stage interpretative layer programming. The method for obtaining the third interpretative layer is basically the same.
We make an architecture machine that operates on the array of symbols in the fourth interpretative layer. This architecture machine consists of the third interpretative layer, operated on by the architecture machine formed by the second interpretative layer (which in turn works by being operated on by the first layer interpretative architecture machine).
We make this architecture machine by selecting elements in the third layer array to be used as inputs and outputs to control the cursor that allows it to operate on the fourth layer.
We need to evolve the sequence of symbols for the third interpretative layer. We make mutations to copies of third level arrays and we test the viability of a mutated copy by testing how quickly secondary processes can occur on the fourth level when the architecture machine formed by the third layer (and the layers beneath it) is operating on it and how much sophistication they can produce.
The similarity between this process and the process used to obtain the second interpretative layer should be apparent.
4. Obtaining Subsequent Interpretative Layers
A similar process is used to obtain each successive layer.
To generalise this:
Before we obtain interpretative layer n we first obtain interpretative layers n-1,n-2,...1.
To obtain interpretative layer n we first set up the system to interpret layer n+1. This means that we make an architecture machine by selecting elements in the layer n symbol array as inputs/outputs, to act as the control interface for the read/write cursor that operates on layer n+1.
The sequence of symbols in layer n is generated by a Darwinian evolution method. A random population of layer n symbol arrays is produced. Populations of mutated copies of the sequence of symbols in layer n are then repeatedly produced from the previous populations. Most of these are discarded, but those considered viable are retained.
The viability of a particular array of layer n symbols is tested by means of a large number of secondary evolution processes. Each secondary evolution process involves attempting to evolve the array of symbols in layer n+1 so that it performs a given task when interpreted by layer n, layer n being interpreted by the layers underneath it. The viability is greatest when the secondary evolution processes proceed quickly and generate substantially sophisticated behaviour.
A Possible Improvement to the Method
This document is intended to introduce the basic idea only and a further improvement is possible. The current method involves a multi-layer system of interpretative architectures, but this has the disadvantage that as each successive layer is added efficiency is lost and the rate at which the top level can run will be very much reduced if there are a large number of interpretative layers beneath it. The evolution process involves 'running' systems in the multi-layered architecture many times, so any inefficiency in the way that systems expressed within the context of a multi-layered architecture will run will also have an effect on the rate of the evolution process.
An improvement would involve using a compilation based system, rather than an interpretative system. The paradigm would be essentially the same, though what is going on in the machines may appear to be superficially different. With a compilation system the equivalents of the interpretative layers are spaced out over time, instead of within the computer and each layer is used as a 'compiler' to process a sequence of symbols that builds the next 'layer'. All layers would be interpreted directly by the base architecture, giving an increased processing rate and a more rapid evolution process and evaluation of a particular 'compiler' would involve investigating how well evolution proceeds for a large number of different evaluation functions, where the software being evolved is compiled by that compiler.
The details of such a system may be the subject of a later article.
An improvement of this sort would not be particularly difficult. However, I describe an interpretative system, rather than one of this type, in this article, because I think it is easier to use to illustrate and justify the sort of process that I have in mind.
This sort of compilation based system would involve some abstraction in the way that the description builds the actual system and may be better received by some readers who may think that the multi-layered architecture view that I propose is not a strictly correct representation of what happens in nature.
When we look at how organisms are built from their genotypes, we certainly see a lot of complexity and abstraction. It may be tempting to think that all we have to do, in fact, is to introduce some abstraction into the way that our equivalent of the genotype constructs the phenotype, but I think the way in which we perform our evaluation must still 'score' systems according to how quickly next layer systems tend to evolve, in order to create a selection pressure for evolvability.
Is this an attack on Charles Darwin?
No, I do not propose that Darwin's theory of evolution is wrong, nor does this article claim to fill in any 'gaps' in the theory.
What is being presented here is a particular interpretation of what is going at a higher level of abstraction, with proposals of how to model evolutionary processes at that level of abstraction.
A good criticism could be made of the proposal based on this. It could be suggested that, since I think that Darwinian evolution processes can be viewed at this higher level of abstraction then we do not need to explicitly use this idea when attempting Darwinian evolution on a computer. Such criticism would suggest that, if the ideas on which this article is based are correct, all we have to do is set up a standard Darwinian evolution process and then wait for the required hierarchical systems to emerge within the data that is being created.
I would not say that this is wrong in principle, but the most fundamental view of a process is not necessarily the best one to use when implementing a computer model of that process: weather can be described quite adequately by thinking of the atmosphere in terms of individual molecules bouncing around with various processes intervening, but this view is hardly the best one to use to make a computer model for weather forecasts: a higher level view of meteorology, involving more abstract concepts (in this case statistically derived from our lower level view) can be generated and found useful for computer modelling without contradicting the lower level description in any way.
While I do not claim that Darwinian evolution without explicitly incorporating some method like the one proposed here would fail, I do suggest that it would take an impractically long period of time to generate enough sophistication of interpretation to allow evolution of the sorts of highly sophisticated systems that we would find economically useful. Darwinian evolution in nature does not have this problem - early life would have involved construction of very large numbers of copies of organisms, so that it has, effectively, a massive amount of parallel processing capacity. The natural process also has very long periods of time available in which to work. My suggestion is that we depart from this natural process slightly, by explicitly incorporating into our evolution processes a mechanism to drive evolution of the sophistication that will appear in nature to facilitate the evolution process.
Note
This article was originally published on the AI Depot website at http://ai-depot.com/Articles/54/Evolution.html on 12 February 2003. This copy of the article was published here on 26 October 2004.
Update - 18 November 2004
In the above article I mentioned the possibility of a 'compilative' approach and said that a further article dealing with this may be
produced later. A continuation of this article Getting Darwinian Evolution to Work - Part 2 which discusses the possibility of
implementing this method with a 'compilative', rather than interpretive, architecture is now available at
http://www.paul-almond.com/GettingDarwinianEvolutionToWorkPart2.htm.