One of the things I’ve done for the AI project is make a recurrent artificial neural network system that allows all or part of the network to be reconfigured or replaced “on the fly,” by replacing subparts of it with potentially differently structured subsystems, after training them in parallel with the running system. So subsystems get replaced, added or removed without shutting down the network or interrupting its operation.
But something has to generate the reconfigured subsystems. Something has to come up with new network topology. It has to generate a string that gets fed into a neural network builder. The string is a linear code that specifies a network. The topology, the transfer functions and local state requirements of nodes, the connection weights for the connections between nodes, and even how and where the inputs and outputs are connected. And the something that’s generating this code is a Blind Uncomprehending Idiot that can’t see the code it’s manipulating and anyway has absolutely no idea how code works.
The Blind Uncomprehending Idiot in question is an evolutionary algorithm. So the question my design needs to answer here is best seen as, what’s an appropriate representation for code in such a system, that can allow recurrent networks of radically different topologies and complexities to be generated and meaningfully compete with one another, and which is also evolvable using evolution primitives that an evolutionary algorithm can work with?
An evolutionary algorithm is a subtype of genetic algorithm which seeks a fit population rather than a fit individual. Sets of interacting or cooperating individuals are selected at random, and then individuals are favored according to how well the sets that they are members of do. This means that individuals that specialize in particular subproblems can be favored inside a general population that otherwise lacks that ability, and their genomes will become more valuable to the population the rarer or more valuable that ability is. EAs can also adapt fairly rapidly to online conditions where the fitness functions change, by altering the ratios of different specialist types within their populations. No particular genotype will dominate unless one emerges that solves the problem well and generally.
EAs are more capable of finding solutions to complex problems than other types of GAs, but they are slow, and not every problem permits the approach.
In order to be evolvable, something has to meet several criteria. First, the code it produces has to be have valid syntax – in this case the syntax required by the network builder. That much is relatively easy. Second, there has to be a combination operator that mixes two genomes meaningfully. Meaningfully in this case means that the offspring network will at least sometimes have behavior similar to both parents, and that the closer the parents are to being alike, the more smoothly the offspring will blend their behavior. This is something that most systems which attempt to evolve code fail to do, so genetic programming is usually quite difficult. Third, it has to allow for code that was evolved for one purpose to be lost (made into an intron) and then rediscovered in different contexts to see whether it is found useful there. And fourth, it has to allow for the existence of introns.
Introns are ‘non-coding’ sections of a genome. In any particular individual they may not do anything, but they’re a pool of potentials, most of which have been valuable to some ancestor or other, and many of which could be activated in an offspring by a happenstance of genome recombination or by a point mutation. They may not do the individual carrying them any good, but they mean that individual’s offspring after a recombination or mutation are more likely to be viable. We have a lot of DNA left over from previous generations all the way back to bacteria, much of which our genome contains instructions to just ‘skip over’. Likewise there’s a lot of DNA that’s been injected by viruses along the way, some of which our genome has found a use for and some of which not, and some of which is not good for us but gets expressed anyway because evolution just hasn’t got around to completely cutting it out yet. It’s still being ‘tested’ by evolution, and maybe some bits of it will turn out to be useful after all in some novel way or in some combination with something else. It’s messy down there.
And in evolution, messy is good. Messy means there’s a lot of stuff for evolution to work with, and there’s a lot more stuff than just what is presently expressed. It means a point mutation or atavism can activate or deactivate a whole subsystem of things, and the subsystem is at least likely to be a subsystem (or part of one) that something somewhere in our ancestry has found useful, and the system without that subsystem is at least somewhat likely to be capable of functioning anyway. And maybe it will be useful again someday, or for some different purpose. Anyway, introns are essential to evolvable systems.
A fair number of people who don’t ever learn that evolution requires introns and the kind of reshuffling where subpart sequences can change so things developed for one purpose can get used in a different way, attempt to use genetic algorithms on bit-string genomes where every bit location is tied to a specific purpose and can never be used for anything else, and where there is no information in the genome other than what is expressed. They get things they think are genetic algorithms but which function essentially the same way as iterated hill climbing or simulated annealing.
Of course neither of those are bad. Genetic algorithms, although capable of finding entirely unexpected solutions, are usually pretty inefficient, and can handle only problems of very modest complexity. EAs scale to greater complexity than other kinds of GA, are better at preserving genetic diversity, and if applicable can usually find some fitness gradients in problems where other algorithms can’t, but are even more inefficient on problems which can be handled any other way.
The only reason an EA may be the best solution in this case is because all the other means of striving for a good solution are also very slow.
On essentially all problems where you already know the structure of the solution, simulated annealing and iterated hill climbing usually require much smaller resources, in both CPU and memory, than full-on evolution. Evolution is vastly more general, but takes a long time building its big set of introns most of which don’t wind up as part of the eventual solution anyway, and figuring out which bits to use in what order. Iterated hill climbing and simulated annealing concentrate mostly on just optimizing the bits where the sequence and meaning of those bits is already known. Evolutionary algorithms are slower on little problems that can be solved without all that reshuffling and testing different sequences and configurations. But I’m working on a big problem, and EA while it’s slower for small problems, scales to big problems better. It’ll be very slow progress no matter what, but with iterated hill climbing and/or simulated annealing and/or GA with a naively coded genome, it would be likely too big a problem for measurable progress to be made at all.
And that brings us back to the question, how does a blind uncomprehending idiot with no idea how code works, handle representing potentially complex code that it can’t even see? Better still, how does it represent evolvable code which the blind uncomprehending idiot’s feeble efforts can manipulate in ways that show at least a few fitness gradients which usually aren’t all that misleading?
I think that I have a reasonably good answer. I’m letting the genome represent a jumping funge.
A jumping funge is a programming language in which every instruction is followed by a ‘jump’ to a different instruction (or more than one, if it’s a control-flow instruction). Most funges are non-jumping; they’re laid out in a 2-dimensional grid or similar and the jumps can only be to a neighboring cell. Control flow in such a funge follows a contiguous path like a flow-chart. Jumping funges are a collection of distinct points, not a grid. They don’t have dimensions as such, only addresses.
Anyway, the encoding I’m using is that paths follow the next-instruction pointers through the funge, and at each point there may (or may not) be a bit string which gets output. The output is catenated onto the code that will be fed into the neural network builder. When the path goes into a cell it’s already visited, or a cell with an ‘end’ instruction, that’s the end of the path.
The bit strings will contain module specifications and weight vectors for the connections between modules, and taken together will specify the network topology. Because neural networks are parallel, it’s easy to have multiple paths in the same funge, with different starting points. A combination operator that preserves a path starting from a different point for each parent, preserves most of a structure that contributed to the behavior of the parents, in a way that a GA finds some traction in.
Points in the funge that no path goes through, are the ‘introns’ of the system. As the system matures, most of them have been part of a path at some point or other, and the weights etc. they contain in their bitstrings have been trained for whatever use that path had. Most of them will still point at the next step in that old path, which may or may not be a step onto a path that’s in current use. When there’s a recombination or mutation that points back onto the old path, those already-trained weights and that part of the old network topology becomes ‘live’ again in the offspring, and may be found in an entirely different place, or found to be useful for some entirely different purpose, than it served previously.
This means that evolution both trains the weights in the connections, and rearranges and configures the modules and connections into different topologies allowing introns to be rediscovered and reused in different contexts, or currently-expressed pathways can become introns.
Instead of having the mutation rates be a global parameter as in most implementations, I’m putting it directly into the genomes themselves; it’s advantageous sometimes and not at other times, and letting it evolve along with everything else is the right thing to do. It’s expected to start high and then decrease as the system matures, which is exactly what it ought to do. Better than that, whenever a new fitness gradient is discovered, a high mutation rate becomes advantageous again and it can rise.
Anyway, it’s a nice general mechanism for evolving both the weights and the topologies of neural networks, and a good evolvable representation for them. It has a good reasonably simple combination operator which works, more or less, in the way our blind uncomprehending idiot needs it to work so that the idiot’s barely-directed actions on a representation of which it has no understanding, will have ‘sensible’ results in terms of preserving and combining the behaviors and traits whose valuations are all the idiot knows about them.