July | 2016 | Bear's Den

The “Delphic Casino” is a very powerful tool for exploring and evaluating bunch of diverse agents that approach solutions to a problem in different ways, in large part because of the way the agents compete. By creating an economy, it implements a degree of novelty search, and makes competition to be more right than each other turn into a cooperative effort that makes the system more right than any of the agents.

Each agent gets paid the most for being right, when the other agents are wrong. Something that’s right, (or “rightest”) if it’s the ONLY thing that’s right on a particular trial, gets all the pokerchips ALL the other agents bet on that trial. On the other hand if an agent is the only thing that’s wrong (or “wrongest”) it loses all the pokerchips it bet, but that’s the only money the other agents win; they have to split it up so each gets back only a very tiny percentage more than what they bet. Most cases are nowhere near so extreme and everybody gets back some fairly large percentage of the money they bet or some fairly small multiplier of it, but that’s the general idea; money goes into the pot as a fraction of the agent’s wealth, and comes out of the pot multiplied or divided by some factor representing that agent’s share of rightness or wrongness. So if something is right when most other things are right, it just about breaks even, or makes a tiny percentage because a few other agents were wrong. But if it’s right when a lot of other agents are wrong, it makes a huge win on those trials. Even if it loses most of its bets, being the only thing that’s producing the right answer on some fraction of trials will ensure that it doesn’t go broke.

This becomes important because from time to time three agents are picked at random. If between them they have enough money to make a “normal” balance for three agents, and one has a tenth or less of that balance, the poor one gets its genome rewritten as the offspring of the other two and gets a third of their pokerchips to go along with half their genomes. I start by arbitrarily giving each agent 500 pokerchips, and the wealth of most agents hovers between 40 and 2000 pokerchips. Once it’s over 1000 they’re likely to “teach their betting system to some poor schlub who’s going broke” (and stake him part of their wealth) fairly soon, and once it’s under 50 they’re liable to “change betting systems” (and have their wealth bumped up to 600 or so) fairly soon. Neither the number of pokerchips nor the number of agents ever changes, so the “average” wealth never changes.

Instead of the simple situation where a few agents control a fraction of wealth proportional to how often they’re right, the fraction of wealth controlled by any single agent is only loosely correlated with its performance, but the number of agents in the population that are near-copies of (or “playing the same betting system as”) it is proportional to how often the system is right. And each of those other bettors is a recombination (or, rarely, a mutation) that attempts to refine the system.

This means niches are protected. A bettor can succeed with a system that has any valuable behavior that other systems don’t have – at least until too many other bettors are playing the same system. At the same time the search for refinements is most concentrated on the systems which are most successful because more bettors are following those systems (with minor variations).

This works. It’s a very powerful, elegant system that automatically does the “niche protection” that other systems like NEAT had to include special code for, and because the trials make the betting pool proportions available, a “total output” is easily accessible. Further, if conditions change online, the wealth distribution and population proportions of diverse betting systems in a Delphic Casino can change much more quickly than actual gene-based evolution of the agent genomes can happen. This changes the “combined output” in real time, making a Delphic Casino capable of much more rapid adaptation (within the limits of the variety of genetic behaviors its agents pursue) than most neuro-evolution systems. It also means learned skills get retained much better (it addresses the Catastrophic Forgetting problem) because agents which have learned some system and retain it persist for a long time in the population, getting more rare as the behavior is less rewarded, but more rare at an exponential (long-tailed) rate – a few of them will hang on for a long time, and if the behavior they’ve learned becomes important again, their betting system will rake huge winnings out of the casino and a population of new players of that system will rapidly re-establish their population. Again, at an exponential rate, meaning once the comeback gets started it will rapidly complete.

This is probably a valuable component of what I’m after. The continuity (and potentially something that might be considered “consciousness”) of this kind of system gets established when there is inter-agent communication.

There is some inter-agent communication implicit in the market dynamics I’ve been describing of course, but it’s very vague and low-bandwidth. It would take an astronomical number of agents and a richly complex market working at a ludicrously fast speed in parallel, for market dynamics alone to model conscious thought. That’s just not a realistic hope.

But inter-agent communication of a richer and more immediate type is as simple as making the output of sets of recurrent connections visible and shared. The agents are recurrent, meaning each round of a trial they are reading some of their previous round’s outputs. These feedback links can be averaged together with the “system” or weighted-average outputs on those links, making each agent’s view of its recurrent information influenced to some extent by what all the other agents in the trial are giving themselves to view as recurrent information.

Well, okay it’s not that simple; you then have to prevent the agents from sandbagging each other by leaving misleading information in the recurrent links for the others to read, which means you need to insulate them enough from the others’ recurrent output that they can’t be misled as to their own, by making sure each can “hear” their own output most strongly. This is a bit meta, but it allows a useful degree of communication while preventing it from being in the agents’ interest to leave misleading information and then bet against the actions it will lead other agents to produce. This is the kind of thing you ALWAYS have to think about when setting up these systems; you have to be very careful not to leave opportunities to “win” in ways that don’t produce positive value, or else the solvers will produce exploits instead of solutions.

Anyway, moving on. With a shared continuity of recurrent I/O, the individual agents become cooperative elements in a single shared process. They can get swapped in and out or changed or evolve, or some of them can even malfunction and be driven to extinction, while the process continues uninterrupted. And while the individual agents are relatively simple, that shared cooperative process can be very complex. The analogy I had in mind setting this all up was the way cortical columns interact in biological brains – largely independent in function but interdependent by virtue of a shared communications substrate.

Which is all terribly useful and gives a really awesome way to produce responsive, rapidly adapting systems that can retain learned skills. So, yay. But it does nothing to tell me what the “win” conditions I should be striving for to produce consciousness are. So… useful but a big glaring problem is still there.

However, there’s another big glaring problem that I couldn’t see until I got this far. The evolution methods I’ve been using so far to evolve those “simple” agents which are supposed to be playing the part of cortical columns, I have concluded probably aren’t scalable enough to produce something of the complexity an individual cortical column would need to be a useful contributor to any consciousness-like process.

I’ve been dealing with neural connections explicitly. That is, for each agent schema I was keeping track of every individual connection in the connectome maps. Brains are among the most complex things nature has ever produced, and nature does it using a genome that stores drastically less information than it would require to specify a total connectome. Connectomes are huge, and evolving them gets slower and slower as they get bigger. Hence my scalability problem.

So it’s time for me to climb another learning curve and start thinking of the agent connectomes as statistical density equations in higher-dimensional spaces rather than as explicit point-to-point maps.

This is the HyperNEAT approach – If you consider a map of inputs on the X axis and outputs on the Y axis, then drawing a 2-dimensional shape in the space can represent connections from inputs to outputs at points where the shape is drawn. Now, if the shape is grayscale, it can represent different connection weights at each of these points.

HyperNEAT gets happy about this and cheerfully goes on to four-dimensional spaces to connect nodes that have locations on both X and Y axes, six-dimensional spaces to connect nodes that have locations in 3D, and so on. Instead of evolving explicit connectomes, as in NEAT, hyperNEAT is evolving the equations of these higher-dimensional curves and density equations, and seeks to exploit information from our 2 and 3-dimensional geometry to create 4- and 6-dimensional geometrically structured solutions to problems. When an actual connectome is needed, the evolved geometric density equations can be solved at your choice of resolution or scale to produce one, and you can choose resolution that’s quite fine or scale that’s quite large if you think it’ll help.

Which is a very good idea; our visual, auditory, olfactory, and tactile inputs, as well as all our motor outputs, are connected to the brain at particular locations, and strongly influence the structure and functioning of the cortical columns in those areas. Our visual cortex is mapped spatially in our brains in a way that has a contiguous mapping to the geometry of how the light actually hits our retinas. And so on. Geometry is a very important structuring principle influencing what kind of cortical columns go where, and where connections connect. My single set of shared recurrencies, accordingly, needs to be replaced with a distributed, geometrically interpolated function on a distributed set of shared recurrencies, and my “agents” need to have attributes that give a distribution of locations that are valid locations within the geometry for them to function.

In the short term, this means ripping it all apart (again!) and building it back different. It doesn’t solve the other glaring problem, but I suppose it may get me to a point where I can see a new glaring problem. And I can’t solve any problems until I can see them. Hopefully solving enough of these problems will eventually give me the tools and/or insight I need to solve the last one. Whatever the last one turns out to be.

M	T	W	T	F	S	S
« Jun				Aug »
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Bear's Den

Monthly Archives: July 2016

Brains, Novelty Search, and HyperNEAT