Roguelike AI, article 5: Evolving state machine AI’s.
Last time, I introduced genetic algorithms and gave a brief guide to troubleshooting them, then presented some of the issues involved in using a genetic algorithm for evolving stateless AI’s of the type covered in article 1. This article is about evolving state machine AI’s of the type introduced in article 2, and a little bit about the limitations of genetic algorithms.
A state machine AI is, fundamentally, a state transition table. You can think of it as a 2-dimensional grid where inputs are mapped against states and each mapping gives a transition to another state.
The “intelligence” of state machine AI’s is bound up in three things: The transition table itself, the intelligence of the stateless AI that’s used in each state, and the decisions about when to give what input.
In my earlier article on state machine AI’s, I focused on the state transition table: It’s central to the idea itself. I limited the input signals that could happen in a given state by giving the monster different sensory capabilities, or a different “observe” function to call in each state. However, within the limits of what could be observed, I was hand-coding when the system decided to issue a state transition signal.
Now, when you’re doing things by hand, you have a clear idea of what you want the monster to do and the state machine, including the stateless AI’s, the transition table, and the decisions about input, follows from that. But if you’re evolving monster behavior instead of hand-coding it, you’ll have to take another look at how to decide when to issue a state-machine input.
You have to look at the inputs available from the dungeon and from the monster’s perceptive power at the moment, and you have to pick which of several signals to issue…. wait a minute. Isn’t that basically the same thing you were doing with a stateless AI? It is, isn’t it? With a stateless AI, you built a tree of decision points and input parameters, with the leaf nodes leading to actions. Deciding which state machine inputs to turn on is exactly the same except that the leaf nodes are signals instead of actions. So you can use the same structure and method now, building a tree of decision points where the action is picking one or more a state machine input signals to turn on.
We’ve already got a good method for combining “genetic” material from decision trees; we just use branch swapping, the same technique we use for the stateless-AI of each state.
But whate about the structure they’ll be controlling? That pretty much has to co-evolve with the controls. what remains is the transition table itself.
But tables are easy to evolve in a genetic algorithm; this is just an array, so we set up a fixed-size block in the genome for it. Each state invokes a stateless-AI with a given set of parameters: probabilities, thresholds, and default inputs.
In the state transition table we have the following: For each state, there is a designated stateless-AI to use, and a set of parameters for it designating its behaviors and tests from the set available to the monster. Each state also contains a set of tests (again from those available to the monster) to use as parameters for the input-selector. For each state, for each input, there is a datum naming a state and a function that gives a probability of transition. I’d recommend that these functions be limited; you could use a constant, or use the output of one of the monster’s available tests, or a simple function of a test and a constant or a simple function of two tests.
And, based on these analyses and assumptions, we’re ready to make a gene map, a combination function, and a mutator for our state machine AI.
To make a gene map, we’re going to have to make some decisions in advance about the morphology of the state machine; For example we can say a priori that it has sixteen inputs and sixteen states. That means our genome needs one input-picker, 16 stateless-AI-with-parameter sets, 16 sets of parameters for the input-picker, and 256 transitions with transition probability functions (see why I advocated keeping them small and limited?).
So, assuming we develop the stateless-AI’s in a separate step, and allow a thousand nodes (each about two words) for the input decision tree, a hundred words per state for stateless-AI parameters, and four coded words per transition function, we get a “genome” of about 8 thousand words, or 32 kilobytes per individual. The combination function will use branch swapping for the tree, crossover within each stateless-AI parameter set, and crossover within the table. On the principle of storing things whose influence on fitness depends on each other close together, the table will be in row-major order with the stateless-AI parameter set in the middle of each row (so that a state’s state transitions remain as close to its behavior definition as possible).
You have to have a reasonable sized population (at least 50 to 100 individuals) or it just won’t work. In order to work with genomes this size and a small population, you need to use very low mutation rates and a very low degree of elitism. And one of two other things. Either you need a whole lot of patience (reconciliation to runtimes of a few weeks or longer) or you need a cluster.
Remember the fitness test we used with the stateless AI’s? Basically, it amounted to this; considering the moves that each of the candidate AI’s would make in a given situation, which virtual monster would have wound up better off? With state-machine AI’s there is one additional complication; we have to know what state the monster is initially in. But given that additional information as part of the “situation”, it becomes the same problem as evolving stateless AI’s.
The “goodness” function for the situation can be the same one you wrote when you were evaluating stateless AI’s; just take every possible thing you can think of, decide how “good” or “bad” it is according to this monster’s goals in life, and combine them to get a total score. The difficulty, as usual, is projecting ahead.
If the “goodness” function is very good – if it takes everything your monster cares about into account and measures its situation realistically, then you don’t have to project it more than one turn into the future, and you can do that without necessarily simulating the player. The “best” state-machine AI, in that case, is the one that most often produces the highest “goodness” score available from any move.
This brings up an important point; what we’ve been trying to achieve so far is stateless and state-machine AI’s, because they are fast and unambiguous ways to decide what to do. There may be scores, or hundreds of actions available to a given monster at a given moment, and we’ve been trying to avoid the need to “try and test” each and every one and see what kind of goodness score we get from doing it. That kind of test is acceptable in _evolving_ a state machine, in order to “grade” state machine quality by how closely or how frequently the state machine matches it in fitness; but so far, we’ve been assuming that it’s more CPU horsepower than we want to do for each and every monster at runtime. The whole point of the GA is to use the expensive function (projection plus measuring goodness of the situation) to evolve a cheap-to-compute function that gives us results as good.
If on the other hand you decide that it is acceptable to spend the compute time at runtime, and you don’t want to sit around for weeks while a genetic algorithm converges, then by all means throw out the state machine and substitute the function that projects the results of actions and decides which resulting situation is best. This will make the AI in your game a lot “heavier” in terms of CPU, but for most “reasonable” goodness functions it can still be managed. And in fact it’s the beginning of a reasonable approach for modeling the player.
The limitations of Genetic algorithms are several; first, they take a whole lot of computing power to arrive at a conclusion (remember, “patience or a cluster”). They’re finicky and often mysterious and usually multiple runs are needed as the earlier runs point out problems with the way you set up the problem or with the parameters you’re running the GA itself with (this is a motif you see a lot of with AI code). They require you to set up the gene map and say how it will be interpreted (which forces you to make decisions about, for example, number of inputs and number of states, whether or not you have a very clear idea what the results of those decisions are.
GA’s produce systems that are usually incomprehensible and always at least nine or ten times more complicated than they need to be; google for “Emergence of Introns” in genetic algorithms if you want the long version of the story, but the short version of the story is there’s got to be a bunch of apparently-useless complication in there before the evolutionary process really starts to work. We can expect introns (complex structures that serve no useful purpose in the end-result but which serve to make evolution via our mutation and combination functions easier and less risky) in our decision trees and in our stateless-ai’s, and if we don’t hand-optimize them, which is substantial work, we can expect that the result will spend about 3x as long as necessary to run. We can expect introns in our state transition table; look for sets of redundant states that clearly serve the same purpose/produce the same behavior and often transition from each to the other wildly. It will still be ten times as fast as the try-all-possibilities and score the results of each method, but if you hand-tweak it to get rid of introns, you can make it both faster and more comprehensible.
And anyway, that, finally, is getting close to all we can do with state machines and stateless AI’s.
Trying to project a “goodness” function more than one turn into the future, of course, brings us up against the same hard rock that we ran into when we were developing stateless-AI’s; in order to do this well, we are going to need something that simulates the player. And that is hard.
** A note about clusters and small-scale distributed computing **.
When I say “cluster” I don’t really mean one of those amazing special-built clusters and distributed computing infrastructures, although one of them could certainly be used to advantage. Instead, I would advocate building a tiny console program to run the Genetic Algorithm as fast as possible, and, while running it, once a minute or so, submit an HTTP request containing a few genomes, and get an HTTP response containing a few other genomes. The server runs a simple script that responds to each request with a random set of a few genomes it’s recieved in the last hundred requests it handled. Exchanging a few genomes now and then is really all that the system needs to keep the populations connected and allow examples of new genes and complexes developed elsewhere to drift between them; it’s analogous to an “island chain” where a few individuals drift between islands occasionally.
Once you’ve got your GA program and your server script, find a LAN where you can set it up and run it. You can do it at work if you get permission first; you can either ask your coworkers to run your little console program and plug in a laptop to act as the server, or you can take over the office LAN while the office workers are away for Christmas vacation, or whatever. Don’t do this without permission at a job you want to keep. A good relationship with coworkers and boss is needed. Or, you can just hook up all the computers in your house and wait for it to work.