Petri Nets for AI?

A Petri Net is one of the less well-known types of value propagation networks.  Like Artificial Neural Networks, Petri Nets are capable of producing extremely complex behavior.  The good news is that they are more stable in recurrent operations than Artificial Neural Networks.  The bad news is that unlike ANNs, there is no widespread and well-founded work on how to most effectively train them.  The training methods that have been used tend to be either extremely general and slow, like evolutionary methods, or extremely adhoc, in the sense of “somebody tried this and found that sometimes and for some problems it seems to sort of work.”

However,  in a quest for general AI, classic Artificial Neural Network training techniques are inappropriate anyway.  In the absence of any constant set of connections or topology, classic neural net training algorithms do not operate. If we assume that we’re trying to create something that has the same kind of emergent fine detail and self-organizing architecture as a brain, then we have to consider the neuroplasticity of biological brains. Neurons, once thought to never be replaced in humans, do in fact sometimes wear out and get replaced.  They get replaced with neurons that don’t connect in exactly the same way, but which continue to develop, change, and refine the emergent structure. Existing neurons develop new dendrites or even axons that change their connectivity and the way the brain uses them, over time.

We have strong hints that this kind of neuroplasticity is not only useful in recovering from brain lesions and injuries, but also a fundamental mechanism in the development of our brains’ fine structure.  Among other things, humans – the only species to develop the kind of flexible intelligence we’re talking about – have neuroplasticity to a degree astonishing among all species on Earth;  our brains are subject to many orders of magnitude more structural change over time than the brains of, eg, horses or elephants or turtles.

When we talk about neural network training algorithms that can utilize plasticity – neural rerouting and network topology changes, in our terms – we are mostly talking about some form of Evolutionary Algorithms anyway.  And if we’re talking about an Evolutionary Algorithm, we can train Petri Nets with it about as easily as we can train Neural Networks.

For those unfamiliar with them, a Petri Net, like a Neural Network, is a directed graph of vertices and edges in which values (in this case integer values considered as a count of ‘tokens’) are transmitted from node to node via the edges according to rules associated with each edge.   But there is an important difference.  When tokens are transmitted along an edge in a Petri net, they are subtracted from the token count at the source vertex and added to the token count at the destination vertex.  You can think of this as a conservation-of-energy rule, and this conservation of energy is why Petri Nets are easier to keep stable than recurrent Artificial Neural Networks.  Over time, the total number of tokens in a Petri net does not change.  Petri nets are naturally recurrent, in that the tokens circulate among the nodes.

In a ‘classical’ Petri Net, the edges are unordered and non-deterministic.  At each step in time one of them is selected at random, its rule is checked, and if the rule’s conditions  are met, the edge ‘fires’ – tokens are moved from source to destination.  In practice, if you want your Petri net to work efficiently, you can check first the edges attached to the vertices whose token counts changed with the firing of the last edge. Otherwise you can check all edges in any sequence, ordered or random.   Some Petri Nets are capable of ‘deadlock’ – reaching a state in which no edge can fire – and some are not.  A ‘deadlock’ in a Petri Net is analogous to ‘quiescence’ in a recurrent Artificial Neural Network – the moment in time when no node fires and the whole system comes to a permanent halt.

A related condition is called ‘saturation’, and classical Hebbian training of a neural network (which always increases and never decreases the connection weights) always leads to saturation unless an additional step downregulating the network weights is taken.  Saturation, in an Artificial Neural Network, means every node fires, every time, creating positive feedback.  This is a repetitive state every bit as meaningless as quiescence.  In a Petri Net, the analogous state would be one in which every edge can fire, and no sequence of firing could reach a state in which some edge cannot.

But both of these conditions are extremely simple to avoid in a Petri Net.  These two invariants result in a Petri Net that provably avoids both deadlock and saturation:

1.  All nodes and some subset of edges form a fully connected graph, all edges of which have the rule that one token may be transferred from the source to the destination if the source has more than zero tokens and the destination has zero or fewer tokens.

2.  The total number of tokens in the network is greater than zero and less than the number of nodes.

A positive total number of tokens means that there is always at least one node with a positive number of tokens.  A total number of tokens less than the number of nodes means that there is always at least one node with zero or fewer tokens.  Therefore at all times at least one edge of that subgraph must connect a node with a positive number of tokens to a node with zero or fewer tokens.  Therefore at all times, there exists at least one edge that can fire. But transferring a token on firing means that no edge can be fired repeatedly without causing a change in the eligibility to fire of itself and other edges connecting to or from its source or destination vertexes.  So no matter what else happens in the network, every member of this set of edges will sometimes have its conditions met for firing (no deadlock) and sometimes not (no saturation).

To these edges you can add any number of other edges, with whatever rules you like.  Any edges whose rules can be met by the same conditions as the rules of the subgraph will never saturate and will never deadlock.

The number of tokens transferred is the strength of the edge, similar in concept to the weight of a connection in an artificial neural network. Adjusting strengths, like adjusting weights in ANN’s, is one of the ways a Petri Net can be trained.  This can be done via evolutionary methods with the same ease as evolutionary methods applied to training an ANN, or can be done with ‘swarm’ methods or similar adhoc hill-climbing techniques if the topology of the Petri Net is such that directional gradients for hill-climbing can be detected.  But the broad experience and proven, elegant, reliable training algorithms that exist for ANN’s don’t really have any closely comparable methods in Petri nets.

Petri Nets can have extremely complex rules, or ‘edges’ that connect more than one source and/or more than one destination, or several different ‘colors’ of tokens, with rules that determine when an edge can fire based on combinations of colors, and transfer particular colored tokens when they fire.  Individual nodes can have maximum numbers of tokens they can hold, or minimum numbers.  Token counts can be real numbers instead of integers.  In Petri Nets designed by hand, such complications are often very useful because they allow us to model the problem using abstractions, in the ways our brains are trained to model the problem.  For example in a network where the edges when fired move blue tokens, but the rules for firing an edge are influenced by the presence or absence of both red and blue tokens, the distribution of red tokens can be used to represent long-term memory and the movement of blue tokens can represent a process controlled or influenced by that memory.

But evolutionary algorithms or processes don’t care much about what we find handy for modeling and building abstractions, and as far as I know these complications can always be modeled by the movement of tokens of a single color along edges whose conditions are no more complicated than checking for minimum and maximum numbers of tokens at source and destination and whose effect is no more complicated than moving some specific number of tokens.

Evolutionary processes can optimize that kind of locally-simpler model in similar time to that taken for the more locally-complex models, even though the simpler  models usually require more vertexes.  The operations for evolutionary algorithms  – mutation, recombination, etc – are also simpler, and the code therefore easier to write, for Petri Nets with greater local simplicity.   Finally, as a Petri Net grows more locally complex, so do the sets of invariants necessary to maintain a proof that the network can neither deadlock nor saturate.  On the whole, so far there is little evidence that such local complexity is worth the effort (or has a positive value at all) in training evolved behavior.   Evidence may be discovered if and as we develop a proper body of knowledge about training and operating Petri Nets, but it is at least as likely that it will not.

I am very much aware that the nodes and connections in our ANN’s  are extremely simplified models that only vaguely remind neurobiologists of the way the neurons and synapses of a biological brain operate.  But there is absolutely nothing in a Petri Net that resembles anything about the way biological brains work.  As such, Petri Nets are of no interest to neurologists or neurobiologists, and the “existence proof” of biological brains as showing the achievability of AI with neural models, as loosely as it ever applied to ANNs, does not apply at all to Petri Nets.

I have been thinking about Petri Nets lately, and about Petri Net topologies that allow tractable hill-climbing methods for training them.  This is because recurrent Artificial Neural Networks trained by Evolutionary Algorithms frequently become unstable or fall into self-reinforcing repetitive patterns uninfluenced by later input.  The conservation of energy in a Petri Net, which gives us the ability to prove the properties of having no deadlocks or saturation states, is appealing.  The difficulty of training Petri Nets given our current state of knowledge and technique is not appealing, but if we are forced to use Evolutionary Algorithms anyway, the difficulty may be no greater than that of training ANNs by similar methods.