How Smart is a Smart AI?

“The Turing test cuts both ways. You can’t tell if a machine has gotten smarter or if you’ve just lowered your own standards of intelligence to such a degree that the machine seems smart.” — Jaron Lanier

Despite the fact that a lot of AI projects produce things that behave in ways that people interpret as intelligence, most of them don’t have the level of intelligence you would probably expect given their behavior.

The classic ELIZA program was, maybe, about as smart as grass. If you rise to the level of one of the more recent chatterbots such as those competing in the Loebner Prize competitions, incorporating thousands of scripted answers and the code to distinguish between thousands of classes of different question, or more ambitious efforts that nevertheless fall into that range of competence, you’re talking about something smarter than a clam, possibly as smart as an insect. Programs that can infer grammars from large bodies of text, or learn how to do narrowly-defined tasks by training feedback, are smarter than insects, but probably not as smart as goldfish. One of Google’s self-driving cars may be as smart as a goldfish but is probably not as smart as a gecko. The servers back at Google that collate all the traffic information they need to plan routes, and update all the maps based on the movements of those self-driving cars, and simulate interactions between proposed software modifications with situations that all of the self-driving cars taken collectively have encountered, and continually, automatically, update the software in the self-driving cars with improved versions based on things learned from those simulations? The software running on those servers is smarter than a gecko, and that’s a good thing because that software, far more than the limited subset running in an individual car, is what’s keeping people safe in those self-driving cars. In fact the software running on those servers may even be as smart as a snake. So, yes, real progress in AI is being made. Each of these systems is more than an order of magnitude smarter than the ones that preceded it, and if progress continues at this astonishing pace we will probably reach human-level intelligence within a few more decades.

But, what the heck am I using for a yardstick? Why am I being so hard on all of these ‘smart’ systems? Well, they seem smarter to us than they actually are, because we’re humans and we’re smart and we anthropomorphize things. When we see something handling language, we assume that it handles language in the same way we do, with all our thought and emotion and our awareness of what’s being said. When we see something doing a competent job of driving a car, we assume that it thinks about the road ahead and the vehicle and the other drivers the same way we do.

This happens in exactly the same way that we see human faces in the shapes of rocks and we see mouths and eyes when we look at the grills and headlights of cars. Most of us, in fact, can’t avoid seeing human faces in things; we have a significant chunk of our brain that is specifically dedicated to recognizing and reading human faces, and no matter what we see, that chunk of our brain instantly evaluates it and tells us how much, or whether, it happens to look like a face and, if so, whether it’s the face of someone we know and what emotion its expression conveys. Similarly, we have a significant chunk of our brains that’s devoted to our symbolic and linguistic interactions with one another, and when we encounter anything that resembles that type of interaction, we cannot help seeing in it something like ourselves. The Turing test is not an accurate yardstick because we are not accurate judges.

And this tendency to anthropomorphize things is a pathetic fallacy. When we attempt to evaluate the intelligence of a system in an objective way, the way an AI researcher ought to think, we shouldn’t rely solely on our tendency to anthropomorphize. The tendency is normally advantageous, because it allows us empathy, which is a highly adaptive trait, with other humans as well as other species. But in the context of a Turing test, it is an exploitable bug in our biological operating system. And, sad to say, most of the Loebner Prize entrants are closer to being scripts intended to exploit that bug than they are attempts to create real intelligence.

Rather than relying on the buggy system call in our biological operating system, we have to look at a program objectively in terms of how complex a task it is to produce its interactions with the world, and then look for something to compare it to. This would be some organism capable of producing about the same number of interactions, about as well suited to its own survival as the system’s responses are to its task at hand, given about the same amount of information input.

And that’s why I said that ELIZA is about as smart as grass. I wasn’t being insulting or flip or snide; that’s actually about as accurate an estimate as I can make. Grass has a few dozen different ways to interact with the world, just as ELIZA had a few dozen different templates it could use for utterances it could make. Grass produces its interactions in ways responsive to its environment – light, temperature, moisture, soil, atmospheric conditions, etc. – in ways at least as complex and well suited to its own survival as ELIZA selects its interactions in response to conversation. And that’s why I think that ELIZA is about as smart as grass. It’s just an estimate; grass might actually be smarter, or less smart, than I’m giving it credit for.

So, look back over the list and consider how much smarter a clam is than grass. Clams, for all that they are quiet and immobile, have a fairly large repertoire of interactions and select them in ways that are responsive to more information about their own environment than grass uses. Insects are hundreds of times smarter than clams, and they accomplish that much without even having brains as such.

If I produced something whose real intelligence is, say, as good as that of a hamster, which handles language as well as a hamster handles, say, seeds, it would seem like a pretty good conversationalist and it would even understand what it’s talking about, for some value of understanding. A hamster-level intelligence, given my modest hardware budget and the programming effort that I as a single person can make, would be more than I can realistically hope for.

Because, you see, hamsters are not actually all that stupid. We don’t think of them as very smart because all the things their intelligence can do are things that we do as well, without thinking much about it. For starters, they can navigate on their own at least as well as one of Google’s self-driving cars, and they do it without continual map updates and GPS input. They can recognize and differentiate between individuals, which geckos and snakes and Google’s self-driving cars can’t do. And moving on to ground we haven’t gotten a firm grasp on yet in AI research, they behave in ways consistent with having intentionality and motivation and desires and emotions. They interact socially, seek mating opportunities intelligently, compete with other hamsters in ways that indicate that they make plans and anticipate the actions of other hamsters, respond in mostly appropriate ways to previously unknown situations or threats, seek shelter, establish food caches, and do a million other things that aren’t at all easy to fully understand or program.

So something as smart as a hamster would be absolutely amazing. I’ll be thinking of this AI project as an attempt to create a well-adapted software creature, not as an attempt to model human intelligence in software. But I want to try to create a creature that’s highly adaptable, handles language with at least some level of understanding, and is otherwise capable in ways that I as a human would value in an assistant and companion. After all, we value the assistance and companionship of other non-human creatures who are part of our households – why shouldn’t a creature made of bits have the same status?

Many system design flaws can be traced to unwarrantedly anthropomorphizing the user. — Steven Maker