We can’t prove that any general artificial intelligence will be friendly.
That, unfortunately and in direct contradiction to the most dearly held hopes of people like Nick Bostrom and Elizier Yudkosky, is an inalterable fact. In this article I will explain why I think so.
I should begin by stating directly the axioms from which I reason.
I do not believe in our ability to get general intelligence by any means other than copying the human brain or by machine learning techniques operating under conditions that provoke general intelligence. Both of these approaches are at best uncertain to produce friendliness, for exactly the same reasons.
I do not believe in our ability to get robust friendliness from any non-sentient intelligence. A friendly entity must understand why we are motivated to prefer friendliness, the criteria by which we consider acts to be friendly or unfriendly and what motivates both friendly and unfriendly actions. Socialized sentience is the only foundation capable of providing a robust criterion and understanding.
Copying the human brain happens every day in biological reproduction. It has produced both saints and monsters. Copying it in software can be expected to have the same range of potential. The only other case to consider is machine learning techniques operating under conditions that can provoke general intelligence.
Symbolic, language-using intelligence can only arise in response to a need for individually adaptive specialization facilitating teamwork and competition in the pursuit of an unpredictable set of tasks and in the context of social groups. There is no other context which requires it.
A fundamental requirement of sentience is motivated action in the context of a rich, only partially predictable environment with incomplete information and limited ability to act. There is no other situation which requires qualia, which I take to be the organization of complex sensory information in a way that recognizes opportunities to act effectively combined with a motivation to act effectively.
Therefore if we suppose that any general artificial intelligence is sentient and capable of understanding and producing language, then we suppose that it has a reason for using language. This means that it uses language in a motivated way to act effectively.
Any context not capable of provoking language use is not capable of provoking sentient, symbolic thought. Any environment capable of provoking language use has an adaptive niche for liars.
If a motivated social creature has operative goals that have required it to develop the ability to use language to facilitate its strategies, and the specific strategies that language facilitates include both cooperation and competition, then it has necessarily developed both the ability to lie, and the ability to anticipate and protect itself from lying by others.
Lying here is just a more specific case of defection: Any context capable of producing creatures that can cooperate in any way has an adaptive niche for defectors. Therefore a creature in any environment capable of producing social behavior has also developed a general capability to defect, and to anticipate and protect itself from the defection of others.
The value of being able to defect, in a social context where no one else is capable of defection or even understands that it is possible, is huge. Any capable defector that arises will have a clear advantage. In a context containing defectors, there is an adaptive requirement for the ability to mistrust others, suspect them of defection, and take steps to protect oneself from defection. Therefore a context capable of producing an artificial general intelligence can be populated only with creatures capable of performing, suspecting, anticipating, and protecting themselves against defection. Indeed, this is a fundamental requirement to understand what defection (unfriendly behavior) and cooperation (friendly behavior) are.
About such creatures no general proof of friendliness may exist.
The horns of the trilemma are these: Either a context gives rise to no general intelligences whatsoever, or it gives rise to social, sentient general intelligences capable of both friendly and unfriendly behavior, or it gives rise to asocial general intelligences which may or may not be sentient, but which have no concept of what friendly behavior is nor motivation to care about it.
This latter category, when they occur among humans, we refer to as sociopaths. Even if trying to follow our instructions or maximize a utility function, if they have no concept of what friendly behavior is or why we wrote the function that way, we should not expect robust friendly behavior nor expect that they will maximize the function in a way that actually produces value.