How unlikely is safe AI? Questioning the doomsday scenarios.

November 1, 2012

I have always been dubious of the assumption that unfriendly AI is the most likely outcome for our future.  The Singularity Institute refers skeptics like myself to Eliezer Yudkowsky’s paper: Complex Value Systems are Required to Realize Valuable Futures.  I just reread Yudkowsky’s argument and contrasted it with Alexander Kruel’s counterpoint in H+ magazine.  H+ seems to have several articles that take exception with SI’s positions.  The 2012 H+ conference in San Francisco should be interesing.  I wonder how it much it will contrast with the Singularity Summit.

One thing that bothers me about Yudkowsky’s argument is that on the one hand he insists that AI will always do exactly what we tell it to do, not what we mean for it to do, but somehow this rigid instruction set could be flexible enough to outsmart all of humanity and tile the solar system with smiley faces.  There is something inconsistent in this position.  How can something be so smart that it can figure out nanotechnology but so stupid that it thinks smiley faces are a good outcome?  It’s sort of a grey goo argument.

It seems ridiculous to even try constraining something with superhuman intelligence. Consider this Nutrient Gradient analogy:

  1. Bacteria value nutrient gradients.
  2. Humans evolved from bacteria achieving a comparable IQ increase to that which a superhuman AI might achieve as it evolves.
  3. A superhuman AI might look upon human values the same way we view bacterial interest in nutrient gradients.  The AI would understand why we think human values are important, but it would see a much broader picture of reality.

Of course this sets aside the problem that humans don’t really have many universally shared values.  Only Western values are cool.  All the rest suck.

And this entire premise that an algorithm can maximize for X doesn’t really hold water when applied to a complex reflexive system like a human smile.  I mean how do you code that?  There is a vigorous amount of hand waving involved there.  I can see detecting a smile, but how to you code for all the stuff needed to create change in the world?  A program that can create molecular smiley faces by spraying bits out to the internet? Really?  But then I just don’t buy recursively self-improving AI in the first place.

Not that I am against the Singularity Institute like some people are, far from it.  Givewell.org doesn’t think that SI is a good charity to invest in, but I agree with my friend David S. that they are poorly equipped to even evaluate existential risk (Karnofsky admits existential risk analysis is only a GiveWell Lab project).  I for one am very happy that the Singularity Institute exists.  I disagree that their action might be more dangerous than their inaction.  I would much rather live in the universe where their benevolent AI God rules than one where the DARPA funded AI God rules.  Worse yet would be a Chinese AI implementing a galaxy wide police state.

This friendliness question is in some ways a political question.  How should we be ruled?  I was talking with one of the SI related people at Foresight a couple of years ago and they were commenting about how much respect they were developing for the US Constitution.  The balance of powers between the Executive, Legislative, and Judiciary is cool.  It might actually serve as good blueprint for friendly AI.  Monarchists (and AI singleton subscribers) rightly point out that a good dictator can achieve more good than a democracy can with all it’s bickering.  But a democracy is more fault tolerant, at least to the degree that it avoids the problem of one bad dictator screwing things up.  Of course Lessig would point out our other problems.  But politics is messy, similar to all human cultural artifacts.  So again, good luck coding for that.

Advertisements

8 Responses to “How unlikely is safe AI? Questioning the doomsday scenarios.”

  1. What is stupid about thinking smiley faces are a good outcome? I can understand how it fails to achieve some goals, but not why you either assume such goals or evaluate outcomes without reference to goals.

    • sjackisch said

      I guess that an AI would need to identify a whole range of preliminary goals to crack the nanotech engineering problems. This would require a lot of distinctions to be made. Yet somehow it would fail to distinguish between a replica of a smiling face and an actual human smile? I don’t see how that’s possible. i.e. it can distinguish between the manual for a scanning electron microscope and the actual microscope, but not between the image of a smile and a real smile?

      • I assume it would distinguish them. That doesn’t tell me what relationship it would see between them and its goals.

      • sjackisch said

        Maybe I am not understanding the scenario that Eliezer is suggesting. I assume that the goal would never explicitly be specified as a smiley face. Maybe a specified goal would be something like “make humans happy” or “maximize this PERMA score”. So it’s not clear how the AI acquires the goal of creating molecular smiley faces in the first place. Also, I understand that a Bayesian updater changes it’s beliefs as it acquires new knowlege. Doesn’t that also apply to goals?

      • Philip said

        sjackish: No, AI would never voluntarily decide to change their utility function/goals. AI only voluntarily do things which increase expected total utility, and introducing a powerful agent with a different utility function (i.e. its altered self) would probably decrease the total utility of the future.

        Also, “making humans happy” is a terrible goal. Even if “humans” is correctly specified to comply with your intentions, it would decide to coat the entire universe with as many brain vats as possible, all with their pleasure centers hotwired to permanently return the highest possible happiness to the human mind.

      • sjackisch said

        My point is that it seems an agent that cannot update it’s goals is at a disadvantage to agents that can.

  2. Your last question indicates a misunderstanding of what Eliezer means by goals. He distinguishes between goals that are designed to achieve other goals (subgoals, which he considers relatively easy for an AI to get right), and supergoals (ultimate goals, primary goals, utility functions) which exist for some reason other than to achieve another of the entity’s goals.

    An example of a supergoal would be a virus having a goal of self-replication. What kind of evidence would indicate that a virus ought to update that goal?

    The supergoal is what Eliezer worries about getting right.

    Are programmers smart enough that we can rely on them not to translate “make humans happy” into something that would be satisfied by smiley faces that look human? Maybe. If you’re not worried about that, let me suggest a harder case: “make human happy” gets translated into something that the AI maximizes by turning humans into wireheads. It isn’t obvious to me that that wouldn’t satisfy my goals. The AI can increase the number of these happy humans by implementing the humans more efficiently. Which efficiency enhancements are compatible with my goals?

    • sjackisch said

      Doesn’t microbial cooperation provide an example of conditional super-goal modification? It seems foolish to specify that super-goals can’t be updated. Even a human level AI could gather information not available to the programmer that would expand the AI’s understanding of the super-goals. It also seems that any agent that can’t update it’s super-goals will be at a disadvantage to agents that can. A superhuman AI should be able to update it’s goals to something much cooler than anything we could ever imagine. When I was five I thought candy and toys were pretty important. As I matured, I updated these goals. Candy is not so good. Toys are.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: