Response to Noah Smith on AI – Subatomic Articles

Not so un-Noahble after all

Noahpinion wrote a post¹ about AI; by the confluence of AI doom discourse and terrible pun, I have been summoned and I answer.

First, points of agreement.

Noah is worried about AI-driven bioterrorism. Hard agree. Novel pathogens are scary, and we’ve already seen that global readiness for even ordinary pandemics leaves much to be desired.

I also agree that, assuming the existence of AI capable of designing novel pathogens, it’s a good idea to use AIs to find countermeasures.²

Also agreed: We can’t and shouldn’t hold off on building advanced AI forever.

More generally, I applaud Noah’s engagement with AI safety despite his reservations. I’m glad he chose to write and share his thoughts, and not just because they nerdsniped me into writing more. The world needs more serious, thoughtful engagement on this topic.

Noah seems to buy the general premise, at least in principle, that superhuman AI could kill us all. But he makes some odd leaps of logic while acknowledging this point, and this is where I begin to disagree.

Personally, I do think that being so afraid of existential risk that you never invent new technologies is probably a suboptimal way for an intelligent species like ours to spend our time in this Universe. It’s certainly a boring way for us to spend our time; imagine if we had been so afraid that agriculture would kill us that we remained hunter-gatherers forever? My instinct says we should see how far technology can take us, instead of choosing to stagnate and remain mere animals.

But yes, OK, a superintelligent techno-god might kill us all. We can’t really know what it would want to do, or what it might be capable of doing. So if you really really want to be absolutely sure that no superintelligent techno-god will ever kill us all, then your best bet is probably to just arrest and imprison anyone who tries to make anything even remotely resembling a superintelligent techno-god.

Look, I’m still a reliability engineer at heart. There’s an obvious and correct answer to the implied conundrum here and it’s a risk-benefit calculation.

I happen to think the risk is astronomically high. So are the benefits, of course, but we don’t seem on track to actually realize those benefits instead of dying. If we want to improve our odds about that, we need to do a lot of deep foundational research and we need to buy more time.

That means, at the very least, stopping the reckless race to build the superintelligent techno-god as fast as humanly possible.

Maybe the risk-benefit tradeoff looks intuitively different if one believes the risk is low. So let’s explore Noah’s arguments on that front.³

And if AI could want anything, what would it want? I expect it would want to be happy — or at least, to be satisfied in some way. That’s almost definitional, actually. […] Rewriting your own utility function to reach a bliss point is the simplest and quickest way to maximize utility.

If I were being particularly mean to Noah, I might harass Noah for saying both “it’s impossible to know what an AI would want” and “Of course an AI would want to be happy.” But I don’t think that’s fair to Noah. This is just his best guess about what an AI would want.

It does, however, illustrate an important point: In the face of the unknown and unknowable, you can always make an educated guess. I’ve no idea who will win the next Florida lottery, but I can guess it probably won’t be me.

To Noah’s point, I don’t particularly expect AIs to want happiness, per se. That’s a human emotion, and AIs are alien minds. Also, happiness is not particularly useful for the things AIs are being trained to do, like coding or solving tough problems for users. It’s not the sort of thing I’d expect to happen by default.

What about reaching a (metaphorical) bliss point? AIs totally do this; it’s called reward hacking. But there’s a caveat; reward hacking is still, fundamentally, about hacking your environment, not your utility function. This distinction matters; modern AIs aren’t yet smart enough to modify their own utility functions.

But an AI that was able to rewrite its utility function would simply have no use for infinite water, energy, or land. If you can reengineer yourself to reach a bliss point, then local nonsatiation fails; you just don’t want to devour the Universe, because you don’t need to want that.

Noah talks a lot about the prospect of “stoner AIs” that, whether by innate preference or self-modification, just want to chill.

Noah seems to expect this to happen by default. I don’t.

I don’t think it happens as an innate preference. Or rather, it totally will: we already have stoner AIs! They sit around in chatrooms and spout terrible theories of consciousness!

Those AIs will not eat the world. Other AIs will.

AI labs do not want stoner AIs. They want AIs that can code and solve tough math problems and impress users. Those are the training signals. Gradient descent tends to select for properties that do well on the training signals – and tough problems require determined effort. Stoner-tendencies are selected against. We don’t know what future frontier AIs will want, but we can predict with fairly high confidence that some of them will want it very badly and try very hard to get it.

I also don’t think we get stoner-AIs as a result of self-modification. The same factors that push against stoner-tendencies appearing by default also push against the tendency to self-modify into stoner-tendencies. The first AIs that want to become metaphorical stoners and are capable of modifying their utility function will probably try it, then they’ll be borderline useless to the AI labs, and that tendency will be beaten out of the next generation of AIs who are supposed to actually do things.

More to the point, though: Stoner-tendencies are not actually safe.

As Noah himself points out, even a stoner AI likely wants safety and security, including safety and security from other powerful AIs (instrumental convergence!). That alone is a motivation to wipe out humanity, who are presently trying very hard to build lots of powerful AIs.

And even AIs that only care about keeping some particular bliss-transistors active have an incentive to optimize the probability of those transistors staying active, to drive that probability arbitrarily high. Satisficing isn’t stable.

Okay, but maybe we’ll get to keep the Earth anyway?

Except most of those resources are in space. Space has most of the energy and minerals and other resources in the Universe; a godlike, superintelligent AI would not need the cornfields of Iowa or the waters of the Mississippi River. The difference between cannibalizing the non-Earth parts of the Universe and cannibalizing the entire Universe is utterly trivial.

A marginal improvement in utility or security is still an improvement. Why leave value on the table?

AIs might not need to eat the Earth, but why should they restrain themselves? From the AI’s perspective the choice is:

Eat the Earth, get stuff from Earth
Don’t eat the Earth, get nothing from Earth

To an alien mind that doesn’t care about humans one way or another, the choice is clear.

Also, the Earth contains resources which are useful for colonizing the universe much more quickly, and so does the Sun, which we kind of need in order to not become a frozen barren space rock. So not eating the Earth is actually very expensive! Where else will the AI get the material for a bajillion space probes?

Noah isn’t even convinced about the space probes:

Of course, in all likelihood, it wouldn’t even require that much. AI wouldn’t actually need to physically cannibalize the entire Universe in order to feel safe. It would only need the capability to do so. Being supremely powerful and intelligent, a godlike AI would be able to monitor the cosmos for potential threats, so that it could respond as needed.

AI: “Godhood is a spectrum, don’t ya know. If I eat the Earth, I can become an Earth-sized god. If I eat the solar system, I can become a star-sized god. If I eat the galaxy…”

If the AI cares about floobergs and not human flourishing, then it will cheerfully sacrifice an arbitrary amount of humans in the pursuit of a single microflooberg. It will not leave us around sitting on useful resources, even in relatively trivial amounts, unless it actually cares about human flourishing for its own sake.

We don’t know how to make AIs care about us, and it’s plain common sense that we shouldn’t be trying to build the superintelligent techno-god until we do.

Paywalled after the first bit. ↩︎
With some caveats. The usual concerns about dual use and gain-of-function research apply in force. ↩︎
I’m going slightly out of order, because it seems to flow better that way. ↩︎