Review of "Rationality: AI to Zombies": Rationality vs. Antifragility
If you prefer to listen rather than read, this blog is available as a podcast here. Or if you want to listen to just this post:
I’ve mentioned here and there over the past few months that I’ve been working my way through Rationality: AI to Zombies by Eliezer Yudkowsky, and last week I finally finished that mammoth tome. Okay it wasn’t an actual tome, it was a kindle ebook, but using the page estimate on Amazon, had it been a book, it would have been 2393 pages. Which may make it the longest book I’ve ever read, surpassing stuff like Les Miserables and War and Peace. (In case you’re wondering, both of those were more enjoyable.) And, given that length, there’s a lot that could be said about it. Consequently, it may take more than one post for me to cover everything I want to. We’ll have to see, but to start off with I’d like to focus on the difference between Talebian Antifragility, my preferred framework, and Bayesian Rationality, the framework espoused by Yudkowsky in this book. And why one is better than the other.
As anyone who’s followed me for any length of time could guess, I think antifragility is better than rationality. (See this post, if you need to brush up on what antifragility is.) This is not a conclusion I came to just recently, and in fact I think I covered it pretty well in my prediction post from the beginning of the year. But back then I was reluctant to paint with too broad a brush, particularly since, at the time, I didn’t feel that I had read enough to be confident of accurately representing the rationalists. Over 2000 pages later I no longer have that concern.
To be fair, I don’t think they’re unaware the ideas of Taleb and antifragility, I’ve seen both mentioned here and there, and late in the book Yudkowsky says:
Truly it is said that “how not to lose” is more broadly applicable information than “how to win.”
This is not a bad summation of the principle of antifragility, but unfortunately insights like these are few and far between, and rather than focusing on how not to lose, or more accurately on how to survive, his focus is on winning, to the point where that is how he defines rationality.
Instrumental rationality, on the other hand, is about steering reality--sending the future where you want it to go. It’s the art of choosing actions that lead to outcomes ranked higher in your preferences. I sometimes call this “winning.”
So rationality is about forming true beliefs and making winning decisions.
There are a couple of big things wrong with this definition, to start with, his focus on winning. And before I do anything else, I should clarify why I have such a problem with it. I mean isn’t winning good? Doesn’t winning encompass not losing? Yes and yes. But not all “wins” are equal, at a minimum there’s not just the how of winning, but when you win. It’s pretty easy to win right this second. If you’re a government, you give the masses exactly what they’re clamoring for. For example in Zimbabwe, when Mugabe took most of the land from the white farmers, and gave it to his supporters. If you’re a bank, you can win by giving everyone a mortgage, regardless of their credit, just like the now bankrupt Washington Mutual did. And if you’re a heroin addict you “win” by injecting more heroin. Importantly, all of these decisions fit Yudkowsky’s description of choosing actions “that lead to outcomes ranked higher in [their] preferences.” All of them were winning. And one of the easiest things about winning right this second is that the path is clear. You don’t have to predict the future at all. (This will be important later).
To be clear, the examples above are not meant to be representative of what I think Yudkowsky means when he says that rationality equals winning, but I fear it’s pretty close to the mark of what most people mean by it, and because of this the subtleties that Yudkowsky brings to the debate are lost. Meaning to the extent that people in a position of power listen to him at all, it’s just one more thing that gets interpreted into “Do whatever it was you were going to do already.”
Given that he contributed a couple of chapters to Global Catastrophic Risks, I don’t think Yudkowsky is unaware of the time frame over which winning has to happen, but I also don’t think he pays nearly enough attention to the trade offs which may be required. One thing that Taleb points out is that often, to win at the end, we have to do a lot of losing at the beginning. The point of antifragility is to accept small, manageable losses in order to realize large, dramatic wins. And that conversely taking easy wins in the short term can lead to large, dramatic losses. Meaning that however well intentioned and careful Yudkowsky himself is, that rationality, as he lays it out, could end up generating lots of meaningless short term victories which farther down the road lead to long term catastrophe.
To be clear, I am fine with conceding that rationalists are not so fixated on short term wins, that they are likely to emulate Mugabe, or Washington Mutual, or to shoot up heroin. But in just the last post I covered other, far more subtle ways, in which “winning” turned out to have significant amounts of “losing” attached, but in ways that were difficult to detect, and took a long time to manifest. Where “steering the future” ended up being a lot harder than people thought. All of this is to say that while Yudkowsky and the rationalists want to make everything about winning, I want to make everything about surviving. Because, as long as you’re surviving, you’re still in the game. And being in the game is important because there’s only two ways out of it, by losing or by winning permanently and forever. And guess which is more likely to happen? Thus, as Yudkowsky said, in the game the rationalists are playing it’s more important to not lose than it is to win. But that’s not what the book says.
As an aside, winning the game permanently and forever might be possible, and transhumanists (another group Yudkowsky belongs to) think that just such a win is within their grasp, either through brain uploading, or a superintelligent, friendly AI, or interstellar colonization, or something equally futuristic. And perhaps this is exactly the win we should all be working towards, but I would also argue that, if history is any guide, it’s more likely that the promise of this ultimate victory will lead us to overextend, with potentially disastrous consequences. As examples of the kind of thing I’m talking about, I would offer up all invasions of Russia, most revolutions (but particularly the communist ones) and every villainous plan from every movie as examples of exactly the sort of overreach that happens when you’re in search of a permanent victory.
I said initially that there were two problems. The first is the overreliance on the idea of winning, and the second is the difficulty of knowing what actions actually lead to a “win”, especially the farther you get from the present. As I mentioned “steering reality” is fairly straightforward when applied to the immediate future, less straightforward but still mostly worth attempting at the time horizon of a few years, and mostly impossible when you push much beyond that. Meaning that your choice of which actions to take in order to get your high-preference outcomes are less and less consequential the farther out it gets, and may eventually end up being no better than acting randomly, in terms of bringing about the future you imagine.
And actually, this is giving the “future predicting business” too much credit. It’s easy to say, that since it might at least work initially, even if it eventually ends up being about the same as acting randomly, it’s better than nothing. But in practice, once people are given the power of “steering reality”, and choosing the actions they think will have better long term outcomes, this centralization often leads to far worse outcomes. The list of times this has happened is both extensive and tragic: North Korea, the Irish Potato Famine, the 2007 Financial Crisis, China’s Great Leap Forward, etc.
However, I’m sure the rationalists don’t see it that way, and as a defense against my first criticism, that they are too focused on winning and not focused enough on survival, I am sure that they would point out that Yudkowsky does mention the importance of not losing and further that he is very aware of existential risks, being one of the primary advocates of AI safety. They might also argue that they are more aware of the tradeoffs between short-term winning and long term winning than I am giving them credit for. That Yudkowsky is only one guy, and however voluminous his book, it is only a small part of the canon. (Please feel free to point me to where it is being discussed.) I’m also sure that they could also point out the many mediocre outcomes which might derive from the more minimal, “just don’t lose” standard. All that said, Yudkowsky did have nearly 2400 pages in which to make that case, and to the best of my recollection he didn’t mention this point. Also my perception of the community is that there is far more, “The future is going to be awesome!” Than, “We need to be super careful…”
As to my second criticism, the idea that predicting the future is impossible and that their attempts to steer reality are just as likely to have bad outcomes as good ones. They might answer by pointing to the central role identifying and eliminating biases has in rationality, and the idea that it was exactly these sorts of biases that led to all the tragic examples I offered above. And, given, that they have identified and corrected for these biases that they are less likely to make the same errors. This is almost certainly true, and I feel confident in saying that if Yudkowsky were made dictator for life that we would not have a repeat of The Great Leap Forward, nor would the country turn into North Korea. Though when it comes to the 2007 financial crisis, I am less confident. I think even with Yudkowsky in charge, something very similar still would have happened. In spite of that, they might, very reasonably argue that some steering of reality is better than no steering, particularly if you eliminate biases, which they claim to have done.
This is a reasonable argument, but I am still of the opinion that, in certain key respects, it provides only the illusion of understanding and control. And this is because, as far as I can tell, Bayesian Rationality still suffers from one weakness which is greater than all the others when compared to Talebian Antifragility, it does not take into account that all errors are not equal. There are some things where being wrong matters not at all, and other things where being wrong is the difference between surviving and losing forever.
With everything we’ve covered thus far, it could be argued that Yudkowsky and I are on the same page, he just didn’t get around to saying so specifically, in his 2000 page book. And, yes, you should be picking up some sarcasm here, but I feel entitled to it, because I had to read that same 2000 page book. However, with respect to this latest criticism even that excuse is unavailable, given that, within the book there are several examples of him being more concerned with how wrong something is, while paying very little attention to how much it matters
I want to focus on two particular examples. In the first he devotes an entire section (out of 26 total) to refuting David Chalmer’s philosophical zombie theory. In the second example, he begins another section promising to explain quantum mechanics and then spends most of it railing against the Copenhagen Interpretation. The actual substance of both the original idea and Yudkowsky’s objections are not that important, what’s important is that in both cases the difference between the two positions has no effect on how things actually work, no discernable, experimental differentiation, and except for the tiniest effect on certain, very niche, ideologies, the future envisioned by one side is identical to the future envisions by Yudkowsky. Despite this, in both cases he spent, frankly, a tedious amount of time mounting a thorough refutation. None of this is to say that I disagreed with Yudkowsky’s arguments, in both cases I was certainly convinced, but even if I wasn’t, even if nobody was, what would it have mattered? These may be large errors philosophically, but practically, they’re inconsequential.
In both cases I got the impression that it was far more important to be correct than it was to explore the consequences of being correct. But allow me to give you a more concrete example, one that I used already in my prediction post.
For a complete overview of the argument I would urge you to read that post, but to briefly recap my point. The rational way of predicting the future is to make a clear, easy to check prediction and assign it a confidence level, (as in I’m 90% confident this will happen or I’m 70% confident.) Then once the time specified by the prediction has passed you check to see if it actually happened. If done correctly,, then 90% of your 90% confidence predictions should come to pass, 80% of your 80% confidence and so on. It’s generally better to be underconfident than overconfident, but the ideal with this system is still to match your confidence with reality. It should be said that in general the problem with past methods was not with people being too cautious, but with being too certain.
In any event, I’m reasonably certain this is the sort of prediction Yudkowsky espouses, though it’s yet another thing he doesn’t really get around to in 2000+ pages. (In fact it’s actually surprising how few practical examples there are in the book.) Part of the reason I’m certain, is that the system is very Bayesian in character. The confidence level is your initial/prior probability and as things change you should use the probability implied by the changes to update your prior probabilities and establish new probabilities.
This is a good system, it’s definitely WAY better than the how things worked in the past. Which was for experts to make an outrageous prediction, state it with absolute certainty and be right about as often as dart-throwing chimps. That is a horrible system, and while most of the credit for exposing it and changing it belongs to Philip Tetlock, and the Good Judgement Project, to the extent that the rationalists are pushing people to this methodology that’s a good thing, but there’s a problem, and the problem is, as I pointed out, not all errors are the same. Or to put it another way, outcomes are asymmetrical.
In my previous post, I didn’t have access to Eliezer Yudkowsky’s predictions, but I did have access to the final results of the 2016 predictions of Scott Alexander from SlateStarCodex, which if you know anything about this space, is basically the next best thing. Now, before I get into it, I should mention that I have an enormous amount of respect for Alexander, and this exercise is only possible because he was so rigorous in making his predictions using the methodology I described.
Returning to the predictions, as you can imagine, since it was 2016 he made some predictions about the presidential election. He gave Trump an 80% chance of losing, conditional on his winning the republican nomination, and he gave him a 60% chance of getting that. Which means if you do the math, that he gave Trump an 88% chance of losing. As we all recall, Trump did not lose, but that’s okay, because even if we round up and count this as one of his 90% predictions (which he did not, he treated it as two separate predictions) Alexander got about 90% of his 90% predictions correct, so the system works, and everything is fine right?
Not exactly, because as I pointed out in the original post, the stuff he was wrong about (Trump and Brexit) was far more consequential than the stuff he was right about. Which is to say that being 90% accurate about your 90% predictions doesn’t make the world 90% the way you expected and 10% of the way different, because generally the stuff you’re wrong about has far more impact that then stuff you’re right about. At the political level (which is where Alexander was predicting) our world isn’t 10% Trump, it’s nearly 100% Trump.
Now, I don’t want to give you the impression that Alexander was egregiously wrong, in fact given that he made his prediction at the beginning of 2016, he actually did really well. His prediction matched 538’s for January, and he was much better than the Sam Wang of the Princeton Election Consortium, who gave Hillary a 99% chance of winning, and Wang was a professional forecaster. No, the point I’m trying to get at is that while Bayesian Rationality, as championed by Yudkowsky, is more aware of its mistakes, and while it offers several small, but nevertheless significant improvements to the scientific method, that it still falls victim to the hubris of understanding and predictability.
All of which is to say, that as far as I can tell, while I’m sure there is some difference in Yudkowsky’s framework between an event with a 1% probability which has very little impact (say that the Copenhagen Interpretation turns out to be correct) and a 1% probability which has an enormous impact (say 50+ nukes going off in a war) but given the time he spent in his book on the first vs. the second, whatever that difference might be is not nearly great enough. And this is the critical weakness of Yudkowsky’s Bayesian Rationality when compared to Talebian Antifragility, that within the 2393 pages of his book there is no system for, or even mention of, dealing with the asymmetry between those two examples.
In closing you may feel that I have been too critical of the book, well if that’s so, then you may want to skip the next week or two, because I’m not done. But also, on this subject, I’m critical because this book is already pretty useful, and it comes close enough to being right that criticism actually has a chance of closing the gap, particularly on the subject of asymmetric outcomes and risk. I suspect (perhaps incorrectly) that if Yudkowsky and I sat down that it would be pretty easy to reach a common ground in this area. However next week I have no such hopes because we’re going to be talking about religion, and the strong anti-religious bias of both the book and the larger rationality movement. Though I think you’ll see that two biases are more closely related than you might imagine.
I have no aspirations to steer reality, but if you’d like to help me steer this blog consider donating.