If you prefer to listen rather than read, this blog is available as a podcast here. Or if you want to listen to just this post:
I.
One of my recent posts, Pandemic Uncovers the Limitations of Superforecasting, generated quite a bit of pushback. And given that in-depth debate is always valuable and that this subject, at least for me, is a particularly important one. I thought I’d revisit it, and attempt to further answer some of the objections that were raised the first time around. While also clarifying some points that people misinterpreted or gave insufficient weight to.
To begin with, you might wonder how anybody could be opposed to superforecasting, and what that opposition would be based on. Isn’t any effort to improve forecasting obviously a good thing? Well for me it’s an issue of survival and existential risk. And while questions of survival are muddier in the modern world than they were historically, I would hope that everyone would at least agree that it’s an area that requires extreme care and significant vigilance. That even if you are inclined to disagree with me, that questions of survival call for maximum scrutiny. Given that we’ve already survived the past, most of our potential difficulties lie in the future, and it would be easy to assume that being able to predict that future would go a long way towards helping us survive it, but that is where I and the superforecasters part company, and the crux of the argument.
Fortunately or unfortunately as the case may be, we are at this very moment undergoing a catastrophe, a catastrophe which at one point lay in the future, but not any more. A catastrophe we now wish our past selves and governments had done a better job preparing for. And here we come to the first issue: preparedness is different than prediction. An eventual pandemic was predicted about as well as anything could have been, prediction was not the problem. A point Alex Tabarrok made recently on Marginal Revolution:
The Coronavirus Pandemic may be the most warned about event in human history. Surprisingly, we even did something about it. President George W. Bush started a pandemic preparation plan and so did Governor Arnold Schwarzenegger in CA but in both cases when a pandemic didn’t happen in the next several years those plans withered away. We ignored the important in favor of the urgent.
It is evident that the US government finds it difficult to invest in long-term projects, perhaps especially in preparing for small probability events with very large costs. Pandemic preparation is exactly one such project. How can we improve the chances that we are better prepared next time?
My argument is that we need to be looking for the methodology that best addresses this question, and not merely how we can be better prepared for pandemics, but better prepared for all rare, high impact events.
Another term for such events is “black swans”, after the book by Nassim Nicholas Taleb, Which is the term I’ll be using going forward. (Though, Taleb himself would say that, at best, this is a grey swan, given how inevitable it was.) Tabarrok’s point, and mine, is that we need a methodology that best prepares us for black swans, and I would submit that superforecasting, despite its many successes, is not that method. And in fact it may play directly into some of the weaknesses of modernity that encourage black swans, and rather than helping to prepare for such events, superforecasting may in fact discourage such preparedness.
What are these weaknesses I’m talking about? Tabarrok touched on them when he noted that, “It is evident that the US government finds it difficult to invest in long-term projects, perhaps especially in preparing for small probability events with very large costs.” Why is this? Why were the US and California plans abandoned after only a few years? Because the modern world is built around the idea of continually increasing efficiency. And the problem is that there is a significant correlation between efficiency and fragility. A fragility which is manifested by this very lack of preparedness.
One of the posts leading up to the one where I criticized superforecasting was built around exactly this point, and related the story of how 3M considered maintaining a surge capacity for masks in the wake of SARS, but it was quickly apparent that such a move would be less efficient, and consequently worse for them and their stock price. The drive for efficiency led to them being less prepared, and I would submit that it’s this same drive that led to the “withering away” of the US and California pandemic plans.
So how does superforecasting play into this? Well, how does anyone decide where gains in efficiency can be realized or conversely where they need to be more cautious? By forecasting. And if a company or a state hires the Good Judgement Project to tell them what the chances are of a pandemic in the next five years and GJP comes back with the number 5% (i.e. an essentially accurate prediction) are those states and companies going to use that small percentage to justify continuing their pandemic preparedness or are they going to use it to justify cutting it? I would assume the answer to that question is obvious, but if you disagree then I would ask you to recall that companies almost always have a significantly greater focus on maximizing efficiency/profit, than on preparing for “small probability events with very large costs”.
Accordingly the first issue I have with superforecasting is that it can be (and almost certainly is) used as a tool for increasing efficiency, which is basically the same as increasing fragility. That rather than being used as a tool for determining which things we should prepare for it’s used as an excuse to avoid preparing for black swans, including the one we’re in the middle of. It is by no means the only tool being used to avoid such preparedness, but that doesn’t let it off the hook.
Now I understand that the link between fragility and efficiency is not going to be as obvious to everyone as it is to me, and if you’re having trouble making the connection I would urge you to read Antifragile by Taleb, or at least the post I already mentioned. Also, even if you find the link tenuous I would hope that you would keep reading because not only are there more issues but some of them may serve to make the connection clearer.
II.
If my previous objection represented my only problem with superforecasting then I would probably agree with people who say that as a discipline it is still, on net, beneficial. But beyond providing a tool that states and companies can use to justify ignoring potential black swans superforecasting is also less likely to consider the probability of such events in the first place.
When I mentioned this point in my previous post, the people who disagreed with me had two responses. First they pointed out that the people making the forecasts had no input on the questions they were being asked to make forecasts on and consequently no ability to be selective about the predictions they were making. Second, and more broadly they claimed that I needed to do more research and that my assertions were not founded in a true understanding of how superforecasting worked.
In an effort to kill two birds with one stone, since that last post I have read Superforecasting: The Art and Science of Prediction by Phillip Tetlock and Dan Gardner. Which I have to assume comes as close to being the bible of superforecasting as anything. Obviously, like anyone, I’m going to suffer from confirmation bias, and I would urge you to take that into account when I offer my opinion on the book. With that caveat in place, here, from the book, is the first commandment of superforecasting:
1) Triage
Focus on questions where your hard work is likely to pay off. Don’t waste time either on easy “clocklike” questions (where simple rules of thumb can get you close to the right answer) or on impenetrable “cloud-like” questions (where even fancy statistical models can’t beat the dart-throwing chimp). Concentrate on questions in the Goldilocks zone of difficulty, where effort pays off the most.
For instance, “Who will win the presidential election twelve years out, in 2028?” is impossible to forecast now. Don’t even try. Could you have predicted in 1940 the winner of the election, twelve years out, in 1952? If you think you could have known it would be a then-unknown colonel in the United States Army, Dwight Eisenhower, you may be afflicted by one of the worst cases of hindsight bias ever documented by psychologists.
The question which should immediately occur to everyone: are black swans more likely to be in or out the Goldilocks zone? It would seem that, almost by definition, they’re going to be outside of this zone. Also, just based on the book’s description of the zone and all the questions I’ve seen both in the book and elsewhere, it would seem clear they’re outside of the zone. Which is to say that even if such predictions are not misused, they’re unlikely to be made in the first place.
All of this would appear to heavily incline superforecasting towards the streetlight effect, where the old drunk looks for his keys under the streetlight, not because that’s where he lost them, but because that’s where the light is the best. Now to be fair, it’s not a perfect analogy. With respect to superforecasting there are actually lots of useful keys under the streetlight, and the superforecasters are very good at finding them. But based on everything I have already said, it would appear that all of the really important keys are out there in the dark, and as long as superforecasters are finding keys under the streetlight what inducement do they have to venture out into the shadows looking for keys? No one is arguing that the superforecasters aren’t good, but this is one of those cases where the good is the enemy of the best. Or more precisely it makes the uncommon the enemy of the rare.
It would be appropriate to ask at this point, if superforecasting is good, then what is “best”, and I intend to dedicate a whole section to that topic before this post is over, but for the moment I’d like to direct your attention to Toby Ord, and his recent book The Precipice: Existential Risk and the Future of Humanity, which I recently finished. (I’ll have a review of it in my month end round up.) Ord is primarily concerned with existential risks, risks which could wipe out all of humanity. Or to put it another way the biggest and blackest swans. A comparison of his methodology with the methodology of superforecasting might be instructive.
Oord spends a significant portion of the book talking about pandemics. On his list of eight anthropogenic risks, pandemics take up 25% of the spots (natural pandemics get one spot and artificial pandemics get the other). On the other hand, if one were to compile all of the forecasts made by the Good Judgement Project since the beginning, what percentage of them would be related to potential pandemics? I’d be very much surprised if it wasn’t significantly less than 1%. While such measures are crude, one method pays a lot more attention than the other, and in any accounting of why we weren’t prepared for the pandemic, a lack of attention would certainly have to be high on the list.
Then there are Oord’s numbers. He provides odds that various existential risks will wipe us all out in the next 100 years. The odds he gives for that happening with a naturally arising pandemic are 1 in 10,000, the odds for an engineered pandemic are 1 in 30. The foundation of superforecasting is the idea that we should grade people’s predictions. How does one grade predictions of existential risk? Clearly compiling a track record would be impossible, they’re essentially unfalsifiable, and beyond all that they’re well outside the Goldilocks zone. Personally I’d almost rather that Oord didn’t give odds and just spent his time screaming, “BE VERY, VERY AFRAID!” But he doesn’t, he provides odds and hopes that by providing numbers people will take him more seriously than if he just yells.
From all this you might still be unclear why Oord is better than the superforecasters. It’s because our world is defined by black swan events, and we are currently living out an example of that: our current world is overwhelmingly defined by the pandemic. If you were to selectively remove knowledge of just it from someone trying to understand the world absolutely nothing would make sense. Everyone understands this when we’re talking about the present, but it also applies to all past forecasting we engaged in. 99% of all superforecasting predictions lent nothing to our understanding of this moment, but 25% of Oord’s did. Which is more important: getting our 80% predictions about uncommon events to 95% or gaining any awareness, no matter how small, of a rare event which will end up dominating the entire world?
III.
At their core all of the foregoing complaints boil down to the idea that the methodology of superforecasting fails to take into account impact. The impact of not having extra mask capacity if a pandemic arrives. The impact of keeping to the Goldilocks zone and overlooking black swans. The impact of being wrong vs. the impact of being right.
When I made this claim in the previous post, once again several people accused me of not doing my research. As I mentioned, since then I have read the canonical book on the subject, and I still didn’t come across anything that really spoke to this complaint. To be clear, Tetlock does mention Taleb’s objections, and I’ll get to that momentarily, but I’m actually starting to get the feeling that neither the people who had issues with the last point, nor Tetlock himself really grasp this point, though there’s a decent chance I’m the one who’s missing something. Which is another point I’ll get to before the end. But first I recently encountered an example I think might be useful.
The movie Molly’s Game is about a series of illegal poker games run by Molly Bloom. The first set of games she runs is dominated by Player X, who encourages Molly to bring in fishes, bad players with lots of money. Accordingly, Molly is confused when Tobey Mcquire, Player X brings in Harlan Eustice, who ends up being a very skillful player. That is until one night when Eustice loses a hand to the worst player at the table. This sets him off, changing him from a calm and skillful player, into a compulsive and horrible player, and by the end of the night he’s down $1.2 million.
Let’s put some numbers on things and say that 99% of the time Eustice is conservative and successful and he mostly wins. That on average, conservative Eustice ends the night up by $10k. But, 1% of the time, Eustice is compulsive and horrible, and during those times he loses $1.2 million. And so our question is should he play poker at all? (And should Player X want him at the same table he’s at?) The math is straightforward, his expected return over 100 average games is -$210k. It would seem clear that the answer is “No, he shouldn’t play poker.”
But superforecasting doesn’t deal with the question of whether someone should “play poker” it works by considering a single question, answering that question and assigning a confidence level to the answer. So in this case they would be asked the question, “Will Harlan Eustice win money at poker tonight?” To which they would say, “Yes, he will, and my confidence level in that prediction is 99%.” That prediction is in fact accurate, and would result in a fantastic Brier score (the grading system for superforecasters), but by repeatedly following that advice Eustice eventually ends up destitute.
This is what I mean by impact, and why I’m concerned about the potential black swan blindness of superforecasting. When things depart from the status quo, when Eustice loses money, it’s often so dramatic that it overwhelms all of the times when things went according to expectations. That the smartest behavior for Eustice, the recommended behavior, should be to never play poker regardless of the fact that 99% of the time he makes thousands of dollars an hour. Furthermore this example illustrates some subtleties of forecasting which often get overlooked:
- If it’s a weekly poker game you might expect the 1% outcome to pop up every two years, but it could easily take five years, even if you keep the probability the same. And if the probability is off by even a little bit (small probabilities are notoriously hard to assess) it could take even longer to see. Which is to say that forecasting during that time would result in continually increasing confidence, and greater and greater black swan blindness.
- The benefits of wins are straightforward and easy to quantify. But the damage associated with the one big loss is a lot more complicated and may carry all manner of second order effects. Harlan may go bankrupt, get divorced, or even have his legs broken by the mafia. All of which is to say that the -$210k expected reward is the best outcome. Bad things are generally worse than expected. (For example it’s been noted that even though people foresaw a potential pandemic, plans almost never touched on the economic disruption which would attend it, which ended up being the biggest factor of all.)
Unless you’re Eustice, you may not care about the above example, or you may think that it’s contrived, but in the realm of politics this sort of bet is fairly common. As an example cast your mind back to the Cuban Missile Crisis. Imagine that in addition to his advisors, that at that time Kennedy also could draw on the Good Judgement Project and superforecasting. Further imagine that the GJP comes back with the prediction that if we blockade Cuba that the Russians will back down, a prediction they’re 95% confident of. Let’s further imagine that they called the odds perfectly. In that case, should the US have proceeded with the blockade? Or should we have backed down and let the USSR base missiles in Cuba? When you just look at that 95% the answer seems obvious. But shouldn’t some allowance be made for the fact that the remaining 5% contains the possibility of all out nuclear war?
As near as I can tell, that part isn’t explored very well by superforecasting. Generally they get a question, they provide the answer and assign a confidence level to that answer. There’s no methodology for saying that despite the 95% probability that such gambles are bad ideas because if we make enough of them eventually we’ll “go bust”. None of this is to say that we should have given up and submitted to Soviet domination because it’s better than a full on nuclear exchange. (Though there were certainly people who felt that way.) More that it was a complicated question with no great answer (though it might have been a good idea for the US to not to put missiles in Turkey.) But by providing a simple answer with a confidence level of 95% superforecasting gives decision makers every incentive to substitute the true, and very difficult questions of nuclear diplomacy with the easy question of whether to blockade. That rather than considering the difficult and long term question of whether Eustice should gamble at all, we’re substituting the easier question of just whether he should play poker tonight.
In the end I don’t see any bright line between a superforecaster saying there’s a 95% chance the Cuban Missile Crisis will end peacefully if we blockade, or a 99% chance Eustice will win money if he plays poker tonight, and those statements being turned into a recommendation for taking those actions, when in reality both may turn out to be very bad ideas.
IV.
All of the foregoing is an essentially Talebian critique of superforecasting, and as I mentioned earlier, Tetlock is aware of this critique. In fact he calls it, “the strongest challenge to the notion of superforecasting.” And in the final analysis it may be that we differ merely in whether that challenge can be overcome or not. Tetlock thinks it can, I have serious doubts, particularly if the people using the forecasts are unaware of the issues I’ve raised.
Frequently people confronted with Taleb’s ideas of extreme events and black swans end up countering that we can’t possibly prepare for all potential catastrophes. Tetlock is one of those people and he goes on to say that even if we can’t prepare for everything that we should still prepare for a lot of things, but that means we need to establish priorities, which takes us back to making forecasts in order to inform those priorities. I have a couple of responses to this.
- It is not at all clear that the forecasts one would make about which black swans to be most worried about follow naturally from superforecasting. It’s likely that superforecasting with its emphasis on accuracy and making predictions in the Goldilocks zone systematically draws attention away from rare impactful events. Oord makes forecasts, but his emphasis is on identifying these events rather making sure the odds he provides are accurate.
- I think that people overestimate the cost of preparedness and how much preparing for one thing, makes you prepared for lots of things. One of my favorite quotes from Taleb illustrates the point:
If you have extra cash in the bank (in addition to stockpiles of tradable goods such as cans of Spam and hummus and gold bars in the basement), you don’t need to know with precision which event will cause potential difficulties. It could be a war, a revolution, an earthquake, a recession, an epidemic, a terrorist attack, the secession of the state of New Jersey, anything—you do not need to predict much, unlike those who are in the opposite situation, namely, in debt. Those, because of their fragility, need to predict with more, a lot more, accuracy.
As Taleb points out stockpiling reserves of necessities blunts the impact of most crises. Not only that, but even preparation for rare events ends up being pretty cheap when compared to what we’re willing to spend once the crisis hits. As I pointed out in a previous post, we seem to be willing to spend trillions of dollars once the crisis hits, but we won’t spend a few million to prepare for crises in advance.
Of course as I pointed at at the beginning having reserves is not something the modern world is great at. Because reserves are not efficient. Which is why the modern world is generally on the other side of Taleb’s statement, in debt and trying to ensure/increase the accuracy of their predictions. Does this last part not exactly describe the goal of superforecasting? I’m not saying it can’t be used in service of identifying what things to hold in reserve or what rare events to prepare for I’m saying that it will be used far more often in the opposite way, in a quest for additional efficiencies and as a consequence greater fragility.
Another criticism people had about the last episode was that it lacked recommendations for what to do instead. I’m not sure that lack was as great as some people said, but still, I could have done better. And the foregoing illustrates what I would do differently. As Tabarrok said at the beginning, “The Coronavirus Pandemic may be the most warned about event in human history.” And yet if we just consider masks our preparedness in terms of supplies and even knowledge was abysmal. We need more reserves, we need to select areas to be more robust and less efficient in, we need to identify black swans, and once we have, we should have credible long term plans for dealing with them which aren’t scrapped every couple of years. Perhaps there is some place for superforecasting in there, but that certainly doesn’t seem like where you would start.
Beyond that, there are always proposals for market based solutions. In fact the top comment on the reddit discussion of the previous article was, “Most of these criticisms are valid, but are solved by having markets.” I am definitely also in favor of this solution as well, but there’s a lot of things to consider in order for it to actually work. A few examples off the top of my head:
- What’s the market based solution to the Cuban Missile Crisis? How would we have used markets to navigate the Cold War with less risk? Perhaps a system where we offer prizes for people predicting crises in advance. So maybe if someone took the time to extensively research the “Russia puts missiles in Cuba” scenario, when that actually happens they gets a big reward?
- Of course there are prediction markets, which seems to be exactly what this situation calls for, but personally I’m not clear how they capture impact problem mentioned above, also they’re still missing more big calls than they should. Obviously part of the problem is that overregulation has rendered them far less useful than they could be, and I would certainly be in favor of getting rid of most if not all of those regulations.
- If you want the markets to reward someone for predicting a rare event, the easiest way to do that is to let them realize extreme profits when the event happens. Unfortunately we call that price gouging and most people are against it.
The final solution I’ll offer is the solution we already had. The solution superforecasting starts off by criticizing. Loud pundits making improbable and extreme predictions. This solution was included in the last post, but people may not have thought I was serious. I am. There were a lot of individuals who freaked out every time there was a new disease outbreak, whether it was Ebola, SARS or Swine Flu. And not only were they some of the best people to listen to when the current crisis started, we should have been listening to them even before that about the kind of things to prepare for. And yes we get back to the idea that you can’t act on the recommendations of every pundit making extreme predictions, but they nevertheless provide a valuable signal about the kind of things we should prepare for, a signal which superforecasting rather than boosting actively works to suppress.
None of the above directly replaces superforecasting, but all of them end up in tension with it, and that’s the problem.
V.
It is my hope that I did a better job of pointing out the issues with superforecasting on this second go around. Which is not to say the first post was terrible, but I could have done some things better. And if you’ll indulge me a bit longer (and I realize if you’ve made it this far you have already indulged me a lot) a behind the scenes discussion might be interesting.
It’s difficult to produce content for any length of time without wanting someone to see it, and so while ideally I would focus on writing things that pleased me, with no regard for any other audience, one can’t help but try the occasional experiment in increasing eyeballs. The previous superforecasting post was just such an experiment, in fact it was two experiments.
The first experiment was one of title selection. Should you bother to do any research into internet marketing they will tell you that choosing your title is key. Accordingly, while it has since been changed to “limitations” the original title of the post was “Pandemic Uncovers the Ridiculousness of Superforecasting”. I was not entirely comfortable with the word “ridiculousness” but I decided to experiment with a more provocative word to see if it made any difference. And I’d have to say that it did. In their criticism of it, a lot of people mentioned that world or the attitude implied in the title in general. But it also seemed that more people read it in the first place because of the title. Leading to the perpetual conundrum: saying superforecasting is ridiculous was obviously going too far, but would the post have attracted fewer readers without that word? If we assume that the body of the post was worthwhile (which I do, or I wouldn’t have written it) is it acceptable to use a provocative title to get people to read something? Obviously the answer for the vast majority of the internet is a resounding yes, but I’m still not sure, and in any case I ended up changing it later.
The second experiment was less dramatic, and one that I conduct with most of my posts. While writing them I imagine an intended audience. In this case the intended audience was fans of Nassim Nicholas Taleb, in particular people I had met while at his Real World Risk Institute back in February. (By the way, they loved it.) It was only afterwards, when I posted it as a link in a comment on the Slate Star Codex reddit that it got significant attention from other people, who came to the post without some of the background values and assumptions of the audience I’d intended for. This meant that some of the things I could gloss over when talking to Taleb fans were major points of contention with SSC readers. This issue is less binary than the last one, and other than writing really long posts it’s not clear what to do about it, but it is an area that I hope I’ve improved on in this post, and which I’ll definitely focus on in the future.
In any event the back and forth was useful, and I hope that I’ve made some impact on people’s opinions on this topic. Certainly my own position has become more nuanced. That said if you still think there’s something I’m missing, some post I should read or video I should watch please leave it in the comments. I promise I will read/listen/watch it and report back.
Things like this remind me of the importance of debate, of the grand conversation we’re all involved in. Thanks for letting me be part of it. If you would go so far as to say that I’m an important part of it consider donating. Even $1/month is surprisingly inspirational.
Part I:
I think your objection here is not against forecasting per se, but rather about a gross misunderstanding of statistics. I’m reminded of a recent book review on SSC (which you talk about later in this post) where existential risks were considered and the probability in the next year was estimated to be somewhat low. Of course, any statistically-minded individual (and Scott Alexander’s audience) viewed that low number as alarming – as did the author himself – because we understand compound risk. Your complaint appears to be that people are not good at compound risk. And while that isn’t always true, in general that is – leading lots of people to come to the wrong conclusions when they consider questions of compound risk.
For example, if there’s a 12% probability of a 6.0 or greater Earthquake this year in the city where I live, most people massively misunderstand what this means. They go five years without an Earthquake and think the unrealized should all be added to this year, saying there’s a 72% probability of an Earthquake this year; or in shorthand, “we’re due for a major Earthquake soon”. That’s not how it works, though. This year there’s a 12% probability, regardless of how many years it has been since the last ‘big one’. So past observations don’t change future probabilities. It doesn’t “build up”.
You try to teach that principle, and people assume, “if it doesn’t build up, that means over the next ten years there’s only a 12% probability of this big event.” Again no. Over the next ten years the probability of an Earthquake is pretty high, but it doesn’t change next year if last year the event didn’t happen. Likewise if the event DID happen. It’s a random throw of the dice every time. (For Earthquakes this appears to be true. If there’s a 2% probability in year 1 and then there’s an Earthquake, the evidence suggests that in year 2 there’s still a 2% chance of another quake of that magnitude. I.e. big Earthquakes can happen in consecutive years.)
Thus, the problem isn’t with the method itself, but in the interpretation of statistics and cumulative risk.
Part II:
I think it’s fair to claim that superforecasting has been applied too broadly – outside fields where it is useful. I think it is inaccurate to say that superforecasting is actively harmful of itself. The place it is harmful is where we use it to try and achieve false insights because we applied it in the wrong fields. That is right in line with the quote you gave from the superforecasting book. Maybe this is the point you’re trying to make, but it comes across as “all superforecasting is actively harmful”, which I don’t feel like you demonstrated.
As to the existential risk of a naturally-occurring pandemic (or even an engineered one), I think the 1:10,000 number is both entirely made up and probably way too high. It’s really hard to kill off an entire species as diverse as humanity with a pandemic. It may feel really bad right now, but this is honestly one of the mildest pandemics to become a big deal in the last millennium. As to its impact in human existential risk, it has a low mortality rate among children and child-bearing populations. Therefore, I’d say the book about X-risk is not a prescient lens with which to view the present moment. The present moment should be asking how normal society can cope with a pandemic that is highly lethal to a small sub-population. Isolation of sub-populations is very difficult in the modern environment, such that the risk of devastation among that population remains high despite best efforts.
An analysis of existential risk has nothing useful to say about that problem. Indeed, it would be more correct to say that the pandemic helps us better understand the importance of preparing for more severe pandemics that could still happen at any time (or if you prefer, it points to the idea that a pandemic could become an existential risk; but I maintain that would be a misleading lesson to take from the current crisis as pandemics aren’t good at that kind of thing).
Part 3:
I feel like this is the same lesson as in part 1, but with specific examples. It’s a question of when the superforecasting is useful versus when it isn’t. Say Eustice is playing tonight. Do I want to be at his table? No! There’s a 99% chance I’ll lose money to him. Maybe I’ll get lucky and be there when he loses big, but the chances of that are slim, and when he starts losing he’ll be bleeding out to the other players at the table as much as to me. Maybe the best I can hope for is a 20% cut 1% of the time. But maybe I don’t have a couple hundred thousand dollars to lose before I begin winning. Also, the probability isn’t cumulative. Meaning that if he went a month without blowing up, that doesn’t mean he has a higher chance of it tonight (as you pointed out in the post). He still has that same 1% chance. So if the superforecasting is about advising me, maybe it’s useful. Meanwhile, if the superforecasting is advising Eustice it’s a bad model.
The question is whether the model is appropriate to the situation. The answer: “Are there fat tail events that could throw this prediction off in a catastrophic way?” Answer that question, and you answer whether the prediction algorithm will be useful.
Also, there’s a real-world analogue of the Cuban Missile Crisis example, and it shows how forecasting helps us make BETTER decisions, not worse ones. People often ask why we don’t eliminate nuclear waste by shooting it off into space, perhaps to burn up in the sun. Our star has a strong gravity well, so shooting waste off there should be pretty fool-proof, right?
NASA has considered this plan and rejected it. Why? Because they could guarantee with something like a 99% probability that the rocket used to launch the waste would NOT explode during launch and spread fissile material throughout a large area (potentially catastrophic if it occurred in the stratosphere). They said that risk is too high even for one launch, let alone the fact that multiple launches would be required to dispose of nuclear waste (representing compound risk), and as such the idea was never pursued further. That seems to be exactly the kind of model you criticize as a bad fit when you talk about superforecasting, but this panel of experts used the forecasting evidence they had and made exactly the right kind of decision from it. Maybe NASA are better at making statistics-based decisions. The problem isn’t that superforecasting is inherently useless; the problem is that it is being used in situations it was never designed to work in, which seems to be your main concern based on the arguments you present (even if the tone suggests the concern is wider than that). So perhaps the question should be, “Who is going to be making decisions based on the forecasting model being used?” If they are capable of understanding statistical nuance, and applying models appropriately, then we shouldn’t be concerned about the use of superforecasting any more than other models. It’s a tool in our belt, and if we’re experienced statistical craftsmen we know when to use it and when it’ll destroy the thing we’re trying to build instead of helping.
Part 4:
I think we are in agreement that superforecasting should be used judiciously as a tool in the larger context of decision-making. You correctly emphasize that too much engineering for efficiency is going on. I think you’re overplaying the problem as arising from use of the tool itself, when the real problem isn’t the tool we use but the underlying goal we’re trying to achieve. Efficiency at the razor’s edge is our enemy. It makes even a mild crisis worse, and potentially exposes us to existential risks we would otherwise live through. Understanding that fact would naturally lead to a better application of judgement as priorities shift, without having to demonize superforecasting to the degree you’re approaching in this post.
On solutions:
1. The CMC was a government-based decision, so I think it’s inappropriate to ask what the market-based solution should be. Unless we’re all die-hard disciples of Ayn Rand, we should be able to agree that certain decisions shouldn’t be market-driven.
2. Prediction markets are inherently ill equipped to answer fat tail questions, and should not be relied upon to tell you anything about the probability of a tail event.
3. Isn’t ‘realiz[ing] extreme profits when the even happens’ the story of The Big Short, as well as Taleb’s entire strategy for investing (how he made his millions in the first place)? The problem with this approach is that it’s not a good signal. The people who made a killing off of the housing crisis did so by trying to keep it a secret for as long as possible. You make money on fat tail events by being discreet about it.
I think the pundit proposal is less useful, as it is at least as prone to bias from the ill-informed as the superforecasting model. At best it produces hypotheses to be explored, but at worst it delves us back into the vicious cycle of over-engineering life. I think a better solution will be found as we back away from the razor’s edge of efficiency, knowing it will cut us if we try to ride it. Robust policies that increase our ability to absorb risk along multiple dimensions (economic, medical capabilities, communications, food sourcing, supply chain management, etc.) are really the only way to prepare for risks in the Unknown Unknowns category. The only thing we know about that category is that it’s out there. No amount of punditry or prediction will tell us more than that. And if that’s our focus in preparing for the next crisis, superforecasting isn’t going to get in the way. Because it will be obvious it isn’t the right tool for the job.
Part 1- Sure, we agree on the blindness to compounding risk, but the question I have and anyone advocating for superforecasting is does the methodology serve to illuminate this blindness or compound it? And my argument is that it’s the latter.
Part 2- Yes, I probably should clarify that superforecasting isn’t bad in all forms and in all use cases. It’s a tool just like most technologies, but I think it’s a tool that emphasizes the uncommon over the rare, which we might say puts it in the “upper quartile” of abusability.
Part 3- As far as being at the table. I guess it wasn’t clear that Player X was always at the table, and also the table was generally 6-8 people (from what I could see) so the difference between having 5-7 fishes and 4-6 fishes is not huge, particularly if the one “non-fish” (Eustice) was going to blow up eventually and you were guaranteed to be there because you were always there.
Taleb would say, and I mostly agree, that in the modern world there are almost always “fat tail events that could throw this prediction off in a catastrophic way” which is why I am convinced that superforecasting is far more likely to be misused than used correctly.
As far as the nuclear waste analogy, for better or worse (I would actually say worse, since it’s fatally hamstrung nuclear power) we have no problem identifying black swans around nuclear waste, and may sense, though I don’t immediately have the data to back it up is that this bias was more responsible for the correct decision than a sober assessment of risk (though that helped).
As far as being able to use statistical methodologies and models correctly, you yourself have pointed out how rare that is, that people are constantly misusing them, and part of my argument is that misusing superforecasting could end up being particularly catastrophic.
Part 4- “Efficiency at the razor’s edge is our enemy” is a great line, I’m stealing it. The phrase I was thinking of was “a fetishization of accuracy” which I think probably leads to the razor’s edge you mention.
As far as realizing profits when the event happens. The big short and options are people taking advantage of the financial system, I’m talking more about what incentives might we give 3M to have surge capacity in N95 masks, extreme profit is definitely one such incentive, and not only is their little incentive to keep it secret (maybe the extreme profit part of it) but they’re lessening the crisis by providing something which mitigates it.
“As far as the nuclear waste analogy, for better or worse (I would actually say worse, since it’s fatally hamstrung nuclear power) we have no problem identifying black swans around nuclear waste, ”
Nuclear power is not hamstrung for either waste or environmentalists. It’s hamstrung by utility companies.
Utility companies set their rates by their capital stock. More capital more rates.
Hence generation plants are huge multibillion dollar monsters. Takes a lot of money, lot of permits and markets have to hope electric rates stay high for decades after in order to make the investment pay off. Electric rates have gone down so being tied up in such a monster is not appealing to investors.
Huge bespoke plants that are unique take a long time to approve. What hasn’t happened is a real effort to build a standardized reactor whose design could be approved and then deployed to customers that buy it.
Sorry, I read that profit part in a different way than you intended. Not sure which incentives you’d use, but price gouging looks like it’s not a very palatable one. The problem is that while it works really well from the standpoint of pure economic theory, it tends to fail in the real world due to general sensibilities of fairness. I suspect this is partly due to the fact that in a crisis people don’t start off on an equal economic footing, so the potential for abuse, hoarding, and general frustration remains.
*****
I think it’s a herring to expand the discussion more broadly to the building of new nuclear power plants. The narrow point was that there is a viable proposed solution to the problem of nuclear waste (shoot it into the sun). A statistical prediction that gives overwhelming emphasis to the safe execution of the problem is presented. And there is a fat tail that only becomes apparent with compound risk. This is a solid counter-example to the claim that this kind of analysis inexorably leads people to make the wrong decision. Because in this case the only way to come to the right conclusion was to do the analysis and make the assessment based on compounding risk.
I’m not saying people always make the right decision. Actually I agree that people often fool themselves with this kind of analysis. But I do think it matters who is making that analysis.
*****
I think we have different approaches to the same problem here, and I don’t think either of our approaches is inherently wrong or (to be honest) particularly effective. In my venue I frequently point out, “This is a statistical tool everyone is using wrong. This is why they’re using it wrong. Don’t be fooled into using it wrong. Use it right instead.” If I’m reading it right, you’re skeptical that the mass of humanity will learn to use many of these statistical tools correctly, including superforecasting.
I’m inclined to agree with you on that point. It is unlikely that people will shape up and stop abusing statistics in general and tools like superforecasting in particular. Many scientists who should know better still rely heavily on subgroup analyses they did not pre-specify. I’ve personally observed them presenting their data at major academic conferences, and then when called on it they look confused as to why their analysis isn’t valid. It’s a constant battle teaching the next generation of SCIENTISTS this message, so of course teaching the general public is significantly more difficult. It’s reasonable to assume we’ll never get a majority of the public to understand how to avoid most statistical traps.
I think it’s also reasonable to assume we’ll never get people to stop using most statistical tricks and traps. The more of them we discover, the more people manipulate data (on purpose or through ignorance) and then use that data to sway the public. I’m not sure there’s a good way to combat that tactic writ large. On a smaller scale, educating the public works to help some people avoid this kind of trap. To the extent they then call out the practice it helps combat trickery for those who don’t understand how they’re being statistricked. This includes those who are statistricking themselves. But education alone is not a general solution to the trickery. At best it helps at the margins.
I don’t think we’ll ever get people to stop abusing statistical methods, and that extends to superforecasting. Perhaps we can promote healthy norms about these methods; ideas people absorb without having to understand why. For example we might say, “Sure superforecasting is useful for some questions, like ‘who will win the NBA Playoffs this year?’ But some questions can’t be superforecast by mere mortals; like, ‘Will a pandemic cancel the playoffs this year?’ It’s important to know whether you’ve picked the right superforecaster. Sometimes the only good superforecaster for an event is God.”
I’m not seeing efficiency as the issue here. A mask stockpile is trivially cheap to maintain. In terms of efficiency, there’s not much savings to be had eliminating it if some oracle told us that we could trust another pandemic won’t happen for a few hundred years. And to be honest in three months we went from almost no masks to everyone having a mask. Capacity to make masks does not require redoing global supply chains or reshaping the economy. The reason we didn’t have a stockpile of masks is momentum. It requires someone every year to go into the mask stockpile, throw out the expired masks and order new masks to replace them.
One solution is the rolling stockpile. You say all hospitals have to place their orders to the stockpile. The stockpile sends out the oldest masks first and orders new ones every time a new order comes in. Then there’s always a year’s worth of masks on hand. The stockpile is maintained, hospitals want good equipment so they will raise a ruckus if the stockpile sends them dusty, moldy masks. Mask makers want orders, they want the stockpile system to continue.
Another solution is to switch into a mask wearing culture, which I’m a bit surprised we seem to have done. If almost everyone is going to keep a box of N95’s at home, there’s never going to be a shortage again.
In the examples you gave, it seems standard statistical analysis is fine to use. The concept of expected value says you can win 99% of the time but if 1% of the time your losses are big enough, it makes sense to treat it as a bad investment. On the flip side, it becomes rational to buy a lottery ticket if the pot gets big enough. In the case of the Cuban Missile crises the loss could be infinity, which often causes the gears of math to blow up.
In the Cuban case, game theory comes into play and you have to consider just how complex the odds are. They are probably more like:
If we stand our ground: 95% chance USSR backs down 4% chance conflict 1% chance full on nuclear war game over.
If we back down: 30% chance USSR also backs down. 69.9% chance USSR backs down but leaves missiles in Cuba. 0.1% nuclear war anyway.
Not facing off over Russia could then be evaluated as having a large chance of working but a small chance of totally losing. However, backing down solves the problem in the immediate term but increases the chance of future confrontations with the USSR, some of which include nuclear war even if the US always backed off because sometimes crap just happens in a standoff that makes no sense.
Prediction Markets – I’m fine with them but they are unlikely to work because what you really need to understand is the dynamics of multiple rounds of play with the USSR. Is allowing them a victory in Cuba the same as Vietnam? How does round 1’s outcome impact round 2? A prediction market probably just gets you to the first probability matrix which hints you should back down but the full model might say something like back down in Turkey and Vietnam but not in Cuba. That plus as 2016 demonstrated what happens when Russian assets start pouring money into the prediction markets?
Well we already know. During 2016 there was a chorus of people claiming a Hillary victory would mean WWIII with Russia because she opposed the Assad regime in Syria. We know now that most of that voice was amplified Russian bots and trolls. Why wouldn’t a prediction market also turn into a way for those impacted by the decisions to shape the probabilities given by the market?
Here is what I think is really missing from our pandemic response. We simply did not bite the bullet and act fast enough and strong enough. A shutdown 2-3 weeks sooner could have spared us 90% of our deaths and contained the disease quickly. We would have opened up sooner and while there would have been some economic impact it wouldn’t have been much. We simply lack leadership while other countries don’t. This cannot be fixed by better spreadsheets. Better spreadsheets can only be a partial substitute at best.
“If you want the markets to reward someone for predicting a rare event, the easiest way to do that is to let them realize extreme profits when the event happens. Unfortunately we call that price gouging and most people are against it. ”
Ok pet peeve here. There is no law against price gouging. There are a few laws in a few states but they mostly apply only to shops that already have items in stock. There are plenty of states that have no such laws and even the ones that make exceptions for costs passed on from suppliers.
Long story short, if XYZ corp. had a stockpile of millions of gallons of hand sanitizer it could have easily sold it to millions of retailers for 10x the normal cost raking in huge profits and never run afoul of price gouging law.
I understand that there are very few laws against price gouging, but there’s a pretty significant cultural bias against people doing it. Perhaps just as we need to have a culture of mask wearing, we need to have a culture that accepts price gouging.
Mixed feelings here. In the short run I think price gouging laws have a wisdom to them. You go to a store and see that toilet paper costs $5/roll. Thought: “Holy shit, world’s ending, I better get everything I can”. Alternatively you see a sign “due to demand only 2 rolls per customer”. Thought now is “well bad stuff is happening but they are handling it, better not panic”. Yes the price gouging law may frustrate the guy who just happened to inherit a storage unit full of toilet paper and would love to cash in on it but to the degree it undercuts panic I suspect it’s probably a positive thing.
My understanding is that during crises ad hoc laws against price gouging often combine with social pressure to reduce how much price gouging goes on.
I think the economics of price gouging aren’t as simple as many economists make them out to be. The story they often tell (https://www.econtalk.org/munger-on-price-gouging/) is that of ice after a hurricane. In an ice shortage people will buy up as much as they can get. They want an extra stockpile in case the supply of ice disappears. Purchase limits do little good, as determined individuals can still go from store to store buying up ice they don’t really need ‘just in case’. Meanwhile, the poor diabetic can’t find ice to keep their insulin chilled so they die.
The story goes that price gouging allows people to ration based on need. A high price of ice makes people who don’t really need it find a way to do without. Meanwhile, the diabetic will pay any price. The end result is that the diabetic will die in the scenario without price gouging, but live in the scenario with gouging.
But people don’t start off with the same amount of money. So if there are ten rich people and a thousand poor diabetics, the rich people can buy up lots of ice because the price increase isn’t that meaningful to them. It’s certainly less meaningful than the possibility of being inconvenienced because they ran out of ice. This drives up the price still more, to the point where poor people suddenly can’t afford ice no matter how desperate they are to get it. The libertarian retort is that the increased price of ice is a signal to new ice suppliers who will want to rush to get more ice on the market. This will drive prices down, to the point where they will eventually approach a normal price again. But by then how many diabetics will have died?
Price gouging is an interesting idea from a libertarian/market perspective, but I am not intending to put it forth as my primary solution. The section on markets was a specific response to a reddit comment, and mostly what I was attempting to show is markets aren’t a fool proof solution either.
But as long as we’re on the subject, would there be any value in distinguishing between foresight and gouging? The former being people who acted to stockpile things in advance of a crisis, who create more supply rather than trying to restrict supply after the crisis has already happened? When you talk about market based solutions those are the kinds of things that interest me. How do we incentivize preparation for crises when things are good?
Do you really think superforecasting techniques are actually used widely enough in the real world to cause significant harm? If there were a lot of policymakers that relied exclusively on superforecasters to the exclusion of other decision-making strategies, then it might make sense to call superforecasting harmful. But in practice, “use superforecasting techniques more than you are currently doing, but be aware that they can mislead you about certain issues” seems like the best advice to be giving to most people.
The IARPA contest that ushered in the first attempt at “superforecasting” was in 2010. The book was published in 2015. It hasn’t been around very long. And with Tetlock and his associates trumpeting how accurate they are, I suspect that we’re a long way from peak superforecasting yet.
I’m a superforecaster, saw your link in the astral codex classifieds thread. I did a little skimming so forgive me if I missed some of your points.
I feel like you’re arguing that people are idiots and bad at using information, so aggregating information better is bad.
You also argue that superforecasting is a bad tool for certain problems, in a way that feels like a strawman to me. Of course it’s not the right tool for every job. Who is suggesting handing innumerate people probabilities with no information about payoffs as if that were a way to make a decision?
Overall it just feels like most of your gripes are with how people make decisions in general and only tangentially related to superforecasting.
I appreciate you taking the time to respond, and for the time you took to read what I wrote even if you just skimmed it, that’s still a lot of skimming to do.
After writing thousands of words on the subject I’m not sure what I can put in a comment to change your mind. But I’ll see what I can do in a short space.
You say that handing people probabilities with no information on payoffs (what I call impact) is a bad way to make a decision. I agree. Is there some feature of the good judgement project which says we’re 90% certain X will happen and we’re 90% certain Y will happen. If we’re wrong about X it’s not a big deal, but if we’re wrong about Y it will be a catastrophe? If so that would answer many of my objections, but I haven’t come across that in all my searches.
To try and put my argument simply:
1- Black swans end up dominating what the world looks like.
2- Superforecasting does not deal with these sorts of events (they’re not in the goldilocks zone—See above for Tetlock quote).
3- It’s possible that SF is still useful anyway, unless it distracts from being prepared for Black Swans which I argue that it does (see the comparison in the last point between Oord and Superforecasters)
I agree that #3 is carrying a lot of weight, which is why I spent 1000’s of words elaborating on it.
As far as assessing the impact of unexpected events, there’s been a small amount of work done in this area with conditional forecasting questions (“Conditional on Y what is the probability of Z? Conditional on not-Y what is the probability of Z?”; only one branch of the conditional is scored – of course usually it’s the one with the high probability of occurring). Tetlock is also involved with some more recent work on counterfactual histories, but I’m not familiar with the details. Mostly though this is something that people just don’t ask superforecasters (so far).
It’s fair to say that most of the questions GJP has forecast are not all that interesting. I think this has as much to do with the nature of grading a competition as anything else. Like other commenters mentioned, GJP did not write their own questions (though I think the commercial incarnation occasionally does). The current incarnation has dabbled with some non-scored questions about the less-short-term future (e.g. economic questions 10 years out), but they’re still not especially interesting questions.
Re 2: I assert, with little evidence, that the tools and habits that allow you to better forecast in the goldilocks zone also help you assign more accurate probabilities to gray swans (and I would include the pandemic as a gray swan – I don’t think many people with a passing understanding of history could have reasonably expected modern pandemics to be implausible, at least if they consciously considered the question). The translation is far from perfect, but probabilistic-numeracy (and I do think that’s the dominant ingredient of superforecasting) is still value-adding. There are a lot of potential catastrophes you could worry about, and it takes forecasting to prioritize preparation in a sensible way.
Re 3: I guess you would also say that Oord assigning probabilities to catastrophes distracts a bit from preparing for said catastrophes? If so, this reinforces my impression that your issue is less with superforecasting and more with improbabilistic decision makers abusing probabilities. If you find it less objectionable from Oord, is the problem just that you think superforecasters are bad at estimating probabilities of black swans? Or is the problem that nobody asks us interesting questions?
As an aside, I strongly dislike calling a 90% forecast wrong just because the 10% hits. A) 90% may have been the true probability; one data point doesn’t tell you a ton, B) if everyone else thought it was 100%, the 90% forecaster may have been prescient.