If you prefer to listen rather than read, this blog is available as a podcast here. Or if you want to listen to just this post:
Last time we talked about mistakenly finding patterns in randomness—patterns that are then erroneously extrapolated into predictions. This time we’re going to talk about yet another mistake people make when dealing with randomness, confusing the extreme with the normal.
When I use the term “normal” you may be thinking I’m using it in a general sense, but in the realm of randomness, “normal” has a very specific meaning, i.e. a normal distribution. This is the classic bell curve: a large hump in the center and thin tails to either side. In general occurrences in the natural world fall on this curve. The classic example is height, people cluster around the average (5’9” for men and 5’4” for women, at least in the US) and as you get farther away from average—say men who are either 6’7” or 4’11”—you find far fewer examples.
Up until relatively recently, most of the things humans encountered followed this distribution. If your herd of cows normally produced 20 calves in a year, then on a good year the herd might produce 30 and on a bad year they might produce 10. The same might be said of the bushels of grain that were harvested or the amount of rain that fell.
These limits were particularly relevant when talking about the upper end of the distribution. Disaster might cause you to end up with no calves, or no harvest or not enough rain. But there was no scenario where you would go from 20 calves one year to 2000 the next. And on an annualized basis even rainfall is unlikely to change very much. Phoenix is not going to suddenly become Portland even if they do get the occasional flash flood.
Throughout our history these normal distributions are so common that we often fall into the trap of assuming that everything follows this distribution, but randomness can definitely appear in other forms. The most common of these is the power law, and the most common example of a power law is a Pareto distribution, one example of which is called the 80/20 rule. This originally took the form of observing that 20% of the people have 80% of the wealth. But you can also see it in things like software, where 20% of the features often account for 80% of the usage.
I’ve been drawing on the work of Nassim Taleb a lot in these newsletters, and in order to visualize the difference between these two distributions he came up with the terms mediocristan and extremistan. And he points out that while most people think they live in mediocristan, because that’s where humanity has spent most of its time, that the modern world has gradually been turning more and more into extremistan. This has numerous consequences, one of the biggest is when it comes to prediction.
In mediocristan one data point is never going to destroy the curve. If you end up at a party with a hundred people and you toss out the estimate that the average height of all the men is 5’9” you’re unlikely to be wrong by more than a couple of inches in either direction. And even if an NBA player walks through the door it’s only going to throw off things by a half an inch. But if you’re estimating the average wealth things get a lot more complicated. Even if you were to collect all the data necessary to have the exact number, the appearance of, the fashionably late, Bill Gates will completely blow that up. For instance an average wealth of $1 million pre-Bill Gates to $2.7 billion after he shows up.
Extreme outliers like this can either be very good or very bad. If Gates shows up and you’re trying to collect money to pay the caterers it’s good. If Gates shows up and it’s an auction where you’re both bidding on the same thing it’s bad. But where such outliers really screw things up is when you’re trying to prepare for future risk, particularly if you’re using the tools of mediocristan to prepare for the disasters of extremistan. Disasters which we’ll get to next time…
As it turns out blogging is definitely in extremistan. Only in this case you’re probably looking at 5% of the bloggers who get 95% of the traffic. As someone who’s in the 95% of the bloggers that gets 5% of the traffic I really appreciate each and every reader. If you want to help me get into that 5%, consider donating.