Back in May, before I knew that academic statisticians were eavesdropping on my thoughts, I mused on Sport Is a TV Show that “judging the footballing abilities of two football teams is so difficult that football itself is often a bad way to do it.” I was writing about matches that end in penalty shootouts, but the thought applies just as well to matches in which one team plays better than the other team but loses anyway. As we all know, that’s not an uncommon occurrence—it happens much more in soccer than in any other sport I follow—which is one reason the soccer punditry is so zealous in proclaiming after every whether the winning team really deserved to win. Scoring in soccer is so much more difficult than scoring in other sports, and matches are so much more likely to come down to two or three individual moments, that there will always be an interesting gap between results and merit.
But how much of a gap? That’s the question that’s asked, if not exactly answered, by an interesting paper in this month’s Journal of Applied Statistics [pdf], written by the astrophysicist G.K. Skinner and the statistician G.H. Freeman. (There’s an abstract here and a helpful summary here, for anyone who doesn’t want to read the whole thing.) Starting from the premise that a football match is an “experiment” designed to determine which team is superior, they analyze goal distribution in order to ascertain the likelihood that the experiment yields the correct result. In other words, they’re trying to figure out how confident we can be based on a given scoreline that the winning team is better—that if the game could be replayed over and over again under the exact same circumstances, the same team would win most of the time.
If Arsenal beats West Ham 20,000-0 in a couple of weeks (not an unrealistic scoreline, by the way) we can be more than 90% sure that they’d go on to win almost every time if the match were replayed 10 or 100 or 500 times. If they win 1-0, however, our confidence—based on established statistical models, I mean, not on Gary Lineker—goes way down, to the point that from the perspective of statistics a soccer match is a pretty terrible experiment. (“For typical scores,” Skinner and Freeman write, “the probability of a misleading result is significant.”) According to Skinner and Freeman, there’s only about a 28% chance that the best team will win the World Cup under the 2006 format, a notion that will surely astonish anyone who witnessed the sheer poetry and power of Marcelo Lippi’s Italy. (And yes, I’m talking about both the goals they scored over the 11 games of the tournament.)
In all seriousness, this is fascinating work, and it’s authoritatively strewn with giant splats of math like
but I’m not convinced by some of its implications. First, “superiority” is defined in a fairly ambiguous way. Early in the paper, the authors acknowledge that game-day conditions affect a team’s level of play, and say that what they’re looking at is superiority on the given day. Later, however, they look at intransitive results in World Cup groups—i.e., when Team A beats Team B, Team B beats Team C, and Team C beats Team A, which they suggest would never happen if a soccer match were a perfect experiment. However, since all three teams can’t play each other under the same conditions at the same time, finding significance in these numbers requires the authors to throw out the circumstances of the match and instead assume that team’s quality level relative to other teams is invariant. This not only seems obviously false, it requires a conceptual leap that reverses the assumption of the first half of the paper. The authors acknowledge this, but continue to treat both sets of conclusions as valid without further addressing the contradiction.
Much more importantly, the basic premise that a soccer match is “an experiment to determine which of the two teams is in some sense superior” is flawed, at least in my opinion. A soccer match a contest in which two teams try to satisfy a victory condition by scoring the most goals over a certain period of time. A redescription of that as an experiment to determine superiority isn’t simply a transition between synonyms but a changed definition that eliminates many of the meanings that normally reside in the game. For instance: for anyone not named Sam Allardyce, one of the purposes of the game is to be exciting to watch and to play, and excitement depends on contingency—on not knowing the outcome in advance, on the sense that anything could happen. But if you start thinking of the game as a scientific effort to identify the best team, contingency becomes a design flaw. An upset, after all, is a botched experiment. The goal then becomes—and Skinner and Freeman end with some speculations in this vein—to recalibrate the format of the match to ensure the “correct” outcome every time, an effort that seems hostile to much of what makes soccer, or any sport, entertaining, thrilling, alive.
That is to say: sport is as much an aesthetic construct as it is an objective test, and some of the glory of it lies precisely in its unreliability. The element of chaos on the pitch in soccer is what, at least to my mind, gives the game a great deal of its meaning, because the players’ struggle to create coherence out of randomness adds a new kind of import to the game’s moments of beauty (I’ve written more about this here, here, and here, among other places). Further, the fact that matches only happen once and the fact that they don’t last forever give hope to weaker teams, and enable stories to emerge that would never be possible under laboratory circumstances. I doubt anyone who saw Scotland beat France two years ago would believe that Scotland could win more than a handful of 500 replays of that game—but they only had to play once, and that’s a feature, not a bug. Saint George might not have beaten the dragon more than five times out of a thousand, either, but what all those paintings commemorate isn’t exactly the result of a failed experiment.
I realize that what Skinner and Freeman are writing about is largely limited to the implications of certain models within the realm of statistics, and that they probably don’t actually want to redesign the game to eliminate chance. (Though they do take the trouble to point out that their idea for playing successive periods of extra time until a satisfactory goal difference had been achieved “would not be popular with those planning television coverage.”) But for those of us who are reading their research primarily as fans of the game, it’s an opportunity to remember that there’s more to soccer than its ability to select the right winner, and that the aesthetic aspects of the sport can’t simply be ignored in any real-world discussion of its accuracy or fairness. A football game isn’t always a good way to judge the footballing abilities of two football teams. That’s what’s so great about it, don’t you see?
Read More: Football as Philosophy, Statistics, Why Do We Follow Sports?
by Brian Phillips · October 8, 2009
[contact-form 5 'Email form']