Back in May, before I knew that academic statisticians were eavesdropping on my thoughts, I mused on Sport Is a TV Show that “judging the footballing abilities of two football teams is so difficult that football itself is often a bad way to do it.” I was writing about matches that end in penalty shootouts, but the thought applies just as well to matches in which one team plays better than the other team but loses anyway. As we all know, that’s not an uncommon occurrence—it happens much more in soccer than in any other sport I follow—which is one reason the soccer punditry is so zealous in proclaiming after every whether the winning team really deserved to win. Scoring in soccer is so much more difficult than scoring in other sports, and matches are so much more likely to come down to two or three individual moments, that there will always be an interesting gap between results and merit.

But how much of a gap? That’s the question that’s asked, if not exactly answered, by an interesting paper in this month’s Journal of Applied Statistics [pdf], written by the astrophysicist G.K. Skinner and the statistician G.H. Freeman. (There’s an abstract here and a helpful summary here, for anyone who doesn’t want to read the whole thing.) Starting from the premise that a football match is an “experiment” designed to determine which team is superior, they analyze goal distribution in order to ascertain the likelihood that the experiment yields the correct result. In other words, they’re trying to figure out how confident we can be based on a given scoreline that the winning team is better—that if the game could be replayed over and over again under the exact same circumstances, the same team would win most of the time.

If Arsenal beats West Ham 20,000-0 in a couple of weeks (not an unrealistic scoreline, by the way) we can be more than 90% sure that they’d go on to win almost every time if the match were replayed 10 or 100 or 500 times. If they win 1-0, however, our confidence—based on established statistical models, I mean, not on Gary Lineker—goes way down, to the point that from the perspective of statistics a soccer match is a pretty terrible experiment. (“For typical scores,” Skinner and Freeman write, “the probability of a misleading result is significant.”) According to Skinner and Freeman, there’s only about a 28% chance that the best team will win the World Cup under the 2006 format, a notion that will surely astonish anyone who witnessed the sheer poetry and power of Marcelo Lippi’s Italy. (And yes, I’m talking about both the goals they scored over the 11 games of the tournament.)
In all seriousness, this is fascinating work, and it’s authoritatively strewn with giant splats of math like
but I’m not convinced by some of its implications. First, “superiority” is defined in a fairly ambiguous way. Early in the paper, the authors acknowledge that game-day conditions affect a team’s level of play, and say that what they’re looking at is superiority on the given day. Later, however, they look at intransitive results in World Cup groups—i.e., when Team A beats Team B, Team B beats Team C, and Team C beats Team A, which they suggest would never happen if a soccer match were a perfect experiment. However, since all three teams can’t play each other under the same conditions at the same time, finding significance in these numbers requires the authors to throw out the circumstances of the match and instead assume that team’s quality level relative to other teams is invariant. This not only seems obviously false, it requires a conceptual leap that reverses the assumption of the first half of the paper. The authors acknowledge this, but continue to treat both sets of conclusions as valid without further addressing the contradiction.

Much more importantly, the basic premise that a soccer match is “an experiment to determine which of the two teams is in some sense superior” is flawed, at least in my opinion. A soccer match a contest in which two teams try to satisfy a victory condition by scoring the most goals over a certain period of time. A redescription of that as an experiment to determine superiority isn’t simply a transition between synonyms but a changed definition that eliminates many of the meanings that normally reside in the game. For instance: for anyone not named Sam Allardyce, one of the purposes of the game is to be exciting to watch and to play, and excitement depends on contingency—on not knowing the outcome in advance, on the sense that anything could happen. But if you start thinking of the game as a scientific effort to identify the best team, contingency becomes a design flaw. An upset, after all, is a botched experiment. The goal then becomes—and Skinner and Freeman end with some speculations in this vein—to recalibrate the format of the match to ensure the “correct” outcome every time, an effort that seems hostile to much of what makes soccer, or any sport, entertaining, thrilling, alive.
That is to say: sport is as much an aesthetic construct as it is an objective test, and some of the glory of it lies precisely in its unreliability. The element of chaos on the pitch in soccer is what, at least to my mind, gives the game a great deal of its meaning, because the players’ struggle to create coherence out of randomness adds a new kind of import to the game’s moments of beauty (I’ve written more about this here, here, and here, among other places). Further, the fact that matches only happen once and the fact that they don’t last forever give hope to weaker teams, and enable stories to emerge that would never be possible under laboratory circumstances. I doubt anyone who saw Scotland beat France two years ago would believe that Scotland could win more than a handful of 500 replays of that game—but they only had to play once, and that’s a feature, not a bug. Saint George might not have beaten the dragon more than five times out of a thousand, either, but what all those paintings commemorate isn’t exactly the result of a failed experiment.

I realize that what Skinner and Freeman are writing about is largely limited to the implications of certain models within the realm of statistics, and that they probably don’t actually want to redesign the game to eliminate chance. (Though they do take the trouble to point out that their idea for playing successive periods of extra time until a satisfactory goal difference had been achieved “would not be popular with those planning television coverage.”) But for those of us who are reading their research primarily as fans of the game, it’s an opportunity to remember that there’s more to soccer than its ability to select the right winner, and that the aesthetic aspects of the sport can’t simply be ignored in any real-world discussion of its accuracy or fairness. A football game isn’t always a good way to judge the footballing abilities of two football teams. That’s what’s so great about it, don’t you see?
Read More: Football as Philosophy, Statistics, Why Do We Follow Sports?
by Brian Phillips · October 8, 2009
Talismanic anecdote relevant to post: the first European soccer match, and probably the first soccer match outside of a World Cup, I ever saw was an opening-round Champions League match between Manchester United and a Hungarian team named something ridiculous like “Zalagerzeg” that when I looked it up on google turned out to be even more improbable than even that: Zalaegerszeg.*
Manchester United dominated the match from stem to stern but managed to fluff a dozen easy chances to score. Obviously. And then as the match ticked over into stoppage time a hopeless long ball evaded a defender’s head, got a guy in to cross, and ended up on a Zalaegerszegian boot two feet from the net. It went in, and that felt good. In no way was it possible that they deserved to win, which removed all that moral and ethical pondering about justice. There was no justice. There was only goal.
I was delighted to google my way towards “zalaegerszeg manchester united” — google gives you Man U after the M, as if that is the most important M-related thing in the history of this Hungarian town — and find the goal, and more delighted to hear the Hungarian announcer’s voice hit a glass-breaking soprano in his excitement.
http://www.youtube.com/watch?v=cazzSHKkkfI
*(Why? I spent a summer in Ireland between undergrad and grad school for the hell of it — read “because my girlfriend broke up with me” — and the people I landed with had basic cable and that was the sort of Champions’ League game that made it to free TV. I also stared uncomprehendingly at some rugby.)
I just watched that 15 straight times, so let’s just embed it for the record:
Interesting article and reaction. I think Skinner and Freeman’s approach is better suited to sports that have more inherent structure than soccer, say tennis or baseball, sports that lend themselves to easier and more obvious statistical analysis. One of soccer’s greatest merits is its open format; copious amounts of space and time are given. Each game develops its own flow and pace dictated or, dare I say, created by the two teams. But, as you say, scoring is more difficult in soccer than other sports, and thus one team may carry the play but fail to find the back of the net. Conversely, one mistake, sometimes even a minor one, and the overwhelmed team may cash in on a rare opportunity to score against a superior team. Upsets ensue, flares and bonfires for everyone. Ice hockey is the one other sport comparable to soccer in this regard, although upsets in that sport are frequently attributable to that most pesky of variables, the proverbial “hot goalie”, he who has the power to steal victory from the grasp of a more deserving opponent.
Could not agree more with this post. This study reminds me of some of the Football Outsiders material that claims that teams like the ’08 Packers (who finished 6-10, mind you) were actually one of the best teams in the NFL. Okay, fine, but it didn’t happen that way, and that’s interesting in itself.
I think the statistican´s response is what every soccer fan already knows – finishing is the most valued and important skill in determining a game´s outcome. That´s why a decent forward costs more than a world class defender.
I definitely think Brian subtly hit on another point – how teams approach games. If both take a positive approach, it can drastically change the probability of a victory. However, if one team takes a negative approach, cough cough Inter v Barca….
The game isn’t played to see which team is better. It’s played to see which team wins.
Very well put. Exactly.
I think the message here is how misleading single matches are. Which is something everybody knows, and everybody pays lip service too, but everybody forgets the second a team that they care about loses.
After an important match, everybody pulls their hair and says, “what conclusions can we make from this loss?” and “does the last minute goal signify some sort of character deficiency in our squad’s spine?”
Then we all try to retrofit rational explanations that make sense of the “flow” of the game. The manager must have given a good/bad team talk. Such and such a player looked uncertain, such and such a player showed a lot of heart.
The reality is of course, that a few tiny more or less random moments will completely change the result and therefore the post match storyline.
So instead of all the soul searching, the most logical thing to do after an important result is to say, “Well. That happened.” And go home. Which we never do.
I think the problems Brian outlines here are connected to an issue that I’ve brought up before: the remarkable range of disagreement among even highly informed and experienced commentators about which players have had the greatest impact on any given match. This is especially true on the defensive end of the pitch, where the players who do the most to impede the other team’s offense often do so in a way that’s invisible to viewers. Sometimes even smart and experienced viewers.
That and we don’t see the entire game. Especially on TV, but even when one is there, it’s hard to see what everyone is actually doing on the field. We’re following the bouncing (or rolling) ball.
The real error is in extrapolating the result of the match or the success of the club to validate self worth.
There is an important element in soccer over other (American) sports that you don’t take into account. Most American playoffs (at least baseball, hockey and basketball) are best of 7 series. This diminishes the chances of an upset. Soccer on the otherhand, in the big tournaments (world cup, euro cup, copa libertadores) all in the end (1/4 finals etc) all come down to one or at the most 2 games. If a team only has to beat another team once, upsets are much more likely to happen.
That is why over a course of a league season, almost invariably the “best” teams do come out on top of the league. That is the beauty of NCAA basketball championships or the F.A. cup. A “lesser” team can catch lightening in a bottle on a given day.
Timotewo makes a good point, but that is more important if you happen to be a fan of a “lesser” team. Yes the FA cup gives a glimmer of hope for glory to these fans (myself included as a Wolves Fan) but on on form team can upset in a 7 series too. Most pundits in the US gave the Angels little chance against the Red Sox based on statistics (In 9 meeting in the run up to the World Series they had only beaten the Sox once) yet they just managed to sweep them pretty easily.
melanie,
Actually it was not the World Series, it was the opening playoff series between the Angels and the Red Sox. It was not a 7 game series but the abreviated 5 game opening game series. And the Angels beat the Sox in league play this year 5-4 and most betting lines and predictions had the Angels winning (which they did). Besides that, I guess your post was spot on. My point is this: in a 7 game series with Spain, how many times do you think the U.S. would come out on top? But in a one game series anything can happen (like it did in the Confederations cup).
The trouble is, if a single game is as random as some are suggesting, then the sum of several such random events (games) would only establish the random nature of the event (league) overall. Which it doesn’t. So you’re back to square one I’m afraid. Besides which, our brains don’t think in complex algorithms (or maybe they do, but mine doesn’t!) but leap to assumptions supported by the smallest evidence, such as Emile Heskey being an international class centre-forward, for example. Nice article though. Wonder what they’d have made of Bolivia?
Timiteo, I said run up to the World Series. offhand I couldn’t remember what bullshit phase it was !
Rubin Kazan 2, Barcelona 1
I was going to leave it at that but lets flog the horse a bit more… if this match was replayed until Puyol’s hair reached the turf, Kazan would only win a handful more. Are they a good, even great team? Yes, that was proven to the rest of us tonight. But still, given the gulf in class- the fantastic goals, the blind stabs in (a very well organized) defence, only combine for a win once in a hundred tries.
But for those of us who are reading their research primarily as fans of the game, it’s an opportunity to remember that there’s more to soccer than its ability to select the right winner, and that the aesthetic aspects of the sport can’t simply be ignored in any real-world discussion of its accuracy or fairness. A football game isn’t always a good way to judge the footballing abilities of two football teams. That’s what’s so great about it, don’t you see?
conditions at the same time, finding significance in these numbers requires the authors to throw out the circumstances of the match and instead assume that team’s quality level relative to other teams is invariant. This not only seems obviously false, it requires a conceptual leap that reverses the assumption of the first half of the paper. The authors acknowledge this, but continue to treat both sets of conclusions as valid without further addressing the contradiction.
An upset, after all, is a botched experiment. The goal then becomes—and Skinner and Freeman end with some speculations in this vein—to recalibrate the format of the match to ensure the “correct” outcome every time, an effort that seems hostile to much of what makes soccer, or any sport, entertaining, thrilling, alive.
href=”http://www.burberryoutletzonline.com”>Burberry outlet Store it is easy to make a problem is a point or position design have do very rich, use a lot of technique and elements, that pile of full can be overwhelming. Ignore the effect of whole space and appear in other parts of the empty, what does not know the performance of the whole society is very important. In addition to design from 2 d to 3 d space to consider in the brain, a whole shoe money lays out the stereo feeling, imagine the appearance of the finished goods, or not see, Burberry outlet usa in the integral aesthetic feeling, can accept consumers
Much more importantly, the basic premise that a soccer match is “an experiment to determine which of the two teams is in some sense superior” is flawed, at least in my opinion. A soccer match a contest in which two teams try to satisfy a victory condition by scoring the most goals over a certain period of time. A redescription of that as an experiment to determine