Tim Vickery has a terrific post on his BBC Sport blog about the mania for rankings in modern football, from obligatory “world’s 50 top derbies”-style cop-outs to the inane IFFHS league rankings that currently place the Argentine first division as the third-strongest league in the world. Vickery is particularly annoyed at the way ranking systems like the IFFHS’s short-circuit meaningful analysis and cause obviously false assertions, like the idea that Argentina’s football league is better than Brazil’s or Spain’s, to be taken seriously.
He draws a further parallel between the crudity of what he calls “ranking-ism” and the difficulty of compiling useful soccer statistics:
This can also extend to the action on the field. There have been many times in the Maracana stadium when I have been sitting next to the team collecting match statistics. “Accurate pass by the number 5,” the team leader would call out, though the ball had been blasted calf-height on the recipient’s wrong foot, keeping the play so tight that loss of possession was inevitable, or “inaccurate pass by the number 8,” after he had played an inspired ball inside the opposing full-back which might have set up a chance if his team-mate had been bright enough to read it.
Witnessing the match stats being compiled has made me acutely aware of their limitations. Football is too fluid for the rigidity of the statistical mind. Has the ball been used well? This depends, surely, on the situation of the game, the zone of the pitch – on considerations that cannot be reduced to a statistic.
This is as good a description of the problem with existing statistical measures in soccer as any I’ve read: a visibly poor pass that loses possession is rated as “accurate” because it hits its man, a brilliant pass goes down as “inaccurate” for the passer even though it was the intended recipient’s lack of imagination that caused it to fail.
I think it would be useful, however, to differentiate between two kinds of bogus numerology in football which Vickery, who describes them both brilliantly, seems to conflate.
The first is ranking-ism, which has its origin, I think, in the delusional idea shared by innumerable football writers and editors that lists automatically make good copy. What lists actually make, of course, is easy copy, and in a world in which the football media machine must be fed with endless shovelfuls of words, a ranking of the top 96 worst haircuts in the history of the Bavarian amateur leagues will tend to be conceived with a joyful sigh of relief.
Even better for editors and writers is a list that’s already been compiled by someone else, and that can simply be reported on. Bloggers are the worst offenders in this category, which is why hard little pieces of nonsense like the IFFHS list thrive so wickedly on the internet: it sounds authoritative, it will generate pageviews, and it requires absolutely no work from the bloggers who link to it, most of whom will remain utterly innocent of any notion of the method used to compile the ranking or its strengths and weaknesses as an analytic instrument. So it spreads everywhere, and because readers are conditioned to read rankings as instant sources of controversy, it generates volumes of largely pointless discussion and takes on an aura of importance that has no relation to its own intrinsic worth.
The second kind of bogus numerology is the one found in actual match statistics, and is bogus only to the extent that the officially recognized numbers are not sufficient to describe or clarify what really takes place on the pitch. The problem here isn’t laziness or cynicism, but simply lack of refinement. The accepted statistical measures aren’t good enough. It’s possible, however, without going too far into problem of statistics in fluidly complex sports, to imagine that advances in methodology would lead to new measures that would further our understanding of the sport and help us see it in a fresh way. There’s a feeling in some corners, of course, that a more complex statistical approach to soccer would somehow diminish the poetry or the humanity of it. But surely knowledge that would open up new angles of insight into the game would only enhance the poetry and humanity of it; at any rate poetry and humanity aren’t going to be saved by ignorance, and the idea that they might be seems like an intellectual surrender somewhere on the order of ranking-ism.
Two styles of pseudo-science, then, one a kind of white noise in the media, the other a potentially useful form of study in a premature state of development. They can bleed into each other, of course, the way the IFFHS rankings, a set of the second type of bad numbers, wound up propagating through ranking-ism and transforming into the first. But they aren’t meant for the same things, and it’s important to distinguish between forms of inquiry that are intended to develop our understanding and forms of inquiry that are intended to numb it.
Read More: Statistics
by Brian Phillips · January 12, 2009
[contact-form 5 'Email form']
But the Argentine league IS better than the Brazilian league . . .
The rest of the post is spot on (though I probably would not have been able to resist the standard dismissal of the IFFHS as the “vanity project of a anorak from Wiesbaden”).
Given that I was thinking “no, it’s easy copy” a few seconds before I read just those words, it’s your second point that I find more intriguing.
The question, of course, is not whether more meaningful statistics could exist (they surely could), but what they might be. An analogy that I’ve used before is that whereas the linear and event-driven nature of sports like baseball lends itself well to Sabermetric measures grounded in simple arithmetic; the much more fluid, dynamic and, yes, uncertain, nature of football requires measures grounded in a type of quantum theory.
And given that I never took Math 21 (or whatever it is called now), that’s as far as I’ve ever gotten.
Great post. I wonder how much progress has been made in the statistical analysis of football by companies who retain a proprietary stake in such information, and who sell it on to managers. Meanwhile, the rest of us are left with corner counts, journos’ arbitrary marks out of ten and Actim’s magic numbers.
If this is so, I think that even if someone does come up with some decent ways to measure the game and allows them to come into the public domain, they might be given short shrift by a soccer public who have been brought up on the idea that stats are practically useless in football — an idea reinforced by the inadequacy of the hitherto available stats.
Going a bit off track here, but game statistics are certainly interesting to consider in a cross-sport sort of way. At one extreme, you have baseball, a game so readily quantifiable that generations of analysts are comfortable with the notion of assigning an “error” when the expected doesn’t happen and which can almost literally be mathematically recreated from its list of statistics, as though it were the sort of plain-text computer code that creates a 3D image of Woody the Cowboy when run through the computer.
At the other, you have soccer, which has defied any attempt to make a decent set of objective statistics from which different people would draw similar conclusions. I love the fact that one of the most “cherished” of soccer statistics is a gymnastics-like numerical representation of subjective judging.
In between, you have a game like basketball, with its soccer-like fluidity combined with a fundamental task — scoring — that is so relatively easy and common that it is fairly easily discussed statistically, though not with the certainty of baseball’s proclamations of numerical certainty. You also have American football, which thrives on a steady diet of statistics that rival baseball for quantity, but so many of which are fundamentally flawed in that they fail to consider the complex interplay of the units (i.e. players) of ritualized land-war that is American football.
I realize I’ve gone absolutely nowhere with this comment, but oh well; I’ve enjoyed thinking about the subject.
So many interesting questions around the subject of statistics in soccer. I’ve been meaning for a long time to do a Stats Week here to explore the topic and finally conduct the interview I’ve been planning with my brilliant brother-in-law who writes for Baseball Prospectus (basically the Bible of Sabermetric analysis in baseball) on the question of applying statistical models to fluid sports. They seem to have had a lot of success with their Basketball Prospectus spinoff, and I’ve been curious whether the approach would have any application in football.
Two questions: Does anyone know what happened to that proprietary soccer-stats system Billy Beane was supposed to be working on with the guy from Leeds University? Billy Beane is the general manager of the Oakland A’s in Major League Baseball (also a hardcore Tottenham fan) who pioneered a revolution in bringing statistical analysis to the task of building a team. The ownership group of the A’s also owns the San Jose Earthquakes, and a few months ago Beane was reportedly collaborating with Leeds University Business School Professor Bill Gerrard to devise a series of new statistical categories for soccer. But I haven’t heard if it’s gone anywhere.
Second: Has anyone read Gerrard’s paper, “Is the Moneyball Approach Transferable to Complex Invasion Team Sports”? (Moneyball was the name of Michael Lewis’s influential book on Billy Beane.) The paper was published in the November 2007 issue of the International Journal of Sport Finance and doesn’t seem to be online, although the abstract is. I’d be curious to read it. Anyway I love the phrase “complex invasion team sport” as a generic description of football.
From the abstract:
Fascinating stuff.
Awesome article on Football statistics! After reading your article, I will be more careful when making football prediction with the statistics that I have.
Your brother-in-law writes for BP? Highly impressive.
My understanding is that Beane and Gerrard are still working together and that their “product” is intended to be proprietary, at least in the initial stages. I haven’t read the article (I don’t have access to a university library at the moment), but the abstract indicates that Gerrard is well aware of some of the most fundamental challenges to developing something of real value.
It’s interesting to see that serious research into football/soccer metrics appears to be proceeding on a largely proprietary basis (see, e.g., Actim and the Beane/Gerrard effort), whereas Sabermetrics was for many years a pretty pure “open source” movement. It would be unfortunate if Beane’s success at demonstrating that one can “monetize” the results of Sabermetric analysis in baseball leads to development in other sports being done behind closed doors. I consider that to be particularly true for football/soccer; given the relative lack of sophistication of the current metrics and the worldwide reach of the sport, the most rewarding path is almost certainly one that takes advantage of contributions from the greatest number of interested individuals from the widest geographic range.
I have also noticed this proprietary movement in football metrics, which is frustrating to American sports journalists. Last year I wrote a short piece about Cristiano Ronaldo for the “Gift” section of the now-defunct Play magazine. The piece was supposed to break down Ronaldo’s skill metrically, or in any pseudo-scientific way possible (It was an assignment. Don’t blame me). After a great deal of searching I found a Nike website that made some claims about Ronaldo’s pace with and without the ball so I tried to contact Opta, the company that supposedly provided the data to Nike, to get more information about the vague stats.
They responded:
“Hi Austin,
“I have been forwarded your enquiry with regards to Cristiano Ronaldo speed data as this was my deal.
“For this particular project we used a partner company (who wish to remain un-named) who track player speeds. The data was not actually collected by Opta as player fitness data (speed and distance) is not something that we analyse. Unfortunately this therefore means that this is not something that we can provide going forward as the company that we used previously do not wish for their data to be used within the media sector.”
When I pressed further, the Opt man wrote:
“The reason for my ‘secrecy’ is because 99% of the clients that our partners work with are proffessional football clubs who want to protect the data and dont want it in the mass media. The companies therefore dont want to associate their name with data in the media as they want to promote exclusivity to their clients.
Hope this makes sense.”
Does it make sense? I tried Ronny’s agent, I talked to Carlos Queiroz, to Manchester United (Don”t get me started on the great wall of Man U), the Portuguese FA, and other stat companies, and found very little. I believe there are some serious metrical studies being produced in the Premiership (Arsenal supposedly rely heavily on them), but there is a very different notion of their provenance.
A belated follow-up to your discussion of applying statistics to football. Unlike baseball and other striking-and-fielding games in which human observation and the paper-and-pencil method can go a long way to recording player activity, football and other invasion team sports need a certain level of technology to gather even a minimum amount of player action data. And if you want data on activity off the ball you need a tracking system. This is expensive. Combine the cost with the obvious concerns of teams over maintaining confidentiality and not surprisingly little player performance data is in the public domain. At least not in a form that allows much in the way of systematic analysis. So most of my own work over the last five years has not been published since I’ve had to sign confidentiality clauses with teams to get access to their data. (My 2007 article uses some historic Opta data that was published in a series of four yearbooks.)
My collaboration with Billy Beane continues but the research remains highly developmental and is not being used in the day-to-day operations of the A’s soccer team, the San Jose Earthquakes.