ADRIFT IN AN OCEAN OF IGNORANCE
With the 2016 editions of the OPTA Pro Analytics Forum and the SLOAN Analytics Conference receiving more attendees and attention than ever, I take a look at the use of analytics in sport, and compare it to analytics used in sports betting.
Does analytics actually work? It is unquestionably cool and fashionable. But is it over-hyped? Is Moneyball a myth? And can sports teams learn anything from professional sports bettors?
“We have an ocean of ignorance and a small island of knowledge”.
[Bill James, of the Boston Red Sox, and father of baseball sabermetrics.]
Bill James is the original Mr Moneyball. He’s the father of sabermetrics – the practice of using numbers to represent the baseball universe. He founded Baseball Abstract and works as a senior advisor for the Boston Red Sox. If anyone can claim to have gained an understanding of his sport using data, then it is James.
But his quote above makes it clear that he acknowledges that the sum of what he doesn’t understand through baseball data is far greater than what he does – or can ever hope to. And that’s in baseball, which structurally is a far easier game to break down than a sport like football.
I don’t know anything about baseball, but I’ve spent a big chunk of my working life figuring out the probabilities in football (soccer) matches. I’ve been a bookmaker’s odds-compiler and trader, a professional gambler and latterly a quant for a football betting syndicate.
Betting professionally is a noble pursuit. It is a fair though unforgiving platform on which to test your ability. You place bets, and you either win or you lose. If you win (consistently, over a long term) you must be good. It’s man-on-man combat, you against the market. It sorts the weak from the strong. A cerebral form of boxing. Almost everyone who bets on sport loses over a long term. Only the strongest survive and win. So I’m quite proud of being a professional gambler. My mother not so much – she wanted me to be a vet.
I use analytics to help me win. It’s an invaluable tool in my armoury. The quantitative research and analysis that I do, and the models that I build are the difference between me winning and losing. I know this because I used to be an amateur gambler who relied on gut instinct, intuition and my sporting knowledge to make bets. Like everyone else who bets this way, I used to lose.
The seeds of my interest in using analytics to win at football betting were not sown by a love of numbers – in fact to be honest I hated Maths at school. Nor really by a love of football or betting – though I was always fairly interested in both. The inspiration actually came from Jurassic Park.
Back when I was at University (being the worst student in the history of British higher education) my flat-mate came back from a holiday in America with a copy of Michael Crichton’s novel. It was top of the best-seller’s list in the US at the time, but hadn’t been published yet in the UK. So I was one of the first people in the country to read it. With the subsequent success of the film franchise it’s easy to forget just how good and successful the original Jurassic Park book was. And despite not having high initial expectations for a science-fictiony book about dinosaurs, I loved it.
In the book, the character of Ian Malcolm (played by Jeff Goldblum in the movie) was more central to the story than in the film. He was the mathematician – or as he preferred to be known, the chaotician whose specialism was the branch of Maths known as chaos theory. I’d never heard of it before – this idea that a butterfly flapping its wings here could cause a typhoon half the way around the world. Chaos theory was effectively a character in the book, a brilliant narrative device that helped bring it to life. For the first time in my life I was genuinely intrigued and engaged by something to do with Maths.
After I finished the book the first thing I did was to say to my flat-mate “that would make a great Steven Spielberg film” (honest, I did). The second thing I did was to start day-dreaming about the possible applications of chaos theory to things I cared about. And so I started making notes on an idea, the basis of which was that if I could get hold of the right data then I could build a model using chaos theory that would let me predict the results of football matches.
If I could predict the results of matches then I could bet on them, and win every time. I would win a fortune, becoming so rich that I could then buy an island and pay some scientists to clone me some dinosaurs. Or something. Whatever the ultimate goals in my addled teenage mind, the seeds of the desire to be a professional gambler using computer models had been sown.
Those seeds needed to survive a nasty frost of harsh reality though. In my initial estimation of how to model a football match I had figured on needing to know who all the players were, how good they all were, and how they played. Perhaps their height and weight too. My chaos theory model would then predict how they would move around the pitch, and when they would score goals. Simple (if not easy).
But then I started to delve into the actual complexity of a game of football. There were twenty-two players who could, by the terms of the laws, be in any position on the field at any given time. They could all potentially interact with each other at any time in a host of different ways, and do a wide range of possible things with the ball. The position and actions of each player influenced the actions and movement of all the others.
At a detailed level there was the influence of the trajectory and spin of the ball. Players’ body position, and their verbal and non verbal communication. The referee’s subjective decisions. The ball and players’ interaction with an imperfect playing surface. The influence of the weather. Decisions by coaches. A shout from the crowd which distracts a player for a split second, making his position slightly different from what it would otherwise have been, which in turn makes some of the opposition players take up slightly different positions, which leads to the ball being played to a different player in another area of the pitch….. A fan throws a beach-ball onto the pitch.
Each factor that can possibly affect the pattern of play in a football match, no mater how infinitesimally small, adds to the number of possible ways in which it will play out. Or rather, it doesn’t add to its complexity, it multiplies it. Each moment in time during a game is related to each subsequent moment, and each movement and action matters – no matter how small. So the number of possible paths that 22 players and a ball can go down are in the trillions after only a few seconds of the opening whistle.
By the end of 90 minutes the number of different paths that the game could have followed is greater than the number of particles in the known universe. A game of football is unpredictable. As in, you cannot predict it. Not with the world’s most powerful super-computer, not with chaos theory, not with anything. We may be in the era of big data, but no amount of data is enough to properly understand something as nuanced, dynamic, chaotic and random as a game of football. Despite its relatively simple laws, football is gigantically, enormously, impossibly complex. At least as far as trying to predict exactly what will happen. A game is a vast ocean of randomness and complexity.
However, in this vast unknowable ocean some solid things do emerge, something to which we can cling – small islands made of rock. It is possible to study, define and largely understand these islands. What you can’t do is predict exactly how the whole fathomless waters of the ocean will react as they swirl around them at any given point in the future.
But we can have a decent stab at knowing roughly how the waters will react around some of these islands, especially the biggest ones. This is how we all know that Barcelona are a really good football team, and that Lionel Messi is a great player. We can’t ever know exactly what Messi will do the next time he gets the ball. But we can be pretty sure that on average, what he will do with the ball will be better (more efficient) within the framework of the rules of football than what another player would do in the same situation. And by the cumulative effect of the greater efficiency of its players (including Messi) and its combined team play, Barcelona will on average play the game more efficiently than its opponents. So it will end up winning games more often than the norm.
You can’t truly predict that Messi will do something brilliant, or truly predict that Barcelona will win. But you can be pretty sure that these things are more likely to happen than with other players and teams. And the essence of being a professional gambler is being able to say how likely these things are. If you are good at estimating these true probabilities then you can compare your estimation to those of bookmakers/betting markets and place bets when the odds implied by those probabilities are in your favour, and make a consistent profit.
In other words, professional gambling has nothing to do with making predictions. Predictions are impossible. Trying to make them is futile – amateurish. Making a profit from gambling comes from being on the right side of statistical probability – from betting only when the odds are in your favour. This is otherwise known as value investing, and it’s the only way to make a sustainable long-term profit from betting on sport, despite what many people would like you to believe.
So, my wonderful dream of using chaos theory to predict football games foundered on the impregnable rocks of the immutable laws of physics. What a bummer.
But like with so much scientific experimentation, the fact that my initial theory was wrong didn’t necessarily mean it a wasted exercise. On the contrary, it helped me to understand the true nature of randomness, the futility of making predictions and the value of thinking probabilistically. Of being a fox rather than a hedgehog. In other words, it put in place the foundations of knowledge I needed to be a professional gambler.
The same complexity that makes a football match unpredictable (in the sense that you can’t fully predict it), also means that a football match is completely un-understandable with numbers (in the sense that you can’t fully understand it). Even a billion data points colossally under-estimates the actual complexity of any game, because of the inter-relatedness of the movements and actions of all the players, the ball and other influencing elements.
So just like my naïve idea of using chaos theory to predict games was doomed to failure, the idea that you can use data/stats/analytics to understand a game of football is similarly ridiculous. All you can hope to do as an analyst is to get familiar with more of the islands, and so strive to be a little bit smarter. This insight can help you make slightly more efficient decisions. These islands are the signals in the noise.
Analytics unquestionably works in sports betting. Intelligent use of data analysis is better at delivering insight, and understanding of underlying intrinsic value than by looking at bare scorelines alone, or trusting the judgement of your eye. Distribution models define price probabilities better than any human brain can hope to compute.
But it also works because the people who use analytics in organisations like sports betting syndicates have learned how to use it. From initial over-hyped expectations like my chaos theory folly, the use of analytics as a tool to help smarter decision making has evolved through trial and error. We learned that analytics wasn’t a magic wand, or a foolproof predictor. It just helped us be a bit smarter. We just need to be a bit more efficient so that we win 55% of bets instead of 50%. This evolutionary process has been fairly rapid in our case, because betting provides brilliant laboratory conditions for such scientific experimentation.
Theories are proposed, and tested in the form of a model that generates bets. These can be real live bets, or run against historic price data. If the model makes money and/or does better than your previous model, then you have some scientific progress. If it does worse it goes in the bin, or gets sent back for refinement. At the very least it adds to the bank of knowledge, where knowing what doesn’t work is valuable just like finding stuff that does.
Betting is terrific for providing this feedback loop. If you win it works, if you lose it doesn’t (just make sure you’re judging over a long term/large sample). But such a feedback loop does not exist within a professional sports team. A club can implement a theory based on analytics (player recruitment, tactics, opposition analysis…) and it can then look at the results the team achieves. But there is no way to know if the results the team achieves were improved because of the implementation of the theory.
A team might win 20 games in a row after using analytics, but it is possible that they would have won 30 in a row if they hadn’t used analytics. Three out of four new signings identified using analytics might work out great, but it’s always possible that four out of four signings made using traditional scouting methods would have been even better. There is literally no way to know for sure, no quick loop of feedback in order to make scientific progress.
It is for this reason that the use of analytics in professional sports clubs is a good few years behind the use of analytics in professional betting organisations like syndicates. It isn’t that the betting analysts are fundamentally any smarter. It’s just that the feedback loop in betting applies Darwinian forces that mean professional analysts and bettors have to get better – or they die (go bust).
Sports analytics is currently in its equivalent of my chaos theory-theory reverie state – still basking in the glow of the brilliance of the writing in Michael Lewis’s Moneyball book and the hype that it (and Brad Pitt) generated. Largely oblivious to the fact that its subject, Billy Beane and the Oakland A’s have had only a single solitary playoff series win to show in almost two decades there, and have just had a woeful .420 MLB season. And that there is no tangible, credible, irrefutable evidence from anywhere in the sporting world that supports the idea that sports analytics can be used to make teams better.
Gartner’s Hype Cycle of Innovation neatly describes what happens with cool, interesting and clever innovations like analytics. The technology trigger in my case was reading about chaos theory in Jurassic Park, and for sports analytics it was the publication of Moneyball.
The internet is an excellent example of an innovation that fits this cycle, with its dot.com investment bubble, the subsequent deep trough, but then its recovery to its current place on the plateau. Sports analytics shares a similar spot on the graph currently with consumer 3D printing, autonomous vehicles, smart robots, big data, wearables and many more.
The hype that currently envelops sports analytics is a bubble that will burst, or rather deflate slowly as negative press about failed transfer committees, ‘playing the game on laptops’, and over-reliance on stats increase in volume and vehemence. There will always be aesthetic value in analytics, because plenty of people like reading data based stories. So analytics journalism, blogs, podcasts and conferences (and the fancy stats they show now during football games on TV) will continue to thrive. Early and devoted adopters of any technology will remain believers. But there’s a big difference between a thing having aesthetic value, and it actually being useful – which is what sports teams are interested in.
Analytics is not ‘the art of winning an unfair game’ as the subtitle of Moneyball claimed, or a ‘game-changer’ as others would have you believe. It is in no position to tell anyone that everything they know is wrong. It is not the keys to a padlock that will unleash a revolutionary new and better way of playing a sport. That is fanciful pie in the sky – the sort of unrealistic expectations foretold by the hype cycle. Analytics is simply a tool that can be put at the disposal of those who make decisions. The insight gleaned can be used to make themselves and their organisations a bit more efficient. That’s all.
In sports it is no substitute for the innate talent of the players on a team, which will always be the most important factor in success. The modern market in that talent (best represented by the salaries that the players get paid) for a wealthy sport like football is pretty efficient, so the correlation between wealth and success is unbreakable over a medium to long term. Analytics cannot be used to break that link – it’s not that powerful, unfortunately.
Analytics is a scientific endeavour. It is the search for a better understanding of the subject. It is not the search for the truth. The paradox of analysis is that the best analysts are those, like Bill James, who recognise and acknowledge that they know virtually nothing. In an ocean of ignorance they look to find new islands of knowledge, and to better understand the ones they’ve already discovered. The worst analysts believe they can understand the whole ocean. Good analytics is mostly about drifting along on a tiny raft of knowledge, hoping to bump into an island. It is most definitely not ‘knowing lots of stats’. It’s about insight.
If a football team was to use analytics in a smart way it could find a succession of small temporary edges – aka marginal gains. And also it could hope to establish a set of sound long-term strategic operational best practices by using analytics. The basis of this is the fundamental common sense that decision making should be evidence based, like with professional betting. Rather than based on superstition, tradition and cognitive biases, like in so much amateur betting.
“Analytics don’t work at all. It’s just some crap some people who are really smart made up to try to get in the game because they had no talent.”
[Charles Barkley. NBA basketball Hall of Famer and analyst.]
But the reality is that to date, the use of analytics has on balance probably done more harm than good to sports teams. They have done the equivalent of trying out my chaos theory football model, and done it with the same lack of comprehension of the complexity of the task they are undertaking, and the limits of the usefulness of the tool they are wielding. English Premier League teams have been involved in an analytics arms race for the last five years or so, hiring ‘fleets of analysts’. And this time has coincided with their performance and results in the Champions League falling off a cliff. It’s just a theory, but the two things may well be related.
What all this means for sports analytics is that as the bubble deflates, the expectations of what it is possible to do with analytics will become more realistic in the trough of disillusionment phase (it will survive the trough – it won’t become obsolete, like some innovations do). This will be painful for some, but it’s a necessary process. I loved my chaos theory model. It was mighty and powerful and brilliant – it’s just such a shame it didn’t work at all. I was gutted, but I got over it.
What it also means for organisations like football teams is that the very last person in the whole world they should be hiring to do analysis for them is somebody who claims to be able to understand football with data. Because by definition that person is an idiot or a liar, or both. The market has a way of sorting these things out – the genuinely talented and open-minded foxes will survive and thrive, while the hedgehogs with their deeply held convictions will eventually go bust and depart the scene. “Only when the tide goes out do you discover who’s been swimming naked” says the great Warren Buffett. In sports betting the tide comes in and out every day. In sports it can take months or years. Einstein said “Imagination is more important than knowledge” and “I have no special talent, I am only passionately curious”. That’s spot on.
Professional gamblers are efficient analysts because they make no naïve pretense that they can fully understand or predict anything. All they’re trying to do is recognize the way the waters swirl around enough of the islands that they can be a bit more efficient than their competitors. That might sound like a mundane aim, but being just a bit more efficient than his competition is what has made Warren Buffet a $70b fortune. So don’t ever underestimate its long-term power.
In football there is a well-worn phrase – ‘show us your medals’. In sports analysis you could equally say ‘show us your profits’. It is an excellent test of an analyst’s skill to see if they can make a model which shows a profit versus Asian Handicap betting lines. Getting them to do this would also be a great way to prick the bubble of these individual’s inflated expectations. Even if they make the greatest model in the world they will still lose about 40% of the time.
“Some of the smartest guys in football don’t work in football itself, they work in the betting industry”.
[Rasmus Ankersen, co-director of football at Brentford and Chairman of Danish champions Midtjylland. His fellow co-DOF is Phil Giles, formerly head of the quant team at Smartodds.]
Sports analytics will reach its plateau of productivity eventually. Sports betting analytics is probably somewhere along the slope of enlightenment at the moment. Tapping into the experiences of betting analysts who have benefitted from the hot-house effect of learning analytics in the laboratory conditions of sports betting is certainly one way sports teams could get ahead in their race.
Otherwise teams are mostly in for an unavoidable process of learning through trial and error. Personally I would also recommend that they take some time out to have a look on the book shelves for an old copy of Jurassic Park.
“My view on the world is we have an ocean of ignorance and a small island of knowledge. You can convert areas of ignorance into areas of knowledge forever and it doesn’t have that much impact because you still don’t understand the world. None of us do. The world is a million more times complicated than any of us understand. We haven’t done anything yet to compare with potentially what we could do”. Bill James.
“A football match is twenty-two sweaty young men running around a field kicking a ball and each other. It is ever thus, no matter how many billions of data points you collect as they do it.” Johnny Phillips.