Year after year, sports franchises aggressively seek the strongest, fastest, most athletic players that can help them win. In fact, the NFL has built an entire industry on accurate player scouting and analysis as athletes transition from college football to the pros. The measures taken by football clubs range from sending scouts across the nation to hiring economists to run their personnel department (most recently done by the Cleveland Browns, who hired Paul DePodesta of Moneyball fame).
However, many of the practices in sports scouting, especially the NFL, are archaic in nature. While sports like baseball have great heuristics (like Sabermetrics) that have grown popular over time, the NFL has never developed a consistent analytical methodology for player evaluation. In an era where an algorithm can successfully fly a plane or pilot a car safely through a crowded street, NFL teams still draft players based on gut feelings or how fast they can run 40 yards in a straight line. It’s actually very common for teams to place wide receivers that can run a 40-yard dash under 4.5 seconds higher on their draft board with little consideration of other tests. To those of us in the profession of data science, this is not only a lost opportunity but a devastating waste of resources. Millions of dollars are gambled on contracts that ride on only a few data points. It’s time for an analytics revolution in the NFL, and it all starts with a February showcase often dubbed the “Underwear Olympics,” or more formally, the NFL Combine.
The NFL Combine is an event where NFL prospects perform athletic tests like the 40-yard dash and broad jump so that teams can better understand the athletic potential of a player. Teams use this data to try and assess how well a player’s athletic traits can translate into NFL production. However, this information is used in a piecemeal fashion, meaning sometimes teams make bets on players based off one single aspect of their performance metrics. This would be akin to a mechanic determining the health of your entire vehicle solely by checking the oil—it fails to tell the whole story.
It is proven that NFL Combine scores can successfully be used for predictions, as is evidenced by numerous research publications listed below   . However, no major sports publications, journals, or teams have ever announced the adoption of any advanced machine learning for player evaluation. Because some of us at SparkCognition are avid football fans and are always looking for a leg up with our fantasy sports teams, some folks in our office decided to see if it was possible to use NFL Combine data to create an all-encompassing prediction for success at the next level.
The results were fantastic, accurately predicting the likelihood of success (barring injury) for the majority of our testing on wide receiver data. Below is a scatterplot that displays predicted versus actual wide receiver success scores. Predictions were determined by feeding the player’s combine results into machine learning algorithms. The actual scores were calculated by weighting yards/game, yards/target, and total touchdowns over the player’s first three seasons (using data taken from pro-football-reference.com). It’s apparent that a simple machine learning approach is capable of precisely, if not perfectly, predicting success. Notice that there is a linear trend between predictions created and actual prospect performance.
What’s amazing about the potential of this analysis is that it is based solely on eight metrics (height, weight, 40 yard dash, vertical jump, broad jump, bench press, short shuttle, & 3 cone drill) that players take during the NFL Combine. It doesn’t factor in body measurements taken during the event, player interviews, or historical performance data. If it was possible to add the wealth of tangible data available to NFL teams to this analysis, a highly powerful predictive capability could be uncovered and used to supplement scouting departments, making them capable of analyzing every movement of draftable prospects across the nation.
In conclusion, NFL personnel departments could be running much more efficiently by using machine learning to predict the future success of a prospect. Because of machine learning’s ability to consider multiple facets of data and how they correlate, artificial intelligence will provide a more complete perspective on the future capabilities of any prospect. With more data for the prediction models to train with, there’s significant potential for machine learning to supplement or replace components of the existing scouting process. If any team needs help on where to start, call us at SparkCognition. We’re fans, too.