Advanced Sports Data Analysis

Using machine learning to make predictions in sports

Author

Matt Waite

Published

January 4, 2023

1 Introduction

The 2020 college football season, for most fans, will be one to forget. The season started unevenly for most teams, schedules were shortened, non-conference games were rare, few fans saw their team play in person, all because of the COVID-19 global pandemic.

For the Nebraska Cornhuskers, it was doubly forgettable. Year three of Scott Frost turned out to be another dud, with the team going 3-5. A common refrain from the coaching staff throughout the season, often after disappointing losses, was this: The team is close to turning a corner.

How close?

This is where modeling comes in in sports. Using modeling, we can determine what we should expect given certain inputs. To look at Nebraska’s season, let’s build a model of the season using three inputs based on narratives around the season: The offense struggled to score, the offense really struggled with turnovers, and the defense improved.

The specifics of how to do this will be the subject of this whole book, so we’re going to focus on a simple explanation here.

First, we’re going to create a measure of offensive efficiency – points per yard of offense. So if you roll up 500 yards of offense but only score 21 points, you’ll score .042 points per yard. A team that gains 250 yards and scores 21 points is more efficient: they score .084 points per yard. So in this model, efficient teams are good.

Second, we’ll do the same for the defense, using yards allowed and the opponent’s score. Here, it’s inverted: Defenses that keep points off the board are good.

Third, we’ll use turnover margin. Teams that give the ball away are bad, teams that take the ball away are good, and you want to take it away more than you give it away.

Using logistic regression and these statistics, our model predicts that Nebraska is actually worse than they were: the Husker’s should have been 2-6. Giving the ball away three times and only scoring 28 points against Rutgers should have doomed the team to a bad loss at the end of the season. But, it didn’t.

So how much of a corner would the team need to turn?

With modeling, we can figure this out.

What would Nebraska’s record if they had a +1 turnover margin and improves offensive production 10 percent?

As played, our model gave Nebraska a 32 percent chance of beating Minnesota. If Nebraska were to have a +1 turnover margin, instead of the -2 that really happened, that jumps to a 40 percent chance. If Nebraska were to improve their offense just 10 percent – score a touchdown every 100 yards of offense – Nebraska wins the game. Nebraska wins, they’re 4-4 on the season (and they still don’t beat Iowa).

So how close are they to turning the corner? That close.

1.1 Requirements and Conventions

This book is all in the R statistical language. To follow along, you’ll do the following:

  1. Install the R language on your computer. Go to the R Project website, click download R and select a mirror closest to your location. Then download the version for your computer.

  2. Install R Studio Desktop. The free version is great.

Going forward, you’ll see passages like this:

install.packages("tidyverse")

Don’t do it now, but that is code that you’ll need to run in your R Studio. When you see that, you’ll know what to do.

1.2 About this book

This book is the collection of class materials for the author’s Advanced Sports Data Analysis class at the University of Nebraska-Lincoln’s College of Journalism and Mass Communications. There’s some things you should know about it:

  • It is free for students.
  • The topics will remain the same but the text is going to be constantly tinkered with.
  • What is the work of the author is copyright Matt Waite 2023.
  • The text is Attribution-NonCommercial-ShareAlike 4.0 International Creative Commons licensed. That means you can share it and change it, but only if you share your changes with the same license and it cannot be used for commercial purposes. I’m not making money on this so you can’t either.