Exploring College Football Data using Shiny
Introduction (aka “O autumn thunder, why do you hurt me so?”)
One of the most unhealthy relationships I have is my seasonal affair I have with college football. I was never much of a big sports fan as a kid, but when I when to college at Rutgers, I got a real taste of what it’s like to be part of a tortured but dogged fanbase. I’ll never forget beating Michigan on a blocked field goal in 2014 (Yeah! We beat Michigan once!) or getting absolutely poured on during blowout Homecoming loss against Wisconsin in an empty student section. The annual fall torment of Rutgers football was then compounded when I moved to OSU and got to watch the Beavers lose game after game (although I did get to rush the field when we beat the Sun Devils in 2018). While nothing fills me with self-inflicted misery like turning on ESPN on October weekends, the quarantine has made me really pine for that autumn thunder.
Unlike professional sports, college football is the one major American sport that relies predominantly on recruiting young high school athletes. No matter how much of a star a college athlete is, they have a contract with a hard end date. At most, you can stay in college for six or so years and play for your team (lookin’ at ya, JT Barrett). Therefore, recruiters are constantly trying to find promising high school talent to fill the ranks of departing upperclassmen.
Recruiting is a really complex art and the success of a recruiting department depends on many factors. The interpersonal skills of the coaching staff is but one small piece of the puzzle! Booster money, university administration support, the local high school football scene, the football culture of the university, and recent success all have some influence on recruiting perception of each school.
I thought it might be interesting to see how the number of wins (often used as a proxy for the general quality of the on-field product) relates to the recruiting efforts for different teams. An exploratory tool would be ideal for this, since different teams would be likely be impacted differently. After all, blue-blood schools like Ohio State and Alabama have a powerful brand that give them an edge over many other universities, even in their down years. My tool of choice for something like this would be Shiny, an R tool for creating interactive web apps and dashboards.
Building the App
We’ll need to install a couple packages to create this app. I used the following packages:
tidyverse to clean the data which I pulled from a great API (check it out here).
Not shown here is the packages
jsonlite which I what I used to actually get the data from both the
We will use
plotly to add additionaly interactivity to the app.A
A brief primer on Shiny (it’s pretty straightforward!):
- A Shiny app is contained in a single R script, usually called
app.R. This script has to contain two functions:
serverfunction which performs computation, data manipulation, etc.
uifunction which defines how the app looks, all the available widgets for the user to interact with, and when certain panels are visible.
- At the end of the
app.Rfile, these functions are called inside of a
shinyAppfunction, usually looking something like
shinyApp(ui = ui, server = server).
I want to basically see how the number of wins influences different teams ability to recruit. The strength of recruitment is measured by the website 247sports using a statistic called Talent Composite.
The API I utilized had the talent composite for teams in the previous 5 years, so I could plot how a team did in recruiting over time. I also wanted users to be able to compare different teams side by side, so there should be functionality to select multiple teams from different conferences.
For the purposes of this app, I just wanted to focus on Football Bowl Subdivision (FBS) teams which represent the highest division of DI college football.
The User Interface
I wanted my UI to have several conditional panels which had checkboxes where the user could display their teams of interest.
Since FBS has over 132 teams, it probably wasn’t a good idea to have a checkbox with all the teams displayed at once. It makes sense to instead have a dropdown menu of conferences (there are 10 conferences in FBS, plus independents) to filter the selections and then just display teams in those conferences. Most users will probably be interested in comparing teams from the same conference, so this UI design choice makes sense from their perspective.
In the end, my UI looked something like this.
Here is the first chunk of code in my
There are a few things here:
- The interactive widgets include a
sliderInputto select different years
selectInputcreates dropdown menu to pick different conferences
- The conference selection changes the conditional panel below it which shows different teams - there is a panel for Big Ten teams, a panel for SEC teams, etc. I left the 9 other groups off since it would take up too much space.
- Each conditional conference selection panel has checkboxes for teams as well as a single checkbox to select and deselct all teams in that conference
An additional button to clear all selection is also built in underneath all the selection panels.
We also define the
mainPanel inside of the UI function which should display the output plot using the function
This is where the plotly component is important, since plotly allows us to create helpful tooltips over parts of our plot and zoom in/out when the user needs.
The call to
useShinyjs() is important here since otherwise we cannot use the helpful
reset function which we use to our clear user selections.
The server function
The server function is basically where we make all the interactive magic happen.
The serve function takes user selections as
input$[INPUT] arguments, so that syntax will appear often in the server function.
We build the app behavior and plots inside of this function.
The first part of the server needs to handle the event where users want to select all teams belonging to a particular conference. When a user selects this button, there needs to be an update to the checkbox inputs so that all the teams are automatically checked.
This can be done using the
observe function to wrap the
There needs to be an
observe call for every conditional panel, so since we have 11 panels, we need 11 calls to
The next part of the serve function captures all the selected teams amongst all the conferences (since teams from different conferences can be checked at the same time).
This was done using a call to the
The difference between
observe is that
reactive can return objects (like dataframes) that can be used elsewhere in the code.
There is another function called
observeEvent which the documentation says monitors for specific events/changes in a single variable.
It probably could have been used above to monitor the checkbox input, which would possibly make for good performance improvements.
The Clear All Selections button was implemented using the
shinyjs::reset function on every reactive element.
Finally, the actual plot generation is done using good old
We need to have a default empty plot to display if there are no teams selected, otherwise the app’s main panel will be completely empty.
The first if-statement captures that scenario.
If there are teams selected, then we create a plot using
geom_path (which creates connected line segments for each team to show change over time).
Labels for each team’s line is inserted into the plot using
geom_text, but only displayed for the point noting the most recent season selected.
This was a design choice to keep the plot from being too busy.
All calls from the
plotly::ggplotly are needed to display the interactive hover tooltip as well as the zoom-in/zoom-out feature of the plot.
The tooltip by default takes whatever the text aesthetic mapping is, so we make use of the
str_c function to concatenate relevant information in a single string: coach(es), season, talent score, etc.
The only somewhat clever part of this plot in my opinion was getting the colors to match up to their respective teams.
By default, all scale color functions like
scale_color_manual insert the colors in whatever order they are provided.
However, the geometries in
ggplot are plotted in alphabetical order (i.e. Alabama comes before Auburn comes before LSU comes before Texas A&M, etc.)
We can enforce alphabetical order of the colors by sorting the colors by school first, removing duplicate rows, then feeding that into
Deploying the App
There are a couple ways to deploy a Shiny application, some of them being harder than others. The easiest way to host a Shiny app is to let RStudio’s shinyapps.io PaaS which handles the hosting for you. The documentation is very easy to follow once you get the app working locally, so it’s not really worth talking about.
This application is hosted here on shinyapps.
One thing that I want to figure out is hosting a shiny app on a VPS using Shiny Server. The main benefit of this as far as I’m concerned is getting a custom domain name instead of a shinyapps one. After a weekend of wrestling though on an Ubuntu 19.04 VPS, I’m actually stuck on even installing dependencies for base R on the most recent version of Ubuntu (other users have encountered this problem, as seen here.
I’ll have to try again on either Ubuntu 18.04 or 16.04. That’ll be good content for a future post!