Students take on March Machine Learning Mania

March Madness, it’s a magical time of year where anything can happen. It’s a phenomenon unlike any other: Sixty-eight hopeful college basketball teams enter and, after one wild month, one is crowned champion of them all. With unthinkable upsets happening every year (think Lehigh/Mercer vs. Duke) and dark horse teams running the table (UConn in 2011 and 2014), modeling the NCAA tournament seems like a lost cause. Try telling that to the stubborn students at the Institute who think that we can model anything…

For the past few years, data scientists everywhere have come together to compete with their models of the tournament on Kaggle. We’ve decided to make an entry under the name of TeamIAA in hopes of bringing home the glory to the Institute.

We started by collecting data from Kaggle (both the provided data and some extras from the forums), KenPom.com, sports-reference.com, and ESPN. We’ve tested all sorts of variables including rating systems (KenPom, SRS, seeds, and some self-made Colley Ratings), strength of schedule, and geographic distance from home.

Once we began modeling, we had a few ideas:

  • Logistic regression: calculating basic win/loss probabilities
  • Multi-stage logistic regression: with the thought that each round of the tournament has its own unique set of factors that influences who will win
  • Random forest: lots and lots of decision trees voting
  • Ensemble: averaging all of these together in hopes of getting the best of each

After testing the models on the last 4 years of the tournament in the first stage of the competition, we’ve decided on a final model being the ensemble.

 

If you’d like a peek at our predictions, they are visualized below (Credit to ThePowerRank.com for the inspiration and source code):

march madness

Visualization of Kansas’ probabilities of advancing to each round, according to our model.
CLICK TO INTERACT

 

Below we have provided a standard bracket visual with our predicted win probabilities listed. There aren’t many surprise picks, but that’s because the favorites are the favorites for a reason, right?

march madness 3

 

Let the madness begin!

Columnists: Michael Dickey and Vahid Sanii

DickeySanii

One Reply to “Students take on March Machine Learning Mania”