NP: John Cage, 4'33"
If you know me at all, you know that I'm a nerd of the highest order. And I've got the University of Chicago undergrad degree to prove it.
You also know that I've been relying on math to make my NCAA tournament picks for the last two years, winning my office pool in the first and coming a free throw or two away from doing the same last year.
Basically, I used to cherry-pick some stats, put them in a spreadsheet, and make up some sort of algorithm that would make picks based on which variables I felt were important. I'd apply that to the field, see if it made sense -- in terms of not picking a ton of upsets, for example -- and then tweak it a bit.
Last year, I decided to plan ahead, though, and loaded up all the stats I could off of ESPN.com for each game and then coding in the outcome of that game. So, you'd have team one's stats, then team two's stats, then a 1 or 0 to indicated whether team one won the game or not. All stats were pre-tournament, because I'd never be able to update them midstream.
The idea was to use multinomial logistic regression to figure out which stats were important. What that does is creates a likelihood curve for a binary output, so you essentially get a bunch of factors that predict how likely team one is to win the game. If it's over 50%, team one wins. Under 50%, and team two wins.
Except that I don't have any tools for multinomial logisitic regression. Back to the drawing board. This is when I think I found the clever bit that (hopefully) makes this work. If I was going to use linear regression, my dependent variable needed to be continuous. So I ended up trying to predict the margin of victory (or defeat) for team one -- team one's score minus team two's score. It gets clever because while I'm modeling the point difference, the output I really care about is simply if that point difference is positive (team one wins) or negative (team two wins). Except when the model spits out a number that's close to zero, this lets me absorb some error and still get what I want.
Now we're cooking with gas. Early on, I noticed that in doing this, whether you were team one or team two seemed to affect the outcome, so I doubled the rows of data and swapped the positions. This seemed to do the trick, as it weighted the same variables for each team the same way. Ultimately, after reducing the model scores to who wins, the model picked correctly in 80% of last year's games, and got everything right from the Sweet Sixteen on forward.
If I had more data, I would have kept a "holdout sample" to test the model after I built it on the rest of the data, but I don't. Yet. I'm going to try to keep doing this, which will hopefully make the model better over time.
I'm not sure it's going to work. I feel like I haven't quite gotten the predictor variables right, because very few of them seem to matter. Ultimately, it has come down to average points for and against for each team, with a little bit of the difference in seeding thrown in for good measure. There are a couple of counter-intuitive results, but I'm going with them on the notion that, if I'm right, I'm going to be really right. For comparison, I'm running a bracket with last year's math as well, to see which one wins.
You can bet that I'll be giving updates on how I'm doing as the tournament gets underway, and I've already gotten in arguments with co-workers over the importance of "intangibles."
As always, I'm quite impressed by your use of logic and math to figure this out. But I have to say this -- even if you have figured ths out scientifically, emotion will win the day -- and UConn will win.
Well yeah, but the point is that I don't know anything about college basketball, so in the absence of enough emotion to make picks, I try substituting math. Only 11 for 16 so far, although I only expect to get 75% or 80% right in the first round.
Trust me Chris, I'm tall but I don't know much either -- but I come from the Big East school that makes the Big East human (they lose a lot) and I know who they have always aspired to ... UConn. We should just crown them now. :-)
notabbott.com is not spamming you -- please read
however, if you'd like e-mails about upcoming shows and whatnot, click here
and if you saw this site plastered on the front of a bass drum, you can find more information about the bands I'm in (including Diver and Andrew Fraker & Sons) right here
Housekeeping note
January 2, 2014
Slacker Profiteering
July 7, 2013
In My Defense
June 20, 2013
When A Foul Isn't A Foul
February 5, 2013
All content on this website (including text, photographs, audio files, and any other original works), unless otherwise noted, is licensed under a Creative Commons License.