In Defense of Statistics: Go Florida State!

The Only Time You're Allowed to Root for Bobby Bowden

Right now, Penn State and Alabama are close enough in the human polls that the statistical polls - which have Alabama a universal #2 and Penn State a universal #3 - are the deciding factor between the two. Switch the statistical ranking portion for the two teams, and Penn State comes out ahead.

People around the country are saying "there's no way that would happen." Don't believe them - they're wrong. The Big Ten is not nearly as weak as people tend to claim, but probably more importantly, the SEC is not nearly as strong as people tend to think. The Big 12 is pretty clearly the premier conference this year, and so I won't even suggest that Penn State could end up ranked over Texas due to weird flukes. Won't happen.

But it is possible for Penn State to end up ranked over Alabama, but it will require a few things to happen - the combination of all of which is probably unlikely, but possible.

Remember one very important thing: statistical polls have no inertia (Billingsley omitted for insanity reasons). Alabama does not have to lose to move down - what has to happen is that their opponents have to lose. So, I mentioned in the first installment of IDS that good teams are pinned most by their best wins. So, how can we imagine Alabama's best wins eroding? Mainly by eroding the SEC as a whole, since Alabama's strength (much like any BCS team) is in the conference's strength. But there are a few teams Alabama's disconnected to.

This is just one possibility. Hopefully you can see the pattern:

  1. Georgia losing to Florida.
  2. Florida State beating Florida.
  3. Vanderbilt beating Kentucky.
  4. Georgia Tech beating Georgia.

Florida State's actually very important, since Florida's the most likely opponent for Alabama. One other thing you might note here: all of these wins are plausible. I'm not suggesting the Citadel beat Florida or something stupid. Georgia Tech is a borderline Top 25 team, FSU is a Top 25 team, Vanderbilt and Kentucky are about the same, and Georgia/Florida are both Top 25 teams.

But in addition to that, we can also boost Penn State's wins and possible future wins:

  1. Oregon State beating Oregon and Cal: this would put them in the Top 25, and suddenly our out-of-conference schedule is much stronger than Alabama's.
  2. Ohio State winning out: Northwestern is still a decent team, and winning out would likely put them in the top 10.
  3. Michigan State winning out (minus the obvious): they won't really move up, but losing to Wisconsin or Purdue would just pull them down to about that level, and that wouldn't help.

The pattern's straightforward: you want Alabama's opponents to lose to teams that Alabama doesn't play, and you want Penn State's opponents to beat teams that Penn State hasn't played.

Colley's rankings have this awesome little script which shows movement of the rankings based on hypotheticals that I wish every ranking system had. Keep in mind that Colley's system will probably tend to show the least movement of any of the rankings for an unbeaten, so moving us above Alabama in Colley's rankings probably moves us above them in all of them.

It's not really possible to get Penn State over Alabama (with an equal number of games won for Alabama/Penn State) with just their limit of 5 inserted games, but the combination of all of the above happening probably should do it - probably sprinkled with LSU losing somewhere else again.

If a few completely bizarre results happened in the SEC - Vandy ending up in the SEC Championship Game or something - that's definitely enough.

But all these results are plausible, and in the end, this isn't gerrymandering or some weird magic formula. Alabama's ranking is built on its opponents. If they go down, so should Alabama.

The Myth of a Weak Schedule

Probably the most common thing that you've heard this week is that Penn State's played a weak schedule compared to those above them. In Texas's case, that's true. Texas is having an insane year, and if they keep winning, you can't make any argument.

But Penn State vs. Alabama? According to most of the statistical ranking systems, Penn State's strength of schedule is just behind Alabama:

  • Colley: PSU #81, Alabama #70
  • Sagarin (*): PSU ~70, Alabama ~60
  • Massey: PSU #70, Alabama #44
  • Anderson-Hester: PSU #46, Alabama #33

(*: estimated from ELO-CHESS ratings only)

People simply haven't noticed the fact that the SEC is much weaker this year than last (more a statement of how good they were previously), and therefore also haven't noticed Alabama's rapidly weakening schedule. Auburn started off as a preseason #10 - Colley has them currently at #72. Also, Alabama's early big OOC win - Clemson - now looks like garbage, as Clemson, a preseason #9, is ranked #80 by Colley.

And for all the people complaining that Penn State played Coastal Carolina:

Penn State out of conference opponents:

  • Oregon State, #30 AP, #43/17/27/25 in Colley/Sagarin/Wolfe/Massey
  • Syracuse, NR AP, #104/112/121/125
  • Temple, NR AP, #83/87/80/98
  • Coastal Carolina, NR AP, NA/178/196/217

Alabama out of conference opponents:

  • Clemson, NR AP #80/74/88/84
  • Western Kentucky, NR AP #102/123/108/124
  • Tulane, NR AP #103/129/110/136
  • Arkansas State, NR AP #90/#106/#95/126.

Yes, we played Coastal Carolina. So what. Everyone plays cupcakes. Some of them are in Division IAA. Some are not. All of them are a functional "bye" week for the teams in question, unless you're Michigan (zing! hey guys, two in a row, you gotta expect some ribbing).

Please join me in telling the national media to go stuff themselves when it comes to scheduling cupcakes. We. Don't. Care. No one thinks that Texas is better than Penn State because we played Coastal Carolina and they get to play Baylor. We ignore those games. Statistical rankings ignore those games. So should the national media's criticism.

Home-field advantage

In the first IDS, I explained the basic way that rating systems work:

  1. Come up with a game-output function, which takes the rating of two teams A and B, and outputs the probability that A will beat B
  2. Find the ratings which maximize the combined probability (given by the game-output function) of all of the results of the season.

If you read Colley's mathematical description, it'll sound a lot different, but it's the same (For the mathematically inclined, it's the difference between a maximum likelihood and a chi-squared approach. The two end up the same so long as the distributions are ~normal-ish, which is why Colley's approach differs the most when there's a bunch of undefeated teams). Colley doesn't include HFA, because it's not easy to do in his approach.

So how do we include home-field advantage in this? Easy. You modify the game-output function and give a boost in probability to the home team, or reduce the probability of the visiting team. All of the rating systems save Colley include HFA, but you may notice a problem: how do you decide how much of a boost to give? Well, the answer is that you do it the same way that you find the ratings - find the value that explains the year's results the best.

The only problem with this is relatively obvious: as far as I know, all of the statistical ranking systems use the same HFA for all teams. That means that going to Penn State, with 110,000 screaming fans, lowers your chances just as much as going to Indiana, with a guy with a hand-puppet. There's no real way to fix this without biasing the results against a team based on historical evidence (which isn't really in the spirit of rating a season), so it's just a limitation.

Which means, again - just like with margin of victory, it's up to the human voters to recognize certain environments as harder to win in than others.

You created a Fanpost! Good for you! Any content from a premium site will be deleted once we catch wind of it--as will any inappropriate content. If you simply want to share a link, quote, or video, please consider using Fanshots instead.