This is the first in (hopefully) a series of posts to try to explain the way that the statistical rankings work in the BCS.
The first thing I'm going to say is this: Statistical rankings are easy to understand. Very easy. There's no magic about them, and with a few crackpot exceptions (Billingsley) they're all basically doing the same thing.
The basic idea is this: first, assume all teams have an intrinsic "true strength" - one number that determines how likely team A is to beat team B. That "true strength" is how the teams are ranked. Note that this means that we're assuming that matchups can't play a significant role - that is, you can't have a situation where Penn State is more likely to beat Wisconsin, who is more likely to beat Ohio State, who is more likely to beat Penn State. This may seem like a big assumption, but first off, it's necessary, and second, you can do some tests to determine if it's likely that loops like that are real, and they don't seem to exist in football. At least, not significantly.
OK. Now, with that rating, you need a "game output function." You need some function that gives you the likelihood that team A will beat team B. Whatever function is chosen, it's going to look something like this, where the X-axis is something like "A-B", where A is the true strength of team A, etc.
Now, armed with a game-output function and the results of each game, you can form a rating. You do this by figuring out what ratings make the total likelihood of the entire season highest, formed by multiplying everything together.
Since the BCS is out for the first time this week, there are some interesting questions we can ask by looking at the computer rankings:
- Why is Ohio State so high? They're ranked 5th, which is universally higher than USC, who beat them. Quite soundly.
Because Ohio State's only lost one game, and that was to a good team. USC lost one game, too, but that was to a much weaker team. Losses to weak teams pin a team much more than losses to good teams. This is where human polls and statistical polls differ: a human sees the pounding USC gave Ohio State and does not see it as a fluke, but sees the USC-Oregon State loss as a fluke. The statistical polls see *both* as an upset, but the USC-Ohio State loss as a much more likely upset.
- Why is Penn State so low? In some cases (Wolfe's rankings) they're below Ball State!
Undefeated teams are a kindof "gotcha" for ranking systems. If a team's true strength rating is "infinite", then all of its wins will have a GOF of "1", and that maximizes the total probability. So really, the biggest difference between all of the ranking systems is how they treat undefeated teams. Most of the ranking use something called "maximum likelihood," which is basically what I described above. When they do that, they correct undefeated teams by using something called "Bayesian inference," which starts by assuming a likelihood that a team that good actually exists. You don't get an infinite ranking, because no team in college football is that good. So it's a combination of "how likely is it that they're this good?" with "how likely is it that they would've won all their games if they're this good?"
This ends up being a "floor" rating, typically - which means that frequently, undefeated teams will get ranked lower than defeated teams, simply because the prior distribution (the likelihood of a great team existing) is low.
An important opinion note here: I hate this method. Frankly, it's stupid: because you can't bias yourself and say "Penn State's more likely to have a great team than Ball State," you have to assume the prior distribution is the same for all teams. This is violently, and provably, wrong. Bayesian statistics is just not a good model for separating undefeated teams. The problem is that it probably looks good, on average. But it underrates "major" programs, and overrates "minor" programs, which is what's happening for Ball State.
I can't really justify Ball State's ranking at all, though. The wins we have, straight down the line, are better than all of Ball State's, by Wolfe's rankings (who has Ball State above Penn State). The only advantage Ball State might have is that they've got one more away game (thus converting a "godawful team" into a "might not be so bad" team). But even there that doesn't make a lot of sense, considering the opponent advantage we have is very large.
- OK, so why is Penn State so much higher in Colley's rankings?
Because Colley uses a different method of finding the ranking that maximizes the likelihood for each teams. Basically, in the end, Colley's rankings end up putting unbeaten teams midway between the "worst" and "best" rankings. I like this method a lot more, since you don't have a preassumed prior distribution. Also, winning more games helps to pull you higher and higher. With a Bayesian approach, winning doesn't help much unless you win a better game than your "best" game, because the prior distribution is likely to be the one that's pinning you back.
- Shouldn't the statistical rankings be allowed to use scoring margin?
No. Very much so, no. Penn State's 46-17 victory over Michigan is a resounding victory in terms of scoring margin, but I'm sure that a lot of people in the human polls looked at that victory and said "hmm, Penn State might have some weaknesses." If not, they should've. I wouldn't blame them. The point is that you play to win the game - period. That's the only fair way to measure how well a team played. A statistical model cannot include all the variables that go into a game, so rather than use a flawed model to get imperfect information, it's best to use a simple model to get perfect information.
Does this suck for Penn State? Yes, and no. Yes, in the sense that the statistical rankings don't help us. But if the human polls worked the way they're supposed to, Penn State should be fine - because the human polls shouldn't get worked up about the fact that PSU has a weak schedule. We beat the crap out of our opponents. That's all they need to know - the statistical rankings deal with the weakness of the opponent.