Tuesday, October 27, 2009

Week 9: Predictions

After correctly predicting an impressive 49 of the 56 games last week, it's time for the numbers to return to earth. I still expect about 42 of these predictions to be correct this week, but back-to-back 80+% accuracy would be unlikely. With that in mind, we go to ...

Games To Watch

West Virginia at South Florida. This game will come down to how well USF's offense plays. Both WVU and USF have defensive efficiency ratings around 15.5 and both have played a similar strength of schedule. WVU, however, has an offensive efficiency of 23.1 compared to 16.6 for USF. If USF can either find an extra gear in their offense or bring WVU's offense down to their level, they have a shot.

USC at Oregon. Oregon State took USC to the wire last weekend, and Oregon is a much better opponent. USC has roughly a 2-point margin in their favor on both the offensive and defensive efficiency, but if Oregon can do a better job of exploiting holes in the previously-invincible USC defense then this will be a close game. Oregon likes to play at a faster pace than USC, so there's the potential to get USC out of their element and rack up some quick points.

Kansas at Texas Tech. Both teams are coming off solid defeats -- Kansas to an underrated Oklahoma team, TTU to an out-of-nowhere A&M squad -- and looking to rebound. TTU needs to find answers on the defensive end, since Kansas has a much more efficient offense than A&M (23.4 versus 19.9). Look for this to be another high-scoring game; good offense, so-so defenses, and a quick pace.

Coin Toss Game of the Week

Middle Tennessee State at Florida-Atlantic. Once again Florida-Atlantic finds itself in the projected closest game of the week, with a 52.1% chance of pulling out the victory. If this is anything like last week then this does not bode well for MTSU.

Full predictions after the break.

Monday, October 26, 2009

Week 9: Full Rankings

Full rankings going into Week 9 after the jump.

Biggest jumps: UCF (+17, 81 to 64); Texas A&M (+16, 75 to 59); Fresno St. (+11, 54 to 43).

Biggest drops: Northern Illinois (-14, 57 to 71); Colorado (-13, 60 to 73); Missouri (-12, 28 to 40).

Week 9: Top 25

After a full weekend of college ball here are the Top 25 for this week, including the changes from last week.

Rank
+/-
Team
WinPct
SoS
Offense
Defense
Pace
001
--
Florida
0.9716
0.6852
32.8
10.1
77.1
002
+4
Texas
0.9616
0.5703
30.3
10.4
85.6
003
+2
Boise St.
0.9592
0.4131
28.2
9.8
82.7
004
-1
Alabama
0.9581
0.5984
25.8
9.1
80.2
005
-1
Oklahoma
0.9572
0.7377
25.0
8.9
88.8
006
+1
TCU
0.9531
0.5396
23.9
8.8
81.7
007
-5
USC
0.9527
0.5681
25.3
9.3
82.9
008
--
Penn State
0.9414
0.4759
23.3
9.2
79.9
009
--
Ohio St.
0.9255
0.5364
22.4
9.7
80.4
010
--
Virginia Tech
0.9044
0.6912
26.8
12.7
80.4
011
+1
Oregon
0.8860
0.5878
23.2
11.7
89.2
012
+1
Iowa
0.8793
0.6085
22.1
11.4
79.9
013
-2
Nebraska
0.8553
0.5218
21.7
12.0
80.8
014
+4
Cincinnati
0.8423
0.4786
21.7
12.4
85.2
015
--
Clemson
0.8407
0.6319
20.4
11.7
84.1
016
+4
LSU
0.8319
0.6022
21.2
12.4
80.0
017
-3
Texas Tech
0.8235
0.5509
28.3
17.0
87.8
018
-1
Utah
0.8131
0.4101
21.4
13.1
83.4
019
-3
Mississippi
0.8096
0.5572
21.5
13.3
81.5
020
+3
Georgia Tech
0.8090
0.7278
28.2
17.4
77.5
021
-2
Tennessee
0.7793
0.6344
20.9
13.7
81.1
022
-1
West Virginia
0.7775
0.4684
23.1
15.2
83.0
023
NA
Pittsburgh
0.7770
0.5189
24.1
15.9
80.5
024
NA
Oklahoma St.
0.7742
0.5076
26.1
17.3
82.5
025
NA
Oregon St.
0.7701
0.6664
23.0
15.4
86.5

Dropped out: BYU, Notre Dame, Virginia.

The big winners this week were Texas, Cincinnati, and LSU.  Alabama, Oklahoma, and USC all won but managed to drop a few spots.  Less than one one-hundreths of a point separate spots two through seven, so there's a lot of potential for movement over the next few weeks.  Florida had big shifts in both their offensive and defensive efficiency, neither in the right direction.  Oklahoma continues to win with excellent defense, clocking in with an adjusted defensive efficiency of 8.9.

At this point I don't think it's too early to start asking what to do about Boise State.  I'll go into more depth later this week, but they've got a downhill road to an undefeated season right now, with only 1-in-12 odds that they won't go undefeated.  Their adjusted efficiencies place them in a dead heat with Alabama, but their strength of schedule is the second-weakest in the Top 25, trailing only #18 Utah.  Their one quality win is a 19-8 home victory over #11 Oregon; after that the next toughest competition is #43 Fresno State.  They've played all of two teams with adjusted defensive efficiencies less than 20, which may mean their stellar offense is more a product of poor competition than actual firepower.

In an interesting footnote, it was a mere three years ago that we were facing a Fiesta Bowl in which a presumed-favorite Oklahoma was taken down by underrespected Boise State.  Were that same game played today we may see the exact opposite scenario.

Week 8: Summary

The first tempo-free week is in the books and I'm actually quite surprised how well the predictions played out.  Overall the system correctly predicted 48 of the 56 games, for an accuracy rate of 86%.  First let's take a look at the four games I singled out in my previous post and then get into were the system got it wrong.

Games To Watch

Georgia Tech at Virginia: This ended up being much less interesting than the numbers indicated.  UVa hung around until the start of the third quarter when GT embarked on a monster 82-yard, 11 minute drive that produced a touchdown and 20-6 lead.  UVa never really recovered and only picked up another FG, while GT added two more touchdowns to put this one away for good.

Clemson at Miami (FL): There were two general trains of thought about this game going in: either it was going to be a blowout of an unranked opponent by a Miami-FL squad that had just gone 4-1 against four Top 25 opponents, OR it was going to be an overtime thriller like the last two games between these teams (in 2004 and 2005).  It turned out to be the latter, with two ties and 12 lead changes culminating in a Clemson win in overtime.

Oklahoma at Kansas: Oklahoma did what the needed to do if they want to get back into serious bowl contention.  On paper they're an efficient squad that gets it done with defense, and Saturday was no different.  Going into this game, Oklahoma was tied with Florida for having the second-most-efficient defense -- 9.0 points per 100 possessions (PPH) -- and held Kansas to 13 points on 176 possessions (7.4 PPH).  Kansas had no answer from their defense, with a defensive efficiency of 19.9.  Up next for Oklahoma is what should be a cakewalk against Kansas St.  Meanwhile Kansas faces another team searching for answers in the form of Texas Tech.

Coin Toss Game of the Week

Florida-Atlantic at Louisiana-Lafayette: This game ended up being anything but close, as Florida-Atlantic pulled away for a 51-29 win.  This is a great example of Just Plain Getting It Wrong.  Which brings us to ...


Saturday, October 24, 2009

Week 8: Updated Predictions

I've gone through and removed most of the expectation maximization code that was adjusting the final score predictions. They were an artifact of last year's ESPN contest and no longer needed. The new predictions also are a bit more realistic since they're not trying to game a specific system. I also tried to make the highlighting a bit less painful to the eyes.

Games To Watch

Georgia Tech at Virginia: GT is the conventional favorite here, fresh off a win over Virginia Tech.  The numbers say it'll be a close one, though, with a slight nod to UVa.

Clemson at Miami (FL): Another potential ACC nailbiter.  Clemson gets the slight nod here, but just barely.

Oklahoma at Kansas: Kansas is in the Top 25 in the first BCS rankings.  Oklahoma is a .500 team.  Look for Oklahoma to start its run for a respectable bowl game here.  My numbers say pick Oklahoma, 31-17.

Coin Toss Game of the Week

Florida-Atlantic at Louisiana-Lafayette: Florida Atlantic has a 50.3% chance of winning this one.  Both teams might want to steer clear of black cats and ladders before stepping onto the field.

Home
Visitors
Odds
UTEP
17
Tulsa
24
77.9
North Carolina
17
Florida St.
24
52.8
Army
7
Rutgers
24
88.4
Alabama
31
Tennessee
17
91.5
Arizona
31
UCLA
17
79.2
Arkansas St.
31
FIU
17
72.1
Baylor
17
Oklahoma St.
24
73.7
Bowling Green
17
Central Michigan
24
53.5
BYU
17
TCU
24
77.0
California
35
Washington St.
14
98.2
Cincinnati
24
Louisville
17
93.1
Colorado St.
24
SDSU
17
83.7
Duke
31
Maryland
17
75.9
Eastern Michigan
17
Ball St.
24
77.8
Hawaii
24
Boise St.
35
98.2
Houston
38
SMU
24
87.7
Kansas
17
Oklahoma
31
83.6
Kansas St.
31
Colorado
24
56.4
Kentucky
24
LA-Monroe
17
90.7
LA-Lafayette
24
FL-Atlantic
31
50.3
LSU
24
Auburn
17
72.1
Marshall
31
UAB
17
73.9
Miami-FL
14
Clemson
17
54.8
Miami-OH
24
Northern Ill.
31
94.9
Michigan
17
Penn State
24
85.1
Michigan St.
17
Iowa
24
73.4
Middle Tenn.
24
WESTERN KY.
17
88.7
Mississippi
31
Arkansas
24
72.0
Mississippi St.
24
Florida
35
97.2
Missouri
17
Texas
24
82.6
Navy
24
Wake Forest
17
81.1
Nebraska
24
Iowa St.
7
95.0
Nevada
35
Idaho
24
82.0
New Mexico
24
UNLV
31
57.1
New Mexico St.
17
Fresno St.
35
94.6
Northwestern
24
Indiana
17
70.3
Notre Dame
24
Boston College
17
75.6
Ohio
31
Kent St.
17
83.6
Ohio St.
31
Minnesota
24
94.9
Pittsburgh
24
South Florida
17
71.8
Purdue
31
Illinois
17
87.3
Rice
17
UCF
31
65.6
South Carolina
17
Vanderbilt
14
90.9
USC
31
Oregon St.
24
94.7
Southern Miss.
35
Tulane
24
97.0
Stanford
24
Arizona St.
17
62.9
Syracuse
31
Akron
17
71.4
Texas Tech
47
Texas A&M
28
95.4
Toledo
17
Temple
24
59.5
Troy
35
North Texas
17
93.7
Utah
17
Air Force
7
77.2
Utah St.
17
LA Tech
24
53.2
Virginia
24
Georgia Tech
17
60.0
Washington
17
Oregon
31
86.2
Western Michigan
31
Buffalo
24
59.9
West Virginia
31
Connecticut
17
79.3

Friday, October 23, 2009

Week 8: Predictions

Below are my predictions for week 8. You'll notice that two of the games have already happened: UTEP-Tulsa and UNC-FSU. I got one right and one wrong, which is about how the numbers said I would do. The favorites for each game are highlighted in green, and the final column is the probability that the favored team will win.

Home Visitors Odds
UTEP17 Tulsa24 78
North Carolina17 Florida St.24 53
Army17 Rutgers24 88
Alabama24 Tennessee17 92
Arizona24 UCLA17 79
Arkansas St.24 FL-International17 72
Baylor17 Oklahoma St.24 74
Bowling Green17 Central Michigan24 54
BYU17 TCU24 77
California17 Washington St.14 98
Cincinnati24 Louisville17 93
Colorado St.24 SDSU17 84
Duke24 Maryland17 76
Eastern Michigan17 Ball St.24 78
Hawaii24 Boise St.31 98
Houston38 SMU31 88
Kansas17 Oklahoma24 84
Kansas St.31 Colorado24 56
Kentucky24 LA-Monroe17 91
LA-Lafayette24 FL-Atlantic31 50
LSU24 Auburn17 72
Marshall24 UAB17 74
Miami-FL17 Clemson24 55
Miami-OH17 Northern Ill.24 95
Michigan17 Penn State24 85
Michigan St.17 Iowa24 73
Middle Tennessee24 Western Kentucky17 89
Mississippi31 Arkansas24 72
Mississippi St.24 Florida31 97
Missouri17 Texas24 83
Navy24 Wake Forest17 81
Nebraska24 Iowa St.17 95
Nevada31 Idaho24 82
New Mexico24 UNLV31 56
New Mexico St.24 Fresno St.31 95
Northwestern24 Indiana17 70
Notre Dame24 Boston College17 76
Ohio24 Kent St.17 84
Ohio St.24 Minnesota17 95
Pittsburgh24 South Florida17 72
Purdue24 Illinois17 87
Rice17 UCF24 66
South Carolina17 Vanderbilt14 91
USC24 Oregon St.17 95
Southern Miss.31 Tulane24 97
Stanford24 Arizona St.17 63
Syracuse24 Akron17 71
Texas Tech47 Texas A&M28 95
Toledo17 Temple24 59
Troy31 North Texas24 94
Utah17 Air Force14 77
Utah St.17 LA Tech24 53
Virginia24 Georgia Tech17 60
Washington17 Oregon24 86
Western Michigan31 Buffalo24 60
West Virginia24 Connecticut17 79

You'll notice that the majority of the games have a projected final score of 24-17. This is due to some expectation maximization I perform in order to make the predicted outcomes more realistic. For example, a final score of 25-18 may be what the raw formula projects, but in reality is uncommon. The algorithm looks for "nearby" scores that are more likely and settles on those. I'll probably revisit this as the season goes on since it masks some of the differences between teams.

Week 8: Top 25

As of October 19th, 2009, here are your Top 25 in the Tempo-Free Gridiron. The columns are
  • WinPct: the Pythagorean winning percentage against a slate of average opponents
  • SoS: Strength-of-schedule
  • Offense: points scored per 100 possessions
  • Defense: points allowed per 100 possessions
  • Pace: number of possessions per half

Rank Team WinPct SoS Offense Defense Pace
001Florida 0.98070.709133.59.076.9
002USC 0.96640.546424.37.983.1
003Alabama 0.96310.588227.59.380.0
004Oklahoma 0.95700.753425.49.089.1
005Boise St. 0.95610.439728.110.181.6
006Texas 0.95430.576929.110.686.6
007TCU 0.94840.527024.69.381.6
008Penn State 0.93580.461023.69.779.1
009Ohio St. 0.91180.560422.410.379.7
010Virginia Tech 0.90350.683426.112.480.4
011Nebraska 0.88900.552124.112.081.1
012Oregon 0.88520.616123.211.889.1
013Iowa 0.88030.590322.411.579.9
014Texas Tech 0.88030.570529.915.486.4
015Clemson 0.84880.599318.810.683.4
016Mississippi 0.81260.542621.012.982.1
017Utah 0.80880.384121.313.282.9
018Cincinnati 0.79800.472721.313.585.5
019Tennessee 0.79590.607122.314.281.2
020LSU 0.79350.608420.212.980.6
021West Virginia 0.78460.451823.115.081.7
022BYU 0.77240.347424.316.282.5
023Georgia Tech 0.75800.720927.618.978.1
024Virginia 0.74440.561817.612.383.5
025Notre Dame 0.74270.622222.515.883.9

I know the first words that are going to come to mind from most people who see these rankings: "How is Oklahoma -- a 3-3 team who's fourth in their half of the Big 12 -- ranked two spots ahead of Texas -- an undefeated team who lead the Big 12 outright -- after Texas beat Oklahoma on a neutral field??"  The short answer is: Oklahoma has played one of the toughest schedules in the nation and lost 3 games by 5 points, whereas Texas has played a mediocre schedule and done well but not excellent."  The long answer is after the jump.


Thursday, October 22, 2009

Under the Hood

There are three key concepts behind the statistics on this site:
The offensive and defensive efficiencies of a team simply represent the number of points a team will score against and have scored against them per 100 possessions.  In basketball the concept of a possession is simple: one team has a chance to score a basket and continues until they either succeed or somehow turn the ball over (turnover, foul, defensive rebound).

In football the concept of a possession is slightly less clear.  Is it a single snap from scrimage?  An entire drive?  Where does a kickoff fall in this definition?  If we were to use a single play from scrimage as our definition of a possession, we find that our symmetric relationship between possessions doesn't hold.  One team could easily have 100 possessions in a game, whereas their opponent may only muster 50.  In this scenario we would have to account for the amount of offense and defense in each game in addition to the efficiency of each squad.  This seems needlessly complicated.

On the flip side, if we were to use a drive as the definition of a possession then the game becomes significantly more symmetric and familiar.  One team takes the ball, attempts to score, and then returns the ball to their opponent.  Unfortunately this further reduces our already-small sample size.  In college basketball there are 340 teams playing a 30-game slate with roughly 70 possessions per game.  In football there are 120 teams playing an 10-game schedule with roughly 20 possessions per game. (All numbers approximate)  Furthermore, how do we account for the scenario in which a pass is intercepted and returned for a touchdown?  A kickoff returned for a touchdown (or the return is dropped and the kicking team returns it for a touchdown)?

I finally settled on the philosophy that every play is a possession.  Every snap, kickoff, punt ... it's all a single possession because each team has the potential to score on every play.  When one team is on offense and has the ball, they are in essense playing both offense and defense on each snap; the same goes for the "defense".  They are also trying to block the other team from scoring and -- if possible -- score themselves.

I may revisit this decision later, but for now it seems to serve me well and allows me to "fit" everything into the other two portions of the model.

This brings us to part two: Pythagorean expectation, a formula developed by Bill James that uses the average number of points scored by and against a team to determine their expected winning percentage.  If S is the number of points scored by a team and A is the number of points allowed, then the expected winning percentage of that team is

WinPct = 1 / (1 + (A/S)^E)

where E is an exponent that varies from sport-to-sport.  In baseball the most accurate value appears to be 1.81.  In college basketball the estimation of the exponent has varied from 14.0 (by Dean Oliver) to 16.5 (John Hollinger) to 11.5 (Pomeroy).  After analyzing data from several years of college football, I settled on a value of 3.0 for E, although many values in the range [2.5,3.2] produced reasonable results.

Now that we have a definition of a possession and a value of E for our Pythagorean winning percentage, that brings us to the log5 formula.  This formula states that if two teams were to play each other, and one team has a winning percentage of A and the other a winning percentage of B, the odds that the first team would win are

.
            A - A * B
  WPct = -----------------
         A + B - 2 * A * B

There are two other small factors my system takes into account: home field advantage and the primacy of more recent games.  In college football the home team wins approximately 62% of all games.  This effect can be seen to varying degrees in other sports as well.  It is interesting to note that the magnitude of home field advantage is larger during the early stages of the season but diminishes as the season progresses.  From 2006-2009, the home team won 68% of all games played in the first half of September, compared with 57% of all games played in the first half of November.  Even taking into account the tendency of Powerhouse U. to schedule early-season games against Sister Mary's School of the Blind, the effect deteriorates as the year progresses.  This may be caused by teams becoming more comfortable playing on the road as they mature during the year, but for now it's simply an observation.

There is also the matter of weighting more recent games more heavily.  Unlike many other systems, I do not completely discard games from the previous years.  I use an exponential decaying factor that causes games from the start of the year to count as roughly 3/5th of a full game, and games from a full year ago as roughly 1/6th of a game.  This allows early-season predictions to be "in the ballpark" for most teams.  Let's be honest; USC is going to remain USC from year-to-year, and Duke is going to  ... well, let's just say Durham becomes a much cheerier place starting in mid-November.

Now that you've had the whirlwind tour of the statistics behind the site, let's get on with the good stuff: rankings.

What's all this, then?

Welcome to my humble attempt at blatant imitation of real sports statisticians.

I'm a long-time follower of college basketball, and over about the last five to ten years there have been some pretty interesting strides in improving our understanding of how to analyze college ball.  One of the more visible people in this arena is Ken Pomeroy, who I've followed since around 2004.  His blog and his rating system really sparked my interest in the concept of tempo-free statistics.  The basic idea behind tempo-free statistics is that what we currently measure in a sport like basketball -- points per game, number of turnovers, etc -- is in many ways broken.  They depend in large part on how quickly a team plays the game; the faster the pace, the more possessions a team and their opponents have, the more opportunities for both teams to score and give up points, rack up assists, and cough up turnovers.

They don't, however, answer the fundamental question "How good is team 'X'?"

For that we turn to tempo-free statistics.  Remove the disparity in possessions, normalize all statistics to a common metric such as "points per 100 possessions", and adjust for the quality of the opponent.  This allows us to see the fundamental efficiency of a given team.  Ken Pomeory has an excellent write-up of how this applies to college basketball that I will not even attempt to duplicate, but simply encourage you to read.

The question this blog examines is "do these concepts apply to football as well as they do to basketball?"  To answer that question I plan to lay out how I create my rankings, where I obtain my data, post predictions on upcoming games, and analyze where the system was right and where it went wrong.  I also hope to examine some areas in which my system produces vastly different results than either conventional wisdom and/or other computerized rankings such as those used in the BCS.

This is not my first attempt at applying this approach to college football.  Last year I participated in the ESPN Winning Formula Challenge, a 12-week-long contest with significant sums of prize money for those who could write a computer program to predict college football results.  Unfortunately there was an issue with my code during the second 4-week competition -- remember kids: bounds-check your array accesses because some team somewhere will hang 80 points on their opponents -- but had my code worked throughout the season I would have finished 6th out of approximately 120 competitors.  That's a 73% accuracy rate using nothing but the final score and number of possessions in each game.

In my next post I'll go into some of the nuts and bolts of my approach, but for now I encourage you to read Pomeroy's write-up on how to use and understand tempo-free statistics.