Computer Ratings for the Upcoming Football Seasons

The algorithm for rating football teams and predictions for the opening weekend of the college football season

Aug 23, 2025

green football field under blue sky during daytime — Photo by Steve DiMatteo on Unsplash

In previous seasons, I've used a collection of similarly designed rating systems to rank football and baseball teams, and to predict the outcomes of upcoming games. However, because I used different data sources depending on the sport and league, the actual code for each league was slightly different and built around the data source. It's inefficient, and I'd prefer to have a single unified rating system that works with many different types of data. With that in mind, let's build a better rating system. As usual, the code will be released on Github under an open source license. I’ve been tweaking the code and improving it right up until this morning, and I’ll need to do a bit of work commenting and organizing it better before it goes on Github.

As a reminder, there's quite a bit of time and effort needed to develop and test this software. Sometimes there are subtle issues that require many rounds of testing to diagnose and resolve. I don't use AI assistance to write my article, either. If you'd like to support continued open development of software like this, please consider subscribing, sharing my articles on social media, forwarding my emails to people who might be interested, and making a financial contribution.

Forward- and Backward-Looking Ratings

Before building a rating system, I need to decide what the purpose of the ratings are. Do I want to rank the quality of teams and predict future performance, or do I want to rank which teams have accomplished the most in the games they've played to date. In many cases, the ratings using the two approaches will be similar, but there can be big differences.

One approach is a backward-looking rating system, assessing what a team has accomplished during the games its played. We typically use this to determine things like qualifying for postseason play. The most basic form of this is the standings, the wins and losses, and that works pretty well in professional leagues where teams generally play schedules of similar difficulty. There are often larger differences in schedule strength in college sports, so we use more complex approaches to weigh a team's wins and losses while accounting for the difficulty of its schedule.

My ratings here are different, a forward-looking approach to measure the quality of teams, with the goal of predicting their future performance. This may sound similar to a backward-looking rating in that we still have to look at what teams have done in prior games. The difference is that forward-looking ratings tend to rely heavily on the score of a game instead of just wins and losses. There's a lot of luck in sports, and it doesn't necessarily even out over the course of a season. A team that loses a lot of close games might actually be a good team that's just been a bit unlucky in its prior games. But a team that's won a lot of close games might not be able to replicate that performance in the future.

Aside from the simple objectivity of looking at which teams have the most wins and fewest losses, there's a lot of subjectivity in which teams have accomplished the most. For example, do we want to give greater weight to wins over opponents with good records or to winning on the road? The question of which teams are most deserving is often a subjective question with room for differing opinions. There's no inherently best answer to which teams are most deserving. In contrast, there is an inherent best answer to which teams will perform best in future games. The answer is to play the games and observe which teams actually play best, and test which rating systems work best.

I'll revisit the topic of backward-looking ratings a few weeks into the college football season when we have enough data to debate which teams are most deserving of being in the college football playoff. Starting in October after the college football games conclude on Saturday evenings, I'll post a column late in the evening called "The Linked Letters After Dark" with a first look at how those week's games have affected the ratings and the playoff picture. But for now, let's look at the forward-looking ratings.

How the Ratings Work

The basic concept for my ratings is to begin by setting each team's rating and the overall home advantage to zero. My system iterates through all the past games that I want to consider and predicts their outcomes. The predictions come from the home team's rating, adding the home advantage, and subtracting the away team's rating. The predictions are compared to the actual outcome of the games. If a team consistently outperforms its predictions, it's a sign that the team is probably underrated, and its rating should be nudged upward. If the team underperforms its predictions, the team is probably overrated, and its rating should be lowered a bit. And if home teams consistently outperform their predictions, it's a sign that the home field advantage should be raised.

The system keeps track of the overall error, which is how well or how poorly the ratings predict the games for all teams. The goal is to find the combination of team ratings and home field advantage that minimizes this overall error. Once the ratings are adjusted, new predictions are made and evaluated. This process keeps repeating until it becomes very difficult to reduce the overall error. At that point, the ratings should good predictor of the quality and future performance of the teams.

My rating system is really an optimization problem to minimize the overall error. When my system adjusts the ratings up or down a bit, there's a small random component because I've found this is helpful in optimizing the ratings. But it also means that if I choose different random numbers, I'll get slightly different ratings for the teams. Because of this, I actually run the rating system many times, each with different random numbers, and I get different ratings each time. The final home advantage and team ratings are the median of all the rating attempts.

Typically, rating systems try to avoid rewarding teams for running up the score too much. Winning a game by three touchdowns is definitely more impressive than winning by one touchdown. And a five touchdown victory is more decisive than just winning by three touchdowns. But does it really make a big difference if a team wins by seven touchdowns instead of five? At some point, a blowout is a blowout, and there are diminishing returns to winning by larger margins. I assume the margin of victory for all games is normally distributed with a mean of zero, and I calculate the standard deviation. Using that, I calculate the cumulative probability for both the predicted and actual margins of victory. A larger difference in cumulative probabilities means the prediction for the game was worse. I prefer this approach because it doesn't excessively reward or punish teams for blowouts, but I also don't need to pick arbitrary thresholds like weighting victories less if the margin is above three touchdowns or five touchdowns.

Offense and Defense Ratings

Instead of using a single number to represent a team's quality, my system actually uses separate ratings for offense and defense. These are not efficiency ratings like some other systems. A positive offense rating means that a team tends to score more points than normal. And a positive defense rating means that a team allows fewer points than normal. These are used to predict the score of a game instead of the margin of victory or defeat.

Offense and defense ratings for each team start at zero. For each game, predictions are made for the home and away teams' scores. The home team's score is predicted as the home team's offense rating, plus half of the home advantage (or zero for a neutral site), minus the away team's defense rating, plus the average score for all teams and games. The away team's score is the away team's offense rating, minus half of the home advantage (or zero for a neutral site), minus the home team's defense rating, plus the average score for all teams and games. A negative score isn't realistic, so if the predicted scores are negative, they're just set to zero. The predicted margin of victory is then home team's predicted score minus the away team's predicted score. This is actually what my system uses to predict the outcome of games.

Weighting Games

College teams tend to play a lot of games within their conference and relatively few games against non-conference opponents. A typical schedule for an FBS team is eight or nine games inside their conference, two or three against non-conference FBS teams, and maybe a single game against an FCS team. There are relatively few games between teams in different divisions, like FBS and FCS teams, or FCS teams against lower divisions. These games are often blowouts, so it might seem like there's not a lot that can be learned from the outcome of these games. I disagree.

In my testing, when I give these games too little weight, it causes the top teams from lower divisions to be ranked alongside the best FBS teams. Although these teams are dominant in their division, and there definitely are a few dominant Division III teams, there's also a large disparity in the skill of players across divisions. In reality, a game between one of the top FBS teams and one of the best Division III teams would almost certainly not be competitive. These blowouts are actually useful for distinguishing the quality of teams in higher divisions from those in lower divisions. However, because there are relatively few of these games, I've found they actually need to be weighted more heavily to make the ratings more accurate. I use graph theory techniques to determine how much differently to weight these games.

In graph theory, each team can be treated as a node. Each game is a connection between two nodes, and that connection is referred to as an edge. There are lots of edges between conference teams, but relatively few edges between teams in different conferences, and even fewer edges across divisions. I use a metric called edge betweenness centrality to measure how dense or sparse the connections are between nodes. Non-conference games and especially games between teams in different divisions have a larger betweenness centrality and are weighted more. This does seem to be helpful in preventing the top teams in lower divisions from being ranked alongside the top FBS teams.

Probability of a Win, Loss, or Tie

When the overall ratings have been calculated, my system uses these ratings to predict the scoring margin for every game. The prediction error is the predicted margin minus the actual margin. These errors are calculated for every game, and the standard deviation is calculated. Assuming that the mean of the errors is zero and using the standard deviation of the errors, my system fits a t-distribution to the data. If there optimal fit has more than 50 degrees of freedom, my system assumes the error is very close to a normal distribution. If not, a t-distribution is probably a better fit.

When predicting future games, the margin is estimated as the home team's rating, plus the home advantage, minus the away team's rating. That predicted margin is also the center of the distribution. My system uses the same standard deviation and distribution type that was just calculated. If there are no ties, the probability the away team wins is the cumulative probability for the part of the distribution that is less than zero. Then the home team's probability of winning is the opposite.

If ties are possibility, that probability is a small sliver of the distribution on either side of zero. Using current NFL rules, the probability of a tie is around 0.38%. My system goes through all the past games and estimates how wide that sliver of the distribution would need to be to match the overall probability of a tie. For NFL games, this ends up being roughly plus or minus 0.07 points. If teams are more evenly matched, the probability of a tie is going to be a bit larger than 0.38% because it's more likely that the game will be close. If it's a lopsided matchup, the probability of a tie will be less than 0.38% because the game is more likely to be a blowout. There are no ties in college football, so this is just ignored.

Early Season Predictions

Unlike some other systems, I don't try to adjust the ratings based on how much production each team has returning. Instead, I just use ratings from the prior season as a starting point. For NFL games, preseason games during the current season do have some predictive power, so I include them. For college football games, it's just the previous season's ratings at the start of the season. As games start to be played in the new season, I start lowering the weight of games from the preseason and the prior season. For college football, my plan is to completely phase out games from the prior season around the sixth week of the season. For NFL games, I'll decrease the prior season's weight more slowly, phasing out preseason and prior season games out completely after the tenth week of the season. This is a somewhat arbitrary decision, and I might use a different approach for other sports or in future seasons.

An Ensemble of Ratings

As I wrote earlier, my system actually calculates many sets of ratings, each being unique and slightly different. This could be called an ensemble of ratings. How large does the ensemble need to be to get accurate ratings? I'm not entirely sure.

Even running the ratings a single time does a very good job of distinguishing good and bad teams. For the NFL, the best teams are rated about 20 points higher than the worst. For an ensemble of 100 ratings, a quick look at one team shows the highest rating is about a point above the lowest. If there are a few teams with similar ratings, the difference between the highest and lowest ratings might be enough to move a team up or down a couple of spots. However, because I use the median rating, increasing the ensemble size from 10 to 30, or 30 to 50, or 50 to 100, doesn't really shift the ratings up or down more than a small fraction of a point.

One thing I have noticed is that if I calculate the standard deviation of the ensemble of ratings for each team, the highest standard deviations are about twice as large as the lowest. This is the case even if I have an ensemble of 100 ratings, so I'm not convinced this is pure chance. I'm not certain, but if there's a larger standard deviation for a team's rating, it probably means there's more uncertainty about how good that team really is. This may be useful information, but I'm not sure yet.

Back to the question, how large of an ensemble of ratings do I really need? Larger is better, but I also need to get results in a timely manner. For the NFL ratings, there are a few hundred games, so they don't take especially long. I'll probably use an ensemble of 50 or 100 rating attempts, and I suspect that's more than enough. For college football, there are roughly than 10 times as many games as in the NFL data, so it requires a lot more computing time. I'll probably limit this to 10 or 15 attempts, sacrificing some of accuracy to make sure I get the results in a timely manner.

Preseason Ratings

As a reminder, these ratings are also the final 2024 ratings, because I don’t adjust them for the upcoming season based on returning players. Delaware and Missouri State are moving up to the FBS for 2025, so they’re included in these ratings.

Overall Ratings
Home advantage: 3.12 points
Mean score: 26.36 points
Rank Rating Team                  Offense Defense
   1 78.10  Ohio State            37.33   40.69  
   2 75.04  Notre Dame            37.76   37.20  
   3 73.42  Ole Miss              36.71   36.56  
   4 73.02  Texas                 34.95   38.33  
   5 71.72  Indiana               38.45   33.15  
   6 71.14  Alabama               35.82   35.35  
   7 70.04  Tennessee             34.45   35.53  
   8 69.66  Penn State            32.91   36.70  
   9 69.29  Georgia               34.61   34.70  
  10 67.98  Oregon                36.10   31.86  
  11 66.34  South Carolina        31.61   34.73  
  12 65.04  Miami                 40.49   24.55  
  13 64.18  Louisville            35.32   28.82  
  14 63.44  Arizona State         32.56   30.85  
  15 63.26  SMU                   34.80   28.71  
  16 62.48  LSU                   32.55   29.91  
  17 62.43  BYU                   30.26   32.23  
  18 62.27  Florida               30.11   32.15  
  19 62.25  USC                   32.65   29.51  
  20 62.04  Colorado              32.07   30.01  
Rank Rating Team                  Offense Defense
  21 61.92  Clemson               33.46   28.19  
  22 61.34  Texas A&M             30.76   30.46  
  23 60.73  Minnesota             26.14   34.51  
  24 60.65  Iowa                  29.17   31.48  
  25 60.02  Iowa State            29.44   30.56  
  26 59.66  Kansas State          30.14   29.30  
  27 58.98  Baylor                33.68   25.35  
  28 58.94  Michigan              24.86   34.25  
  29 58.83  Kansas                30.44   28.47  
  30 58.22  Missouri              26.68   31.47  
  31 57.92  Virginia Tech         27.26   30.75  
  32 57.65  Oklahoma              26.21   31.49  
  33 57.50  Boise State           31.42   26.07  
  34 57.24  Tulane                29.87   27.48  
  35 56.75  Vanderbilt            27.57   28.84  
  36 56.57  TCU                   30.13   26.36  
  37 55.99  Arkansas              28.73   27.23  
  38 55.77  Illinois              27.77   28.01  
  39 55.68  Auburn                26.05   29.88  
  40 55.09  UNLV                  29.23   25.84  
Rank Rating Team                  Offense Defense
  41 54.92  UCF                   30.85   23.97  
  42 54.85  Nebraska              22.29   32.69  
  43 54.30  Georgia Tech          26.97   27.33  
  44 53.62  Utah                  21.16   32.42  
  45 53.50  Washington            23.48   29.96  
  46 52.99  Texas Tech            34.13   18.79  
  47 52.89  Wisconsin             23.24   29.71  
  48 52.59  Boston College        26.27   26.41  
  49 52.57  Syracuse              30.69   21.78  
  50 52.51  Kentucky              21.23   31.20  
  51 52.42  Army                  22.59   29.82  
  52 52.37  Cincinnati            24.03   28.17  
  53 51.81  Rutgers               28.64   23.13  
  54 51.08  Navy                  25.62   25.17  
  55 50.70  Pittsburgh            28.16   22.76  
  56 50.40  UCLA                  21.47   28.88  
  57 50.11  Marshall              25.46   24.46  
  58 49.08  California            20.52   28.54  
  59 48.88  Texas State           27.98   20.87  
  60 48.76  Memphis               26.78   22.05  
Rank Rating Team                  Offense Defense
  61 48.74  Duke                  22.71   25.89  
  62 48.34  West Virginia         26.91   21.34  
  63 47.92  James Madison         23.13   24.79  
  64 47.67  North Carolina        27.21   20.23  
  65 47.58  Mississippi State     27.64   20.04  
  66 46.73  Ohio                  20.25   26.44  
  67 46.64  Maryland              25.16   21.45  
  68 45.96  Washington State      28.06   17.87  
  69 45.67  Virginia              20.81   24.88  
  70 45.32  Houston               15.16   30.22  
  71 45.19  Oklahoma State        26.23   18.87  
  72 44.55  Old Dominion          23.20   21.47  
  73 44.49  South Alabama         24.50   20.21  
  74 44.21  UConn                 22.08   22.09  
  75 43.99  Miami (OH)            17.69   26.30  
  76 43.99  NC State              24.99   19.24  
  77 43.92  Northwestern          18.81   25.10  
  78 43.76  Georgia Southern      22.50   21.08  
  79 43.71  Louisiana             22.30   21.49  
  80 43.62  Michigan State        17.86   25.76  
Rank Rating Team                  Offense Defense
  81 43.09  Jacksonville State    26.99   16.10  
  82 42.54  Northern Illinois     14.73   28.16  
  83 41.88  Toledo                20.78   21.20  
  84 41.85  Bowling Green         18.97   22.86  
  85 41.47  Arizona               20.43   20.96  
  86 40.53  South Florida         23.15   17.35  
  87 40.48  Stanford              20.92   19.49  
  88 40.25  Fresno State          19.38   20.68  
  89 39.60  UTSA                  23.68   15.57  
  90 39.60  Wake Forest           21.50   18.10  
  91 39.55  Florida State         14.56   25.20  
  92 39.37  East Carolina         21.62   17.81  
  93 39.20  North Texas           25.53   13.52  
  94 39.01  San José State        20.86   18.06  
  95 38.63  Sam Houston           15.17   23.47  
  96 37.10  App State             21.03   16.08  
  97 36.92  Nevada                17.79   19.24  
  98 36.75  Coastal Carolina      21.14   15.54  
  99 36.59  Western Kentucky      16.86   19.67  
 100 36.53  Rice                  15.77   20.85  
Rank Rating Team                  Offense Defense
 101 35.99  Buffalo               19.02   16.89  
 102 35.91  Colorado State        16.46   19.32  
 103 35.83  Troy                  18.00   17.89  
 104 35.44  Oregon State          17.46   17.86  
 105 34.53  UL Monroe             15.08   19.27  
 106 34.26  Florida International 15.73   18.36  
 107 34.10  Western Michigan      20.77   13.18  
 108 33.95  Liberty               16.07   17.95  
 109 33.83  Utah State            25.60   8.11   
 110 33.78  Air Force             11.64   22.18  
 111 33.49  Georgia State         18.50   15.05  
 112 33.46  Wyoming               12.10   21.05  
 113 33.45  New Mexico            26.30   7.16   
 114 32.47  Arkansas State        18.31   14.42  
 115 32.05  Louisiana Tech        10.46   21.61  
 116 31.73  Charlotte             17.56   14.20  
 117 31.50  Eastern Michigan      16.18   15.14  
 118 31.49  Hawai'i               11.94   19.68  
 119 30.99  UAB                   18.84   12.17  
 120 30.44  Purdue                17.92   12.55  
Rank Rating Team                  Offense Defense
 121 30.30  Missouri State        19.89   10.29  
 122 29.57  Florida Atlantic      15.86   13.83  
 123 29.53  Central Michigan      15.29   14.16  
 124 29.18  Akron                 14.70   14.50  
 125 28.88  San Diego State       12.66   16.22  
 126 27.97  Delaware              16.57   11.43  
 127 26.23  Ball State            18.95   7.15   
 128 25.97  Massachusetts         16.77   9.04   
 129 25.35  Temple                11.67   13.68  
 130 22.77  UTEP                  12.43   10.37  
 131 20.32  New Mexico State      12.47   7.85   
 132 20.14  Middle Tennessee      10.33   9.73   
 133 19.90  Kennesaw State        7.41    12.53  
 134 18.82  Southern Miss         9.12    9.71   
 135 15.19  Tulsa                 13.47   1.61   
 136 10.06  Kent State            7.80    2.25

Weekend Game Predictions

There are five games this weekend, so here are predictions for the five games using the computer ratings.

After the team name, there are two numbers. The first is the team’s expected margin of victory (positive) or defeat (negative). The second number is the team’s probability of winning the game.

The estimated score is based on the offense and defense ratings and may not exactly match up with the predicted margin of victory, though it’s very close. The main idea is to give a rough approximation of what score might be expected based on the quality of the two teams, and whether it will be a high or low scoring game.

I try to estimate the quality of the game based on the ratings of the two teams and the likelihood of a competitive game. This is maximized when two highly-rated teams play and are evenly matched. Finally, I show the probability of a blowout, a close game, a high scoring game, and a low scoring game, with the thresholds based on previous games used to train the rating system.

The games are listed in order of their expected quality, not the start time.

1. Iowa State (0.36, 51.17%) vs. Kansas State (-0.36, 48.83%)
Estimated score: 26.50 - 25.94, Total: 52.44
Quality: 97.48%, Team quality: 96.25%, Competitiveness: 99.98%
Blowout probability (margin >= 34.0 pts): 0.54%
Close game probability (margin <= 8.5 pts): 51.29%
High scoring probability (total >= 79.1 pts): 23.53%
Low scoring probability (total <= 26.4 pts): 24.00%

2. Sam Houston (-1.08, 46.48%) at Western Kentucky (1.08, 53.52%)
Estimated score: 20.30 - 21.31, Total: 41.61
Quality: 90.97%, Team quality: 86.86%, Competitiveness: 99.80%
Blowout probability (margin >= 34.0 pts): 0.56%
Close game probability (margin <= 8.5 pts): 51.14%
High scoring probability (total >= 79.1 pts): 15.51%
Low scoring probability (total <= 26.4 pts): 33.98%

3. Stanford (5.86, 68.42%) at Hawai'i (-5.86, 31.58%)
Estimated score: 26.03 - 20.38, Total: 46.41
Quality: 88.47%, Team quality: 85.69%, Competitiveness: 94.31%
Blowout probability (margin >= 34.0 pts): 1.12%
Close game probability (margin <= 8.5 pts): 46.54%
High scoring probability (total >= 79.1 pts): 18.82%
Low scoring probability (total <= 26.4 pts): 29.36%

4. Fresno State (-21.71, 3.79%) at Kansas (21.71, 96.21%)
Estimated score: 15.70 - 37.69, Total: 53.40
Quality: 70.65%, Team quality: 91.85%, Competitiveness: 41.81%
Blowout probability (margin >= 34.0 pts): 15.73%
Close game probability (margin <= 8.5 pts): 13.33%
High scoring probability (total >= 79.1 pts): 24.34%
Low scoring probability (total <= 26.4 pts): 23.21%

5. Idaho State (-38.98, 0.07%) at UNLV (38.98, 99.93%)
Estimated score: 16.11 - 55.13, Total: 71.25
Quality: 30.68%, Team quality: 81.77%, Competitiveness: 4.32%
Blowout probability (margin >= 34.0 pts): 65.81%
Close game probability (margin <= 8.5 pts): 0.63%
High scoring probability (total >= 79.1 pts): 41.60%
Low scoring probability (total <= 26.4 pts): 11.21%

After this weekend’s games, I’ll update the ratings and predict the remaining week 1 games. I probably won’t keep posting new articles with ratings every weekend during the season, and I’ll instead add a static page that will always have updated ratings. But I will post an article with predictions for the remainder of week 1, and I’ll do the same for the first week of NFL action.

This system is still a work in progress, and I intend to verify its accuracy later in the season when the ratings aren’t based heavily on last year’s games. Like I said earlier, it’s also a work in progress, and I’ll post my code for both college football and NFL data on Github once I get it cleaned up and properly commented.

Bring on some football!

These ratings and predictions are based on data from collegefootballdata.com.