Skip to Content, Navigation, or Footer.
Wednesday, Feb. 21, 2024
The Observer

Addressing ZeLO’s flaws

Earlier this week, when I wrote the “ZeLO Picks: Week 2 column,” I mentioned that despite only being Week 2 (or 3, if you count the ever-annoying Week 0), ZeLO had some problems and shortcomings.

And that is entirely true. As good as a 77.3% pick percentage might sound, even when compared to ESPN’s FPI of 84%, there is an enormous chasm between those numbers that go beyond the 6.7% difference. I hope to address some of those problems and then offer a few different solutions.

First things first, I track both ZeLOs and FPI’s performances in more ways than just their win-loss record. While that is a strong indicator of your correctness, there is a difference between me giving someone a 51% chance of winning vs. me giving someone a 90% chance of winning. 

If I am repeatedly more confident (and correct) about a result, I am making a better forecast about the given outcome. 

Well, the same is valid with FPI and ZeLO. While ZeLO has agreed with FPI on a vast majority of the games the two models have picked, FPI tends to be more confident in the teams it picks to win. 

Take, for example, the Week 2 matchup of Marhsall vs. Norfolk State. FPI gave Marshall a 99.3% chance of victory, while ZeLO only gave Marshall an 85.5% chance.

Both models got the pick right, but on the sliding scale of correctness, FPI was “more right.” And there are a lot of instances where that is the case; both models get it right, but ZeLO is a lot less confident in its pick. 

There is a mathematical way to measure this, called a Brier Score or Net Brier Points. Essentially, Brier Points are scored by taking the prediction you made and running it through the following formula: 

25-((Predicted Outcome-100)^2/100) = Brier Score.

So, for example, using our Marshall outcome, since the Thundering Herd won the game, FPI would have scored 24.9951 Brier Points, while ZeLO would have scored just 22.8975. That difference might not seem like a lot, given that it is not even a total three points difference, but through the first Week of College football, the difference in Brier points is almost 285 points.

FPI is just a lot more confident in its picks that it will rack up a large total. I expect FPI’s confidence to lessen as teams like Alabama stop playing Utah State and start playing games where the outcome is more questionable. Still, I think the Brier Point bridge is not going anywhere soon. 

This does help ZeLO a little bit; when the two models have both been wrong, ZeLO does take less of a blow (the Brier Score formula I am using is 538’s, which tends to punish overconfidence). 

Had Marshall been upset, ESPN would have lost 74.986 points while ZeLO would have only lost 74.7 (not a huge difference, but you get the idea). 

There are two fixes that I currently see. The first is also a separate minor issue: FCS teams. I think I scored FCS teams a little bit too strongly, and as a result, ZeLO has much less confidence in picking against them than FPI. That is an easy enough fix and something I plan to incorporate next off-season.  

The second issue is that I think I weighted a lot of the raw statistics too lightly, so there is not enough separation between Alabama and Utah State. That fix is a bit more complicated, but I plan to adjust that too. 

ZeLO’s lack of confidence rolls right into the second glaring issue I observe from the Week 0 and Week 1 data. The second method I use to track ZeLOs stats is just by categorizing picks into four categories and seeing how ZeLO's win-loss is doing in each of those categories.

The first category is Toss-Up (50%-59%), Lean (60-74%, Likely (75%-94%), and Solid (95%+). 

Currently, ZeLO is .523 in the Toss-up category, .686 in the Lean bucket, .950 in the Likely grouping, and 100% for the Solid category. 

At first, the issue might seem like it is the Toss-up category, which is underperforming at 12-12, and while I would like to see the performance rise to around 55%, a 50% performance is okay. 

No, the main problem is how the Likely category is at 97% when realistically, it should be a lot closer to 85%. This suggests that ZeLO has a classification problem, which is just a symptom of the confidence problem that ZeLO has. Instead of having 44 games classified as likely outcomes, ZeLO should have interpreted some of these matchups to be more solids than they were. I think the fixes for this problem are the same as before, change FCS teams and statistical weighting.   

The third problem I have already talked about in other columns is the fondness ZeLO seems to have for Group 5 teams when they match up against Power 5 teams. 

Because ZeLO is working off of stats, and because Group 5 teams perform against each other as Power 5 teams do, the difference between them is somewhat hidden, so when they faceoff ZeLO acts like it is two peers playing one another. The solution is to tweak the recruiting numbers to either recognize the quality of players Power 5 teams are getting or the quantity they are getting. 

Another solution is to be a bit more aggressive with the strength of the schedule component and add a distinction between playing Power 5 teams and Group 5 teams.

The fourth and final issue is how ZeLO plays against the spread. ZeLo is 33.6% against the spread, which is an atrocious mark. 

In part, I think ZeLO’s performance against the spread will improve when the lines are not Minnesota -36. ZeLO is not built to work against those kinds of spreads; a significant percentage of ZeLO's losses against the spread come from ZeLO covering on spreads that are 25+ points. 

In fact I am already seeing some levels of improvement. In Week 0, ZeLO went 5-6 against the spread (.454), then 28-54 in Week 1 (.342) and this weekend 31-50 (.383). The difference could just be a bit of a random sample, but I do think ZeLO’s performance against the spread will improve.

And, of course, by addressing the other issues I have mentioned, I think ZeLO should also improve against the spread. 

Contact Tom Zwiller at