Imagine you’re a pitcher in the majors and you’ve done everything you can to prep for a game so that you’re in the best condition. You’ve gone through the game plan with the catcher and coaches, to know where to target certain players and where to avoid. You’ve done all that prep, and then you find out who the home plate umpire is for the day and your plan needs to drastically change. Why? I’ll tell you why?
The difference in the strike zones of some umpires is quite significant, with the extremes adding or removing about half a run per game from the average, just from differences on called balls and strikes.
How do we know this?
Based on the work I did previously where I built a model to estimate the probability of a strike based on the location of the pitch from Statcast data. Using that I could compare the actual strike calls to predicted strike probabilities, but instead of looking at how well catchers performed, I looked at umpires to see if any of them had a strike zone different to the average. I built these comparisons for the 2017, 2018 & 2019 seasons.
With that, I totalled the number of called strikes/balls each umpire had given compared to the expected league average strike zone. Then converted it into a run-value based on the count and divided by the number of games umpired, to give a run-value per game for each umpire.
So, an umpire with a larger than average strike zone will have a negative runs per game as more strikes are called than average, whilst an umpire with a smaller than average zone will have a positive runs per game due to more balls.

Above is the average runs per game difference for all umpires who called at least 15 games from home plate in that season, which was 83 umpires for all three seasons. You can see that the extremes for all three seasons are in the +/- 0.5 runs or higher, meaning the difference one could expect to see between the umpires at the extremes is a run per game.
But is this an issue if we can predict it?
Consistency is the key
When I first found out that umpires haven’t historically done a good job of calling the strike zone, I was initially disappointed. But I came to the realisation that the historical purpose of the umpire is to make sure that we see the ball get hit or swung at as much as possible. I personally came to this conclusion because MLB’s own definition of the strike zone isn’t a precise one.
The Major League Official Rules defines the top of the strike zone at the midpoint between the top of the batter’s shoulders and the top of the uniform pants. The bottom of the strike zone is at the hollow beneath the kneecap; both determined from the batter’s stance as the batter is prepared to swing at the pitched ball.
I concluded that I care more about consistency than I do about what the actual zone is. Then I found out that the average umpire’s zone changes from count to count, so I was disappointed again.
This compassionate bias to whoever is struggling means that identical pitches pitched towards the edges of the zone in 3-0 and 0-2 counts were often getting completely different calls. In the modern baseball analytics world, every team knows that there is a different zone for each count so they should be able to prepare for that.
Are umpires zones consistent?
For umps with more than 15 games in successive seasons, the year to year Pearson correlation coefficient is 0.61, showing that one season’s runs per game is a pretty good indicator at predicting the next season.

So, it is reasonably consistent season to season but how about within season? What is the variance game to game within a season for the umpires? The average SD for Umpire Runs per Game in 2019 was 0.57, with the average mean being 0 runs by design, and our umpires with highest differences have their standard deviations in that ballpark.
Who are they then?
For 2019, here are the umpires with the highest and lowest runs per game impact (based on their calling of the strike zone). A lot of these umpires have been in the top or bottom ten for all three of the seasons that I looked at.


Is this useful?
This info isn’t really useful to many people outside of teams. It could be interesting for fans watching, but where I do see some potential impact is in the daily fantasy sports (DFS) world.
If you are unfamiliar with DFS, the contests are daily and typically utilise a salary cap format, in which players are allotted a maximum budget to spend on athletes for their team, represented as either play money or points. Each athlete has their own cost, with elite athletes having the highest costs.
The contests are split into multiple formats with the most popular being the following two categories: cash games and guaranteed prize pool (GPP). In “Double-up” or “50/50” cash game competitions, players win a prize equal to double their entry fee if they finish with a score within the top 50% of all participants. Guaranteed prize pool contests have higher stakes, using tiered payouts based on finishing in different percentiles or positions of the field of contestants.
In this world the top players are looking to find the tiniest level of details to give them an edge over their competitors. Daily fantasy contests are often won by a minority of skilled professional players who employ elaborate statistical modelling and automated tools that can manage hundreds of entries at once, and identify the weakest opponents. So, these models should be including who the umpire is and if they aren’t they may be losing out compared to their competitors.
For the general public, this may be something you want to think about if you want to see a pitching duel or a sluggers fest. But you should be aware that teams will know about these differences.
The most inconsistent
Also on a side note, one reason why I like this metric is that the umpire with the largest game-to-game runs per game SD is Angel Hernandez. So, what this metric says is that Angel Hernandez is the most inconsistent home plate umpire and on that most of us can agree.