Umpires are more biased than I thought

In the midst of its Postseason, and probably unbeknownst to most of its fans, MLB has taken its next steps in using an automated ball-strike system or “Robo Umps” as I like to call them.

This was first tested in the Atlantic League which is an independent organization. MLB now has taken the next step and will try it with some of the top prospects in the league via the Arizona Fall League. The system will be used all games played at Salt River Fields which is the home of two fall league teams, the Salt River Rafters and Scottsdale Scorpions.

The feature image is home plate umpire Brian deBrauwere checking an iPhone while wearing an earpiece, prior to the start of the Atlantic League All-Star minor league baseball game. The umpire received information about balls and strikes with the device connected to a TrackMan computer system that uses Doppler radar.

To me this is a very important up and coming change for the MLB and one which might significantly change how baseball is played. The reason I personally believe this should be introduced is not solely based on the idea of making sure we have a perfect strike zone, but more by the fact that Major League umpires are biased in their calls (beyond the known bias around count which I have blogged about before).

Firstly let me say that a baseball umpires job is inherently difficult and that most people would not only struggle but fail miserably to do what they do. But I do believe there is some bias in their decisions, it may be conscious or unconscious bias but it does exist. Let’s look at why I have come to this decision.

At the start of this season FanGraphs added catcher framing to its defensive metrics and therefore its WAR. In the piece that introduced pitch framing there was a sentence which caught my attention.

I used a logistic mixed effects model (with batters and pitchers as random effects) to split up credit for extra strikes between pitchers, catchers and dumb luck (or, if you prefer, unattributed variance)

What this means is that the writer, and creator of the model, Jared Cross wanted to determine how to split the credit for these framed pitches and that the batter and/or the pitcher may influence that decision.

The pitch location and catcher framing should be the only two things which influence an umpires decision on whether it’s a ball or a strike. However, if it can be shown that decisions are being influenced by who is hitting or pitching, ten there is clearly some other bias in umpires. Which, for me, is much worse bias than the pitch count bias. I wanted to know if this is true or not and to do that I had to build my own pitch framing model. Thankfully Jared detailed out his method so I could follow it.

As stated before we know that umpires call a different zone depending on the count, the blue contour lines in the images below show where strike calls are a coin flip for each count. This was based on right handers in 2018.

Knowing this I built 24 (the 12 counts x 2 for each handedness) generalized additive models to estimate the probability of a strike based on the location of the pitch from the Statcast data. Using this I can compare the actual strike calls to predicted strike probabilities. I built these models for 2017, 2018 & 2019. (For those interested these GAMs are built using the base model in the MGCV package in R, Called_Strike ~ s(plate_x, plate_z))

I also translated these extra strikes into run value based on the count, which can be roughly translated into saved runs at a rate of 0.135 saved runs per additional called strike.

My framing model is going to start with the prior that only catchers no-one else are responsible for the difference we see. If that was the cases here is the number of called strikes they have gained and framing runs for the top and bottom 10 for each of the last 3 seasons.

Now these framing runs don’t match up perfectly to the ones on FanGraphs, Sports Info Solutions or Baseball Prospectus but the names at the top and the bottom do match up. So we now this model is roughly inline and we can now start to investigate if out prior is correct or not.

I want to see if who is pitching or batting makes a difference, but if we concentrate on singular pitchers or batters we may fall into small sample size issues. Thankfully there are a group of hitters that we combine together, whose performance is significantly different to the league as a whole, pitchers.

Pitchers bat significantly worse than league average but combined they see over 5000 plate appearances which should give us a reasonable sample size to compare how they are treated, they account for around 2.8% of the called balls and strikes over the 3 seasons.  

To see if there was any difference in their treatment I separated pitchers and non-pitchers then looked at their expected called strike rate and their actual called strike rate.

This suggests that pitchers are getting a slightly rougher deal from the umpires but given that about 75% of pitches are virtually gimmes (called strike probability less than 10% or greater than 90%), if we concentrate on just the 10%-90% pitches where the umpire is making a real decision it highlights it even further.

For pitches with a called strike percentage in the 10%-90% range Pitchers have been getting a over 5% more strikes that the expected rate. Form me, this is an alarmingly high difference.

If we bucket the pitches based on called strike probability, table above for 2019, you can see that this difference is occurring across the whole model but most significantly 30%-70%. Which is when umpires are making the difficult decisions.

This would seem to imply that umpires are calling strikes more harshly on pitchers, who are players whose batting performance is significantly worse than average. This is different to the bias we see on count as the umpires are generally compassionate there, shrinking the zone on pitchers counts and expanding it on hitters counts.

This is a disturbing finding for me, this shows that umpires are impacted by who is in front of them. Since writing my piece last year I had softened to idea of the “Robo Ump”, as I had come to appreciate the skill of catcher framing (being an Indians fan and watching a full season of Roberto Perez can do that to you) but these findings put me right back in the we need an automated ball-strike system as soon as possible group.

These are just my initial findings from looking into the called strike and framing models. I am going to continue to look into this data to see if I can find if the trend extends across all players and if there is any bias based on pitcher. Also, I am happy for someone to come and tell me I have got this completely wrong and there isn’t this bias or the difference isn’t significant enough. As it would restore some faith in umpires for me.

One comment

  1. You are right on and I think it has much more to do with who the umpires are betting on than anything else. If you think they’re not betting on the game then what excuse do you have for such incompetence!!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.