Welcome back Christopher Gauthier
Have you ever been at a game and a player smokes a ball right at someone? Like, had that ball been hit a little more to the left or the right of the fielder, it’s a no-doubt hit? Well, of course you have, it’s a routine occurrence in the sport of baseball. Chances are, after said ball was hit, either you or someone you know said something to the effect of “Man, anywhere else on the diamond and that ball is surely a hit!”
Now, have you ever been at a game and a batter hits a weak grounder that finds a hole, or a lazy blooper that falls just out of an outfielder’s reach? I’m talking weak contact, to the point where you don’t even want to reward the hitter with a base. Again, of course you have, it’s common in the game, a part of its fiber. Chances are, after said ball was hit, the words “Looks like a line drive in the books” were said by you or someone you were with.
When you see batted balls like these, you start to wonder if there’s a better way to evaluate hitting performance, to reward those who were unlucky and correctly knock down those who experienced greater fortune.
This is where expected stats come in.
Thanks to the implementation of MLB’s Statcast system in 2015, fans have access to knowing what was previously unknown. Yes, you knew a ball was hit hard; now you know how hard, at one angle, and, thanks to some very smart people, the probability that it will be a hit in the future. These are the expected stats I refer to. From these hit probabilities based on batted-ball metrics, we can better predict what the hitter’s batting average (as well as other rate metrics) “should have been” had he experienced no good-fortune or misfortune.
Baseballsavant could be considered the “Statcast” version of the MLB stats page. Along with your standard baseball statistics, they provide much more; innovative visualizations, new metrics that better evaluate performance, and a search function that can show you the information for every batted ball from the 2015 season onward. Falling under the category of new metrics are expected stats.
The main analysis that one can conduct with these stats is measuring the difference between a player’s particular rate stat and their accompanying expected rate stat, i.e. a hitter’s batting average and expected batting average, or xBA.
Note: I find batting average to be the most useful stat to analyze using its expected stat because it is the most flawed of the most common hitting rate stats; additionally, the scenarios I described at the beginning would best be investigated with batting average.
First, we take a look at the “luckiest” hitters from the 2019 season (qualified hitters), in terms of the difference between their BA and xBA (the “luckiest” players will have a BA higher than their xBA, meaning they may have achieved more hits for their batted balls than they can expect in the future):
Player | BA | xBA | BA – xBA |
Nolan Arenado | .315 | .272 | .043 |
Tim Anderson | .335 | .294 | .041 |
Kris Bryant | .282 | .246 | .036 |
Leury Garcia | .279 | .247 | .032 |
Ketel Marte | .329 | .299 | .030 |
Trevor Story | .294 | .264 | .030 |
Daniel Murphy | .279 | .250 | .029 |
To be honest, when I first explored this dataset, I expected to see relatively speedy hitters towards the top of this “luckiest” list, and there are some present here. Of the 156 qualified base runners from 2019, here are the sprint speeds and rankings for these players:
Player | Sprint Speed (ft/s) | Ranking |
Nolan Arenado | 25.9 | 127th |
Tim Anderson | 28.7 | 20th |
Kris Bryant | 28.2 | 34th |
Leury Garcia | 28.5 | 25th |
Ketel Marte | 27.9 | 49th |
Trevor Story | 29.2 | 8th |
Daniel Murphy | 25.7 | 134th |
However, what I found most interesting was the presence of some of the Colorado Rockies’ best hitters on this list; 3 of the top 7, in fact.
On the flip side, here are the “unluckiest” hitters from the 2019 season in terms of BA-xBA:
Player | BA | xBA | BA – xBA |
Marcell Ozuna | .241 | .288 | -.047 |
Justin Smoak | .208 | .250 | -.042 |
Jurickson Profar | .218 | .251 | -.033 |
Lorenzo Cain | .260 | .290 | -.030 |
Robbie Grossman | .240 | .265 | -.025 |
C.J. Cron | .253 | .277 | -.024 |
Rougned Odor | .205 | .229 | -.024 |
On this list we see some slower players, so here are the sprint speeds and their respective rankings for each of these players:
Player | Sprint Speed (ft/s) | Ranking |
Marcell Ozuna | 27.4 | 72nd |
Justin Smoak | 23.5 | 154th |
Jurickson Profar | 26.7 | 100th |
Lorenzo Cain | 27.8 | 50th |
Robbie Grossman | 27.5 | 68th |
C.J. Cron | 26.1 | 123rd |
Rougned Odor | 28.0 | 44th |
Notice how, throughout this article so far, I have put the words “luckiest” and “unluckiest” in quotations because the differences in these stats may not entirely be comprised of fortune. By taking a look at sprint speeds, it is possible that there are more parameters of the game affecting these statistics than pure luck.
For this reason, I decided to run a regression analysis to model sprint speed’s effect on a player’s batting average. For this part of the article, however, I opened it up to all qualified hitting seasons from 2015-onward, to give us a greater sample size and hopefully some better insights. Here are the results, in graph form with some accompanying statistics:
Just simply looking at the graph, it seems like there is no relationship between sprint speed and BA-xBA. Upon further inspection using the summary statistics, we can see that there isn’t an obvious relationship. However, from an intuitive sense, the theory makes sense; a weakly hit ball has a better chance of becoming a hit if the batter has the legs to make it happen. Now, I’m afraid I’ve already plunged down the rabbit hole too far, but maybe the effects of other statistics on BA-xBA deserves a greater look in another article (the distinction between a “hitter’s ballpark” versus a “pitcher’s ballpark” as one’s home stadium as an input into this model is of particular interest to me).
For a simpler application, I would like to direct your attention to a particularly hefty contract that was awarded to a player who may have not been deserving.
After a career year in 2017, Eric Hosmer signed an eight-year, $144 million contract with the San Diego Padres. In this career year, Eric Hosmer set career-highs in batting average, on-base percentage, slugging percentage, and weighted on-base average. Displayed below are those stats, with the accompanying rankings among qualified hitters:
Statistic | Value | Ranking |
AVG | .318 | 9th |
OBP | .385 | 15th |
SLG | .498 | 43rd |
wOBA | .376 | 26th |
I would argue using these statistics alone doesn’t warrant the contract Hosmer received, but in a free-agency period with little options at first base, overpaying for a potential franchise pillar could seem reasonable. However, San Diego’s front office seemed to be looking at all the wrong evaluation metrics. Here is the same table for Eric Hosmer, replacing his common rate statistics with their expected counterparts:
Statistic | Value | Ranking |
xBA | .298 | 14th |
xOBP | .368 | 31st |
xSLG | .467 | 60th |
xwOBA | .364 | 48th |
Using these metrics, it is more evident that Eric Hosmer was an overpay, and San Diego has certainly paid the price; in the two years Hosmer has played in a Padre’s uniform, he has gone from a World-Series-winning Royals icon to a below average player, accumulating -0.5 WAR in 2018 and 2019 combined (is accumulating the right word in this case?).
This is probably one of many cases where expected stats could have aided front offices from making poor financial decisions, but this is one of the most glaringly obvious misjudgments, and the first that came to mind. The constant evolution of analytics in baseball is helping teams eliminate risk in player signings; all that is required is teams use the stats readily available to them.