In modern baseball, there are many ways to try to analyse a pitcher’s performance. They all have their uses, and they all are helpful when used in the right context. So, why not add some more to the works.
Today, we will look at Expected Run Value of pitches and the concept of pitcher’s “Stuff & Control” as a way to look at which pitchers and which pitches are the best.
The first is a concept that every pitch thrown should have an expected outcome. For example, you would most likely expect a different outcome from a 100mph fastball just out the top of the zone to an 80mph changeup right in the middle. Personally, I’d expect a swing and miss for the first, and a home run for the second.
The second concept is one you may have heard of before, where a pitcher’s skill can be broken down into two main categories: “Stuff” and “Control”. This is based on how good a pitch they can throw (a 100mph fastball is better than an 80mph one) and how well they can locate the pitch in the right place (a curveball that breaks out of the zone is better than one which hangs over the middle).
Building some models
**Talking about Maths, Coding and Machine Learning Warning** Skip to the results section if you don’t want the gory details. All that I describe from here on out was done with R. There are links at the end of the article with further details on the approach I use.
We are going to build some machine-learning models that will take inputs like pitch velocity, spin, and location over the plate to predict a run value for each pitch. I am going to be using Random Forest model to generate this. To do that, we need pitch-level data.
Firstly, we grab the Statcast pitch-level data from Baseball Savant. I do this using R and the ‘baseballr’ package built by Bill Petti. Thanks to both.
Next, for our models, we need an outcome which we are trying to calculate. We will use the linear weights of all events, e.g. singles, doubles, walks, balls and strikes. These linear weights are effectively the average run value of these events during the 2021 season. We do this because the actual outcome (i.e. runs scored) of each pitch is dependent on many other aspects, some beyond the control of the pitcher, and we want to treat them equally. To do this, we need to build some run expectancy matrices.
Run expectancy matrices, or RE24 or RE288, provide the average number of runs scored per inning given the current count, number of outs and placement of baserunners. And therefore, the amount which is expected from that point.
The number “24” refers to the potential number of base-out states (zero outs, one out, two outs, and the eight different baserunner arrangements). Whilst “288” is the same but with the additional 12 states of the count. Here is the RE288, which we need for 2021.
This is colour coded so that anything blue is worse than the run expectancy at the start of an inning (0.51 runs) and anything red better. Personally, I find it very interesting to compare states with similar run expectancies, e.g. the start of an inning has a similar run expectancy to a bases-loaded situation with two outs and 1-2 down in the count. Which would you prefer (assuming your team hadn’t scored other runs)?
With this, and simpler RE24, we can calculate the linear weight for all events. Some samples below.
Now we have our linear weights; these can become our outcomes. Every ball thrown gets a value based on the linear weight of that event of that pitch.
With this, we can start to think about building our machine-learning models. Next, we need to determine what input variables we want to put into it. Normally, this would take a good bit of time to do the exploratory analysis to work out which parts of the Statcast data (it has over 90 columns) to use as features for the model, but I’ve used this data a lot, so we are just going to select a few outright. We are going to test later to see if they are all features worth keeping.
“release_speed”, “release_pos_x_adj”, “release_extension”, “release_pos_z”, “pfx_x_adj”, “pfx_z”, “plate_x”, “plate_z”, “release_spin_rate”, “spin_axis”
These are all pieces of raw data that I believe could impact the outcome of the pitch, but there are also other factors that I could include which are not part of the raw data.
When talking about how good certain pitchers are, we often say it’s not only the pitch that is being thrown but also what other pitches they can throw. The ability of pitchers to change their speeds and movement between pitch types, and the ability to tunnel (throw the pitches down the same path), has an impact on the performance of the pitches.
Based on this, we can build some manufactured features to add to our model and see if they also have an impact. Due to limitations of computing power and time, I’ve just focused on the velocity and movement differences, and not the tunnelling. Although, this is something I might return to at a later point.
With that, I added three additional features to our models.
“release_speed_diff”, “xmov_diff”, “zmov_diff”
These are for non-fastballs, comparing the speed and movement of the off-speed pitch to the fastball’s average speed and movement. This could have potentially been done more intricately by comparing to that game and not the season average, but I felt it was a lot more effort for maybe a minor gain.
I ran these features through a feature selection tool that, based on the data, removes features it doesn’t deem important. None were removed, so we can be happy that these additional features have an impact.
We are now armed to make our run-expectancy models. And I pluralise models because we must build one for each pitch type (four-seamers, sinkers, sliders, etc.) because the traits of each pitch are quite different.
There are many types of machine-learning models that one could choose, but I used a random forest model for this. Random Forests are a way of averaging multiple deep decision trees which could overfit, trained on different parts of the same training set, with the goal of reducing the variance. This comes at the expense of a small increase in the bias and some loss of interpretability, but generally greatly boosts the performance in the final model.
This generated models for each pitch type, and we then applied these models to all the pitches which happened in 2021. This gave us an Expected Run Value (xRV) for each pitch, so we can now determine which pitchers and which specific pitches were the best in 2021. (Note: I had to base some of my models on randomised smaller sample sizes as the compute power/time was too much for my pc using all pitches).
But before we jump to any results, let’s quickly go through how we used this and some more models to generate a “Stuff” metric and a “Control” metric.
My analysis of “Stuff & Control” is based on the simple idea that these two elements make up the whole.
Expected Run Value = Stuff Run Value + Control Run Value
We just built a model which determines the Expected Run Value of a pitch. If we wanted to build a model which looked at just “Stuff” or just looked at “Control”, then we could get the other as well. I determined we could make a Stuff Run Value model if we removed the location data from the previous models.
Some of you might be saying that pitch location isn’t entirely within the pitcher’s control and that the catcher and/or the team may suggest pitching the ball in a location that isn’t best. I’m working on the idea that the pitcher and the team want to throw the pitch in a good location, so if it doesn’t happen, the pitcher is at fault.
So, I re-ran the previous models removing the location data and got some new Expected Run Values based just on “Stuff”, and then used that to generate “Control” values as well.
On to some results.
Expected Run Value
Using the data we have generated above, we have the metric xRV. This is the Expected Run Value over 100 pitches. Negative is good for pitchers, i.e. the pitcher decreases the expected runs from the average, and a positive value is bad. We also have the metric xRV+, which is xRV scaled to 100 as league average, so anything above 100 is good.
So, here are the best pitchers by xRV in 2021 (minimum 1000 pitches).
In a shock to probably no one, the best was Liam Hendriks, and maybe a little more of a shock, the best starter was Julio Urías. The top of the list is littered with relievers, which is only to be expected given that they generally perform at a higher level but over a shortened time.
If we look at total xRV, we can see who was the most valuable across the whole season according to xRV.
None of this is shocking; it is a who’s who of the top performers from 2021, though the order may be different to other pitcher metrics. It is very interesting to me to see Hendriks still in the top 10. His performance as a reliever is better than all but the best starters, even though he pitched less than 40% of the innings.
A shoutout needs to go to Casey Sadler. He didn’t qualify for the 1000-pitch limit, but if you reduce that to 500, he sits atop the list with an xRV of -1.9. And when we break down the type of pitches thrown, you may see why.
So, we know the best pitchers, but what about the best individual pitches. The table below is based on pitches thrown at least 100 times.
Now, this is a bit more shocking. I did not expect Richard Lovelady’s slider to be the best pitch in baseball. All of the top five is relatively shocking, but given the smaller sample sizes, we were likely going to have some surprises. Both Glasnow’s and McClanahan’s curves being up there is a good sign for Rays fans in 2022.
Let’s look at the top for each major pitch type
Stuff & Control
Now let us look at the Stuff & Control metrics but of the addition models. The metrics Stuff+ and Control+ both use an Expected Run Value for “Stuff” and for “Control”, scaled to 100 as league average. Anything above 100 is good.
Who has the best “Stuff”?
Hendriks drops down from the top spot after being leapfrogged by Emmanuel Clase. What this tells us is, by my metrics, Hendriks locates his stuff better than Clase – which is why he is atop the main leader board. Once again, this list is full of relievers, but the presence of deGrom is not surprising at all. Once again, McClanahan in the top 10 is another good sign for Rays fans.
Who has the best “Control”?
On the “Control” front, we see a completely different list of players – a wide array of reliever talent and some very interesting starters. Eduardo Rodríguez was someone I was hoping would do well here, given his profile of below-average speed and spin but above-average expected performance. It’s worth talking about Robbie Ray here, as he is the only one of these pitchers with above-average “Stuff & Control”. This combo may very well be what led him to the Cy Young this year.
Let’s look at the top pitches by “Stuff”. There are some similarities to what we saw in the overall list, but there are also some with great “Stuff” but bad “Control”.
Oh boy, does my model love Tyler Glasnow’s curveball. It just also really doesn’t like where he throws it. It’s the same with Ben Rowen’s slider and Aroldis Chapman’s splitter. Those three were all great pitches but were not well located.
For top pitches by “Control”, you see a mixture of pitches that are below-average when it comes to “Stuff” but are made average by the “Control”. And there are some pitches which are average in “Stuff” but made well above-average by the “Control”.
Casey Sadler is on another list here! I will have to look into what he did differently in 2021 compared to previous seasons because the underlying data suggests that his performance was legit. Even if his 0.67 ERA was an outlier, his xERA and FIP of 2.45 and 2.48 respectively might very well get repeated next year.
Here are the top pitches by stuff by pitch type.
There is at least one pitch in all of these lists which has great “Stuff” but bad “Control”, suggesting that there is some hope for these pitches.
There are a few things that stand out which highlight some of the limitations of these models. For starters, my model doesn’t like deGrom’s control. It really doesn’t like where he throws his pitches, which, to be fair to the model, are in more hittable areas but don’t get hit due to his immense velocity and spin. This is probably a case of an outlier individual who isn’t being given full credit by my model as it is designed not to overfit.
There is plenty more to look at, so I have attached a spreadsheet of the top pitchers and pitches for you to play around with.
Photo by Thearon W. Henderson
Russell is Bat Flips and Nerds’ resident analytical genius, and arguably Europe’s finest sabermetrician. If you’re not following Russell on Twitter @REassom then you’re doing baseball wrong.