Analytics and its effects on the MLB – The Stolen Base

This is the second piece in a series on the effects of analytics on baseball, the first on the bunt can be found here. This time I will be looking at the decline of the stolen base, the involvement of analytics in its decline and why there is a chance for a revival.

So what is a stolen base? A stolen base occurs when a runner advances to a base to which he is not entitled and the official scorer rules that the advance should be credited to the action of the runner. If the defending team makes a mistake (wild pitch, passed ball or error) the awarding of a stolen base can be slightly arbitrary as it is down to a scorer but in the modern game the stolen base is generally given to the runner if they made an attempt to steal before defending team made the mistake.

Why do people like stolen bases? Most people split baseball into four categories, hitting, pitching, fielding and baserunning. Stolen bases are the most visible aspect of baserunning, it easier to see the impact of a stolen base than a runner who manages to get to 3rd base of a soft single when they started on first. Also the potential of base stealing adds to the dynamic of pitcher versus batter, causing pitcher to shorten their wind up to the pitch so baserunners can’t easily steal bases.

Stealing a base is one of the most clearly visible aspects of baseball and has played its part in major games throughout the history of baseball, Jackie Robinson’s steal of home in the 1955 World Series (video), Dave Roberts steal in game 4 of the 2004 ALCS (video), Rickey Henderson in 1991 stealing 3rd base, then lifting it aloft as he broke the all-time steals record (video). These moments live on clear in the memory of fans, so when people think that they might not see moments like this in the future then they are going to be disappointed.

What is the purpose a stolen base? The simple idea behind a stolen base is that by advancing a runner by a base the batting team increases there chance of scoring runs and that is true in all circumstances. You are more likely to score a run, and score more runs, with a runner on second rather than first. But how successful at stealing do you have to be for it to be an overall benefit for the team.

As with bunts the piece of analysis which showed the true value of stealing was the run expectancy table, with this piece of analysis teams were able to work out how much they gained by stealing a base and how much they lost when they were caught stealing. Using the same run expectancy tables (2010-2015) as we did for bunts we get the following (failure for double steal is categorised as runner out at third and safe at second).

From this you can see that to be classed as a successful base stealer you needed to have a success rate better than 70%. If you just looking to steal so just one run can be scored, i.e. bottom of the ninth in a tied game, the success rate is down to 60%. What surprised me in this analysis was the lower success rate required for the double steal to be effective for the team but it due to the negative being less severe due to my definition of runner out at third and safe at second.

If we perform this analysis for each season going back to the 1950s and look at the 0 out stealing of second base scenario we find that the required success rate has fluctuated between 70% and 75% over the last 60 seasons. There is no overall trend up or down but it does follow a factor which I will discuss later.

But until the last decade the overall success rate for stealing hadn’t been above or around the required success rate. That meant that as an overall endeavour stealing bases was actually costing teams runs and not actually increasing them. But we had people stealing 100+ bases in a single season during the 1980s, surely this wasn’t a negative endeavour for them. To check that I compared the success rate of the runners in the past split by the number of attempted steals.

As you can see the average success rate for most categories is still below the required rate with only the players who steal the most being above the required amount. So the top base stealing players were a positive to the team but the lower stealing players were negative to the teams overall performance.  This is one of the main reasons that there was a decrease in the number of stolen base attempts. The gap between the top stealers and the bottom stealers has been narrowing over time but it wasn’t until 2007 that we had more than 50% of the people stealing bases being in the positive.

From the late 1990s onward we have seen a steady decline in the number of attempted stolen bases but it is still more than was seen in the 1950s and 60s. With analytics the teams have managed to bring the success rate of the stolen base up to the required rate overall so why aren’t we starting to see more? This is due to how many runs are being scored in baseball right now and how runs are being scored.

Let’s take Whit Merrifield the top base stealer for 2018, he had 45 stolen bases and was caught stealing 10 times. If we look at the added run expectancy for what he did it was worth 3.4 runs across the season to the Royals and even with the Royals anaemic total runs scored of 638 his base stealing was effectively increased it by 0.5%.

He would have had doubled his stolen base output to get to one percent of the teams output, for one of the lowest scoring teams in 2018. In 1986 Vince Coleman lead the way with 107 stolen bases from 121 attempts, that was worth 15.6 runs in add run expectancy. The Cardinals that year scored 601 runs in total, which means that Coleman’s efforts on base added 2.6% to the Cards run total.

If we compare Vince to the Indians, who were the highest stealing team in 2018, they stole 135 bases and were caught 36 times.  That gave them a net run expectancy gain of 6.5 which when they scored 818 in 2018 means that all that effort account for 0.8% of their runs.

If teams are scoring more runs then the required rate for stealing bases being successful goes up, this is because you gain less from being further round the diamond when teams score more runs. So stealing at an 80% rate isn’t worth as much in the current game and the add run expectancy for stealing goes down.

The expected gain for stealing 40 bases at an 80% rate (40 in 50) in 2018 was 2.3 runs, compare that to the average of the 70s and 80s combined which is 2.9 runs. The run expectancy gain of those stolen bases has fallen by 20% but they overall value is decreased further as teams are scoring more. In 2018 the average runs scored was 721, the average for the 70s and 80s combined was 670. Therefore the value added by those steals was 0.32% and 0.43% respectively, meaning in this scenario the value of those steals has dropped 26%.

As you can see from the above graph, games in the 1970s and 80s were averaging around 8.5 runs a game but when steals started to decline in the start of the 1990s we were seeing 9.5-10 runs a game which reduced the impact of the steal. The number of runs per game has been declining since the peak but just when it was getting to a level where it was comparable (2014) we have seen another uptake in runs scored.

So, too many runs being scored is the second reason that base stealing isn’t as popular as it was in the 1970s and 80s. The final reason why is how teams are scoring their runs now compared to historically, via the home run. The percentage rate of total runs driven by home runs has been dubbed the ‘Guillen Number’, by Joe Sheehan of Baseball Prospectus, after Ozzie Guillen due to his White Sox teams scoring large percentages of their runs by the long ball.

Above is a box and whisker chart for the percentage of runs scored by home run for all the team since 1951. For each year the end of the ‘whiskers’ are the highest and lowest rate (unless there is and outlier, which is shown as a dot), with the ‘box’ showing the middle 50% from the 25th percentile to the 75th percentile and the line in the box showing the median for the year. This chart allows us to see the spread across each team for the year and the overall change over time.

This shows the sharp increase the number of runs driven in by home runs in recent seasons and the gradual increase that occurred in the 1990s which drove up the no. of runs scored per game. The teams with the lowest rate in the last couple seasons are higher than the median values in 1970s and 80s. And the top teams are driving the run by a homer at historically high rates. For 2018 the league median was 40.2%, there were only 2 teams in the 1970s and 8 in the 1980s which bettered that figure.

This affects stealing because it changes the risk profile of stealing. The point of stealing a base is so that you can get a run from a hit instead of two hits or a single instead of a double but if your team is set up to hit home runs why would you take the risk of stealing when you would be driven in by home run if you were on 1st, 2nd or 3rd.  You wouldn’t, and they don’t. Look at 2018 teams like the Yankees and Athletics. The Athletics stole just 35 bases, the least this year, but hit 227 homer runs, 3rd most and the Yankees stole 63 bases, 5th lowest, but hit 267 home runs the most any team has in one season ever.

These two teams are the extreme but when you have teams that are taking this approach to the game you are going to see a decline in the number of stolen bases as whole. So those are the main reasons that we don’t see as much stealing as we did in the 1970s and 80s and most of these don’t look they will change soon.

Is this the slow and inevitable death of the stolen base? No, the stolen base won’t ever die unless they change the rules to ban it. It has been and always will be a risk and reward play and there will be players and teams that are willing to take those risks. Also because of the added psychological impacts on the pitcher as well, teams will always want the pitcher to be worrying about it.

We are not going to see the number of steals we saw historically but the decline in the attempted steal rate is shallower over the last few years, there is one thing which can, maybe, improve that might make stealing numbers increase. The above average speed players should able to get better at stealing.

There were 28 players that stole 20 or more bases last season and everyone had above average sprint speed (defined as 27 ft/s by MLB Statcast). Statcast’s sprint speed is defined as “feet per second in a player’s fastest one-second window” on individual plays. For a player’s seasonal average, the following two types of plays currently qualify for inclusion in Sprint Speed. The best of these runs, approximately two-thirds, are averaged for a player’s seasonal average.

  • Runs of two bases or more on non-homers, excluding being a runner on second base when an extra base hit happens
  • Home to first on “topped” or “weakly hit” balls.

There is a great piece here on how this is calculation done by Baseball Savant, the data visualisation arm of MLB Statcast.

There are two very high up on the stolen base list but whose sprint speed is only just above average, Jose Ramirez (4th) and Jonathan Villar (3rd). This pair both attempted to steal 40 times and were successful 85% and 87.5% of the time respectively, no player who attempted more than 30 steals had a higher success rate. Their sprint speeds are 27.5 ft/s and 27.7 ft/s, of the 401 players who had more than 50 competitive sprint plays this season are 195th and 170th overall.

How are these two stealing at a better rate than the players that are faster than them? That I sadly cannot answer. Only their average sprint speed is publicly available, I cannot see if these two have faster acceleration than normal which helps them over the shorter distance required for stealing a base compared to runs used to calculate the sprint speed.

What I can say is Jose Ramirez — according to Fangraphs — leads all of MLB with runs added via baserunning in 2018 with 12. This a full 33% higher than any other player and nearly double the next player with an above average offensive capability (Mookie Betts 6.9 (ed – Decimal nice) ). He is among the top players for all three of the metrics which make up the baserunning stat, these look at stolen bases, grounding into double plays and other base running.  All of this with an only slightly above average sprint speed.

Ramirez is clearly doing something that other players are not but the data I have available cannot tell me what that is. But the data and metrics that teams have is at a higher quantity and quality to what is publicly available. That is something the teams should be looking at and is a ray of hope to see an increase, probably small, in the number of stolen bases.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.