View Single Post
  #1149  
Old 11-22-2021, 02:13 AM
Snowman's Avatar
Snowman Snowman is offline
Travis
Tra,vis Tr,ail
 
Join Date: Jul 2021
Posts: 1,979
Default

Quote:
Originally Posted by BobC View Post
So now back to BABIP. The fact that you have some pitchers that appear to consistenly be above or below the league average BABIP, all the time, leads me to believe there is something other than simple "luck" involved with them being able to do that. At what point (ie: sample size) will a statistician be comfortable in finally admitting there may be some other factor(s) or variable(s) that they haven't been able to effectively measure, quantify, and account for, and as a result just refer to it as "luck". For wouldn't it be true that if they had been able to somehow measure and include all the pertinent factors and variables in their formulas, such as a pitcher like Maddux's ability to have batters consistenly not hit the ball hard or cleanly, that those formulas would in fact show where all things do eventually regress to a mean. Just like they do in the case of flipping coins where it will eventually always come back to show a 50/50 heads or tails probability. In other words, in the case of BABIP, if the statisticians could effectively factor in ALL variables and factors, there would be no outliers, like a Maddux maybe, sitting significantly outside the mean, unless expainable by some other variable or factor, like a lack of a sufficient sample size. But to just simply explain these outliers by attributing those differences to such an amorphous concept or idea as luck, leads me to believe there is an inability, or unwillingness, on the part of those performing the statistical analysis to effectively be able to find and include all the pertinent variables and factors in their formulas. Thus making BABIP maybe the best statistical tool for it's intended purpose they can do for now, but ultimately not the best and closest statistical measure or tool currently out there for use that it could be.
While it may be true that certain pitchers consistently outperform league average BABIP values, that doesn't mean that the pitchers themselves are responsible. As I mentioned earlier, pitching BABIP values revert to the mean, but if you want to get more specific, they actually revert to their current team's unknowable true mean BABIP against, which we can estimate with sample data. I say current team because that team's value depends on their defensive capabilities and/or motivations (the "dog days of summer" is a very real thing, and teams out of contention behave in strange ways, but that's for another thread) at any given point in time. The ballpark is also a factor, as well as playing in the NL or AL as mentioned previously.

Doing some "back of the napkin" math, here's a quick breakdown of how those numbers might look. I say "might" because I'm just using quick maximum likelihood estimates using means here from the past 3 seasons to break these values apart, but it's probably at least directionally accurate with at least a handful of outliers. We could get a much more precise breakdown of how these BABIP values vary by team with more data and by solving for it with a system of equations using linear algebra by setting them all up in a matrix and inverting it, but I'm too lazy to do that. Well, that and it takes more time. But these values were easy to find for each team over the past 3 seasons for both home and away stats, so I did some quick math to break the numbers out into various attributions.

Notes/caveats - The standard deviation in this sample data is ~0.018, or 18 BABIP points, so these are loose estimates, and the true values will vary. But it's still a useful exercise in at least understanding how some pitchers can seemingly be able to "beat the system", when in actuality, they are just benefitting from being in the right park on the right teams. We could test this theory by looking at how pitchers perform before vs after being traded (you'd have to look at the population of all traded pitchers as a whole though, not just a few of them). The column "3Y_Delta" represents the delta between a team's 3-year average BABIP and the MLB average BABIP.

Anyhow, some interesting takeaway approximations from these loose estimates are:

Pitching in the NL appears to be worth a mere -0.002 BABIP points (not sure I buy this, I'd like more data)
Home field advantage is worth around -0.006 BABIP points

Note, the data sufficiently explains the "Koufax effect". And while I didn't run the numbers for the full league back when Maddux was pitching, I did spot check several of the other pitchers on his team during that era, and it appears to sufficiently explain the "Maddux effect" as well.

Per my rough calculations, it appears as though the Dodgers' advantage is more attributable to their defensive abilities than it is to the ballpark. While not all of the values should pass the "smell test" (sample sizes, confidence intervals, blah blah blah), you'll notice that many/most? of these do (e.g., Colorado and Boston have terrible park BABIP effects while Seattle, St Louis, and San Diego all show as being a clear pitcher's parks). Also worth noting is that the teams that do not pass the smell test are more likely to be the teams whose actual BABIP values varied wildly over the past 3 seasons (like the Chicago White Sox, whose ballpark factor of -0.030 does not pass the smell test, but whose BABIP values were all over the place the past 3 seasons with 0.292, 0.268, and 0.303).

I'm not well-tuned to the current defensive abilities of each team though. Perhaps someone paying closer attention can look at these numbers and see if they look directionally accurate as a group. Although also worth pointing out is that these defensive BABIP values are made up of both the players' abilities and team strategies like when to play the shift and where to place your players. Teams that are heavily invested in analytics certainly outperform other teams that are not, with respect to these defensive BABIP values.

See my "back of the napkin math" table below which rank orders teams by their overall 3-year average BABIP performances. Note, we would expect pitchers on these teams to have BABIPs that regress to their team averages more so than to the league average.
Attached Images
File Type: jpg babip_dat.jpg (84.9 KB, 96 views)

Last edited by Snowman; 11-22-2021 at 03:10 AM.
Reply With Quote