![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
|
#1
|
||||
|
||||
![]()
Great stuff guys. This is a fun thread.
A few things: Snowman, if that model could be created it would be pretty cool. Obviously it would take a lot of work. I have a practical question. Sorry, I don’t know all the terminology, and I really have no idea how a model like that works. If that model were to be created, how would information get processed through the model? For say 1953 or whatever year, would every stat for that year have to be manually input into the model? The idea of how athletes evolve is interesting. Of course humans have slowly gotten bigger, faster, stronger etc over the past 130 years. However, for quality of play in baseball, I’m not sure it is as simple as every year we go forward the quality of play gets a little better. Obviously, there have been social changes that impact this greatly. Quality of play clearly went down during the war years of the early 1940s, and clearly went up in the late 1940s with integration. This is only a guess, but it seems to me, just brainstorming, that quality of play seems especially strong in the 1950s / early 1960s, and also from the late 1980s to around 2000. A high number of very elite players entered MLB in the 1950s. Mantle, Mays, Aaron, Clemente, Jackie Robinson, Frank Robinson, Snider, Berra, Campanella, Banks, Matthews, Koufax, Gibson etc. The upper tier HOFers are seemingly endless for the 1950s and moving to the 1960s for the end of their careers. But it seems like there were far less upper tier HOFers starting out in the 1960s. Brock, Rose, Morgan types are not nearly as impressive as the 1950s list. Similarly, upper tier HOFers starting out near 1970 and early to mid 1980s are not nearly as impressive as the 1950s list. 1970s you have Reggie, Schmidt, Brett, early to mid 1980s you have Rickey Henderson, Ripken, etc. but no where near the top end talent starting out in the 1950s. But then in the mid to late 1980s you add Bonds, Clemens, Griffey, Randy Johnson, Maddux, Pedro, Arod, Jeter, Frank Thomas etc., just a lot of top tier HOFers and it would seem like very high level of play. I guess my question is how much impact do high end HOFers have on the level of play for a time period? The flip side of the argument would be that the “average” type players increased in skill greatly over time, and the “average” players in the league getting better over time could be more impactful than the amount of top end talent at any one time. Anyway, fun stuff to think about. Finally, my understanding is that a high or low BABIP generally is a lucky/unlucky stat. An unusually high (and out of line) BABIP for a pitcher would entail bad luck where a bunch of line drives and grounders happen to get hits. And an unusually low BABIP for a pitcher would be good luck where line drives seem to be hit right at guys etc. How much of BABIP is “good situational pitching” or “good situational defense”? Who knows. But this being said, Maddux is a fascinating pitcher. His control is obviously elite and close to best of all time for control. And not just throwing strikes, but the ability to nibble at the edges of the strike zone. This makes it very hard to make solid contact and should equate to a lower BABIP. That’s just the eye test from watching him. Strikes that are on the corners are difficult to hit hard. It you rarely throw a meat ball and get lots of strikes on the corners then you’d think stats should follow the eye test, just because Maddux was so good with his control. |
#2
|
|||
|
|||
![]() Quote:
Some very insightful points. In particular about the measure of "luck" in regards to BABIP. Kind of like predicting the outcome of flipping a coin and whether it lands heads or tails. That outcome is always a 50/50 probability. And so over time, and all other things constant and equal and assuming a sufficient sample size, anyone flipping coins would eventually expect to see them ending up with exactly half heads, and half tails. To me, I've always thought of this as kind of what is meant by "regressing to the mean", in this case ending up 50/50 on heads or tales. But what is interesting is say you start out flipping coins to test this, and everything being constant and nothing abnormal with the coin, the first 9 flips all come out tails. Now the absolute probabity of a head or a tail is still just 50/50 on that next, 10th flip, or is it? Since over a large enough sample size we expect the number of heads or tails to come up to regress to that expected mean of 50/50 for each of the two possible outcomes, if in starting out with getting tails 9 times in a row, you know you eventually have to start flipping heads, but the probability of each and every single flip is still always going to be just 50/50. So now you have somewhat of a paradox on what the actual probability of flipping a head or tail on all future attempts should be, at least it seems like one to me. So now back to BABIP. The fact that you have some pitchers that appear to consistenly be above or below the league average BABIP, all the time, leads me to believe there is something other than simple "luck" involved with them being able to do that. At what point (ie: sample size) will a statistician be comfortable in finally admitting there may be some other factor(s) or variable(s) that they haven't been able to effectively measure, quantify, and account for, and as a result just refer to it as "luck". For wouldn't it be true that if they had been able to somehow measure and include all the pertinent factors and variables in their formulas, such as a pitcher like Maddux's ability to have batters consistenly not hit the ball hard or cleanly, that those formulas would in fact show where all things do eventually regress to a mean. Just like they do in the case of flipping coins where it will eventually always come back to show a 50/50 heads or tails probability. In other words, in the case of BABIP, if the statisticians could effectively factor in ALL variables and factors, there would be no outliers, like a Maddux maybe, sitting significantly outside the mean, unless expainable by some other variable or factor, like a lack of a sufficient sample size. But to just simply explain these outliers by attributing those differences to such an amorphous concept or idea as luck, leads me to believe there is an inability, or unwillingness, on the part of those performing the statistical analysis to effectively be able to find and include all the pertinent variables and factors in their formulas. Thus making BABIP maybe the best statistical tool for it's intended purpose they can do for now, but ultimately not the best and closest statistical measure or tool currently out there for use that it could be. Last edited by BobC; 11-21-2021 at 06:56 PM. |
#3
|
||||
|
||||
![]() Quote:
From Wikipedia - The gambler's fallacy, also known as the Monte Carlo fallacy or the fallacy of the maturity of chances, is the incorrect belief that, if a particular event occurs more frequently than normal during the past, it is less likely to happen in the future (or vice versa), when it has otherwise been established that the probability of such events does not depend on what has happened in the past. Such events, having the quality of historical independence, are referred to as statistically independent. The fallacy is commonly associated with gambling, where it may be believed, for example, that the next dice roll is more than usually likely to be six because there have recently been fewer than the usual number of sixes. |
#4
|
||||
|
||||
![]() Quote:
Doing some "back of the napkin" math, here's a quick breakdown of how those numbers might look. I say "might" because I'm just using quick maximum likelihood estimates using means here from the past 3 seasons to break these values apart, but it's probably at least directionally accurate with at least a handful of outliers. We could get a much more precise breakdown of how these BABIP values vary by team with more data and by solving for it with a system of equations using linear algebra by setting them all up in a matrix and inverting it, but I'm too lazy to do that. Well, that and it takes more time. But these values were easy to find for each team over the past 3 seasons for both home and away stats, so I did some quick math to break the numbers out into various attributions. Notes/caveats - The standard deviation in this sample data is ~0.018, or 18 BABIP points, so these are loose estimates, and the true values will vary. But it's still a useful exercise in at least understanding how some pitchers can seemingly be able to "beat the system", when in actuality, they are just benefitting from being in the right park on the right teams. We could test this theory by looking at how pitchers perform before vs after being traded (you'd have to look at the population of all traded pitchers as a whole though, not just a few of them). The column "3Y_Delta" represents the delta between a team's 3-year average BABIP and the MLB average BABIP. Anyhow, some interesting takeaway approximations from these loose estimates are: Pitching in the NL appears to be worth a mere -0.002 BABIP points (not sure I buy this, I'd like more data) Home field advantage is worth around -0.006 BABIP points Note, the data sufficiently explains the "Koufax effect". And while I didn't run the numbers for the full league back when Maddux was pitching, I did spot check several of the other pitchers on his team during that era, and it appears to sufficiently explain the "Maddux effect" as well. Per my rough calculations, it appears as though the Dodgers' advantage is more attributable to their defensive abilities than it is to the ballpark. While not all of the values should pass the "smell test" (sample sizes, confidence intervals, blah blah blah), you'll notice that many/most? of these do (e.g., Colorado and Boston have terrible park BABIP effects while Seattle, St Louis, and San Diego all show as being a clear pitcher's parks). Also worth noting is that the teams that do not pass the smell test are more likely to be the teams whose actual BABIP values varied wildly over the past 3 seasons (like the Chicago White Sox, whose ballpark factor of -0.030 does not pass the smell test, but whose BABIP values were all over the place the past 3 seasons with 0.292, 0.268, and 0.303). I'm not well-tuned to the current defensive abilities of each team though. Perhaps someone paying closer attention can look at these numbers and see if they look directionally accurate as a group. Although also worth pointing out is that these defensive BABIP values are made up of both the players' abilities and team strategies like when to play the shift and where to place your players. Teams that are heavily invested in analytics certainly outperform other teams that are not, with respect to these defensive BABIP values. See my "back of the napkin math" table below which rank orders teams by their overall 3-year average BABIP performances. Note, we would expect pitchers on these teams to have BABIPs that regress to their team averages more so than to the league average. Last edited by Snowman; 11-22-2021 at 03:10 AM. |
![]() |
|
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Lefty Grove = Lefty Groves... And Lefty's 1921 Tip Top Bread Card | leftygrove10 | Net54baseball Vintage (WWII & Older) Baseball Cards & New Member Introductions | 12 | 10-15-2019 12:55 AM |
62 koufax ,59 mays,72 mays vg ends monday 8 est time sold ended | rjackson44 | Live Auctions - Only 2-3 open, per member, at once. | 3 | 05-22-2017 05:00 PM |
Final Poll!! Vote of the all time worst Topps produced set | almostdone | Postwar Baseball Cards Forum (Pre-1980) | 22 | 07-28-2015 07:55 PM |
Long Time Lurker. First time poster. Crazy to gamble on this Gehrig? | wheels56 | Net54baseball Vintage (WWII & Older) Baseball Cards & New Member Introductions | 17 | 05-17-2015 04:25 AM |
It's the most wonderful time of the year. Cobb/Edwards auction time! | iggyman | Net54baseball Vintage (WWII & Older) Baseball Cards & New Member Introductions | 68 | 09-17-2013 12:42 AM |