View Single Post
  #104  
Old 04-16-2019, 01:03 PM
darwinbulldog's Avatar
darwinbulldog darwinbulldog is offline
Glenn
Glen.n Sch.ey-d
 
Join Date: Mar 2012
Location: South Florida
Posts: 3,255
Default

Quote:
Originally Posted by nat View Post
It really depends how this counterfactual is spelled out. Let me make it a bit more specific and then offer an answer.

I'm going to assume:

(1) We've got the player starting their rookie year.
(2) We don't know what their career is going to be like in our imaginary world, but:
(3) we do know what their career was like in the real world.

Without (3) you're really asking about which player had the best tools - you're looking for scouting reports on these guys as 20 year olds. But given that we do know what their careers were like in the real world, I think my first cuts to the list will be Griffey and Mantle. Both were injury prone, and Mantle had problems with alcohol. In real life, both were great players. But the probability that if Mantle played his career out again his knees would give out before they did, or his alcoholism would get severe enough that he couldn't play at a top level, are too high for me to be comfortable picking him. (Ditto for Griffey wrt to the injuries. He also just wasn't as great of a player as the others.)

That leaves Aaron, Mays, and Trout. Now, let's assume (as seems reasonable) that a player's possible performances form a normal distribution, with the mean determined by their talent level. That is: if they each replayed their career a zillion times, of the outcomes they generate, 66% of them will fall within one standard deviation of the average outcome, a further 33% will fall within an additional standard deviation of the average, and then there are a few outliers. We are, in effect, being asked to take a chance on one of those zillion possible careers, it's just that we don't know which one.

Now, we do know that in the actual world Aaron and Mays put together superlative careers. That is, we've already picked one possible outcome out of the bag, and it turned out to be a good one. Given that these outcomes form a normal distribution, it is extremely likely that their actual career was relatively close to the expected outcome. (99% probable that it's within two standard deviations, 66% within one.) It's possible, but not terrifically likely, that their actual career was one of the extreme outliers. So we can be reasonably confident that if we picked Aaron or Mays, we'll again get something reasonably close to the career that they actually produced. Now, this still involves quite a bit of uncertainty - that 99% confidence interval covers four standard deviations after all - but it's pretty good.*

Trout, despite being both my favorite Angel and my favorite fish, doesn't allow this kind of confidence because we haven't seen the rest of his career yet. He certainly could end up beating Mays or Aaron, but he hasn't done it yet. Which means that, given our information, the range of possible outcomes on Trout's career is greater than it is for the other two. One way to think about this is that the bell curve of possible careers for Trout is flatter than it is for Mays or Aaron. So, given the additional risk involved in picking him, my second cut would be to eliminate Trout.

It then comes down to which player you think had the better career: Mays or Aaron. I'll pick Mays, but if you want to go with Aaron I'm not going to argue too much.



* Can we be 99% confident that their actual careers are within two standard deviations of their mean career, given that we know that they had great careers? Maybe not. If not, let me given an additional argument. Given that they actually had great careers, their mean performance, whatever it is, has got to be pretty high. And so even if their actual careers were unlikely outliers, their expected career is still going to be good. And, more to the point for this exercise, if we have grounds to think that Aaron's or Mays' career was actually an outlier, we have the same grounds for thinking that Trout's career (so far) is as well. And, given that we know more about Aaron's career than about Trout's, we can still infer that the distribution of possible careers for Trout is flatter than it is for Aaron and Mays.
Good stuff, but in a normal distribution over 4% (not just 1%) of outcomes deviate from the mean by more than two standard deviations. What I would focus on though are the standard errors of the means, which become tiny with all of the data in a 20-25 year career.

Naturally there's going to be some regression toward the mean, as you allude to in your footnote, but that doesn't have any impact on the rank ordering of where you expect the players' careers to end up if you replayed them under slightly different circumstances. Sure, it's possible that Don Mattingly would end up having the best career in MLB history, but it's more likely that Griffey would, more likely still that it would be Mantle, and even more likely that it's Mays.
Reply With Quote