Net54baseball.com Forums - View Single Post - Slightly OT

nat · #3 11-08-2019, 11:18 AM

Let me give a little primer on how WAR is calculated. This will give you the general idea, not the details (and might elide some complications), and I promise no complicated formulas in this post.

The fundamental component of a WAR calculation is a set of measures known as linear weights. We'll do offensive statistics first. Here's how they work:

At any given time during a baseball game, for any given base, either a runner is on it or not, and there are some number of outs. Call any such situation a 'game state'. Thus: runner on first, no body out, is a game state. So is: bases loaded, two outs. And so on.

For any game state, it's possible to determine, historically, how many runs have scored, on average, from that state to the end of the half inning in which it occurs. For example, from 2010-2015, with no one out and no one on base, on average a team will score 0.48 runs. That's the "run expectancy" for that particular state (during that particular time frame).

Call an 'event' anything that changes the game state. So hitting a single is an event, stealing a base is an event (so is getting caught stealing), and so on. The "run expectancy" of a given event is the change in the run expectancy of the state before the event to the run expectancy of the state after the event, plus any runs that actually scored. So if there's no one out and no one on, and you hit a triple, the run expectancy of the state you left is 0.48 (that's how many runs you can expect to score with no one on and no one out) and the run expectancy of the state that you entered is 1.35 (that's how many runs a team scores, on average, with a runner on third and no one out). Since you didn't drive anyone in you don't need to worry about any runs scoring on this play, so the run expectancy of your triple was (1.35-0.48) or 0.87.

Now, we're interested not in run expectancies of particular triples, but of triples in general. To find that, you add up the run expectancy of all the triples that have been hit (in the time frame that interests you), and take the average. That gives you the "expected run value" of a triple. (Of course you can do the same thing with every event type.)

So for any player, you can figure out the expected run value of everything that they did on offense by adding up the expected run value of each of the things that they did. How many singles did they hit? Multiple that by the expected run value of a single. How many doubles did they hit? Multiple that by the expected run value of a double. Add those two numbers. Then do the same for triples, strike outs, home runs, ground into double plays (obviously the expected run value will be negative for the bad things), and so on.

WAR modifies this figure to account for differences in, e.g., the park a player plays in. It's easier, for example, to hit home runs in Wrigley than in Oakland. You can figure out how much easier by looking at the frequency with which home runs are hit at one park versus another. (And ditto for any other event.) These are called 'park effects', and you use them to adjust an individual players' expected run values. So what you get is that if you have one guy who plays his home games in Wrigley and another who plays his home games in Oakland, if their stats are otherwise identical, the guy playing in Oakland will have a greater number of expected runs. (Because relative to his context - the ballpark that depresses offense - what he did was more valuable.)

Now, that gives you a player's expected run value. (Let's call him Jim - I need a name for this player.) You need then to compare it to replacement level. Replacement level players are the guys who bounce back and forth between AAA and the end of the MLB bench. For most MLB teams, the worst player on the roster is going to be roughly replacement level. (It can be a bit higher or a bit lower, but it's usually going to be somewhat close.) IIRC WAR calculations actually use a percentage of MLB average as replacement level, but if you wanted to figure it out empirically it wouldn't be too hard. Find the guys who occupy the last spot on the roster for each time, and calculate their expected run values. Then subtract Jim's expected run value from that of the replacement level player and you have his expected offensive runs above replacement.

Converting that into wins above replacement is pretty easy. In recent years, a combination of scoring/preventing 10 runs will, on average, win a game for a team. So you take Jim's offensive runs above replacement and divide by 10. That gives you his offensive wins above replacement (or oWAR).

>>
Now for defense.

For current players we have play-by-play data available, showing precisely where each player made each play. Let's talk about current players. (Older players introduce complications that aren't super relevant when comparing them to each other, but do muck things up when comparing them to current players. If anyone is interested I can do a follow up post explaining the difference, but for now I'm content to talk about current players.)

You split the field up into a grid. For each fielder, for each spot on the grid there is some probability that he will make a play at that spot on the grid, and a run expectancy for failing to make a play at that spot. Both of those values you can figure historically: what percentage of the time have shortstops, for example, actually managed to make a play in one particular part of the grid? When they failed to make a play there, what happened? On average, how many expected runs did the team on offense pick up? (That would be calculated as above.)

So say that Jim is a shortstop. The grid squares right around him will be ones at which almost every shortstop makes a play almost every time. The value of Jim making a play there will be very low - because almost anyone they stuck at shortstop would have made that play. But the further you get away from where a shortstop stands (and yes, shifting makes calculating this stuff a nightmare), the lower the probability is that a shortstop will make a play there. So the run value of Jim making a play, at any spot on the grid, is the probability that an average shortstop would NOT make a play at that spot, times the expected run value (for the offensive team) of that play not being made. So if an average shortstop has only a 50/50 chance of getting to a ball and making a play on it, and balls hit to that spot on the grid have a run expectancy of .4 (basically: they're usually singles), then Jim gets credit for saving .2 runs if he makes a play there. Add up all the plays that Jim makes and you get his expected runs saved above replacement.* Notice that no additional adjustment for replacement level is necessary, we've already accounted for the difference between Jim and another guy who might play shortstop for his team.

*(Replacement level for defense is regarded as average MLB play. There are lots of AAA players who would be average MLB players defensively. It turns out that hitting the ball is harder than fielding it.)

You then take Jim's defensive runs saved (on baseball-reference this is listed as 'Rfield' under "Player Value -- Batting", I don't know why they don't include it under the "fielding" heading, but they don't), and divide by 10. (To convert runs to wins.)

The last thing that you need to do is to apply a positional adjustment. It's harder to find a shortstop who hits 300/400/500 (for example) than it is to find a leftfielder who can accomplish that. And so players who play difficult defensive positions get a bonus and those who play easy ones (or DH) get a penalty. The bonus/penalty is pro-rated based on number of games played at each position. Baseball-reference calls this figure 'Rpos'. Again, divide by 10 to turn runs into wins.

You basically add that all up to get a player's WAR. (That's not 100% true, there's a little double counting that you need to take care of - this is one of those complications that I'm going to skip over since this is just meant to be an introduction.) Fangraphs has a more complete discussion, if you want the details. But notice that there's nothing "theoretical" about this, a player's WAR is literally a function of what he did on the baseball field, and what other players did on the baseball field.

This is long enough already, so I'm not going to discuss pitcher's WAR. The calculations are different, but the general idea is the same.

Now, it is important to remember what WAR measures and what it doesn't. Statistics are tools, and if you try to drive a nail with a belt sander, you're probably going to get into trouble. WAR measures what it says that it does: wins above replacement player. A little elaboration on that might be helpful though. Because replacement level is replacement level in the league it doesn't tell you how many more wins Jim got for his team than the guy playing short at AAA would have. It says, basically, imagine that we drop Jim into an arbitrarily selected team in the league, how many more (or fewer, if Jim is really bad) would that team likely win. This is often a useful thing to know, but as with any tool, keep in mind what it's meant for and what it's not.