Friday, February 20, 2009

Analysis of the Significance of Statistical Categories in Fantasy Baseball: How to choose the right scoring categories

Sawyer Campbell

Commissioner, Toddy Hollandsworths Fantasy Baseball League

2009 Season

Abstract

This is meant to inform why we will be using a particular set of statistics to evaluate the scoring of points in our league. A simple combination of mathematical analysis and the concepts of fairness are coupled with the American concept of freedom to determine what categories we should choose to have a fair and balanced league, but with enough room for an individual owner to customize their team according to their liking.

Importance of Proper Statistics Categories

First, I want to establish that we cannot simply use the trendiest stats to base our scoring upon because they have limited intrinsic value. We also cannot use esoteric stats like complete games or hit batsmen because of their statistical insignificance in a head to head league. The proper choice of categories must be balanced to prevent owners from too easily abusing the system. If we recall last year, we had owners who won several categories because they tallied the lowest numbers in those categories, but it was a result of not updating their team. We will no longer reward negligence. An owner also could theoretically have a complete imbalance between their pitching and hitting performance on their team and if there were an uneven number of categories for pitching and hitting, then an owner could would a majority of categories by having a one dimensional team. With a proper choice of stat categories, one can force an owner to have a balanced lineup, but one could still win with an unbalanced lineup; the owner is left with the freedom to choose how to construct their team. This will depend on how the roster is established and will be discussed later.

In a head to head matchup, it is important to have statistically significant categories to judge a team against another. One also wishes to have the same number of categories under pitching as one has under hitting to promote a balanced team. To do this, one must understand that there are two kinds of stats. From here on out, we will call offensive stats like Runs and Hits and pitching stats like Wins and Ks “Sum” stats. A Sum stat is a stat which is a positive whole number which cannot decrease by a player’s poor play, but is rather a totaling of the player’s contribution to the team. We also call these progressive statistics. If a player does not contribute to a team, then there is no negative impact on Sum stats; the impact can only be neutral or positive. The problem with Sum stats is that they inherently promote quantity over quality. An owner could abuse sum stats by playing more batters off the bench and thus increasing their chance of winning the Sum stat categories. To balance these progressive Sum stats one could introduce regressive Sum stats such as Errors or Losses, but these could ultimately lead to an owner winning by neglecting to play a full team which has happened in the past. The correct choice to balance these progressive Sum stats is to include stats that from here on out we will call “Percent” stats.

Percent stats include ERA, OBP, AVG, and WHIP and all are extremely valuable measures of a player’s ability. These Percent stats are different than the monotonic Sums stats (contribution can only be neutral or positive) in that a player’s contribution can be negative, neutral, or positive. The impact of the negative contribution of Percent stats forces the owner to choose quality over quantity. But again, one could only play one amazing player and possibly win the Percent stats so they must be equaled in number by Sum stats to force the owner to field a full team. The end result of a combination of these stats is that an owner is almost forced to yield a full team of quality players that is balanced across the board.

Our Statistics Categories

The categories chosen for our league break down as follows:

Batting / Defense

Sum Stats

Percent Stats

Hits (H)

Batting Average (AVG)

Runs (RBI)

On Base Percentage (OBP)

Runs (R)

Slugging Percentage (SLG)

Stolen Bases (SB)

Fielding Percentage (FPCT)

Pitching

Sum Stats

Percent Stats

Wins (W)

Earned Run Average (ERA)

Quality Starts (QS)

Walks Plus Hits Per Inning Pitched (WHIP)

Strikeouts (K)

Win Percentage (WPCT)

Saves Plus Holds (SVHD)

Save Percentage (SVPCT)

Hitting Categories

One might ask why these particular categories were chosen. Let us break this down first by Batting/Defense categories and then by Pitching Categories. The typical stats of H, R, RBI, and SB are all present to measure an individual player’s contribution to the offense. Other statistics such as Runs Created were looked at, but due to the varying definitions of such stats they were overlooked for now. Last year, we had no way of valuing one type of hit from another and we sought to prevent this from happening this season. One way to do this would be to include the ever popular Home Runs (HR) category, but this does not value a double or a triple over a single and over emphasizes the contributions that a HR gives.

Home Run Stat Analysis

Hitting a Homerun = 1H + 1RBI + 1R +1HR + AVG (All increase)

Hitting any other hit =1H + AVG (only RBIs are possible to add)

HR = 5 categories guaranteed

Any other hit = 2 categories guaranteed, 3 possible

One can see that Home Runs are simply overweighed against other types of hits. To prevent this, we have included the concept of Slugging Percentage (SLG) which is defined as

and takes into account other types of hits by weighting them by the number of bases a batter earns with each hit. What this stat tells us is the average number of bases a player will achieve per at-bat. A SLG of 1.000 equates to a player earning, on average, a base for every at-bat. Another important category is On Base Percentage (OBP) ,which is defined as

OBP indicates the average that a player gets on base for a single plate appearance. An OBP of 1.000 would predict a player getting a base every time they stepped up to the plate. Fielding Percentage (FPCT) has been included and Errors (E) have been left out because Errors are a regressive stat category.

Pitching Categories

Pitching categories are a little bit harder to deal with on a statistical significance level if they are broken down into the individual contributions by the pitchers of specified roles. We have chosen to have eight scoring categories for pitchers which include; W, QS, K, SVHD, WPCT, SVPCT, WHIP, ERA. These categories were chosen in the same methodology as the hitting categories which eliminates regressive stat categories and creates an equal number of Sum categories and Percentage categories that enforce quantity and quality, respectively. Whereas a batter can contribute to all eight offensive and defensive categories, no single pitcher can contribute to all eight pitching categories during one appearance. This breaks down as follows:

Starting Pitcher

Possible Scoring Categories

Not Possible Scoring Categories

W, K, QS, W%, ERA, WHIP

SVHD, SVPCT

Relief Pitcher

Possible Scoring Categories

Not Possible Scoring Categories

W, K, SVHD, SVPCT, W%, ERA, WHIP

QS

One can see that a relief pitcher can impact 7/8 or 87.5% of possible stat categories, whereas a starting pitcher can impact 6/8 or 75% of possible stat categories. This imbalance creates a situation in which an owner would very likely need to have a team which consisted of both starting pitchers and relievers and one kind of pitcher. Regressive categories such as Runs Allowed, Earned Runs, and Losses are gone for reasons stated above. The pitching category is much better balanced than last year’s. Take for example the possible impact of a start on the stats

Last Year

A Win = 1W + WPCT

No Decision = no impact

A Loss = 1L -WPCT

This Year

A Win = 1W + WPCT, QS is possible

No Decision = 0W + 0WPCT, Qs is possible

A Loss = 0W -WPCT, Qs is possible

First, let’s discard stats such as ERA, WHIP, and K which both year’s scoring categories possessed. If we look at the possible contributions by number of categories, then last year a win had an impact of two categories, a no decision had an impact of zero categories, and a loss had an impact of negative two categories. This year, a win can have an impact of one or two categories, a no decision can have an impact of zero or one category, and a loss can have an impact of negative one or zero categories. The result of the scoring this year is that a single decision is just as important as how well the pitcher pitched. It is possible for a loss not to impact a player’s scoring.

Summary

We have evaluated the impact of possible scoring categories and have determined sixteen total categories to use for scoring. We have also established a balanced between pitching and hitting categories and also balances within the pitching and hitting categories by removing regressive Sum stats and having an equal number of progressive Sum Stats and Percentage stats. Further analysis of the chosen scoring categories shows that the overall scoring should be better balanced than the scoring system in last year’s league.

Appendix A: Further analysis of impact of extra players

There are several places on the bench for extra players and every owner can do as they please with these players. If they choose to play these players when a regular starter has no game on that day, then there is a possible measurable impact of playing the bench player in that open spot. The following is an analysis based on categories only. The degree of impact will change with a player’s actual play during a game, but during a week’s time fluctuations will begin to average out.

Extra Batter

Possible Impact

Not Possible Impact

H, R, RBI, SB, AVG, SLG, OBP, FPCT

N/A

An extra batter can impact 100% of possible categories, but should one play an extra batter? Further analysis shows that Sum stats cannot have a negative impact and that Percentage stats can have either a positive or negative impact. Since there are four Sum categories and four Percentage categories, then an extra batter’s contribution can be expressed as

CXB = 4 +/- 4

And therefore, the range of impact of an extra batter can be expressed as

0<= IXB =<>

Starting Pitcher

Possible Scoring Categories

Not Possible Scoring Categories

W, K, QS, W%, ERA, WHIP

SVHD, SVPCT

An extra starting pitcher can only impact 75% of possible categories. Using similar analysis we can see that an extra starting pitcher’s contribution can be expressed as

CXSP = 3 +/- 3


And therefore, the range of impact of an extra starting pitcher can be expressed as

0<= IXSP =<>


Relief Pitcher

Possible Scoring Categories

Not Possible Scoring Categories

W, K, SVHD, SVPCT, W%, ERA, WHIP

QS

An extra relief pitcher can only impact 87.5% of possible categories which is more than an extra starting pitcher can impact. Using similar analysis we can see that an extra relief pitcher’s contribution can be expressed as

CXRP = 3 +/- 4


And therefore, the range of impact of an extra starting pitcher can be expressed as

-1<= IXRP =<>


What is interesting to note is that it could be quite dangerous to play an extra reliever over an extra starting pitcher, but one has to take into account the limited quantity of quality starting pitchers available and then couple that with differences inherent in pitching situations between relief pitchers and starting pitchers.

Conclusion

The impact of an extra player can be measured in a categorical sense by analysis of the possible impact that the extra player could have on the scoring categories available to them. Ultimately, the decision to have extra batters, starting pitchers, or relief pitchers will depend on the available talent and the owner’s individual needs and their willingness to take risks.

2 comments:

  1. holy shit dude, that is a lot of analysis. I'm glad you cared enough about this.

    ReplyDelete
  2. That's why they call him Dr. Dwayne.

    Bravo.

    I didn't like WPCT last year, what was your reasoning for including WPCT over BAA.

    If I had the skrills that you have I would write a formal essay on the inherent weakness in Fantasy Baseball placing such little weight on a player's fielding statistics, to be included in The Fantasy Baseballer, Journal of Fantasy Baseballing, which I presume your essay will launch.

    ReplyDelete