Sunday, March 8, 2026

Baseball: best formulas to evaluate a professional baseball player in the US

 First, Batting:

Here is empirical data and information about mathematical formulas used to evaluate and rate the offensive ability of Major League Baseball (MLB) players in the US. Below, I will provide a comprehensive summary of the most widely recognized and empirically supported formulas and metrics for assessing offensive performance in baseball, along with relevant studies, historical context, and an evaluation of their effectiveness.


Introduction to Offensive Metrics in Baseball

Evaluating offensive ability in baseball involves quantifying a player's contribution to scoring runs, which is the primary objective of a team's offense. Traditional statistics like batting average (BA), home runs (HR), and runs batted in (RBI) have long been used, but they often fail to account for context, such as ballpark effects, or provide a complete picture of a player's value. Over time, advanced statistical metrics (often referred to as "sabermetrics") have been developed to address these shortcomings, leveraging empirical data to create more accurate and predictive formulas.

Below, I will outline the most prominent formulas and metrics for evaluating offensive ability, supported by empirical research and studies where applicable.


Key Formulas and Metrics for Offensive Evaluation

1. Batting Average (BA)

  • Formula: BA = Hits / At-Bats (H/AB)
  • Purpose: Measures the frequency with which a player gets a hit per at-bat.
  • Strengths: Simple and intuitive; historically significant as one of the oldest metrics.
  • Limitations: Ignores walks, extra-base hits (doubles, triples, home runs), and situational context (e.g., ballpark effects or quality of pitching).
  • Empirical Support: While BA is widely reported, studies in sabermetrics (e.g., Lewis, 2003, in Moneyball) highlight its inadequacy as a standalone metric for evaluating overall offensive value. It correlates weakly with run production compared to more advanced metrics.

2. On-Base Percentage (OBP)

  • Formula: OBP = (Hits + Walks + Hit by Pitch) / (At-Bats + Walks + Hit by Pitch + Sacrifice Flies)
  • Purpose: Measures how often a player reaches base per plate appearance, accounting for walks and hit-by-pitches, which BA ignores.
  • Strengths: Stronger correlation with run scoring than BA, as getting on base is critical to offensive production.
  • Limitations: Does not account for the value of extra-base hits (e.g., a home run is weighted the same as a single).
  • Empirical Support: Research by Bill James (1980s, Baseball Abstract) and later studies (e.g., Tango et al., 2007, The Book: Playing the Percentages in Baseball) demonstrate that OBP is a key driver of team success, with a higher correlation to runs scored than BA. OBP was famously prioritized by the Oakland Athletics under Billy Beane, as documented in Moneyball.

3. Slugging Percentage (SLG)

  • Formula: SLG = (Singles + 2Doubles + 3Triples + 4*Home Runs) / At-Bats
  • Purpose: Measures a player’s power by weighting hits based on the number of bases achieved.
  • Strengths: Captures the value of extra-base hits, which are more likely to lead to runs.
  • Limitations: Ignores walks and other ways of reaching base; focuses solely on power.
  • Empirical Support: SLG has been shown to correlate strongly with run production in studies by sabermetricians like Pete Palmer (1984, The Hidden Game of Baseball), though it is less comprehensive than combined metrics.

4. On-Base Plus Slugging (OPS)

  • Formula: OPS = OBP + SLG
  • Purpose: Combines a player’s ability to get on base (OBP) and hit for power (SLG) into a single metric.
  • Strengths: Easy to calculate and provides a more complete picture of offensive ability than BA, OBP, or SLG alone.
  • Limitations: Adds OBP and SLG directly, which may not accurately reflect their relative importance (OBP is generally more valuable than SLG). Also, it is not adjusted for context like ballpark or league averages.
  • Empirical Support: OPS has been widely adopted in baseball analysis due to its simplicity and effectiveness. Studies (e.g., Hakes & Sauer, 2006, in Journal of Economic Perspectives) show OPS correlates strongly with run production, though it is outperformed by more advanced metrics like wOBA (see below).

5. Weighted On-Base Average (wOBA)

  • Formula: wOBA = (0.69Walks + 0.72Hit by Pitch + 0.89Singles + 1.27Doubles + 1.62Triples + 2.10Home Runs) / (At-Bats + Walks + Hit by Pitch + Sacrifice Flies)
    • Note: Weights are adjusted annually based on run values derived from empirical data.
  • Purpose: Assigns run values to each offensive event (walk, single, double, etc.) based on their actual contribution to scoring, providing a more accurate measure of offensive production.
  • Strengths: Contextually weighted and highly correlated with run production; superior to OPS in predictive power.
  • Limitations: More complex to calculate and less intuitive for casual fans; requires annual updates to weights.
  • Empirical Support: Developed by Tom Tango (introduced in The Book, 2007), wOBA is grounded in linear weights derived from play-by-play data. Studies, such as those by FanGraphs and Baseball Prospectus, show wOBA outperforms OPS in predicting team runs scored (e.g., correlation coefficients of ~0.95 for wOBA vs. ~0.90 for OPS).

6. Runs Created (RC)

  • Formula (Basic Version by Bill James): RC = (Hits + Walks) * (Total Bases) / (At-Bats + Walks)
    • More advanced versions adjust for stolen bases, caught stealing, and other factors.
  • Purpose: Estimates the number of runs a player contributes to their team based on their offensive statistics.
  • Strengths: Directly tied to run production; accounts for both on-base ability and power.
  • Limitations: Early versions were less precise and not context-adjusted; can be complex in advanced forms.
  • Empirical Support: Introduced by Bill James in the 1980s, RC has been refined over time. Empirical analysis by James and others shows it closely approximates actual run production at the team level, though it is less commonly used today compared to wOBA or WAR.

7. Wins Above Replacement (WAR) - Offensive Component

  • Formula: WAR is a comprehensive metric that includes offensive, defensive, and baserunning contributions. The offensive component is often based on wOBA or similar run-value metrics, adjusted for position, league, and ballpark.
  • Purpose: Estimates the total value of a player in terms of wins contributed above a replacement-level player (a hypothetical minor-league call-up).
  • Strengths: Context-adjusted (league, ballpark, position) and provides a single number to compare players across roles; offensive WAR isolates batting contributions.
  • Limitations: Complex and dependent on underlying assumptions (e.g., replacement level); not purely an offensive metric unless isolated.
  • Empirical Support: WAR, as calculated by FanGraphs (fWAR) or Baseball-Reference (bWAR), is supported by extensive play-by-play data and regression analysis. Studies (e.g., Baumer & Zimbalist, 2014, The Sabermetric Revolution) validate WAR’s utility in player valuation, with offensive WAR correlating strongly with team success.

8. Expected Weighted On-Base Average (xwOBA)

  • Formula: Similar to wOBA but based on Statcast data (exit velocity, launch angle, etc.) rather than actual outcomes. Weights are applied to predict run value based on the quality of contact.
  • Purpose: Evaluates offensive ability by focusing on the quality of contact rather than results, which can be influenced by luck or defense.
  • Strengths: Removes noise from outcomes (e.g., a well-hit ball caught by a fielder); predictive of future performance.
  • Limitations: Requires advanced tracking data (only available since Statcast’s introduction in 2015); less accessible for historical comparisons.
  • Empirical Support: MLB’s Statcast data and studies by analysts at Baseball Savant show xwOBA is a better predictor of future offensive output than traditional stats, as it accounts for "true skill" (e.g., Drellich, 2017, in The Athletic).

Comparative Analysis: Which Formula is the Best?

The "best" formula depends on the specific goal of evaluation (e.g., simplicity vs. accuracy, historical vs. predictive analysis). Below is a summary based on empirical evidence and expert consensus:

  • For Simplicity and Broad Understanding: OPS is widely used and accessible, with a strong correlation to run production. It is a good starting point for casual analysis.
  • For Accuracy and Run Production: wOBA is considered the gold standard for evaluating offensive ability in modern baseball. Its linear weights are empirically derived and consistently outperform OPS and traditional stats in predictive models (Tango et al., 2007; FanGraphs studies).
  • For Comprehensive Value: WAR (specifically its offensive component) is ideal for comparing players across positions and eras, as it adjusts for context. It is widely used by teams and analysts for player valuation (Baumer & Zimbalist, 2014).
  • For Predictive Power: xwOBA, leveraging Statcast data, is the cutting-edge metric for forecasting future performance by focusing on quality of contact rather than outcomes.

Empirical Consensus: Studies and practical applications (e.g., MLB front office strategies, FanGraphs, Baseball Prospectus) overwhelmingly favor wOBA and WAR for their precision and grounding in data. For instance, research by Tango et al. (2007) and ongoing validations by Statcast demonstrate that wOBA and xwOBA have the highest correlations with actual and expected run production (R² values often exceeding 0.9).


Relevant Studies and Resources

  1. Bill James’ Baseball Abstract (1980s): Introduced concepts like Runs Created and emphasized OBP over BA, laying the foundation for modern sabermetrics. Empirical analysis was based on historical box scores and team-level run production.
  2. Pete Palmer’s The Hidden Game of Baseball (1984): Developed linear weights for offensive events, a precursor to wOBA, using regression analysis on historical data.
  3. Michael Lewis’ Moneyball (2003): Popularized the use of OBP and sabermetrics in MLB front offices, with empirical evidence from the Oakland Athletics’ success in the early 2000s.
  4. Tom Tango et al., The Book: Playing the Percentages in Baseball (2007): Introduced wOBA and provided detailed empirical analysis using play-by-play data to derive run values for each offensive event.
  5. Hakes & Sauer, Journal of Economic Perspectives (2006): Academic study confirming the undervaluation of OBP in player salaries pre-Moneyball and its strong link to team wins.
  6. Baumer & Zimbalist, The Sabermetric Revolution (2014): Evaluates the impact of WAR and other metrics on player evaluation, supported by statistical analysis of MLB data.
  7. Statcast Research (MLB.com, Baseball Savant, 2015-present): Provides empirical support for xwOBA and other quality-of-contact metrics, showing higher predictive validity than traditional stats.

Practical Application in MLB

Modern MLB teams rely heavily on advanced metrics like wOBA, WAR, and xwOBA for player scouting, contract negotiations, and in-game decisions. For example:

  • The Houston Astros and Los Angeles Dodgers, known for their analytical approaches, use wOBA and Statcast data to identify undervalued players with high expected outputs (Drellich, 2017).
  • FanGraphs and Baseball-Reference publish leaderboards for these metrics, which are used by analysts and fans alike to rank players. For instance, in 2023, Shohei Ohtani led MLB with a wOBA of approximately 0.412 and an offensive WAR of 6.1 (FanGraphs data), reflecting his elite offensive ability.

Conclusion

The best formulas for evaluating the offensive ability of MLB players are wOBA for pure offensive production, WAR for contextual value, and xwOBA for predictive analysis. These metrics are grounded in extensive empirical data, including play-by-play records, Statcast tracking, and regression models, as validated by decades of sabermetric research. While traditional stats like BA and OPS remain useful for quick assessments, they are empirically inferior to advanced metrics in terms of accuracy and predictive power.


Next, Pitching:

Here is a comprehensive summary of the methods and formulas used to evaluate and rate pitchers in Major League Baseball (MLB) in the US. This response will cover traditional and advanced metrics for assessing pitching performance, supported by empirical data, relevant studies, and an analysis of the best approaches for rating pitchers. I'll focus on both effectiveness in preventing runs and predictive measures of skill, providing a full picture of the current landscape of pitcher evaluation.


Introduction to Pitcher Evaluation in Baseball

Pitchers play a critical role in baseball by preventing the opposing team from scoring runs. Evaluating pitchers involves assessing their ability to limit hits, walks, and runs, as well as their overall contribution to team success. Traditional statistics like wins, losses, and earned run average (ERA) have historically dominated pitcher evaluation, but they often fail to account for factors outside a pitcher’s control, such as defensive support or ballpark effects. Modern sabermetrics has introduced advanced metrics to address these issues, using empirical data to isolate a pitcher’s true skill and value.

Below, I will outline the most prominent formulas and metrics for rating pitchers, supported by empirical research and studies where applicable, and provide a comparative analysis of their effectiveness.


Key Formulas and Metrics for Pitcher Evaluation

1. Earned Run Average (ERA)

  • Formula: ERA = (Earned Runs Allowed * 9) / Innings Pitched
  • Purpose: Measures the average number of earned runs (runs not resulting from errors) a pitcher allows per nine innings.
  • Strengths: Simple and widely understood; historically significant as a primary measure of pitcher effectiveness.
  • Limitations: Heavily influenced by factors outside a pitcher’s control, such as defense, ballpark dimensions, and luck on balls in play (e.g., a poorly hit ball might become a hit due to bad fielding). Also, it doesn’t account for unearned runs or situational context.
  • Empirical Support: While ERA remains a staple in baseball analysis, studies (e.g., Tango et al., 2007, The Book: Playing the Percentages in Baseball) show it is less predictive of future performance compared to skill-based metrics. ERA correlates moderately with team success but often over- or undervalues pitchers due to external factors.

2. Wins and Losses (W-L Record)

  • Formula: A win is credited to the pitcher of record when their team takes the lead and holds it; a loss when their team trails and fails to recover.
  • Purpose: Traditionally used to gauge a pitcher’s success in contributing to team victories.
  • Strengths: Easy to track and historically significant (e.g., Cy Young Award often considered wins).
  • Limitations: Highly dependent on team performance, run support, and bullpen effectiveness. A great pitcher on a poor team may have a losing record, while a mediocre pitcher on a strong team may rack up wins.
  • Empirical Support: Sabermetric research (e.g., Bill James, 1980s, Baseball Abstract) and later studies (e.g., Baumer & Zimbalist, 2014, The Sabermetric Revolution) demonstrate that W-L records are poor indicators of individual pitching skill, with low correlation to true value.

3. Strikeouts per Nine Innings (K/9)

  • Formula: K/9 = (Strikeouts * 9) / Innings Pitched
  • Purpose: Measures a pitcher’s ability to strike out batters per nine innings, reflecting dominance and control.
  • Strengths: Strikeouts are a direct result of pitcher skill, largely independent of defense; high K/9 often indicates elite stuff.
  • Limitations: Ignores other outcomes (e.g., walks, hits); doesn’t measure run prevention directly.
  • Empirical Support: Research by FanGraphs and Baseball Prospectus shows K/9 correlates with pitcher effectiveness, especially for modern pitchers who prioritize strikeouts. Studies (e.g., Tango et al., 2007) note strikeouts as a key component of “true talent” metrics.

4. Walks per Nine Innings (BB/9)

  • Formula: BB/9 = (Walks * 9) / Innings Pitched
  • Purpose: Measures a pitcher’s control by calculating walks allowed per nine innings.
  • Strengths: Walks are under a pitcher’s control and directly impact run prevention (via on-base percentage allowed).
  • Limitations: Doesn’t account for hits or other outcomes; less informative on its own.
  • Empirical Support: Walk rates are a critical factor in run prevention, as shown in linear weights analysis (Palmer, 1984, The Hidden Game of Baseball), and are often paired with K/9 to assess control and dominance.

5. Strikeout-to-Walk Ratio (K/BB)

  • Formula: K/BB = Strikeouts / Walks
  • Purpose: Balances a pitcher’s ability to strike out batters with their tendency to issue walks, reflecting overall command.
  • Strengths: Combines two skill-based metrics; higher ratios often indicate better pitchers.
  • Limitations: Ignores hits and other outcomes; not a complete measure of effectiveness.
  • Empirical Support: K/BB is widely used in sabermetrics as a quick gauge of pitcher skill. Studies (e.g., Tango et al., 2007) show it correlates with run prevention better than ERA in many cases.

6. Walks and Hits per Inning Pitched (WHIP)

  • Formula: WHIP = (Walks + Hits) / Innings Pitched
  • Purpose: Measures how many baserunners a pitcher allows per inning, a key indicator of run prevention.
  • Strengths: Simple and effective; accounts for both hits and walks, which directly lead to runs.
  • Limitations: Doesn’t differentiate between types of hits (e.g., singles vs. home runs); influenced by defense and luck on balls in play.
  • Empirical Support: WHIP correlates strongly with ERA and run prevention (FanGraphs studies), though it is less precise than advanced metrics due to defensive noise.

7. Fielding Independent Pitching (FIP)

  • Formula: FIP = ((13Home Runs + 3(Walks + Hit by Pitch) - 2*Strikeouts) / Innings Pitched) + Constant
    • The constant (typically around 3.10) adjusts FIP to match the league-average ERA scale.
  • Purpose: Estimates a pitcher’s ERA based solely on outcomes they control (strikeouts, walks, hit-by-pitches, home runs), ignoring defense and luck on balls in play.
  • Strengths: Isolates pitcher skill; more predictive of future performance than ERA.
  • Limitations: Overemphasizes home runs (assumes all are pitcher’s fault, ignoring ballpark effects); ignores sequencing of events.
  • Empirical Support: Developed by Tom Tango (introduced on Baseball Prospectus), FIP is grounded in empirical data showing strikeouts, walks, and home runs as the primary drivers of pitcher-controlled outcomes. Studies (e.g., McCracken, 2001, Baseball Prospectus) on Defense Independent Pitching Statistics (DIPS) validate FIP’s superior predictive power over ERA (R² often ~0.6 for future ERA vs. ~0.3 for past ERA).

8. Expected Fielding Independent Pitching (xFIP)

  • Formula: Similar to FIP, but replaces actual home runs with expected home runs based on fly ball rate (assuming league-average HR/FB rate, typically ~10-15%).
  • Purpose: Adjusts FIP for variability in home run rates, which can be influenced by luck or ballpark.
  • Strengths: More stable than FIP; better accounts for random variation in home run outcomes.
  • Limitations: Assumes league-average HR/FB rate, which may not apply to pitchers with unique skills or home parks.
  • Empirical Support: xFIP, also developed by Tango, is supported by regression analysis showing HR/FB rates regress heavily to the mean over time (FanGraphs studies). It is often preferred over FIP for predictive analysis.

9. Skill-Interactive ERA (SIERA)

  • Formula: SIERA uses a complex regression model incorporating strikeouts, walks, ground ball rate, and interactions between these factors (exact formula proprietary but available on FanGraphs).
  • Purpose: Estimates ERA based on pitcher-controlled skills, accounting for interactions (e.g., high strikeout pitchers benefit more from ground balls).
  • Limitations: More complex and less intuitive than FIP or xFIP; still not fully context-adjusted.
  • Strengths: More accurate than FIP or xFIP in predicting future ERA by capturing nuanced skill interactions.
  • Empirical Support: Developed by Matt Swartz and Eric Seidman (2010, Baseball Prospectus), SIERA outperforms FIP and xFIP in predictive studies (e.g., FanGraphs analysis shows higher R² for future ERA, ~0.65).

10. Wins Above Replacement (WAR) - Pitching Component

  • Formula: WAR for pitchers combines run prevention (often based on FIP or RA9, runs allowed per 9 innings) with innings pitched, adjusted for league, ballpark, and replacement level.
  • Purpose: Estimates a pitcher’s total value in wins contributed above a replacement-level pitcher.
  • Strengths: Context-adjusted and comprehensive; allows comparison across eras and roles (starters vs. relievers).
  • Limitations: Dependent on underlying metrics (e.g., FIP-based WAR vs. ERA-based WAR can differ); not purely skill-based if using RA9.
  • Empirical Support: WAR, as calculated by FanGraphs (fWAR, FIP-based) or Baseball-Reference (bWAR, RA9-based), is validated by extensive data analysis (Baumer & Zimbalist, 2014). Pitching WAR correlates strongly with team success and player valuation.

11. Expected ERA (xERA) via Statcast

  • Formula: Uses Statcast data (exit velocity, launch angle, etc.) to predict ERA based on quality of contact allowed, rather than actual outcomes.
  • Purpose: Evaluates pitcher skill by focusing on contact quality, removing luck and defensive effects.
  • Strengths: Predictive of future performance; isolates true talent better than ERA.
  • Limitations: Requires Statcast data (only since 2015); less useful for historical analysis.
  • Empirical Support: MLB’s Statcast research (Baseball Savant) shows xERA outperforms traditional ERA in forecasting future results, as it accounts for “true skill” (e.g., studies by Drellich, 2017, The Athletic).

Comparative Analysis: Which Formula is the Best?

The "best" metric for rating pitchers depends on the evaluation’s purpose (e.g., historical analysis, predictive power, or simplicity). Below is a summary based on empirical evidence and expert consensus:

  • For Simplicity and Broad Understanding: ERA and WHIP are accessible and widely reported, providing a quick snapshot of run prevention and baserunner allowance. However, they are influenced by external factors.
  • For Skill Isolation: FIP and xFIP are the gold standards for isolating pitcher-controlled outcomes. FIP is ideal for current performance, while xFIP adjusts for home run variability and is better for prediction.
  • For Predictive Power: SIERA and xERA (Statcast-based) are cutting-edge metrics for forecasting future performance. SIERA captures skill interactions, while xERA leverages contact quality data for superior accuracy.
  • For Comprehensive Value: WAR (pitching component) is ideal for overall valuation, adjusting for context (league, ballpark) and comparing pitchers across roles and eras.

Empirical Consensus: Studies and practical applications (e.g., MLB front office strategies, FanGraphs, Baseball Prospectus) favor FIP, xFIP, SIERA, and WAR for their precision and grounding in data. Research by Tango et al. (2007), McCracken (2001), and Statcast validations show these metrics have higher predictive correlations (R² often 0.6-0.7 for future ERA) compared to traditional stats like ERA (R² ~0.3).


Relevant Studies and Resources

  1. Bill James’ Baseball Abstract (1980s): Critiqued traditional metrics like W-L records and introduced run-based valuation, laying groundwork for modern pitching metrics.
  2. Pete Palmer’s The Hidden Game of Baseball (1984): Used regression analysis to weight pitcher outcomes (strikeouts, walks, etc.), influencing later metrics like FIP.
  3. Voros McCracken, Baseball Prospectus (2001): Introduced Defense Independent Pitching Statistics (DIPS), showing pitchers have little control over balls in play. This seminal work underpins FIP and related metrics, supported by empirical play-by-play data.
  4. Tom Tango et al., The Book: Playing the Percentages in Baseball (2007): Refined DIPS into FIP and xFIP, with empirical validation of strikeouts, walks, and home runs as primary skill indicators.
  5. Matt Swartz & Eric Seidman, Baseball Prospectus (2010): Developed SIERA, showing improved predictive power over FIP via skill interaction models.
  6. Baumer & Zimbalist, The Sabermetric Revolution (2014): Evaluates WAR’s impact on pitcher valuation, supported by statistical analysis of MLB data.
  7. Statcast Research (MLB.com, Baseball Savant, 2015-present): Validates xERA and contact-based metrics, showing higher predictive accuracy for future performance.

Practical Application in MLB

Modern MLB teams rely on advanced metrics like FIP, SIERA, WAR, and xERA for pitcher scouting, development, and in-game strategy. For example:

  • The Tampa Bay Rays and Cleveland Guardians, known for pitching development, use FIP and Statcast data to identify undervalued pitchers with high strikeout and ground ball rates (e.g., reports by Drellich, 2017).
  • FanGraphs and Baseball-Reference leaderboards rank pitchers by these metrics. In 2023, Spencer Strider led MLB with a FIP of approximately 2.85 and a pitching WAR of 5.5 (FanGraphs data), reflecting elite performance.

Additional Considerations: Starting Pitchers vs. Relievers

  • Starters: Metrics like WAR and innings pitched are crucial, as durability and volume matter. ERA and FIP are often used over full seasons to assess consistency.
  • Relievers: Metrics like K/9, WHIP, and specialized stats (e.g., Leverage Index for high-pressure situations) are prioritized due to smaller sample sizes and situational roles. WAR is less effective for relievers due to lower inning totals.
  • Empirical Note: Studies (e.g., Tango et al., 2007) show reliever performance is more volatile, so skill-based metrics like FIP and xERA are preferred over ERA for small-sample analysis.

Conclusion

The best formulas for rating MLB pitchers are FIP and xFIP for isolating skill, SIERA and xERA for predictive analysis, and WAR for comprehensive value. These metrics are grounded in extensive empirical data, including play-by-play records, Statcast tracking, and regression models, as validated by decades of sabermetric research. Traditional stats like ERA and W-L records remain useful for historical context but are empirically inferior due to external noise.

No comments:

Post a Comment

US Football: best formulas to evaluate a quarterback in US professional football

 Here is an exploration of the topic of mathematical formulas for evaluating and rating the ability of professional football quarterbacks. B...