Riddler: Baseball Hall of Fame

Challenge

Here’s the description from the 538 Riddler column.

Derek Jeter and Larry Walker were just elected to the Baseball Hall of Fame! That got Stephanie thinking. Suppose there are 20 players on the ballot and 400 voters in a given year. Each voter can select up to 10 players for induction without voting for any given player more than once. To gain entry, a player must have been selected on at least 75 percent of the ballots.

Under these circumstances, what is the maximum number of players that can be inducted into the Hall of Fame?

Approach

Let’s define:

  • The number of ballots \(b = 400\).
  • The number of available votes per ballot \(a = 10\).
  • The threshold for the fraction of ballots to induct, \(t = 3/4 = 0.75\).

Then, we have two simple expressions for the total number of votes \(v\) and the minimum number of votes required to induct \(i\):

\[ \begin{aligned} v & = ba \\ i & = bt \\ \end{aligned} \]

And then we can express \(c\), the maximum number of inductees.

\[ \begin{aligned} c & = \left\lfloor \frac{v}{t} \right\rfloor = \left\lfloor \frac{ba}{bt} \right\rfloor = \left\lfloor \frac{a}{t} \right\rfloor \\ & = \left\lfloor \frac{10}{.75} \right\rfloor = 13 \end{aligned} \] Note that the expression doesn’t depend on the number of ballots cast, just the number of available votes per ballot and the threshold. (And the screening for the ballot doesn’t matter either, as long as it provides more candidates than \(c\).)

Answer

The maximum number of inductees in a given voting year is 13.


Ramifications

So, what could have happened if every year the writers got together and colluded on their votes? Through 2018, we’ve seen 123 actual inductees – roughly 2 per year, or 0.6% of the players who’ve played the game.)

Instead, if the writers maxed out the total number of inductees, we would have had 13 new inductees each year of the voting, for 949 members of the Baseball Hall of Fame! (That would work out to 4.8% of all players.)

That seems a bit overly generous, but maxing out the inductees would avoid the multi-year campaigning, and might ensure the induction of some players who were close to the 75% threshold without quite getting there.

Long-suffering candidates

Let’s take a closer look at some of the players that missed the cutoff. Here, we’re looking at those who hit the 15-year maximum allowed voting, and came close but didn’t get to the 75% threshold.

When would they have gotten in?

# Get player and Hall of Fame voting data from the Lahman package
require("Lahman", quietly = TRUE, warn.conflicts = FALSE)

data(Master)
players <- Master %>%
  mutate_at(vars(debut, finalGame), ymd) %>%
  select(playerID, debut, finalGame, nameGiven, nameFirst, nameLast, deathDate)

data(HallOfFame)
df <- HallOfFame %>%
  filter(votedBy == "BBWAA") %>%
  left_join(players, by = "playerID") 

voting_years <- df %>%
  distinct(yearID) %>%
  arrange(yearID) %>%
  pull(yearID)

# Set up inductees dataframe
inductees <- df %>% 
  filter(0 == 1)

# Starting in 1936, get the top ranked 13 players and induct them.
# Then, for subsequent years, get the top-ranked 13 players that 
# haven't already been inducted.

for(idx in voting_years) {
  current_inductees <- df %>% 
    filter(yearID == idx) %>%
    anti_join(inductees, by = "playerID") %>%
    mutate(year_rank = rank(desc(votes), ties.method = "max")) %>%
    filter(year_rank <= 13)
  inductees <- bind_rows(inductees, current_inductees)
}

Not surprisingly, all of these players would have gotten in on their first voting year, even with some paltry votes in some cases.

##        full_name votes ballots year
## 1    Red Ruffing    22     153 1949
## 2     Nellie Fox    39     360 1971
## 3    Jim Bunning   146     383 1977
## 4 Orlando Cepeda    48     385 1980
## 5    Jack Morris   111     499 2000

References

  1. https://baseballhall.org/hall-of-famers/rules/voting-rules-history
  2. Baseball Hall of Fame data through 2018 from the “Lahman” R package. Michael Friendly, Chris Dalzell, Martin Monkman and Dennis Murphy (2019). Lahman: Sean ‘Lahman’ Baseball Database. R package version 7.0-1. https://CRAN.R-project.org/package=Lahman