Wooldridge Source: Data on NCAA men’s basketball teams, collected by Weizhao Sun for a senior seminar project in sports economics at Michigan State University, Spring 2017. He used various sources, including www.espn.com and www.teamrankings.com/ncaa-basketball/rpi-ranking/rpi-rating-by-team. Data loads lazily.

data('ncaa_rpi')

Format

A data.frame with 336 observations on 14 variables:

  • team: Name

  • year: Year

  • conference: Conference

  • postrpi: Post Rank

  • prerpi: Preseason Rank

  • postrpi_1: Post Rank 1 yr ago

  • postrpi_2: Post Rank 2 yrs ago

  • recruitrank: Recruits Rank

  • wins: Number of games won

  • losses: Number of games lost

  • winperc: Winning Percentage

  • tourney: Tournament dummy

  • coachexper: Coach Experience

  • power5: PowerFive Dummy

Notes

This is a nice example of how multiple regression analysis can be used to determine whether rankings compiled by experts – the so-called pre-season RPI in this case – provide additional information beyond what we can obtain from widely available data bases. A simple and interesting question is whether, once the previous year’s post-season RPI is controlled for, does the pre-season RPI – which is supposed to add information on recruiting and player development – help to predict performance (such as win percentage or making it to the NCAA men’s basketball tournament). For the binary outcome that indicates making it to the NCAA tournament, a probit or logit model can be used for courses that introduce more advanced methods. There are some other interesting variables, such as coaching experience, that can be included, too.

Used in Text: not used

Examples

 str(ncaa_rpi)
#> 'data.frame':	336 obs. of  14 variables:
#>  $ team       : chr  "Boston College" "Boston College" "Boston College" "Boston College" ...
#>  $ year       : chr  "2003-2004" "2009-2010" "2012-2013" "2015-2016" ...
#>  $ conference : chr  "ACC" "ACC" "ACC" "ACC" ...
#>  $ postrpi    : int  19 115 114 249 104 41 187 126 1 2 ...
#>  $ prerpi     : int  37 53 223 102 90 34 119 76 11 12 ...
#>  $ postrpi_1  : int  44 67 244 161 125 37 115 107 10 3 ...
#>  $ postrpi_2  : int  55 131 65 204 176 23 55 63 4 7 ...
#>  $ recruitrank: int  97 300 69 69 88 10 46 79 55 24 ...
#>  $ wins       : int  24 15 16 7 10 21 13 17 31 35 ...
#>  $ losses     : int  10 16 17 25 18 11 18 14 6 5 ...
#>  $ winperc    : num  70.6 48.4 48.5 21.9 35.7 ...
#>  $ tourney    : int  1 0 0 0 0 1 0 0 1 1 ...
#>  $ coachexper : int  23 27 28 25 28 34 21 24 29 35 ...
#>  $ power5     : int  1 1 1 1 1 1 1 1 1 1 ...
#>  - attr(*, "time.stamp")= chr "21 Dec 2018 17:07"
#>  - attr(*, "label.table")= list()
#>  - attr(*, "expansion.fields")= list()
#>  - attr(*, "byteorder")= chr "LSF"
#>  - attr(*, "orig.dim")= int [1:2] 336 14