Wooldridge Source: Data on NCAA men’s basketball teams, collected by Weizhao Sun for a senior seminar project in sports economics at Michigan State University, Spring 2017. He used various sources, including www.espn.com and www.teamrankings.com/ncaa-basketball/rpi-ranking/rpi-rating-by-team. Data loads lazily.
data('ncaa_rpi')
A data.frame with 336 observations on 14 variables:
team: Name
year: Year
conference: Conference
postrpi: Post Rank
prerpi: Preseason Rank
postrpi_1: Post Rank 1 yr ago
postrpi_2: Post Rank 2 yrs ago
recruitrank: Recruits Rank
wins: Number of games won
losses: Number of games lost
winperc: Winning Percentage
tourney: Tournament dummy
coachexper: Coach Experience
power5: PowerFive Dummy
This is a nice example of how multiple regression analysis can be used to determine whether rankings compiled by experts – the so-called pre-season RPI in this case – provide additional information beyond what we can obtain from widely available data bases. A simple and interesting question is whether, once the previous year’s post-season RPI is controlled for, does the pre-season RPI – which is supposed to add information on recruiting and player development – help to predict performance (such as win percentage or making it to the NCAA men’s basketball tournament). For the binary outcome that indicates making it to the NCAA tournament, a probit or logit model can be used for courses that introduce more advanced methods. There are some other interesting variables, such as coaching experience, that can be included, too.
Used in Text: not used
str(ncaa_rpi)
#> 'data.frame': 336 obs. of 14 variables:
#> $ team : chr "Boston College" "Boston College" "Boston College" "Boston College" ...
#> $ year : chr "2003-2004" "2009-2010" "2012-2013" "2015-2016" ...
#> $ conference : chr "ACC" "ACC" "ACC" "ACC" ...
#> $ postrpi : int 19 115 114 249 104 41 187 126 1 2 ...
#> $ prerpi : int 37 53 223 102 90 34 119 76 11 12 ...
#> $ postrpi_1 : int 44 67 244 161 125 37 115 107 10 3 ...
#> $ postrpi_2 : int 55 131 65 204 176 23 55 63 4 7 ...
#> $ recruitrank: int 97 300 69 69 88 10 46 79 55 24 ...
#> $ wins : int 24 15 16 7 10 21 13 17 31 35 ...
#> $ losses : int 10 16 17 25 18 11 18 14 6 5 ...
#> $ winperc : num 70.6 48.4 48.5 21.9 35.7 ...
#> $ tourney : int 1 0 0 0 0 1 0 0 1 1 ...
#> $ coachexper : int 23 27 28 25 28 34 21 24 29 35 ...
#> $ power5 : int 1 1 1 1 1 1 1 1 1 1 ...
#> - attr(*, "time.stamp")= chr "21 Dec 2018 17:07"
#> - attr(*, "label.table")= list()
#> - attr(*, "expansion.fields")= list()
#> - attr(*, "byteorder")= chr "LSF"
#> - attr(*, "orig.dim")= int [1:2] 336 14