Wooldridge Source: O. Baser and E. Pema (2003), “The Return of Publications for Economics Faculty,” Economics Bulletin 1, 1-13. Professors Baser and Pema kindly provided the data. Data loads lazily.
data('big9salary')
A data.frame with 786 observations on 30 variables:
id: person identifier
year: 92, 95, or 99
salary: annual salary, $
pubindx: publication index
totpge: standardized total article pages
assist: =1 if assistant professor
assoc: =1 if associate professor
prof: =1 if full professor
chair: =1 if department chair
top20phd: =1 if Ph.D. from top 20 dept.
yearphd: year Ph.D. obtained
female: =1 if female
osu: =1 if Ohio State U.
iowa: =1 if U. Iowa
indiana: =1 if Indiana U.
purdue: =1 if Purdue U.
msu: =1 if Michigan State U.
minn: =1 if U. Minnesota
mich: =1 if U. Michigan
wisc: =1 if U. Wisconsin
illinois: =1 if U. Illinois
y92: =1 if year == 92
y95: =1 if year == 95
y99: =1 if year == 99
lsalary: log(salary)
exper: years since first teaching job
expersq: exper^2
pubindxsq: pubindx^2
pubindx0: =1 if pubindx == 0
lpubindx: log(pubindx) if pubindx > 0
https://www.cengage.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&product_isbn_issn=9781111531041
This is an unbalanced panel data set in the sense that as many as three years of data are available for each faculty member but where some have fewer than three years. It is not clear that something like a fixed effects or first differencing analysis makes sense: in effect, approaches that remove the heterogeneity control for too much by controlling for unobserved heterogeneity which, in this case, includes faculty intelligence, talent, and motivation. Presumably these factors enter into the publication index. It is hard to think we want to hold the main factors driving productivity fixed when trying to measure the effect of productivity on salary. Pooled OLS regression with “cluster robust” standard errors seems more natural. On the other hand, if we want to measure the return to having a degree from a top 20 Ph.D. program then we would want to control for factors that cause selection into a top 20 program. Unfortunately, this variable does not change over time, and so FD and FE are not applicable.
Used in Text: not used
str(big9salary)
#> 'data.frame': 786 obs. of 30 variables:
#> $ id : int 101 101 101 102 102 102 103 103 103 104 ...
#> $ year : int 92 95 99 92 95 99 92 95 99 92 ...
#> $ salary : int NA NA 107100 79420 88239 100450 87450 96831 108290 NA ...
#> $ pubindx : num 30.5 31 40.5 33.5 33.9 ...
#> $ totpge : num 92.7 107.2 186.5 127.5 133 ...
#> $ assist : int 0 0 0 0 0 0 0 0 0 1 ...
#> $ assoc : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ prof : int 1 1 1 1 1 1 1 1 1 0 ...
#> $ chair : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ top20phd : int 0 0 0 0 0 0 1 1 1 1 ...
#> $ yearphd : int 73 73 73 76 76 76 61 61 61 91 ...
#> $ female : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ osu : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ iowa : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ indiana : int 1 1 1 1 1 1 1 1 1 1 ...
#> $ purdue : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ msu : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ minn : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ mich : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ wisc : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ illinois : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ y92 : int 1 0 0 1 0 0 1 0 0 1 ...
#> $ y95 : int 0 1 0 0 1 0 0 1 0 0 ...
#> $ y99 : int 0 0 1 0 0 1 0 0 1 0 ...
#> $ lsalary : num NA NA 11.6 11.3 11.4 ...
#> $ exper : int 19 22 26 16 19 23 31 34 38 1 ...
#> $ expersq : int 361 484 676 256 361 529 961 1156 1444 1 ...
#> $ pubindxsq: num 933 959 1636 1125 1149 ...
#> $ pubindx0 : num 0 0 0 0 0 0 0 0 0 0 ...
#> $ lpubindx : num 3.42 3.43 3.7 3.51 3.52 ...
#> - attr(*, "time.stamp")= chr "22 Jan 2013 14:09"