Wooldridge Source: O. Baser and E. Pema (2003), “The Return of Publications for Economics Faculty,” Economics Bulletin 1, 1-13. Professors Baser and Pema kindly provided the data. Data loads lazily.

data('big9salary')

Format

A data.frame with 786 observations on 30 variables:

  • id: person identifier

  • year: 92, 95, or 99

  • salary: annual salary, $

  • pubindx: publication index

  • totpge: standardized total article pages

  • assist: =1 if assistant professor

  • assoc: =1 if associate professor

  • prof: =1 if full professor

  • chair: =1 if department chair

  • top20phd: =1 if Ph.D. from top 20 dept.

  • yearphd: year Ph.D. obtained

  • female: =1 if female

  • osu: =1 if Ohio State U.

  • iowa: =1 if U. Iowa

  • indiana: =1 if Indiana U.

  • purdue: =1 if Purdue U.

  • msu: =1 if Michigan State U.

  • minn: =1 if U. Minnesota

  • mich: =1 if U. Michigan

  • wisc: =1 if U. Wisconsin

  • illinois: =1 if U. Illinois

  • y92: =1 if year == 92

  • y95: =1 if year == 95

  • y99: =1 if year == 99

  • lsalary: log(salary)

  • exper: years since first teaching job

  • expersq: exper^2

  • pubindxsq: pubindx^2

  • pubindx0: =1 if pubindx == 0

  • lpubindx: log(pubindx) if pubindx > 0

Notes

This is an unbalanced panel data set in the sense that as many as three years of data are available for each faculty member but where some have fewer than three years. It is not clear that something like a fixed effects or first differencing analysis makes sense: in effect, approaches that remove the heterogeneity control for too much by controlling for unobserved heterogeneity which, in this case, includes faculty intelligence, talent, and motivation. Presumably these factors enter into the publication index. It is hard to think we want to hold the main factors driving productivity fixed when trying to measure the effect of productivity on salary. Pooled OLS regression with “cluster robust” standard errors seems more natural. On the other hand, if we want to measure the return to having a degree from a top 20 Ph.D. program then we would want to control for factors that cause selection into a top 20 program. Unfortunately, this variable does not change over time, and so FD and FE are not applicable.

Used in Text: not used

Examples

 str(big9salary)
#> 'data.frame':	786 obs. of  30 variables:
#>  $ id       : int  101 101 101 102 102 102 103 103 103 104 ...
#>  $ year     : int  92 95 99 92 95 99 92 95 99 92 ...
#>  $ salary   : int  NA NA 107100 79420 88239 100450 87450 96831 108290 NA ...
#>  $ pubindx  : num  30.5 31 40.5 33.5 33.9 ...
#>  $ totpge   : num  92.7 107.2 186.5 127.5 133 ...
#>  $ assist   : int  0 0 0 0 0 0 0 0 0 1 ...
#>  $ assoc    : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ prof     : int  1 1 1 1 1 1 1 1 1 0 ...
#>  $ chair    : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ top20phd : int  0 0 0 0 0 0 1 1 1 1 ...
#>  $ yearphd  : int  73 73 73 76 76 76 61 61 61 91 ...
#>  $ female   : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ osu      : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ iowa     : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ indiana  : int  1 1 1 1 1 1 1 1 1 1 ...
#>  $ purdue   : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ msu      : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ minn     : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ mich     : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ wisc     : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ illinois : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ y92      : int  1 0 0 1 0 0 1 0 0 1 ...
#>  $ y95      : int  0 1 0 0 1 0 0 1 0 0 ...
#>  $ y99      : int  0 0 1 0 0 1 0 0 1 0 ...
#>  $ lsalary  : num  NA NA 11.6 11.3 11.4 ...
#>  $ exper    : int  19 22 26 16 19 23 31 34 38 1 ...
#>  $ expersq  : int  361 484 676 256 361 529 961 1156 1444 1 ...
#>  $ pubindxsq: num  933 959 1636 1125 1149 ...
#>  $ pubindx0 : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ lpubindx : num  3.42 3.43 3.7 3.51 3.52 ...
#>  - attr(*, "time.stamp")= chr "22 Jan 2013 14:09"