Wooldridge Source: A. Abadie (2003), “Semiparametric Instrumental Variable Estimation of Treatment Response Models,” Journal of Econometrics 113, 231-263. Professor Abadie kindly provided these data. He obtained them from the 1991 Survey of Income and Program Participation (SIPP). Data loads lazily.

data('k401ksubs')

Format

A data.frame with 9275 observations on 11 variables:

  • e401k: =1 if eligble for 401(k)

  • inc: annual income, $1000s

  • marr: =1 if married

  • male: =1 if male respondent

  • age: in years

  • fsize: family size

  • nettfa: net total fin. assets, $1000

  • p401k: =1 if participate in 401(k)

  • pira: =1 if have IRA

  • incsq: inc^2

  • agesq: age^2

Notes

This data set can also be used to illustrate the binary response models, probit and logit, in Chapter 17, where, say, pira (an indicator for having an individual retirement account) is the dependent variable, and e401k [the 401(k) eligibility indicator] is the key explanatory variable.

Used in Text: pages 166, 174, 223, 264, 283, 301-302, 340, 549

Examples

 str(k401ksubs)
#> 'data.frame':	9275 obs. of  11 variables:
#>  $ e401k : int  0 1 0 0 0 0 0 0 0 1 ...
#>  $ inc   : num  13.2 61.2 12.9 98.9 22.6 ...
#>  $ marr  : int  0 0 1 1 0 1 1 1 1 0 ...
#>  $ male  : int  0 1 0 1 0 0 0 0 0 1 ...
#>  $ age   : int  40 35 44 44 53 60 49 38 52 45 ...
#>  $ fsize : int  1 1 2 2 1 3 5 5 2 1 ...
#>  $ nettfa: num  4.57 154 0 21.8 18.45 ...
#>  $ p401k : int  0 1 0 0 0 0 0 0 0 0 ...
#>  $ pira  : int  1 0 0 0 0 0 1 0 1 1 ...
#>  $ incsq : num  173 3749 165 9777 511 ...
#>  $ agesq : int  1600 1225 1936 1936 2809 3600 2401 1444 2704 2025 ...
#>  - attr(*, "time.stamp")= chr "25 Jun 2011 23:03"