Wooldridge Source: A. Abadie (2003), “Semiparametric Instrumental Variable Estimation of Treatment Response Models,” Journal of Econometrics 113, 231-263. Professor Abadie kindly provided these data. He obtained them from the 1991 Survey of Income and Program Participation (SIPP). Data loads lazily.

data('k401ksubs')

Format

A data.frame with 9275 observations on 11 variables:

  • e401k: =1 if eligble for 401(k)

  • inc: annual income, $1000s

  • marr: =1 if married

  • male: =1 if male respondent

  • age: in years

  • fsize: family size

  • nettfa: net total fin. assets, $1000

  • p401k: =1 if participate in 401(k)

  • pira: =1 if have IRA

  • incsq: inc^2

  • agesq: age^2

Source

https://www.cengage.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&product_isbn_issn=9781111531041

Notes

This data set can also be used to illustrate the binary response models, probit and logit, in Chapter 17, where, say, pira (an indicator for having an individual retirement account) is the dependent variable, and e401k [the 401(k) eligibility indicator] is the key explanatory variable.

Used in Text: pages 166, 174, 223, 264, 283, 301-302, 340, 549

Examples

str(k401ksubs)
#> 'data.frame': 9275 obs. of 11 variables: #> $ e401k : int 0 1 0 0 0 0 0 0 0 1 ... #> $ inc : num 13.2 61.2 12.9 98.9 22.6 ... #> $ marr : int 0 0 1 1 0 1 1 1 1 0 ... #> $ male : int 0 1 0 1 0 0 0 0 0 1 ... #> $ age : int 40 35 44 44 53 60 49 38 52 45 ... #> $ fsize : int 1 1 2 2 1 3 5 5 2 1 ... #> $ nettfa: num 4.57 154 0 21.8 18.45 ... #> $ p401k : int 0 1 0 0 0 0 0 0 0 0 ... #> $ pira : int 1 0 0 0 0 0 1 0 1 1 ... #> $ incsq : num 173 3749 165 9777 511 ... #> $ agesq : int 1600 1225 1936 1936 2809 3600 2401 1444 2704 2025 ... #> - attr(*, "time.stamp")= chr "25 Jun 2011 23:03"