# Heterogeneity in Marginal Non-Monetary Returns to Higher Education

Heterogeneity in Marginal Non-Monetary Returns to Higher Education Abstract In this paper we estimate the effects of college education on cognitive abilities, health, and wages, exploiting exogenous variation in college availability. By means of semiparametric local instrumental variables techniques we estimate marginal treatment effects in an environment of essential heterogeneity. The results suggest positive average effects on cognitive abilities, wages, and physical health. Yet, there is heterogeneity in the effects, which points toward selection into gains. Although the majority of individuals benefits from more education, the average causal effect for individuals with the lowest unobserved desire to study is zero for all outcomes. Mental health effects, however, are absent for the entire population. (JEL: C31, H52, I10, I21) 1. Introduction “The whole world is going to university—Is it worth it?” The Economist’s headline read in March 2015.1 Although convincing causal evidence on positive labor market returns to higher education is still rare and nearly exclusively available for the United States, even less is known about the non-monetary returns to college education (see Oreopoulos and Petronijevic 2013; Barrow and Malamud 2015). Although non-monetary factors are acknowledged to be important outcomes of education (Oreopoulos and Salvanes 2011), evidence on the effect of college education is so far limited to health behaviors (see in what follows). We estimate the long-lasting marginal returns to college education in Germany decades after leaving college. As a benchmark, we start by looking at wage returns to higher education but the paper’s focus is on the non-monetary returns that might also be seen as mediators of the more often studied effect of education on wages. These non-monetary returns are cognitive abilities and health. Cognitive abilities and health belong to the most important non-monetary determinants of individual well-being. Moreover, the stock of both factors also influences the economy as a whole (see, among many others, Heckman et al. 1999 and Cawley et al. 2001 for cognitive abilities and Acemoglu and Johnson 2007, Cervellati and Sunde 2005, and Costa 2015 for health). Yet, non-monetary returns to college education are not fully understood (Oreopoulos and Salvanes 2011). Psychological research broadly distinguishes between effects of education on the long-term cognitive ability differential that are either due to a change in the cognitive reserve (i.e., the cognitive capacity) or due to an altered age-related decline (see, e.g., Stern 2012). Still, even the compound manifestation of the overall effect has rarely been studied for college education over a short-term horizon2 and—as far as we are aware—it has never been assessed for the long run. Few studies analyze the returns to college education on health behaviors (Currie and Moretti 2003; Grimard and Parent 2007; de Walque 2007). We use a slightly modified version of the marginal treatment effect approach introduced and forwarded by Björklund and Moffitt (1987) and Heckman and Vytlacil (2005). The main feature of this approach is to explicitly model the choice for education, thus turning back from a mere statistical view of exploiting exogenous variation in education to identify casual effects toward a description of the behavior of economic agents. Translated into our research question, the MTE is the effect of education on different outcomes for individuals at the margin of taking higher education. The MTE can be used to generate all conventional treatment parameters, such as the average treatment effect (ATE). On top of this, comparing the marginal effects along the probability of taking higher education is also informative in its own right: different marginal effects do not just reveal effect heterogeneity but also some of its underlying structure (for instance, selection into gains). This is an important property that the local average treatment effect—LATE, as identified by conventional two stage least squares methods—would miss. The individuals in our sample made their college decision between 1958 and 1990 and graduated in the case of college education between 1963 and 1995. Our outcome variables (wages, standardized measures of cognitive abilities3 and mental and physical health) are assessed between 2010 and 2012, thus, 20–54 years after the college decision. Our instrument is a measure of the relative availability of college spots (operationalized by the number of enrolled students divided by the number of inhabitants) in the area of residence at the time of the secondary school graduation. Using detailed information on the arguably exogenous expansions of college capacities in all 326 West German districts (cities or rural areas) during the so-called “educational expansion” between the 1960s and 1980s generates variation in the availability of higher education. By deriving treatment effects over the entire support of the probability of college attendance, this paper contributes to the literature mainly in two important ways. First, this is the first study that analyzes the long-term effect of college education on cognitive abilities and general health measures (instead of specific health behaviors). Long-run effects on skills are crucial in showing the sustainability of human capital investments after the age of 19. Along this line, this outcome can complement existing evidence in identifying the fundamental value of college education since—unlike studies on monetary returns—effects on cognitive skills do neither directly exhibit signaling (see the debate on discrepancy between private and social returns as in Clark and Martorell 2014) nor adverse general equilibrium effects (as skills are not determined by both, forces of demand and supply). Second, by going beyond the point estimate of the LATE, we provide a more comprehensive picture in an environment of essential heterogeneity. The results suggest positive average returns to college education for wages, cognitive abilities, and physical health. Yet, the returns are heterogeneous—thus, we find evidence for selection into gains—and even close to zero for the around 30% of individuals with the lowest desire to study. Mental health effects are zero throughout the population. Thus, our findings can be interpreted as evidence for remarkable positive average returns for those who took college education in the past. Yet, a further expansion in college education, as sometimes called for, is likely not to pay off as this would mostly affect individuals in the part of the distribution that are not found to be positively affected by education. We also try to substantiate our results by looking at potential mechanisms of the average effects. Although we cannot causally differentiate all channels and the data allow us to provide suggestive evidence only, our findings may be interpreted as follows. Mentally more demanding jobs, jobs with a less health deteriorating effects and better health behaviors probably add to the explanation of skill and health returns to education. The paper is organized as follows. Section 2 briefly introduces the German educational system and describes the exogenous variation we exploit. Section 3 outlines the empirical approach. Section 4 presents the data. The main results are reported in Section 5 whereas Section 6 addresses some of its potential underlying pathways. Section 7 concludes. 2. Institutional Background and Exogenous Variation 2.1. The German Higher Educational System After graduating from secondary school, adolescents in Germany either enroll into higher education or start an apprenticeship. The latter is part-time training-on-the-job and part-time schooling. This vocational training usually takes three years and individuals often enter the firm (or another firm in the sector) as a full-time employee afterward. To be eligible for higher education in Germany, individuals need a university entrance degree. In the years under review, only academic secondary schools (Gymnasien) with 13 years of schooling in total award this degree (Abitur). Although the tracking from elementary schools to secondary schools takes place rather early at the age of 10, students can switch secondary school tracks in every grade. It is also possible to enroll into academic schools after graduating from basic or intermediate schools in order to receive a university entrance degree. In Germany, mainly two institutions offer higher education: universities/colleges4 and universities of applied science (Fachhochschulen). The regular time to receive the formerly common Diplom degree (master’s equivalent) was 4.5 years at both institutions. Colleges are usually large institutions that offer degrees in various subjects. The other type of higher educational institutions, universities of applied science, are usually smaller than colleges and often specialized in one field of study (e.g., business schools). Moreover, universities of applied science have a less theoretical curriculum and a teaching structure that is similar to schools. Nearly all institutions of higher education in Germany do not charge any tuition fees. However, students have to cover their own costs of living. On the other hand, their peers in apprenticeship training earn a small salary. Possible budget constraints (e.g., transaction costs arising through the need to move to another city in order to go to college) are likely determinants of the decision to enroll into higher education. 2.2. Exogenous Variation in College Education over Time Although the higher educational system as described in Section 2.1 did not change in the years under review, the accessibility (in terms of mere quantity but also distribution within Germany) of tertiary education changed significantly, providing us with a source of exogenous variation. This so called “educational expansion” falls well into the period of study (1958–1990). Within this period, the shrinking transaction costs of studying may have changed incentives and the mere presence of new or growing colleges could also have nudged individuals toward higher education that otherwise would not have studied. In this paper, we consider two processes in order to quantify the educational expansion. The first is the openings of new colleges, the second is the extension in capacity of all colleges (we refer to both as college availability).5 College availability as an instrument for higher education was introduced to the literature by Card (1995) and has frequently been employed since then (e.g., Currie and Moretti 2003), also to estimate the MTE (e.g., Carneiro et al. 2011; Nybom 2017). We exploit the rapid increase in the number of new colleges and in the number of available spots to study as exogenous variation in the college decision. Between 1958 (the earliest secondary school graduation year in our sample) and 1990 the number of colleges in Germany doubled from 33 to 66.6 In particular, the opening of new colleges introduced discrete discontinuities in choice sets. As an example, students had to travel 50 km, on average, to the closest college before a college was opened in their district (measured from district centroid to centroid), see Figure 1. Figure A.1 in the Appendix gives an impression of the spatial variation in college availability over time. Figure 1. View largeDownload slide Average distance to the closest college over time for districts with a college opening. Own illustration. Information on colleges are taken from the German Statistical Yearbooks 1959–1991 (German Federal Statistical Office various issues, 1959–1991). The distances (in km) between the districts are calculated using district centroids. These distances are weighted by the number of individuals observed in the particular district-year cells in our estimation sample of the NEPS-Starting Cohort 6 data. The resulting average distances are depicted by the black circles. Note that prior to time period 0, the average distance changes over time either due to sample composition or a college opening in a neighboring district. Only districts with a college opening are taken into account. Figure 1. View largeDownload slide Average distance to the closest college over time for districts with a college opening. Own illustration. Information on colleges are taken from the German Statistical Yearbooks 1959–1991 (German Federal Statistical Office various issues, 1959–1991). The distances (in km) between the districts are calculated using district centroids. These distances are weighted by the number of individuals observed in the particular district-year cells in our estimation sample of the NEPS-Starting Cohort 6 data. The resulting average distances are depicted by the black circles. Note that prior to time period 0, the average distance changes over time either due to sample composition or a college opening in a neighboring district. Only districts with a college opening are taken into account. There was an increase in the size of existing colleges and, therefore, in the number of available spots to study as well. The average number of students per college was 5,013 in 1958 and 15,438 in 1990. Of the 33 colleges in 1958, 30 still existed in 1990 and had an average size of 23,099 students. The total number of students increased from 155,000 in 1958 to 1 million in 1990. Figure 2 shows the trends in college openings and enrolled students (normalized by the number of inhabitants) for the five most-populated German states. Although the actual numbers used in the regressions vary on the much smaller district level, the state level figures simplify the visualization of the pattern. Figure 2. View largeDownload slide Number of colleges and students over the time in selected states. Own illustration. College opening and size information are taken from the German Statistical Yearbooks 1959–1991 (German Federal Statistical Office various issues, 1959–1991). Yearly information on the district-specific population size is based on personal correspondence with the statistical offices of the federal states. For sake of lucidity the trends are only plotted for the five most  populated states. Figure 2. View largeDownload slide Number of colleges and students over the time in selected states. Own illustration. College opening and size information are taken from the German Statistical Yearbooks 1959–1991 (German Federal Statistical Office various issues, 1959–1991). Yearly information on the district-specific population size is based on personal correspondence with the statistical offices of the federal states. For sake of lucidity the trends are only plotted for the five most  populated states. Factors that have driven the increase in the number of colleges and their size can briefly be summarized into four groups: (i) The large majority of the population had a low level of education. This did not only result from WWII but also from the “anti-intellectualism” (Picht 1964, p. 66) in the Third Reich, and the notion of education in Imperial Germany before, befitting the social status of certain individuals only. (ii) An increase in the number of academic secondary schools at the same time (as analyzed in Jürges et al. 2011; Kamhöfer and Schmitz 2016 for instance) qualified a larger share of school graduates to enroll into higher education (Bartz 2007). (iii) A change in production technologies led to an increase in firm’s demand for high-skilled workers—especially, given the low level of educational participation (Weisser 2005). (iv) Political decision makers were afraid that “without an increase in the number of skilled graduates the West German economy would not be able to compete with communist rivals” (Jürges et al. 2011, p. 846, in reference to Picht 1964). Although these reasons (maybe except for the firm’s demand for more educated workers) affected the 10 West German federal states—that are in charge of educational policy—in the same way, the measures taken and the timing of actions differed widely between states. Because of local politics (e.g., the balancing of regional interests and avoiding clusters of colleges) there was also a large amount of variation in college openings within the federal states. See Online Appendix B to the paper for a much more detailed description of the political process involved. A major concern for instrument validity is that, even though the political process did not follow a unified structure and included some randomness in the final choice of locations and timing of openings, regions where colleges were opened differed from those that already had colleges before (or that never established any). Table 1 reports some numbers on the regional level as of the year 1962 (the earliest possible year available to us with representative data).7 Regions that already had colleges before did not differ in terms of sociodemographics (except for population densities, as mostly large cities had colleges before) but were somewhat stronger in terms of socioeconomic indices. The differences were not large however. Given that we include district fixed-effects and a large set of socioeconomic controls (including the socioeconomic environment before the college decision, see Section 4), this should not be a problematic issue. Table 1. Comparison of regions with and without college openings before college opens using administrative data. (1) (2) (3) (4) (5) (6) College opening... Before Between Later than 1958 1958–1990 1990 or never Mean s.d. Mean s.d. Mean s.d. Observations Number of regions 27 30 190 Sociodemographic characteristics Female (in %) 53.0 (2.0) 53.0 (1.4) 52.9 (4.3) Average age (in years) 37.2 (1.1) 37.0 (1.1) 36.6 (1.9) Singles (in %) 38.8 (2.5) 37.7 (2.3) 38.9 (4.6) Population density per km2 in 1962 1381.9 (1076.7) 1170.1 (1047.3) 327.1 (479.7) Change in population density 1962–1990 1.6 (186.3) −71.0 (202.8) 31.5 (98.5) Migrational background (in %) 2.7 (3.0) 1.6 (1.5) 2.1 (2.3) Socioeconomic characteristics Share of employees to all individuals (in %) 47.0 (3.6) 45.3 (4.2) 46.2 (5.2) Employees with an income > 600 DM (in %) 27.3 (3.8) 24.8 (5.3) 25.9 (6.4)  Employees by industry (in %)  – Primary 2.1 (5.2) 5.2 (5.2) 2.8 (5.5)  – Secondary 52.9 (8.4) 54.7 (6.2) 54.3 (8.9)  – Tertiary 45.0 (9.3) 40.1 (8.3) 42.9 (9.6) Employees in blue collar occup. (in %) 53.6 (9.4) 59.0 (7.9) 56.5 (9.3) Employees in academic occup. (in %) 22.0 (4.4) 17.5 (4.3) 20.3 (5.9) (1) (2) (3) (4) (5) (6) College opening... Before Between Later than 1958 1958–1990 1990 or never Mean s.d. Mean s.d. Mean s.d. Observations Number of regions 27 30 190 Sociodemographic characteristics Female (in %) 53.0 (2.0) 53.0 (1.4) 52.9 (4.3) Average age (in years) 37.2 (1.1) 37.0 (1.1) 36.6 (1.9) Singles (in %) 38.8 (2.5) 37.7 (2.3) 38.9 (4.6) Population density per km2 in 1962 1381.9 (1076.7) 1170.1 (1047.3) 327.1 (479.7) Change in population density 1962–1990 1.6 (186.3) −71.0 (202.8) 31.5 (98.5) Migrational background (in %) 2.7 (3.0) 1.6 (1.5) 2.1 (2.3) Socioeconomic characteristics Share of employees to all individuals (in %) 47.0 (3.6) 45.3 (4.2) 46.2 (5.2) Employees with an income > 600 DM (in %) 27.3 (3.8) 24.8 (5.3) 25.9 (6.4)  Employees by industry (in %)  – Primary 2.1 (5.2) 5.2 (5.2) 2.8 (5.5)  – Secondary 52.9 (8.4) 54.7 (6.2) 54.3 (8.9)  – Tertiary 45.0 (9.3) 40.1 (8.3) 42.9 (9.6) Employees in blue collar occup. (in %) 53.6 (9.4) 59.0 (7.9) 56.5 (9.3) Employees in academic occup. (in %) 22.0 (4.4) 17.5 (4.3) 20.3 (5.9) Notes: Own calculations based on Micro Census 1962, see Lengerer et al. (2008). Regions are defined through administrative Regierungsbezirk entries and the degree urbanization (Gemeindegrößenklasse) and may cover more than one district. College information is aggregated at regional level and a region is considered to have a college if at least one of its districts has a college. Calculations for population density and change in population density based on district-level data acquired through personal correspondence with the statistical offices of the federal states. Data are available on request. The variables “employees in blue collar occup.” and “employees in academic occup.” state the shares of employees in the region in an occupation that is usually conducted by a blue collar worker/a college graduate, respectively. Standard deviations (s.d.) are given in italics in parentheses. View Large Table 1. Comparison of regions with and without college openings before college opens using administrative data. (1) (2) (3) (4) (5) (6) College opening... Before Between Later than 1958 1958–1990 1990 or never Mean s.d. Mean s.d. Mean s.d. Observations Number of regions 27 30 190 Sociodemographic characteristics Female (in %) 53.0 (2.0) 53.0 (1.4) 52.9 (4.3) Average age (in years) 37.2 (1.1) 37.0 (1.1) 36.6 (1.9) Singles (in %) 38.8 (2.5) 37.7 (2.3) 38.9 (4.6) Population density per km2 in 1962 1381.9 (1076.7) 1170.1 (1047.3) 327.1 (479.7) Change in population density 1962–1990 1.6 (186.3) −71.0 (202.8) 31.5 (98.5) Migrational background (in %) 2.7 (3.0) 1.6 (1.5) 2.1 (2.3) Socioeconomic characteristics Share of employees to all individuals (in %) 47.0 (3.6) 45.3 (4.2) 46.2 (5.2) Employees with an income > 600 DM (in %) 27.3 (3.8) 24.8 (5.3) 25.9 (6.4)  Employees by industry (in %)  – Primary 2.1 (5.2) 5.2 (5.2) 2.8 (5.5)  – Secondary 52.9 (8.4) 54.7 (6.2) 54.3 (8.9)  – Tertiary 45.0 (9.3) 40.1 (8.3) 42.9 (9.6) Employees in blue collar occup. (in %) 53.6 (9.4) 59.0 (7.9) 56.5 (9.3) Employees in academic occup. (in %) 22.0 (4.4) 17.5 (4.3) 20.3 (5.9) (1) (2) (3) (4) (5) (6) College opening... Before Between Later than 1958 1958–1990 1990 or never Mean s.d. Mean s.d. Mean s.d. Observations Number of regions 27 30 190 Sociodemographic characteristics Female (in %) 53.0 (2.0) 53.0 (1.4) 52.9 (4.3) Average age (in years) 37.2 (1.1) 37.0 (1.1) 36.6 (1.9) Singles (in %) 38.8 (2.5) 37.7 (2.3) 38.9 (4.6) Population density per km2 in 1962 1381.9 (1076.7) 1170.1 (1047.3) 327.1 (479.7) Change in population density 1962–1990 1.6 (186.3) −71.0 (202.8) 31.5 (98.5) Migrational background (in %) 2.7 (3.0) 1.6 (1.5) 2.1 (2.3) Socioeconomic characteristics Share of employees to all individuals (in %) 47.0 (3.6) 45.3 (4.2) 46.2 (5.2) Employees with an income > 600 DM (in %) 27.3 (3.8) 24.8 (5.3) 25.9 (6.4)  Employees by industry (in %)  – Primary 2.1 (5.2) 5.2 (5.2) 2.8 (5.5)  – Secondary 52.9 (8.4) 54.7 (6.2) 54.3 (8.9)  – Tertiary 45.0 (9.3) 40.1 (8.3) 42.9 (9.6) Employees in blue collar occup. (in %) 53.6 (9.4) 59.0 (7.9) 56.5 (9.3) Employees in academic occup. (in %) 22.0 (4.4) 17.5 (4.3) 20.3 (5.9) Notes: Own calculations based on Micro Census 1962, see Lengerer et al. (2008). Regions are defined through administrative Regierungsbezirk entries and the degree urbanization (Gemeindegrößenklasse) and may cover more than one district. College information is aggregated at regional level and a region is considered to have a college if at least one of its districts has a college. Calculations for population density and change in population density based on district-level data acquired through personal correspondence with the statistical offices of the federal states. Data are available on request. The variables “employees in blue collar occup.” and “employees in academic occup.” state the shares of employees in the region in an occupation that is usually conducted by a blue collar worker/a college graduate, respectively. Standard deviations (s.d.) are given in italics in parentheses. View Large Yet, changes in district characteristics that are potentially related to the outcome variables might be a more important problem. There could, for instance, be changes in the population structure that both induce a higher demand for college education and go along with improved cognitive abilities and health. This could be the case if the regions with college openings were more “dynamic” with a younger and potentially increasing population. Table 1 shows a decline in the population density by 6% between 1962 and 1990 in the areas that opened colleges whereas there were no average changes in the areas with preexisting colleges and a 10% increase in the areas that never opened any. This reflects different regional trends in population ageing. As one example, the Ruhr Area in the west, where three colleges were opened, experienced a population decline and comparably stronger population ageing over time. Again, these differences are not dramatically large, but we might be worried of different trends in health and cognitive abilities that are correlated with college expansion. If this was the case—more expansion in areas that have a more ageing population with deteriorating health and cognitive abilities—we might underestimate the effect of college eduction on these outcomes. We include a district-specific time trend to account for this in the analysis. The expansion in secondary schooling noted previously was unrelated to the college expansion. Although college expansion naturally took place in a small number of districts, expansion in secondary schooling was across all regions. In addition, Kamhöfer and Schmitz (2016) do not find any local average treatment effects of school expansion on cognitive abilities and wages. Thus, it seems unlikely that selective increases in cognitive abilities due to secondary school expansion invalidate the instrument. Nevertheless, again, district-specific time trends should capture large parts if this was a problem. So essentially, what we do is the following: we look within each district and attribute changes in the college (graduation/enrollment) rate from the general trend (by controlling for cohort FE) and the district specific trend (which might be due to continually increased access to higher secondary education) to either changes in the college spots or a new opening of a college nearby. We use discontinuities in college access over time that cannot be exploited using data on individuals that make the college decision at the same point in time (for instance cohort studies) as some of the previous literature that used college availability as an instrument did. Details on how we exploit the variation in college availability in the empirical specification are discussed in Section 4.4 after presenting the data. 3. Empirical Strategy Our estimation framework widely builds on Heckman and Vytlacil (2005) and Carneiro et al. (2011). Derivations and in-depth discussion of most issues can be found there. We start with the potential outcome model, where Y1 and Y0 are the potential outcomes with and without treatment. The observed outcome Y either equals Y1 in case an individual received a treatment—which is college education here—or Y0 in the absence of treatment (the individual identifier i is implied). Obviously, treatment participation is voluntary, rendering a treatment dummy D in a simple linear regression endogenous. In the marginal treatment effect framework, this is explicitly modeled by using a choice equation, that is, we specify the following latent index model: $$Y^1 = X^{\prime }\beta _1 + U_1,$$ (1) $$Y^0 = X^{\prime }\beta _0 + U_0,$$ (2) $$D^* = Z^{\prime }\delta - V, \quad \mbox{where }\, D = \boldsymbol {1}[ D^* \ge 0] = \boldsymbol {1}[ Z^{\prime }\delta \ge V].$$ (3) The vector X contains observable, and U1, U0 unobservable factors that affect the potential outcomes.8D* is the latent desire to take up college education that depends on observed variables Z and unobservables V. Z includes all variables in X plus the instruments. Whenever D* exceeds a threshold (set to zero without loss of generality), the individual opts for college education, otherwise she does not. U1, U0, V are potentially correlated, inducing the endogeneity problem (as well as heterogenous returns) as we observe Y(=DY1 + (1 − D)Y0), D, X, Z, but not U1, U0, V. Following this model, individuals are indifferent between higher education and directly entering the labor market (e.g., through an apprenticeship) whenever the index of observables Z΄δ is equal to the unobservables V. Thus, if we knew the switching point (point of indifference) and its corresponding value of the observables, we could make sharp restriction on the value of the unobservables. This property is exploited in the estimation. Since for every value of the index Z΄δ one needs individuals with and without higher education, it is important to meaningfully aggregate the index by a monotonous transformation that for example returns the quantiles of Z΄δ and V. One such rank-preserving transformation is done by the cumulative distribution function that returns the propensity score P(Z) (quantiles of Z) and UD (quantiles of V).9 If we vary the excluded instruments in Z΄δ from the lowest to the highest value while holding the covariates X constant, more and more individuals will select into higher education. Those who react to this shift also reveal their rank in the unobservable distribution. Thus, the unobservables are fixed given the propensity score and it is feasible to evaluate any outcome for those who select into treatment at any quantile UD that is identified by the instrument-induced change of the higher education choice. In general, estimating marginal effects by UD does not require stronger assumptions than those required by the LATE since Vytlacil (2002) showed its equivalence.10 Yet, strong instruments are beneficial for robustly identifying effects over the support of P(Z). This, however, is testable. The marginal treatment effect (MTE), then, is the marginal (gross) benefit of taking the treatment for those who are just indifferent between taking and not-taking it and can be expressed as \begin{equation*} {\mathit {MTE}}(x,u_D) = \frac{\partial E(Y|x, p)}{\partial p}. \end{equation*} This is the effect of an incremental increase in the propensity score on the observed outcome. The MTE varies along the line of UD in case of heterogeneous treatment effects that arise if individuals self-select into the treatment based on their expected idiosyncratic gains. This is a situation Heckman et al. (2006) call “essential heterogeneity”. This is an important structural property that the MTE can recover: If individuals already react at low values of the instrument, where the observed part of the latent desire of selecting into higher education (P(Z)) is still very low, a prerequisite for yet going to college is that V is marginally lower. These individuals could choose college against all (observed) odds because they are more intrinsically talented or motivated as indicated by a low V. If this is translated into higher future gains (U1 − U0), the MTE would exhibit a significant negative slope: As P(Z) rises, marginal individuals need less and less compensation in terms of unobserved and expected returns to yet choose college—this is called selection into gains. As Basu (2011, 2014) notes, essential heterogeneity is not restricted to active sorting into gains but is always an issue if selection is based on factors that are not completely independent of the gains. Thus, in health economic applications, where gains are arguably harder to predict for the individual than, say, monetary returns, essential heterogeneity is also an important phenomenon. In this case the common treatment parameters ATE, ATT, and LATE do not coincide. The MTE can be interpreted as a more fundamental parameter than the usual ones as it unfolds all local switching effects by intrinsic “willingness” to study and not only some weighted average of those.11 The main component for estimating the MTE is the conditional expectation E(Y | X, p). Heckman and Vytlacil (2007) show that if we plug in the counterfactuals in (1) and (2) in the potential outcome equation, rearrange and apply the expectation E(. | X, p) to all expressions and impose an exclusion restriction of p on Y (exposed in what follows), we get an expression that can be estimated: \begin{eqnarray} E(Y|X, p) & =& X^{\prime }\beta _0 + X^{\prime }(\beta _1 -\beta _0) \cdot p + E(U_1 - U_0 | D=1, X) \cdot p \nonumber \\ & =& X^{\prime }\beta _0 + X^{\prime }(\beta _1 -\beta _0) \cdot p + K(p), \end{eqnarray} (4) where K(p) is some not further specified function of the propensity score if one wants to avoid distributional assumptions of the error terms. Thus, the estimation of the MTE involves estimating the propensity score in order to estimate equation (4) and, finally, taking its derivative with respect to p. Note that this derivative—and hence the effect of college education—depends on heterogeneity due to observed components X and unobserved components K(p), since this structure was imposed by equations (1) and (2): \begin{eqnarray} \frac{\partial E(Y|X, p)}{\partial p} & =& X^{\prime }(\beta _1 -\beta _0) + \frac{\partial K(p)}{\partial p}. \end{eqnarray} (5) To achieve non-parametric identification of the terms in equation (5), the Conditional Independence Assumption has to be imposed on the instrument \begin{equation*} (U_1,U_0,V)\!\perp \!\!\!\perp Z|X \end{equation*} meaning that the error terms are independent of Z given X. That is, after conditioning on X a shift in the instruments Z (or the single index P(Z)) has no effect on the potential outcome distributions. Non-parametrically estimating separate MTEs for every data cell determined by X is hardly ever feasible due to a lack of observations and powerful instruments within each such cell. Yet, in case of parametric or semiparametric specifications a conditional independence assumption is not sufficient to decompose the effect into observed and unobserved sources of heterogeneity. To separately identify the right hand side of equation (5) unconditional independence is required: (U1, U0, V) ⊥⊥ Z, X (Carneiro et al. 2011, for more details consult the Online Appendices).12 In a pragmatic approach, one can now either follow Brinch et al. (2017) or Cornelissen et al. (forthcoming) who do not aim at causally separating the causes of the effect heterogeneity. In this case a conventional exclusion restriction on the instruments suffices for estimating the overall level and the curvature of the MTE. Our solution in bringing the empirical framework to the data without too strong assumptions, is to estimate marginal effects that only vary over the unobservables while fixing the X-effects at mean value. This means to deviate from (4) by restricting β1 = β0 = β except for the intercepts α1, α0 in (1) and (2) such that E(Y | X, p) becomes \begin{eqnarray} E(Y|X, p) & = X^{\prime }\beta + (\alpha _1 -\alpha _0) \cdot p + K(p). \end{eqnarray} (6) Thus, we allow for different levels of potential outcomes, whereas we keep conditioning on X. This might look like a strong restriction at first sight but is no more different than the predominant approach in empirical economics of trying to identify average treatment effects where the treatment indicator is typically not interacted with other observables. Certainly, this does not rule out that the MTE varies by observable characteristics. Even with the true population effects that are varying over X, note that the derivative of equation (4) w.r.t. the propensity score is constant in X. Hence, only the level of the MTE changes for certain subpopulations determined by X, the curvature remains unaffected. Thus, estimation of equation (6) delivers an MTE that has a level that is averaged over all subpopulations without changing the curvature. In this way all crucial elements of the MTE are preserved, since we are interested in the average effect and its heterogeneity with respect to the unobservables for the whole population. How this heterogeneity is varying for certain subpopulations is of less importance and also the literature has focused on MTEs where the X-part is averaged out. On the other hand we gain with this approach by considerably relaxing our identifying assumption from an unconditional to a conditional independence of the instrument. One advantage in not estimating heterogeneity in the observables can arise if X contains many variables that each take many different values. In this case, problems of weak instruments can inflate the results.13 In estimating (6), we follow Carneiro et al. (2010, 2011) again and use semi-parametric techniques as suggested by Robinson (1988).14 Standard errors are clustered at the district level and were generated by bootstrapping the entire procedure using 200 replications. 4. Data 4.1. Sample Selection and College Education Our main data source are individual level data from the German National Educational Panel Study (NEPS), see Blossfeld et al. (2011). The NEPS data map the educational trajectories of more than 60,000 individuals in total. The data set consists of a multi-cohort sequence design and covers six age groups, called “starting cohorts”: newborns and their parents, pre-school children, children in school grades 5 and 9, college freshmen students, and adults. Within each starting cohort the data are organized in a longitudinal manner, that is, individuals are interviewed repeatedly. For each starting cohort, the interviews cover extensive information on competence development, learning environments, educational decisions, migrational background, and socioeconomic outcomes. We aim at analyzing longer term effects of college education and, therefore, restrict the analysis to the “adults starting cohort”. For this age group six waves are available with interviews conducted between 2007/2008 (wave 1) and 2013 (wave 6), see LIfBi (2015). Moreover, the NEPS includes detailed retrospective information on the educational and occupational history as well as the living conditions at the age of 15—about three years before individuals decide for higher education. From the originally 17,000 respondents in the adults starting cohort, born between 1944 and 1989, we exclude observations for four reasons: First, we focus on individuals from West Germany due to the different educational system in the former German Democratic Republic (GDR), thereby dropping 3,500 individuals living in the GDR at the age of the college decision. Second, to allow for long-term effects we make a cut-off at college attendance before 1990 and drop 2,800 individuals who graduated from secondary school in 1990 or later. Third, we drop 1,000 individuals with missing geographic information. An attractive (and for our analysis necessary) feature of the NEPS data is that they include information on the district (German Kreis) of residence during secondary schooling that is used in assigning the instrument in the selection equation. The fourth reason for losing observations is that the dependent variables are not available for each respondent, see in what follows. Our final sample includes between 2,904 and 4,813 individuals, depending on the outcome variable. The explanatory variable “college degree” takes on the value 1 if an individual has any higher educational degree, and 0 otherwise. Dropouts are treated as all other individuals without college education. More than one fourth of the sample has a college degree, whereas three fourths do not. 4.2. Dependent Variables Wages. The data set covers a wide range of individual employment information such as monthly income and weekly hours worked. We calculate the hourly gross wage for 2013 (wave 6) by dividing the monthly gross labor market income by the actual weekly working hours (including extra hours) times the average number of weeks per month, 4.3. A similar strategy is, for example, applied by Pischke and von Wachter (2008) to calculate hourly wages using German data. For this outcome variable, we restrict our sample to individuals in working age up to 65 years and drop observations with hourly wages below €5 and above the 99th quantile (€77.52) as this might result from misreporting. Table 2 reports descriptive statistics and reveals considerably higher hourly wages for individuals with college degree. The full distribution of wages (and the other outcomes) for both groups is shown in Figure A.2 in the Appendix. In the regression analysis we use log gross hourly wages. Table 2. Descriptive statistics dependent variables. (1) (2) (3) (4) (5) (6) Health measure Cognitive ability component Gross hourly wage PCS MCS Read. speed Read. comp. Math liter. Observations 3,378 4,813 4,813 3,995 4,576 2,904 with college degree (in %) 31.0 28.1 28.1 27.8 28.1 28.0 Raw values Mean with degree 27.95 53.31 51.15 39.69 29.76 13.37 Mean without degree 19.35 50.39 50.53 35.99 22.75 9.36 Maximum possible value –a 100 100 51 39 22 Transformed values Mean with degree 3.25 0.23 0.04 0.32 0.63 0.61 Mean without degree 2.88 −0.09 −0.02 −0.12 −0.25 −0.24 (1) (2) (3) (4) (5) (6) Health measure Cognitive ability component Gross hourly wage PCS MCS Read. speed Read. comp. Math liter. Observations 3,378 4,813 4,813 3,995 4,576 2,904 with college degree (in %) 31.0 28.1 28.1 27.8 28.1 28.0 Raw values Mean with degree 27.95 53.31 51.15 39.69 29.76 13.37 Mean without degree 19.35 50.39 50.53 35.99 22.75 9.36 Maximum possible value –a 100 100 51 39 22 Transformed values Mean with degree 3.25 0.23 0.04 0.32 0.63 0.61 Mean without degree 2.88 −0.09 −0.02 −0.12 −0.25 −0.24 Notes: Own calculations based on NEPS-Starting Cohort 6 data. Gross hourly wage given in euros. Gross hourly wage is transformed to its log value, the other variables are transformed in units of standard deviation with mean 0 and standard deviation 1. a. The gross hourly wage is truncated below at €5 and above at the highest quantile (€77.52). View Large Table 2. Descriptive statistics dependent variables. (1) (2) (3) (4) (5) (6) Health measure Cognitive ability component Gross hourly wage PCS MCS Read. speed Read. comp. Math liter. Observations 3,378 4,813 4,813 3,995 4,576 2,904 with college degree (in %) 31.0 28.1 28.1 27.8 28.1 28.0 Raw values Mean with degree 27.95 53.31 51.15 39.69 29.76 13.37 Mean without degree 19.35 50.39 50.53 35.99 22.75 9.36 Maximum possible value –a 100 100 51 39 22 Transformed values Mean with degree 3.25 0.23 0.04 0.32 0.63 0.61 Mean without degree 2.88 −0.09 −0.02 −0.12 −0.25 −0.24 (1) (2) (3) (4) (5) (6) Health measure Cognitive ability component Gross hourly wage PCS MCS Read. speed Read. comp. Math liter. Observations 3,378 4,813 4,813 3,995 4,576 2,904 with college degree (in %) 31.0 28.1 28.1 27.8 28.1 28.0 Raw values Mean with degree 27.95 53.31 51.15 39.69 29.76 13.37 Mean without degree 19.35 50.39 50.53 35.99 22.75 9.36 Maximum possible value –a 100 100 51 39 22 Transformed values Mean with degree 3.25 0.23 0.04 0.32 0.63 0.61 Mean without degree 2.88 −0.09 −0.02 −0.12 −0.25 −0.24 Notes: Own calculations based on NEPS-Starting Cohort 6 data. Gross hourly wage given in euros. Gross hourly wage is transformed to its log value, the other variables are transformed in units of standard deviation with mean 0 and standard deviation 1. a. The gross hourly wage is truncated below at €5 and above at the highest quantile (€77.52). View Large Health. Two variables from the health domain are used as outcome measures: the Physical Health Component Summary Score (PCS) and the Mental Health Component Summary Score (MCS), both from 2011/2012 (wave 4).15 These summary scores are based on the SF12 questionnaire, which is an internationally standardized set of 12 items regarding eight dimensions of the individual health status. The eight dimensions comprise physical functioning, physical role functioning, bodily pain, general health perceptions, vitality, social role functioning, emotional role functioning and mental health. A scale ranging from 0 to 100 is calculated for each of these eight dimensions. The eight dimensions or subscales are then aggregated to the two main dimensions mental and physical health, using explorative factor analysis (Andersen et al. 2007). For our regression analysis, we standardize the aggregated scales (MCS and PCS) to have mean 0 and standard deviation 1, where higher values indicate better health. Columns (2) and (3) of Table 2 report sample means of the health measures across individuals by college graduation. Those with college degree have, on average, a better physical health score. With respect to mental health, both groups differ only marginally. Cognitive Abilities. Cognitive abilities summarize the “ability to understand complex ideas, to adapt effectively to the environment, to learn from experience, to engage in various forms of reasoning, to overcome obstacles by taking thought” (American Psychological Association 1995), where the sum of these abilities is referred to as intelligence. Psychologists distinguish several concepts of intelligence with different cognitive abilities. However, they all include measures of verbal comprehension, memory and recall as well as processing speed. Although comprehensive cognitive intelligence tests take hours, a growing number of socioeconomic surveys includes much shorter proxies that measure specific skill components. The short ability tests are usually designed by psychologists and the results are highly correlated with the results of more comprehensive intelligence tests (cf. Lang et al. 2007 for a comparison of cognitive skill tests in the German Socio-economic Panel with larger psychological test batteries). The NEPS includes three kinds of competence tests that cover various domains of cognitive functioning: reading speed, reading competence, and mathematical competence.16 All competence tests were conducted once in 2010/2011 (wave 3) or 2012/2013 (wave 5), respectively, as paper and pencil tests under the supervision of a trained interviewer and the test language was German. The first test measures reading speed.17 The participants receive a booklet consisting of 51 short true-or-false questions and the test duration is 2 min. Each question has between 5 and 18 words. The participants have to answer as many questions as possible in the given window. The test score is the number of correct answers. Since the test aims at the answering speed, the questions only deal with general knowledge and use easy language. One question/statement, for example, reads “There is a bath tub in every garage.” The mean number of correct answers in our estimation sample is 39.69 (out of 51) for college graduates and 35.99 for others, see Table 2. For more information, see Zimmermann et al. (2014). The reading competence test measures understanding of texts. It lasts 28 min and covers 32 items. The test consists of three different tasks. First, participants have to answer multiple choice questions about the content of a text, where only one out of four possible answers is right. In a decision-making task, the participants are asked whether statements are right or wrong according to the text. In a third task, participants need to assign possible titles out of a list to sections of the text. The test includes several types of texts, for example, comments, instructions, and advertising texts (LIfBi 2011). Again, the test score reflects the number of correct answers. Participants with college degree score on average 29.76 and without 22.75 (out of 39).18 The mathematical literacy test evaluates “recognizing and [...] applying [of] mathematics in realistic, mainly extra-mathematical situations” (LIfBi 2011, p. 8). The test has 22 items and takes 28 min. It follows the principle of the OECD-PISA tests and consists of the areas quantity, space and shape, change and relations, as well as data and change, and measures the cognitive competencies in the areas of application of skills, modeling, arguing, communicating, representing, as well as problem solving; see LIfBi (2011). Individuals without college degree score on average 9.36 (out of 22) and persons who graduated from college receive 4 points more. Due to the rather long test duration given the total interview time, not every respondent had to do all three tests. Similarly to the OECD-PISA tests for high school students, individuals were randomly assigned a booklet with either all three or two out of the three tests. 3,995 individuals did the reading speed test, 4,576 the reading competence test, and 2,904 math. Since the tests measure different competencies that refer to distinct cognitive abilities, we may not combine the different test scores into an overall score but give the results separately (see Anderson 2007). 4.3. Control Variables Individuals in our sample made their college decision between 1958 and 1990. The NEPS allows us to consider important socioeconomic characteristics that probably affect both the college education decision as well as the outcomes today (variables denoted with X in Section 3). This is general demographic information such gender, migrational background, and family structure, parental characteristics like parent’s educational background. Moreover, we include two blocks of controls that were determined before the educational decision was made. Pre-college living conditions include family structure, parental job situation, and household income at the age of 15, whereas pre-college education includes educational achievements (number of repeated grades and secondary school graduation mark). Table A.1 in the Appendix provides more detailed descriptions of all variables and reports the sample means by treatment status. Apart from higher wages, abilities and a better physical health status (as seen in Table 2), individuals with a college degree are more likely to be males from an urban district without a migrational background. Moreover, they are more likely to have healthy parents (in terms of mortality). Other variables seem to differ less between both groups. We also account for cohort effects of mother and father, district fixed effects as well as district-specific time trends (see Mazumder 2008; Stephens and Yang 2014 for the importance of the latter). 4.4. Instrument The processes of college expansion discussed in Section 2.2 probably shifted individuals also with a lower desire to study into college education. Such powerful exogenous variation is beneficial for our approach as we try to identify the MTE along the distribution of the desire to study. We assign each individual the college availability as instrument (that is, a variable in Z but not in X). In doing so, we use the information on the district of the secondary school graduation and the year of the college decision, which is the year of secondary school graduation. The district—there are 326 districts in West Germany—is either a city or a certain rural area. The question is how to exploit the regional variation in openings and spots most efficiently as it is almost infeasible to control for all distances to all colleges simultaneously. Our approach to this question is to create an index that best reflects the educational environment in Germany and combines the distance with the number of college spots, \begin{eqnarray} Z_{it}=\sum _{j}^{326}K( {\mathit {dist}}_{ij}) \times \Bigg (\frac{\# {\mathit {students}}_{jt}}{\# {\mathit {inhabitants}}_{jt}}\Bigg ). \end{eqnarray} (7) The college availability instrument Zit basically includes the total number of college spots (measured by the number of students) per inhabitant in district j (out of the 326 districts in total) individual i faces in year t weighted by the distance between i’s home district and district j. Weighting the number of students by the population of the district takes into account that districts with the same number of inhabitants might have colleges of a different size. This local availability is then weighted by the Gaussian kernel distance $$K( {\mathit {dist}}_j)$$ between the centroid of the home district and the centroid of district j. The kernel puts a lot of weight to close colleges and a very small weight to distant ones. Since individuals can choose between many districts with colleges, we calculate the sum of all district-specific college availabilities within the kernel bandwidth. Using a bandwidth of 250 km, this basically amounts to $$K( {\mathit {dist}}_j) = \phi ( {\mathit {dist}}_j/250)$$ where ϕ is the standard normal pdf. Although 250 km sounds like a large bandwidth, this implies that colleges in the same district receive a weight of 0.4, whereas the weight for colleges that are 100 km away is 0.37, but it is reduced to 0.24 for 250 km. Colleges that are 500 km away only get a very low weight of 0.05. A smaller bandwidth of, say, 100 km would mean that already colleges that are 250 km away receive a weight of 0.02 that implies the assumption that individuals basically do not take them into account at all. Most likely this does not reflect actual behavior. As a robustness check, however, we carry out all estimations with bandwidths between 100 and 250 km and the results are remarkably stable, see Online Appendix Figure C.1. Table 3 presents the descriptive statistics. We also provide background information on certain descriptive measures on distance and student density. Table 3. Descriptive statistics of instruments and background information. (1) (2) (3) (4) Statistics Mean SD Min Max Instrument: college availability 0.459 0.262 0.046 1.131 Background information on college availability (implicitly included in the instrument)  Distance to nearest college 27.580 26.184 0 172.269  At least one college in district 0.130 0.337 0 1  Colleges within 100 km 5.860 3.401 0 16  College spots per inhabitant within 100 km 0.034 0.019 0 0.166 (1) (2) (3) (4) Statistics Mean SD Min Max Instrument: college availability 0.459 0.262 0.046 1.131 Background information on college availability (implicitly included in the instrument)  Distance to nearest college 27.580 26.184 0 172.269  At least one college in district 0.130 0.337 0 1  Colleges within 100 km 5.860 3.401 0 16  College spots per inhabitant within 100 km 0.034 0.019 0 0.166 Notes: Own calculations based on NEPS-Starting Cohort 6 data and German Statistical Yearbooks 1959–1991 (German Federal Statistical Office various issues, 1959–1991). Distances are calculated as the Euclidean distance between two respective district centroids. View Large Table 3. Descriptive statistics of instruments and background information. (1) (2) (3) (4) Statistics Mean SD Min Max Instrument: college availability 0.459 0.262 0.046 1.131 Background information on college availability (implicitly included in the instrument)  Distance to nearest college 27.580 26.184 0 172.269  At least one college in district 0.130 0.337 0 1  Colleges within 100 km 5.860 3.401 0 16  College spots per inhabitant within 100 km 0.034 0.019 0 0.166 (1) (2) (3) (4) Statistics Mean SD Min Max Instrument: college availability 0.459 0.262 0.046 1.131 Background information on college availability (implicitly included in the instrument)  Distance to nearest college 27.580 26.184 0 172.269  At least one college in district 0.130 0.337 0 1  Colleges within 100 km 5.860 3.401 0 16  College spots per inhabitant within 100 km 0.034 0.019 0 0.166 Notes: Own calculations based on NEPS-Starting Cohort 6 data and German Statistical Yearbooks 1959–1991 (German Federal Statistical Office various issues, 1959–1991). Distances are calculated as the Euclidean distance between two respective district centroids. View Large The instrument jointly uses college openings and increases in size. Size is measured in enrollment as there is no available information on actual college spots. This might be considered worrisome as enrollment might reflect demand factors that are potentially endogenous. Although we believe that this is not a major problem as most study programs in the colleges were used to capacity, we also, as a robustness check, neglect information on enrollment and merely exploit information on college openings by using $$Z_{it}=\sum _{j}^{326}K( {\mathit {dist}}_{ij}) \times \boldsymbol {1}{[\mbox{college{\,\,}available}_{jt}],}$$ (8) where $$\boldsymbol {1}{[\cdot ]}$$ is the indicator function. The results when using this instrument are comparable, with minor differences, to those from the baseline specification as shown in Figure A.3 in the Appendix. Certainly, the overall findings and conclusions are not affected by this choice. We prefer the combined instrument as this uses information from both aspects of the educational expansion. 5. Results 5.1. OLS Although we are primarily interested in analyzing the returns to college education for the marginal individuals, we start with ordinary least squares (OLS) estimations as a benchmark. Column (1) in Table 4, panel A, reports results for hourly wages, columns (2) and (3) for the two health measures, whereas columns (4)–(6) do the same for the three measures of cognitive abilities. Each cell reports the coefficient of college education from a separate regression. After conditioning on observables, individuals with a college degree earn approximately 28% higher wages, on average. Although PCS is higher by around 0.3 of a standard deviation—recall that all outcomes but wages are standardized—there is no significant relation with MCS. Individuals with a college degree read, on average, 0.4 SD faster than those without college education. Moreover, they approximately have a by 0.7 SD better text understanding and mathematical literacy. All in all, the results are pretty much in line with the differences in standardized means as shown in Table 2, slightly attenuated, however, due to the inclusion of control variables. Table 4. Regression results for OLS and first stage estimations. (1) (2) (3) (4) (5) (6) Health measure Cognitive ability component Gross hourly wage PCS MCS Read. speed Read. comp. Math liter. Panel A: OLS results College degree 0.277*** 0.277*** 0.003 0.398*** 0.729*** 0.653*** (0.019) (0.033) (0.036) (0.037) (0.032) (0.044) Panel B: 2SLS first-stage results College availability 2.368*** 2.576*** 2.576*** 2.521*** 2.327*** 2.454*** (0.132) (0.122) (0.122) (0.132) (0.119) (0.159) Observations 3,378 4,813 4,813 3,995 4,576 2,904 (1) (2) (3) (4) (5) (6) Health measure Cognitive ability component Gross hourly wage PCS MCS Read. speed Read. comp. Math liter. Panel A: OLS results College degree 0.277*** 0.277*** 0.003 0.398*** 0.729*** 0.653*** (0.019) (0.033) (0.036) (0.037) (0.032) (0.044) Panel B: 2SLS first-stage results College availability 2.368*** 2.576*** 2.576*** 2.521*** 2.327*** 2.454*** (0.132) (0.122) (0.122) (0.132) (0.119) (0.159) Observations 3,378 4,813 4,813 3,995 4,576 2,904 Notes: Own calculations based on NEPS-Starting Cohort 6 data. Regressions also include a full set of control variables as well as year-of-birth and district fixed effects, and district-specific linear trends. District clustered standard errors in parentheses. ***p < 0.01. View Large Table 4. Regression results for OLS and first stage estimations. (1) (2) (3) (4) (5) (6) Health measure Cognitive ability component Gross hourly wage PCS MCS Read. speed Read. comp. Math liter. Panel A: OLS results College degree 0.277*** 0.277*** 0.003 0.398*** 0.729*** 0.653*** (0.019) (0.033) (0.036) (0.037) (0.032) (0.044) Panel B: 2SLS first-stage results College availability 2.368*** 2.576*** 2.576*** 2.521*** 2.327*** 2.454*** (0.132) (0.122) (0.122) (0.132) (0.119) (0.159) Observations 3,378 4,813 4,813 3,995 4,576 2,904 (1) (2) (3) (4) (5) (6) Health measure Cognitive ability component Gross hourly wage PCS MCS Read. speed Read. comp. Math liter. Panel A: OLS results College degree 0.277*** 0.277*** 0.003 0.398*** 0.729*** 0.653*** (0.019) (0.033) (0.036) (0.037) (0.032) (0.044) Panel B: 2SLS first-stage results College availability 2.368*** 2.576*** 2.576*** 2.521*** 2.327*** 2.454*** (0.132) (0.122) (0.122) (0.132) (0.119) (0.159) Observations 3,378 4,813 4,813 3,995 4,576 2,904 Notes: Own calculations based on NEPS-Starting Cohort 6 data. Regressions also include a full set of control variables as well as year-of-birth and district fixed effects, and district-specific linear trends. District clustered standard errors in parentheses. ***p < 0.01. View Large Panel B of Table 4 reports the first stage results of the 2SLS estimations. The coefficients of the instrument point into the expected direction and are highly significant. As to be expected, they barely change across the outcome variables (as the first-stage specifications only differ in the number of observations across the columns). In order to get a feeling for the effect size of college availability in the first-stage, we consider, as an example, the college opening in the city of Essen in 1972. In 1978, about 11,000 students studied there. To illustrate the effect of the opening, we assume a constant population size of 700,000 inhabitants. The kernel weight of new spots in the same district is 0.4 (=K(0)). According to equation (7), the instrument value increases by 0.006 (rounded). Given the coefficient of college availability of 2.4, an individual who made the college decision in Essen in 1978 had a 1.44 percentage points higher probability to go to college due to the opening of the college in Essen (compared to an individual who made the college decision in 1971). This seems to be a plausible effect. The effect of the college opening in Essen on individuals who live in districts other than Essen is smaller, depending on the distance to Essen. 5.2. Marginal Treatment Effects Figure 3(a) shows the distribution of the propensity scores used in estimating the MTE by treatment and control group. They are obtained by logit regressions of the college degree on all Z and X variables. Full regression results of the first and the second stage of the 2SLS estimations are reported in the Online Appendices. For both groups, the propensity score varies from 0 to about 1. Moreover, there is a common support of the propensity score almost on the unit interval. Variation in the propensity score where the effects of the X variables are integrated out is used to identify local effects. Figure 3. View largeDownload slide Distribution of propensity scores. Own illustration based on NEPS-Starting Cohort 6 data. The left panel shows the propensity score (PS) density by treatment status. The right panel illustrates the joint PS density (dashed line). The solid line shows the PS variation solely caused by variation in Z, since the X-effects have been integrated out. Further note that in the right panel the densities were both normalized such that they sum up to one over the 100 points where we evaluate the density. Figure 3. View largeDownload slide Distribution of propensity scores. Own illustration based on NEPS-Starting Cohort 6 data. The left panel shows the propensity score (PS) density by treatment status. The right panel illustrates the joint PS density (dashed line). The solid line shows the PS variation solely caused by variation in Z, since the X-effects have been integrated out. Further note that in the right panel the densities were both normalized such that they sum up to one over the 100 points where we evaluate the density. This variation is presented in Figure 3(b). It shows the conditional support of P when the influence of the linear X-index of observables on the propensity score is integrated out (∫fP(Z, X)dX). Here, the support ranges nearly from 0 to 0.8 only caused by variation in the instrument—the identifying variation. This is important in the semiparametric estimation since it shows the regions in which we can credibly identify (conditional on our assumptions) marginal effects without having to rely on inter- or extrapolations to regions where we do not have identifying variation. We calculate the MTE using a local linear regression with a bandwidth that ranges from 0.10 to 0.16 depending on the outcome variable.19 We calculate the marginal effects along the quantiles UD by evaluating the derivative of the treatment effect with respect to the propensity score (see equation (6) in Section 3). Figure 4 shows the MTE for all outcome variables. The upper left panel presents the MTE for wages. We find that individuals with low values of UD have the highest monetary returns to college education. Low values of UD mean that these are the individuals who are very likely to study as already small values of P(z) exceed UD, see the transformed choice equation in Section 3. The returns are close to 80% for the smallest values of UD and then approach 0 at UD ≈ 0.7. Thus, we tend to interpret these findings as clear and strong positive returns for the 70% of individuals with the highest desire to study, whereas there is no clear evidence for any returns for the remaining 30%. Hence, there is obviously selection into gains with respect to wages, where individuals with higher (realized) returns self-select into more education. This reflects the notion that individuals make choices based on their expected gains. Figure 4. View largeDownload slide Marginal treatment effects for cognitive abilities and health. Own illustration based on NEPS-Starting Cohort 6 data. For gross hourly wage, the log value is taken. Health and cognitive skill outcomes are standardized to mean 0 and standard deviation 1. The MTE (vertical axis) is measured in logs for wage and in units of standard deviations of the health and cognitive skill outcomes. The dashed lines give the 95% confidence intervals based on clustered bootstrapped standard errors with 200 replications. Calculations based on a local linear regression where the influence of the control variables was isolated using a semiparametric Robinson estimator (Robinson 1988) for each outcome variable. The optimal, exact bandwidths for the local linear regressions are: for wage 0.10, PCS 0.13, MCS 0.16, reading competence 0.10, for reading speed 0.11, math score 0.12. Figure 4. View largeDownload slide Marginal treatment effects for cognitive abilities and health. Own illustration based on NEPS-Starting Cohort 6 data. For gross hourly wage, the log value is taken. Health and cognitive skill outcomes are standardized to mean 0 and standard deviation 1. The MTE (vertical axis) is measured in logs for wage and in units of standard deviations of the health and cognitive skill outcomes. The dashed lines give the 95% confidence intervals based on clustered bootstrapped standard errors with 200 replications. Calculations based on a local linear regression where the influence of the control variables was isolated using a semiparametric Robinson estimator (Robinson 1988) for each outcome variable. The optimal, exact bandwidths for the local linear regressions are: for wage 0.10, PCS 0.13, MCS 0.16, reading competence 0.10, for reading speed 0.11, math score 0.12. The curve of marginal treatment effects resembles the one found by Carneiro et al. (2011) for the United States with the main difference that we do not find negative effects (but just zero) for a part of the distribution. The effect sizes are also comparable although ours are somewhat smaller. For instance, Carneiro et al. (2011) find highest returns of 28% per year of college, whereas we find 80% for the college degree that, on average, takes 4.5 years to be earned. What could explain these wage returns? Two potential channels of higher earnings could be better cognitive skills and/or better health due to increased education. The findings on skills and health that we discuss in the following could, thus, be read as investigations into mechanisms for the positive wage returns. However, at least for health, this would only be one potential interpretation as health might also be directly affected by income. The right column of Figure 4 plots the results for cognitive skills. The distribution of marginal treatment effects is remarkably similar to the one for wages. We see that, also in terms of cognitive skills, not everybody benefits from more education. Some individuals, again those with high desire to study, strongly benefit, while the effects approach zero for individuals with UD > 0.6. This holds for reading speed, reading competence, as well as mathematical literacy. The largest returns are as high as 2–3 standard deviations, again, for the small group with highest college readiness only. Thus, we observe the same selection into gains as with wages and the findings could be interpreted as returns to cognitive abilities from education being a potential pathway for positive earnings returns. The findings are somewhat different for health, as seen in the lower left part of Figure 4. First of all, the returns are much more homogeneous than those for wages and skills. Although there is still some heterogeneity in returns to physical health (though to a smaller degree than before) returns are completely homogeneous for mental health. Moreover, the returns are zero throughout for mental health. Physical health effects are positive (although not always statistically significant) for around 75% of the individuals whereas they are close to zero for the 25% with the lowest desire to study. The main findings of this paper can be summarized as follows: – Education leads to higher wages and cognitive abilities for the same approximately 60% of individuals. This can also be read as suggestive evidence for cognitive abilities being a channel for the effect of education on wages. – Education does not pay off for everybody. However, in no case are the effects negative. Thus, education does never harm in terms of gross wages, skills and health. (Obviously, this view only considers potential benefits and disregards costs—thus, net benefits might well be negative for some individuals.) – There are clear signs of selection into gains. Those individuals who realize the highest returns to education are those who are most ready to take it. With policy initiatives such as the “Higher Education Pact 2020” Germany continuously increases participation in higher education in order to meet OECD standards (see OECD 2015a,b). Our results imply that this might not pay off, at least in terms of productivity (measured by wages), cognitive abilities, and health. Without fully simulating the results of further increased numbers of students in Germany, it is safe to assume that additional students would be those with higher values of UD as those with the high desire to study are in large parts already enrolled. But these additional students are the ones that do not seem to benefit from college education. However, this projection needs to be taken with a grain of salt as our findings are based on education in the 1960s–1980s and current education might yield different effects. We carry out two kinds of robustness checks with respect to the definition of the instrument (see Section 4.4). Figure A.3 in the Appendix reports the findings when the instrument definition does not consider the increases in college size. The MTE curves do not exactly stay the same as before but the main conclusions are unchanged. Wage returns are slightly more homogeneous. The results for reading competence and mathematical literacy are virtually the same whereas for reading speed homogeneously positive effects are found. However, the confidence bands of the curves for both definitions of the instrument widely overlap. This also holds for the health measures. The MTE curve for MCS is slightly shifted upward and the one for PCS is more homogeneous but the difference in the curves across both kinds of instruments are not significant. Although the likelihood that two valid instruments exactly deliver the same results is fairly low in any application (and basically zero when so many points are evaluated as is the case here), the broad picture that leads to the conclusions stated previously is invariant to the change in the instrument definition. In Online Appendix C, we report the results of robustness check where we use different kernel bandwidths to weight the college distance (bandwidths between 100 and 250 km). Here the differences are indeed widely absent. Although the condensation of college availability in equation (7) seems somewhat arbitrary, these robustness checks show that the specification of the instrument does not affect our conclusions. 5.3. Treatment Parameters Table 5 reports the conventional treatment parameters estimated using the MTE and the respective weights as described previously and more formally derived and explained in, for example, Heckman et al. (2006). In particular, we calculate the average treatment effect (ATE), the average treatment effect on the treated (ATT), the average treatment effect on the untreated (ATU) and the local average treatment effect (LATE). The estimated weights applied to the returns for each UD on the MTE curve are shown in Figure 5. Figure 5. View largeDownload slide Treatment parameter weights conditional on the propensity score. Own illustration based on NEPS-Starting Cohort 6 data. Weights were calculated using the entire sample of 8,672 observations for that we have instrument and control variable information in spite of availability of the outcome variable. Figure 5. View largeDownload slide Treatment parameter weights conditional on the propensity score. Own illustration based on NEPS-Starting Cohort 6 data. Weights were calculated using the entire sample of 8,672 observations for that we have instrument and control variable information in spite of availability of the outcome variable. Table 5. Estimated treatment parameters for main results. (1) (2) (3) (4) Treatment parameter ATE ATT ATU LATE Main outcomes  Log gross wage 0.43 0.59 0.36 0.49 (0.06) (0.07) (0.07) (0.05)  PCS 0.45 0.86 0.29 0.55 (0.13) (0.13) (0.16) (0.09)  MCS 0.10 0.09 0.10 0.05 (0.10) (0.12) (0.13) (0.08)  Reading competence 1.10 1.88 0.78 1.18 (0.13) (0.15) (0.16) (0.08)  Reading speed 0.72 1.17 0.54 0.70 (0.14) (0.15) (0.18) (0.11)  Mathematical literacy 1.11 1.56 0.93 1.13 (0.17) (0.21) (0.19) (0.14) (1) (2) (3) (4) Treatment parameter ATE ATT ATU LATE Main outcomes  Log gross wage 0.43 0.59 0.36 0.49 (0.06) (0.07) (0.07) (0.05)  PCS 0.45 0.86 0.29 0.55 (0.13) (0.13) (0.16) (0.09)  MCS 0.10 0.09 0.10 0.05 (0.10) (0.12) (0.13) (0.08)  Reading competence 1.10 1.88 0.78 1.18 (0.13) (0.15) (0.16) (0.08)  Reading speed 0.72 1.17 0.54 0.70 (0.14) (0.15) (0.18) (0.11)  Mathematical literacy 1.11 1.56 0.93 1.13 (0.17) (0.21) (0.19) (0.14) Notes: Own calculations based on NEPS-Starting Cohort 6 data. The MTE is estimated with a semiparametric Robinson estimator. The LATE is estimated using the IV weights depicted in Figure 5. Therefore, the LATE in this table deviates slightly from corresponding 2SLS estimates. Standard error estimated using a clustered bootstrap (at district level) with 200 replications. View Large Table 5. Estimated treatment parameters for main results. (1) (2) (3) (4) Treatment parameter ATE ATT ATU LATE Main outcomes  Log gross wage 0.43 0.59 0.36 0.49 (0.06) (0.07) (0.07) (0.05)  PCS 0.45 0.86 0.29 0.55 (0.13) (0.13) (0.16) (0.09)  MCS 0.10 0.09 0.10 0.05 (0.10) (0.12) (0.13) (0.08)  Reading competence 1.10 1.88 0.78 1.18 (0.13) (0.15) (0.16) (0.08)  Reading speed 0.72 1.17 0.54 0.70 (0.14) (0.15) (0.18) (0.11)  Mathematical literacy 1.11 1.56 0.93 1.13 (0.17) (0.21) (0.19) (0.14) (1) (2) (3) (4) Treatment parameter ATE ATT ATU LATE Main outcomes  Log gross wage 0.43 0.59 0.36 0.49 (0.06) (0.07) (0.07) (0.05)  PCS 0.45 0.86 0.29 0.55 (0.13) (0.13) (0.16) (0.09)  MCS 0.10 0.09 0.10 0.05 (0.10) (0.12) (0.13) (0.08)  Reading competence 1.10 1.88 0.78 1.18 (0.13) (0.15) (0.16) (0.08)  Reading speed 0.72 1.17 0.54 0.70 (0.14) (0.15) (0.18) (0.11)  Mathematical literacy 1.11 1.56 0.93 1.13 (0.17) (0.21) (0.19) (0.14) Notes: Own calculations based on NEPS-Starting Cohort 6 data. The MTE is estimated with a semiparametric Robinson estimator. The LATE is estimated using the IV weights depicted in Figure 5. Therefore, the LATE in this table deviates slightly from corresponding 2SLS estimates. Standard error estimated using a clustered bootstrap (at district level) with 200 replications. View Large Whereas the local average treatment effect is an average effect weighted by the conditional density of the instrument, the ATT (vice versa for the ATU) for example gives more weight to those individuals that select already into higher education at low UD values (indicating low intrinsic reluctance for higher education). The reason is that their likelihood of being in any “treatment group” is higher compared to individuals with higher values of UD. The ATE places equal weight over the whole support. In all cases but mental health and reading speed, the LATE parameters in column (4) approximately double compared to the OLS estimates. Increasing local average treatment effects (compared to OLS) seem to be counterintuitive as one often expects OLS to overestimate the true effects. Yet, this is not an uncommon finding and in a world with heterogeneous effects often explained by the group of compliers that potentially has higher individual treatment effects than the average individual (Card 2001). This is directly obvious by comparing the LATE to column (1) that is another indication of selection into gains. Regarding the other treatment parameters, the LATE lies within the range of the ATT and the ATU. Note that these are the “empirical”, conditional-on-the-sample parameters as calculated in Basu et al. (2007), that is, the treatment parameters conditional on the common support of the propensity score. The population ATE, however, would require full support on the unity interval.20 As depicted in Figure 3, we do not have full support in the data at hand. Although we observe individuals with and without college degree for most probabilities to study, we cannot observe an individual with a probability arbitrarily close to 100% without college degree (and arbitrarily close to 0% with a degree). Instead, the parameters in Table 5 were computed using the marginal treatment effects on the common support only. However, as this reaches from 0.002 to 0.969 it seems fair to say that this probably comes very close to the true parameters. Table 5 is informative in particular for two reasons. First, it boils down the MTE to single numbers such that the average effect size immediately becomes clear. And, second, differences between the parameters again emphasize the role of effect heterogeneity. Together with the bootstrapped standard errors the table reveals that the ATT and the ATU structurally differ from each other for all outcomes but mental health. Hence, the treatment group of college graduates seems to benefit from higher education in terms of wages, skills, and physical health compared to the non-graduates. One reason is that they might choose to study because of their idiosyncratic skill returns. Yet, it is also likely to be windfall gains that go along with monetary college premiums that the decision was more likely to be based on. Nonetheless, this also is evidence for selection into gains. The effect sizes for all (ATE), for the university degree subgroup (ATT), and for those without higher education (ATU) in Table 5 capture the overall returns to college education, not the per-year effects. On average, the per-year effect is approximately the overall effect divided by 4.5 years (the regular time it takes to receive a Diplom degree), if we assume linear additivity of the yearly effects. The per-year effects for mathematical literacy and reading competence are about 25% of a standard deviation for all parameters. For reading speed the effects are around 15% of an SD, whereas the wage effects are around 10%. These effects are of considerable size, yet slightly smaller than those found in the previous literature on different treatments and, importantly, different compliers. For instance, ability returns to an additional year of compulsory schooling were found to be up to 0.5 SD (see, e.g., Banks and Mazzonna 2012). To get an idea of the total effect of college education on, say, math skills, the following example might help. If you start at the median of the standardized unconditional math score distribution (Φ(0) = 50%), the average effect of 1.11 of a standard deviation, all other things the same, will make you end up at the 87% quantile of that distribution (Φ(0 + 1.11) = 87%)—in the thought experiment of being the only treated in the peer group. As suggested by the pattern of the marginal treatment effects in Figure 4, the health returns to higher education are smaller than the skill returns, still they are around 10% of an SD per year (except for the zero effect on mental health). Given the previous literature, the results seem reasonable. Regarding statistical significance of the effects, note that we use several outcome variables and potentially run into multiple testing problems. Yet, we refrain from taking this into account by a complex algorithm that also accounts for the correlation of the six outcome variables and argue the following way: All ATEs and ATTs are highly statistically significant. Thus, our multiple testing procedure with six outcomes should not be a major issue. Even with a most conservative Bonferroni correction, critical values for statistical significance at the 5% level would increase from 1.96 to 2.65 and would not change any conclusions regarding significance.21 6. Potential Mechanisms for Health and Cognitive Abilities In this section, we investigate the role of potential mechanisms through which college education may work. It is likely to affect the observed level of health and cognitive abilities through the attained stock of health capital and the cognitive reserve—the mind’s ability to tolerate brain damage (Stern 2012; Meng and D’Arcy 2012). There are probably three channels through which education affects long-run health and cognitive abilities: – in college: a direct effect from education; – post-college: a diminished age-related decline in health and skills due to the higher health capital/cognitive reserve attained in college (e.g., the “cognitive reserve hypothesis”, Stern et al. 1999); – post-college: different health behavior or different jobs that are less detrimental to health and more cognitively demanding (Stern 2012). The post-college mechanisms that compensate for the decline also contain implicit multiplying factors like complementarities and self-productivity, see Cunha et al. (2006) and Cunha and Heckman (2007). The NEPS data includes various job characteristics and health behaviors that potentially reduce the age-related skill/health decline. However, the data neither allow us to disentangle these components empirically (i.e., observing changes in one channel that are exogenous from other channels) nor to analyze how the effect on the mechanism causally maps into higher skills or better health (as, e.g., in Heckman et al. 2013). Thus, it should be noted that this sub-analysis is merely suggestive and by no means a comprehensive analysis on the mechanisms of the effects found in the previous section. Moreover, the following analysis focusses on the potential channel of different jobs and health behaviors. It does the same as before (same controls, same estimation strategy and instrument) but replaces the outcome variables by the indicators of potential mechanisms. Cognitive Abilities. The main driving force behind skill formation after college might lie in activities on the job. When individuals with college education engage in more cognitively demanding activities, for example, more sophisticated jobs, this might mentally exercise their minds (Rohwedder and Willis 2010). This effect of mental training is sometimes referred to as use-it-or-lose-it hypothesis, see Rohwedder and Willis (2010) or Salthouse (2006). If such an exercise effect leads to alternating brain networks that “may compensate for the pathological disruption of preexisting networks” (Meng and D’Arcy 2012, p. 2), a higher demand for cognitively demanding tasks (as a result of college education) increases the individual’s cognitive capacity. In order to investigate if a more cognitively demanding job might be a potential mechanism (as, e.g., suggested by Fisher et al. 2014), we use information on the individual’s activities on the job. All four outcome variables considered in this subsection are binary, their definitions, sample means effects of college education are given in Table 6. For the sake of brevity we focus on the most relevant treatment parameters here and do not discuss the MTE curvatures. Table 6. Potential mechanisms for cognitive skills. Parameter Definition Sample mean ATE ATT ATU Math: percentages =1 if job requires calculating with 0.711 0.20 0.23 0.19 percentages and fractions (0.06) (0.07) (0.07) Reading =1 if respondent often spends more 0.777 0.23 0.30 0.30 than 2 hours reading (0.03) (0.03) (0.04) Writing =1 if respondent often writes more 0.704 0.39 0.64 0.29 than 1 page (0.07) (0.09) (0.07) Learning new things =1 if respondent reports to learn new 0.671 0.22 0.31 0.18 things often (0.07) (0.09) (0.07) Parameter Definition Sample mean ATE ATT ATU Math: percentages =1 if job requires calculating with 0.711 0.20 0.23 0.19 percentages and fractions (0.06) (0.07) (0.07) Reading =1 if respondent often spends more 0.777 0.23 0.30 0.30 than 2 hours reading (0.03) (0.03) (0.04) Writing =1 if respondent often writes more 0.704 0.39 0.64 0.29 than 1 page (0.07) (0.09) (0.07) Learning new things =1 if respondent reports to learn new 0.671 0.22 0.31 0.18 things often (0.07) (0.09) (0.07) Notes: Own calculations based on NEPS-Starting Cohort 6 data. Definitions are taken from the data manual. Standard error estimated using a clustered bootstrap (district level) and reported in parentheses. View Large Table 6. Potential mechanisms for cognitive skills. Parameter Definition Sample mean ATE ATT ATU Math: percentages =1 if job requires calculating with 0.711 0.20 0.23 0.19 percentages and fractions (0.06) (0.07) (0.07) Reading =1 if respondent often spends more 0.777 0.23 0.30 0.30 than 2 hours reading (0.03) (0.03) (0.04) Writing =1 if respondent often writes more 0.704 0.39 0.64 0.29 than 1 page (0.07) (0.09) (0.07) Learning new things =1 if respondent reports to learn new 0.671 0.22 0.31 0.18 things often (0.07) (0.09) (0.07) Parameter Definition Sample mean ATE ATT ATU Math: percentages =1 if job requires calculating with 0.711 0.20 0.23 0.19 percentages and fractions (0.06) (0.07) (0.07) Reading =1 if respondent often spends more 0.777 0.23 0.30 0.30 than 2 hours reading (0.03) (0.03) (0.04) Writing =1 if respondent often writes more 0.704 0.39 0.64 0.29 than 1 page (0.07) (0.09) (0.07) Learning new things =1 if respondent reports to learn new 0.671 0.22 0.31 0.18 things often (0.07) (0.09) (0.07) Notes: Own calculations based on NEPS-Starting Cohort 6 data. Definitions are taken from the data manual. Standard error estimated using a clustered bootstrap (district level) and reported in parentheses. View Large College education has strong effects on all four outcomes. It increases the likelihood to be in a job that requires calculating with percentages and fractions, that involves reading or writing and in which individuals often learn new things. The effect sizes are very large which is not too surprising as many of the jobs that entail these mentally demanding tasks require a college diploma as a quasi-formal condition of employment. Moreover, as observed before, there seems to be effect heterogeneity here as well and selection into gains as all average treatment effects on the treated are larger than the treatment effects on the untreated (except for the case of reading more than 2 h). The differences are particularly strong for writing and for learning new things. All in all, the findings suggest that cognitively more demanding jobs due to college education might play a role in explaining long-run cognitive returns to education. Note again, however, that these findings are only suggestive evidence for a causal mechanism. It might as well be that it is the other way around and the cognitive abilities attained in college induce a selection into these job types. Health Concerning the health mechanisms, we study job-related effects and effects on health behavior. The NEPS data cover engagement in several physical activities on the job, for example, working in a standing position, working in an uncomfortable position (like bending often), walking or cycling long distances, or carrying heavy loads. Table 7 reports definitions, sample means and effects. The binary indicators are coded as 1 if the respondent reports to engage in the activity (and 0 otherwise) in the upper panel of the table. Table 7. Potential mechanisms for health. Parameter Definition Sample mean ATE ATT ATU Physically demanding activities on the job  Standing position =1 if often working in a standing 0.302 −0.37 −0.56 −0.30 position for 2 or more hours (0.07) (0.09) 0.08)  Uncomfortable pos. =1 if respondent needs to bend, crawl, 0.190 −0.20 −0.37 −0.13 lie down, keen or squat (0.05) (0.06) (0.06)  Walking =1 if job often requires walking, 0.242 −0.39 −0.56 −0.32 running or cycling (0.06) (0.07) (0.07)  Carrying =1 if often carrying a load of at least 0.182 −0.40 −0.50 −0.37 10 kg (0.05) (0.05) (0.05) Health behaviors  Obesity =1 if body mass index (=weight in 0.155 −0.08 −0.15 −0.05 kg/height in m2) > 30 (0.04) (0.05) (0.05)  Smoking =1 if currently smoking 0.270 −0.18 −0.23 −0.16 (0.06) (0.06) (0.07)  Alcohol amount =1 if three or more drinks when 0.187 −0.14 −0.13 −0.14 consuming alcohol (0.05) (0.06) (0.06)  Sport =1 if any sporting exercise in the 0.717 0.16 0.31 0.10 previous 3 months (0.07) (0.07) (0.09) Parameter Definition Sample mean ATE ATT ATU Physically demanding activities on the job  Standing position =1 if often working in a standing 0.302 −0.37 −0.56 −0.30 position for 2 or more hours (0.07) (0.09) 0.08)  Uncomfortable pos. =1 if respondent needs to bend, crawl, 0.190 −0.20 −0.37 −0.13 lie down, keen or squat (0.05) (0.06) (0.06)  Walking =1 if job often requires walking, 0.242 −0.39 −0.56 −0.32 running or cycling (0.06) (0.07) (0.07)  Carrying =1 if often carrying a load of at least 0.182 −0.40 −0.50 −0.37 10 kg (0.05) (0.05) (0.05) Health behaviors  Obesity =1 if body mass index (=weight in 0.155 −0.08 −0.15 −0.05 kg/height in m2) > 30 (0.04) (0.05) (0.05)  Smoking =1 if currently smoking 0.270 −0.18 −0.23 −0.16 (0.06) (0.06) (0.07)  Alcohol amount =1 if three or more drinks when 0.187 −0.14 −0.13 −0.14 consuming alcohol (0.05) (0.06) (0.06)  Sport =1 if any sporting exercise in the 0.717 0.16 0.31 0.10 previous 3 months (0.07) (0.07) (0.09) Notes: Own calculations based on NEPS-Starting Cohort 6 data. Definitions are taken from the data manual. Standard error estimated using a clustered bootstrap (at district level) and reported in parentheses. View Large Table 7. Potential mechanisms for health. Parameter Definition Sample mean ATE ATT ATU Physically demanding activities on the job  Standing position =1 if often working in a standing 0.302 −0.37 −0.56 −0.30 position for 2 or more hours (0.07) (0.09) 0.08)  Uncomfortable pos. =1 if respondent needs to bend, crawl, 0.190 −0.20 −0.37 −0.13 lie down, keen or squat (0.05) (0.06) (0.06)  Walking =1 if job often requires walking, 0.242 −0.39 −0.56 −0.32 running or cycling (0.06) (0.07) (0.07)  Carrying =1 if often carrying a load of at least 0.182 −0.40 −0.50 −0.37 10 kg (0.05) (0.05) (0.05) Health behaviors  Obesity =1 if body mass index (=weight in 0.155 −0.08 −0.15 −0.05 kg/height in m2) > 30 (0.04) (0.05) (0.05)  Smoking =1 if currently smoking 0.270 −0.18 −0.23 −0.16 (0.06) (0.06) (0.07)  Alcohol amount =1 if three or more drinks when 0.187 −0.14 −0.13 −0.14 consuming alcohol (0.05) (0.06) (0.06)  Sport =1 if any sporting exercise in the 0.717 0.16 0.31 0.10 previous 3 months (0.07) (0.07) (0.09) Parameter Definition Sample mean ATE ATT ATU Physically demanding activities on the job  Standing position =1 if often working in a standing 0.302 −0.37 −0.56 −0.30 position for 2 or more hours (0.07) (0.09) 0.08)  Uncomfortable pos. =1 if respondent needs to bend, crawl, 0.190 −0.20 −0.37 −0.13 lie down, keen or squat (0.05) (0.06) (0.06)  Walking =1 if job often requires walking, 0.242 −0.39 −0.56 −0.32 running or cycling (0.06) (0.07) (0.07)  Carrying =1 if often carrying a load of at least 0.182 −0.40 −0.50 −0.37 10 kg (0.05) (0.05) (0.05) Health behaviors  Obesity =1 if body mass index (=weight in 0.155 −0.08 −0.15 −0.05 kg/height in m2) > 30 (0.04) (0.05) (0.05)  Smoking =1 if currently smoking 0.270 −0.18 −0.23 −0.16 (0.06) (0.06) (0.07)  Alcohol amount =1 if three or more drinks when 0.187 −0.14 −0.13 −0.14 consuming alcohol (0.05) (0.06) (0.06)  Sport =1 if any sporting exercise in the 0.717 0.16 0.31 0.10 previous 3 months (0.07) (0.07) (0.09) Notes: Own calculations based on NEPS-Starting Cohort 6 data. Definitions are taken from the data manual. Standard error estimated using a clustered bootstrap (at district level) and reported in parentheses. View Large Table A.1. Control variables and means by college degree. Respondents Variable Definition with college degree w/o college degree General information  Female =1 if respondent is female 40.38 54.18  Year of birth (FE) Year of birth of the respondent 1959 1959  Migrational background =1 if respondent was born abroad 0.89 0.64  No native speaker =1 if mother tongue is not German 0.30 0.43  Rural district =1 if current district is rural 16.79 24.96  Mother still alive =1 if mother is still alive in 2009/10 65.38 63.83  Father still alive =1 if father is still alive in 2009/10 45.27 42.3 Pre-college living conditions  Married before college =1 if respondent got married before the year of the college decision or in the same year 0.20 0.44  Parent before college =1 if respondent became a parent before the year of the college decision or in the same year 0.30 0.17  Siblings Number of siblings 1.56 1.87  First born =1 if respondent was the first born in the family 33.73 29.01  Age 15: lived by single parent =1 if respondent was raised by single parent 5.33 5.32  Age 15: lived in patchwork family =1 if respondent was raised in a patchwork family 1.11 0.27  Age 15: orphan =1 if respondent was an orphan at the age of 15 0.10 0.20  Age 15: mother employed =1 if mother was employed at the respondent’s age of 15 45.93 46.87  Age 15: mother never unemployed =1 if mother was never unemployed until the respondent’s age of 15 61.24 62.29  Age 15: father employed =1 if father was employed at the respondent’s age of 15 92.46 90.73  Age 15: father never unemployed =1 if father was never unemployed until the respondent’s age of 15 98.45 97.14 Pre-college education  Final school grade: excellence =1 if the overall grade of the highest school degree was excellent 4.59 1.79  Final school grade: good =1 if the overall grade of the highest school degree was good 31.51 25.83  Final school grade: satisfactory =1 if the overall grade of the highest school degree was satisfactory 17.97 28.03  Final school grade: sufficient or worse =1 if the overall grade of the highest school degree was sufficient or worse 1.04 1.42  Repeated one grade =1 if student needed to repeat one grade in elementary or secondary school 19.97 20.51  Repeated two or more grades =1 if student needed to repeat two or more grades in elementary or secondary school 2.74 1.85  Military service =1 if respondent was drafted for compulsory military service 28.03 23.89 Parental characteristics (M: mother, F: father)  M: year of birth (FE) Year of birth of the respondent’s mother 1930 1932  M: migrational background =1 if mother was born abroad 5.47 4.85  M: at least inter. edu =1 if mother has at least an intermediate secondary school degree 17.97 5.95  M: vocational training =1 if mother’s highest degree is vocational training 20.86 16.18  M: further job qualification =1 if mother has further job qualification (e.g., Meister degree) 4.29 1.73  F: year of birth (FE) Year of birth of the respondent’s father 1927 1929  F: migrational background =1 if father was born abroad 6.36 5.54  F: at least inter. edu =1 if father has at least an intermediate secondary school degree 20.86 8.09  F: vocational training =1 if father’s highest degree is vocational training 19.12 21.99  F: further job qualification =1 if father has further job qualification (e.g., Meister degree) 11.46 6.76 Number of observations (PCS and MCS sample) 1,352 3,461 Respondents Variable Definition with college degree w/o college degree General information  Female =1 if respondent is female 40.38 54.18  Year of birth (FE) Year of birth of the respondent 1959 1959  Migrational background =1 if respondent was born abroad 0.89 0.64  No native speaker =1 if mother tongue is not German 0.30 0.43  Rural district =1 if current district is rural 16.79 24.96  Mother still alive =1 if mother is still alive in 2009/10 65.38 63.83  Father still alive =1 if father is still alive in 2009/10 45.27 42.3 Pre-college living conditions  Married before college =1 if respondent got married before the year of the college decision or in the same year 0.20 0.44  Parent before college =1 if respondent became a parent before the year of the college decision or in the same year 0.30 0.17  Siblings Number of siblings 1.56 1.87  First born =1 if respondent was the first born in the family 33.73 29.01  Age 15: lived by single parent =1 if respondent was raised by single parent 5.33 5.32  Age 15: lived in patchwork family =1 if respondent was raised in a patchwork family 1.11 0.27  Age 15: orphan =1 if respondent was an orphan at the age of 15 0.10 0.20  Age 15: mother employed =1 if mother was employed at the respondent’s age of 15 45.93 46.87  Age 15: mother never unemployed =1 if mother was never unemployed until the respondent’s age of 15 61.24 62.29  Age 15: father employed =1 if father was employed at the respondent’s age of 15 92.46 90.73  Age 15: father never unemployed =1 if father was never unemployed until the respondent’s age of 15 98.45 97.14 Pre-college education  Final school grade: excellence =1 if the overall grade of the highest school degree was excellent 4.59 1.79  Final school grade: good =1 if the overall grade of the highest school degree was good 31.51 25.83  Final school grade: satisfactory =1 if the overall grade of the highest school degree was satisfactory 17.97 28.03  Final school grade: sufficient or worse =1 if the overall grade of the highest school degree was sufficient or worse 1.04 1.42  Repeated one grade =1 if student needed to repeat one grade in elementary or secondary school 19.97 20.51  Repeated two or more grades =1 if student needed to repeat two or more grades in elementary or secondary school 2.74 1.85  Military service =1 if respondent was drafted for compulsory military service 28.03 23.89 Parental characteristics (M: mother, F: father)  M: year of birth (FE) Year of birth of the respondent’s mother 1930 1932  M: migrational background =1 if mother was born abroad 5.47 4.85  M: at least inter. edu =1 if mother has at least an intermediate secondary school degree 17.97 5.95  M: vocational training =1 if mother’s highest degree is vocational training 20.86 16.18  M: further job qualification =1 if mother has further job qualification (e.g., Meister degree) 4.29 1.73  F: year of birth (FE) Year of birth of the respondent’s father 1927 1929  F: migrational background =1 if father was born abroad 6.36 5.54  F: at least inter. edu =1 if father has at least an intermediate secondary school degree 20.86 8.09  F: vocational training =1 if father’s highest degree is vocational training 19.12 21.99  F: further job qualification =1 if father has further job qualification (e.g., Meister degree) 11.46 6.76 Number of observations (PCS and MCS sample) 1,352 3,461 Notes: Own calculations based on NEPS-Starting Cohort 6 data. Definitions are taken from the data manual. Mean values refer to the MCS and PCS sample. FE = variable values are included as fixed effects in the analysis. View Large Table A.1. Control variables and means by college degree. Respondents Variable Definition with college degree w/o college degree General information  Female =1 if respondent is female 40.38 54.18  Year of birth (FE) Year of birth of the respondent 1959 1959  Migrational background =1 if respondent was born abroad 0.89 0.64  No native speaker =1 if mother tongue is not German 0.30 0.43  Rural district =1 if current district is rural 16.79 24.96  Mother still alive =1 if mother is still alive in 2009/10 65.38 63.83  Father still alive =1 if father is still alive in 2009/10 45.27 42.3 Pre-college living conditions  Married before college =1 if respondent got married before the year of the college decision or in the same year 0.20 0.44  Parent before college =1 if respondent became a parent before the year of the college decision or in the same year 0.30 0.17  Siblings Number of siblings 1.56 1.87  First born =1 if respondent was the first born in the family 33.73 29.01  Age 15: lived by single parent =1 if respondent was raised by single parent 5.33 5.32  Age 15: lived in patchwork family =1 if respondent was raised in a patchwork family 1.11 0.27  Age 15: orphan =1 if respondent was an orphan at the age of 15 0.10 0.20  Age 15: mother employed =1 if mother was employed at the respondent’s age of 15 45.93 46.87  Age 15: mother never unemployed =1 if mother was never unemployed until the respondent’s age of 15 61.24 62.29  Age 15: father employed =1 if father was employed at the respondent’s age of 15 92.46 90.73  Age 15: father never unemployed =1 if father was never unemployed until the respondent’s age of 15 98.45 97.14 Pre-college education  Final school grade: excellence =1 if the overall grade of the highest school degree was excellent 4.59 1.79  Final school grade: good =1 if the overall grade of the highest school degree was good 31.51 25.83  Final school grade: satisfactory =1 if the overall grade of the highest school degree was satisfactory 17.97 28.03  Final school grade: sufficient or worse =1 if the overall grade of the highest school degree was sufficient or worse 1.04 1.42  Repeated one grade =1 if student needed to repeat one grade in elementary or secondary school 19.97 20.51  Repeated two or more grades =1 if student needed to repeat two or more grades in elementary or secondary school 2.74 1.85  Military service =1 if respondent was drafted for compulsory military service 28.03 23.89 Parental characteristics (M: mother, F: father)  M: year of birth (FE) Year of birth of the respondent’s mother 1930 1932  M: migrational background =1 if mother was born abroad 5.47 4.85  M: at least inter. edu =1 if mother has at least an intermediate secondary school degree 17.97 5.95  M: vocational training =1 if mother’s highest degree is vocational training 20.86 16.18  M: further job qualification =1 if mother has further job qualification (e.g., Meister degree) 4.29 1.73  F: year of birth (FE) Year of birth of the respondent’s father 1927 1929  F: migrational background =1 if father was born abroad 6.36 5.54  F: at least inter. edu =1 if father has at least an intermediate secondary school degree 20.86 8.09  F: vocational training =1 if father’s highest degree is vocational training 19.12 21.99  F: further job qualification =1 if father has further job qualification (e.g., Meister degree) 11.46 6.76 Number of observations (PCS and MCS sample) 1,352 3,461 Respondents Variable Definition with college degree w/o college degree General information  Female =1 if respondent is female 40.38 54.18  Year of birth (FE) Year of birth of the respondent 1959 1959  Migrational background =1 if respondent was born abroad 0.89 0.64  No native speaker =1 if mother tongue is not German 0.30 0.43  Rural district =1 if current district is rural 16.79 24.96  Mother still alive =1 if mother is still alive in 2009/10 65.38 63.83  Father still alive =1 if father is still alive in 2009/10 45.27 42.3 Pre-college living conditions  Married before college =1 if respondent got married before the year of the college decision or in the same year 0.20 0.44  Parent before college =1 if respondent became a parent before the year of the college decision or in the same year 0.30 0.17  Siblings Number of siblings 1.56 1.87  First born =1 if respondent was the first born in the family 33.73 29.01  Age 15: lived by single parent =1 if respondent was raised by single parent 5.33 5.32  Age 15: lived in patchwork family =1 if respondent was raised in a patchwork family 1.11 0.27  Age 15: orphan =1 if respondent was an orphan at the age of 15 0.10 0.20  Age 15: mother employed =1 if mother was employed at the respondent’s age of 15 45.93 46.87  Age 15: mother never unemployed =1 if mother was never unemployed until the respondent’s age of 15 61.24 62.29  Age 15: father employed =1 if father was employed at the respondent’s age of 15 92.46 90.73  Age 15: father never unemployed =1 if father was never unemployed until the respondent’s age of 15 98.45 97.14 Pre-college education  Final school grade: excellence =1 if the overall grade of the highest school degree was excellent 4.59 1.79  Final school grade: good =1 if the overall grade of the highest school degree was good 31.51 25.83  Final school grade: satisfactory =1 if the overall grade of the highest school degree was satisfactory 17.97 28.03  Final school grade: sufficient or worse =1 if the overall grade of the highest school degree was sufficient or worse 1.04 1.42  Repeated one grade =1 if student needed to repeat one grade in elementary or secondary school 19.97 20.51  Repeated two or more grades =1 if student needed to repeat two or more grades in elementary or secondary school 2.74 1.85  Military service =1 if respondent was drafted for compulsory military service 28.03 23.89 Parental characteristics (M: mother, F: father)  M: year of birth (FE) Year of birth of the respondent’s mother 1930 1932  M: migrational background =1 if mother was born abroad 5.47 4.85  M: at least inter. edu =1 if mother has at least an intermediate secondary school degree 17.97 5.95  M: vocational training =1 if mother’s highest degree is vocational training 20.86 16.18  M: further job qualification =1 if mother has further job qualification (e.g., Meister degree) 4.29 1.73  F: year of birth (FE) Year of birth of the respondent’s father 1927 1929  F: migrational background =1 if father was born abroad 6.36 5.54  F: at least inter. edu =1 if father has at least an intermediate secondary school degree 20.86 8.09  F: vocational training =1 if father’s highest degree is vocational training 19.12 21.99  F: further job qualification =1 if father has further job qualification (e.g., Meister degree) 11.46 6.76 Number of observations (PCS and MCS sample) 1,352 3,461 Notes: Own calculations based on NEPS-Starting Cohort 6 data. Definitions are taken from the data manual. Mean values refer to the MCS and PCS sample. FE = variable values are included as fixed effects in the analysis. View Large We find that college education reduces the probability of engaging in all four physically demanding activities. Again, the estimated effects are very large in size, implying that it is the college diploma that qualifies for a white-collar office-job position. These effects might explain why we find physical health effects of education and are in line with the absence of mental health effects. White-collar jobs are usually less demanding with respect to physical health but not at all less stressful. Besides physical activities on the job, health behaviors may be considered as an important dimension of the general formation of health over the life-cycle, see Cutler and Lleras-Muney (2010). To analyze this, we resort to the following variables in our data set: a binary indicator for obesity (body mass index exceeds 30) as a compound lifestyle measure and more direct behavioral variables like an indicator for smoking, the amount of alcohol consumption (1 if having at least three or more drinks when consuming alcohol), as well as physical activity measured by an indicator of having taken any sport exercise in the previous 3 months. The lower panel in Table 7 reports the sample means and treatment effects. College education leads to a decrease in the probability of being obese, but increases the probability of smoking. This is in line with LATE estimates of the effect of college education in the United States of Grimard and Parent (2007) and de Walque (2007). College education also seems to negatively affect alcohol consumption and increases the likelihood to engage in sport exercise. Again, the effect sizes are large, if not as large compared to the other potential mechanisms. Moreover, some of them are only marginally statistically significant. Taken together, college education affects potential health mechanisms in the expected direction. Again, there is effect heterogeneity, observable in different treatment parameters for the same outcome variables. Since health is a high dimensional measure, the potential mechanisms at hand are of course not able to explain the health returns to college education entirely. Nevertheless, the findings encourage us in our interpretation of the effects of college education on physical health. 7. Conclusion This paper uses the Marginal Treatment Effect framework introduced and advanced by Björklund and Moffitt (1987) and Heckman and Vytlacil (2005, 2007) to estimate returns to college education under essential heterogeneity. We use representative data from the German National Educational Panel Study (NEPS). Our outcome measures are wages, cognitive abilities, and health. Cognitive abilities are assessed using state-of-the-art cognitive competence tests on individual reading speed, text understanding, and mathematical literacy. As expected, all outcome variables are positively correlated with having a college degree in our data set. Using an instrument that exploit exogenous variation in the supply of colleges, we estimate marginal returns to college education. The main findings of this paper are as follows: College education improves average wages, cognitive abilities and physical health (but not mental health). There is heterogeneity in the effects and clear signs of selection into gains. Those individuals who realize the highest returns to education are those who are most ready to take it. Moreover, education does not pay off for everybody. Although it is never harmful, we find zero causal effects for around 30%–40% of the population. Thus, although college education is beneficial on average, further increasing the number of students—as sometimes called for—is less likely to pay off, as the current marginal students are those who are mostly in the range of zero causal effects. Potential mechanisms of skill returns are more demanding jobs that slow down the cognitive decline in later ages. Regarding health we find positive effects of higher education on BMI, non-smoking, sports participation and alcohol consumption. All in all, given that the average individual clearly seems to benefit from education and provided that the continuing technological progress has skills become more and more valuable, education should still be an answer to the technological change for the average individual. One limitation of this paper is that we are not able to stratify the analysis by study subject. This is left for future work. Appendix: Additional Figures and Tables Figure A.1. View largeDownload slide Spatial variation of colleges across districts and over time. Own illustration based on the German Statistical Yearbooks 1959–1991 (German Federal Statistical Office various issues, 1959–1991). The maps show all 326 West-German districts (Kreise, spatial units of 2009) but Berlin in the years 1958 (first year in the sample), 1970, 1980, and 1990 (last year in the sample). Districts usually cover a bigger city or some administratively connected villages. If a district has at least one college, the district is depicted in black. Only few districts have more than one college. For those districts the number of students is added up in the calculations but multiple colleges are not depicted separately in the maps. Figure A.1. View largeDownload slide Spatial variation of colleges across districts and over time. Own illustration based on the German Statistical Yearbooks 1959–1991 (German Federal Statistical Office various issues, 1959–1991). The maps show all 326 West-German districts (Kreise, spatial units of 2009) but Berlin in the years 1958 (first year in the sample), 1970, 1980, and 1990 (last year in the sample). Districts usually cover a bigger city or some administratively connected villages. If a district has at least one college, the district is depicted in black. Only few districts have more than one college. For those districts the number of students is added up in the calculations but multiple colleges are not depicted separately in the maps. Figure A.2. View largeDownload slide Distribution of dependent variables by college graduation. Own illustration based on NEPS-Starting Cohort 6 data. Figure A.2. View largeDownload slide Distribution of dependent variables by college graduation. Own illustration based on NEPS-Starting Cohort 6 data. Figure A.3. View largeDownload slide Sensitivity in marginal treatment effects when using only the sum of the kernel weighted college distances. Own illustration based on NEPS-Starting Cohort 6 data. For gross hourly wage, the log value is taken. Health and cognitive skill outcomes are standardized to mean 0 and standard deviation 1. The MTE (vertical axis) is measured in logs for wage and in units of standard deviations of the health and cognitive skill outcomes. The dashed lines give the 95% confidence intervals. Calculations based on a local linear regression where the influence of the control variables was isolated using a semiparametric Robinson estimator (Robinson 1988) for each outcome variable. Figure A.3. View largeDownload slide Sensitivity in marginal treatment effects when using only the sum of the kernel weighted college distances. Own illustration based on NEPS-Starting Cohort 6 data. For gross hourly wage, the log value is taken. Health and cognitive skill outcomes are standardized to mean 0 and standard deviation 1. The MTE (vertical axis) is measured in logs for wage and in units of standard deviations of the health and cognitive skill outcomes. The dashed lines give the 95% confidence intervals. Calculations based on a local linear regression where the influence of the control variables was isolated using a semiparametric Robinson estimator (Robinson 1988) for each outcome variable. The editor in charge of this paper was Claudio Michelacci. Acknowledgments We thank the editor and two anonymous referees for many helpful suggestions which improved the paper considerably. We are grateful to Pedro Carneiro, Arnaud Chevalier, Damon Clark, Eleonora Fichera, Martin Fischer, Hendrik Jürges and Corinna Kleinert for valuable comments and Claudia Fink for excellent research assistance. Furthermore, we would like to thank the participants of several conferences and seminars for helpful discussions. Access to Micro Census data at the GESIS-German Microdata Lab, Mannheim, is gratefully acknowledged. Financial support from the Deutsche Forschungsgemeinschaft (DFG, Grant number SCHM 3140/1-1) is gratefully acknowledged. Matthias Westphal is affiliated with and was also partly funded by the Ruhr Graduate School in Economics. Hendrik Schmitz and Matthias Westphal are furthermore affiliated with the Leibniz Science Campus Ruhr. This paper uses data from the National Educational Panel Study (NEPS): Starting Cohort Adults, 10.5157/NEPS:SC6:5.1.0. From 2008 to 2013, NEPS data was collected as part of the Framework Program for the Promotion of Empirical Educational Research funded by the German Federal Ministry of Education and Research (BMBF). As of 2014, NEPS is carried out by the Leibniz Institute for Educational Trajectories (LIfBi) at the University of Bamberg in cooperation with a nationwide network. Footnotes 1 The Economist, edition March 28th to April 3rd 2015. 2 Hansen et al. (2004) use a control function approach to adjust for education in the short-term development of cognitive abilities. Carneiro et al. (2001, 2003) analyze the short-term effects of college education. Glymour et al. (2008), Banks and Mazzonna (2012), Schneeweis et al. (2014), and Kamhöfer and Schmitz (2016) analyze the effects of secondary schooling on long-term cognitive skills. 3 See Section 4 for a detailed definition of cognitive abilities. We use the terms “cognitive abilities”, “cognitive skills”, and “skills” interchangeably. 4 We use the words university and college as synonyms to refer to German Universitäten and closely related institutions like institutes of technology (Technische Universitäten/Technische Hochschulen), an institutional type that combines features of colleges and universities applied science (Gesamthochschulen) and universities of the armed forces (Bundeswehruniversitäten/Bundeswehrhochschulen). 5 The working paper version Kamhöfer et al. (2015) also uses the introduction of a student loan program as further source exogenous variation. Using this instrument does not affect the findings at all but is not considered here for the sake of legibility of the paper. 6 All data are taken from the German Statistical Yearbooks, 1959–1991, see German Federal Statistical Office (various issues, 1959–1991). We only use colleges and no other higher educational institutes described in Section 2.1 (e.g., universities of applied science). Administrative data on openings and the number of students are not available for other institutions than colleges. However, since other higher educational institutions are small in size and highly specialized, they should be less relevant for the higher education decision and, thus, neglecting them should not affect the results. 7 Table 1 uses a different data source than the main analysis and the local level is slightly broader than districts, see the notes to the table. 8 Note that the general derivation does not require linear indices. However, it is standard to assume linearity when it comes to estimation. 9 By applying, for instance, the standard normal distribution to the left and the right of the equation: Z΄δ ≥ V ⇔ Φ(Z΄δ) ≥ Φ(V) ⇔ P(Z) ≥ UD, where P(Z) ≡ P(D = 1|Z) = Φ(Z΄δ). 10 In this model the exclusion restriction is implicit since Z has an effect on D* but not on Y1, Y0. Monotonicity is implied by the choice equation since D* monotonously either increases are decreases the higher the values of Z. 11 To make this explicit, all treatment parameters (TEj(x)) can be decomposed into a weight (hj(x, uD)) and the MTE: $$TE_j(x)=\int _{0}^{1} {\mathit {MTE}}(x,u_D)h_j(x,u_D)du_D$$. See, for example, Heckman and Vytlacil (2007) for the exact expressions of the weights for common parameters. 12 Essentially, this is equivalent to a simple 2SLS case. If one wants to identify observable effect heterogeneity (i.e., interact the treatment indicator with control variables in the regression model) the instrument needs to be independent unconditional of these controls. 13 On the other hand, estimating with heterogeneity in the observables can lead to an efficiency gain. 14 Semi-parametrically, the MTE can only be identified over the support of P. The greater the variation in Z (conditional on X) and, thus P(Z), the larger the range over which the MTE can be identified. This may be considered a drawback of the MTE approach, in particular, because treatment parameters that have weight unequal to zero outside the support of the propensity score are not identified using semi-parameteric techniques. This is sometimes called the “identification at infinity” requirement (see Heckman 1990) of the MTE. However, we argue that the MTE over the support of P is already very informative. We use semi-parametric estimates of the MTE and restrict the results to the empirical ATE or ATT that are identified for those individuals who are in the sample (see Basu et al. 2007). Alternatively one might use a flexible approximation of K(p) based on a polynomial of the propensity score as done by Basu et al. (2007). This amounts to estimating $$E(Y|X, p) = X^{\prime }\beta + (\alpha _1 -\alpha _0) \cdot p + \sum _{j=1}^k \phi _j p^j$$ by OLS and using the estimated coefficients to calculate $$\widehat{\,\it MTE\,}\!(x,p) = (\widehat{\alpha }_1 - \widehat{\alpha }_0) + \sum _{j=1}^k \widehat{\phi }_j j p^{j-1}$$. 15 The working paper version also considers health satisfaction with results very similar to PCS (Kamhöfer et al. 2015). 16 For a general overview over test designs and applications in the NEPS, see Weinert et al. (2011). 17 The test measures the “assessment of automatized reading processes”, where a “low degree of automation in decoding [...] will hinder the comprehension process”, that is, understanding of texts (Zimmermann et al. 2014, p. 1). The test was newly designed for NEPS but based on the well-established Salzburg reading screening test design principles (LIfBi 2011). 18 The total number of possible points exceeds 32 because some items were worth more than one point. 19 We assess the optimal bandwidth in the local linear regression using Stata’s lpoly rule of thumb. Our results are also robust to the inclusion of higher order polynomials in the local (polynomial) regression. The optimal, exact bandwidths are: wage 0.10, PCS 0.13, MCS 0.16, reading competence 0.10, for reading speed 0.11, math score 0.12. 20 The ATT would require for every college graduate in the population a non-graduate with the same propensity score (including 0%). For the ATU one would need the opposite: a graduate for every non-graduate with the same propensity score including 100%. 21 Also taking into account the outcomes from Section 6 and assuming that we test 18 times would increase the critical value to 2.98 in the (overly conservative) Bonferroni correction. References Acemoglu Daron , Johnson Simon ( 2007 ). “Disease and Development: The Effect of Life Expectancy on Economic Growth.” Journal of Political Economy , 115 , 925 – 985 . Google Scholar CrossRef Search ADS American Psychological Association ( 1995 ). “Intelligence: Knowns and Unknowns.” Report of a task force convened by the American Psychological Association. Andersen Hanfried H. , Mühlbacher Axel , Nübling Matthias , Schupp Jürgen , Wagner Gert G. ( 2007 ). “Computation of Standard Values for Physical and Mental Health Scale Scores Using the SOEP Version of SF-12v2.” Schmollers Jahrbuch: Journal of Applied Social Science Studies/Zeitschrift für Wirtschafts- und Sozialwissenschaften , 127 , 171 – 182 . Anderson John ( 2007 ). Cognitive Psychology and its Implications , 7 ed. Worth Publishers , New York . Banks James , Mazzonna Fabrizio ( 2012 ). “The Effect of Education on Old Age Cognitive Abilities: Evidence from a Regression Discontinuity Design.” The Economic Journal , 122 , 418 – 448 . Barrow Lisa , Malamud Ofer ( 2015 ). “Is College a Worthwhile Investment?” Annual Review of Economics , 7 , 519 – 555 . Google Scholar CrossRef Search ADS Bartz Olaf ( 2007 ). “Expansion und Umbau–Hochschulreformen in der Bundesrepublik Deutschland zwischen 1964 und 1977.” Die Hochschule , 2007 , 154 – 170 . Basu Anirban ( 2011 ). “Estimating Decision-Relevant Comparative Effects using Instrumental Variables.” Statistics in Biosciences , 3 , 6 – 27 . Google Scholar CrossRef Search ADS PubMed Basu Anirban ( 2014 ). “Person-Centered Treatment (PeT) Effects using Instrumental Variables: An Application to Evaluating Prostate Cancer Treatments.” Journal of Applied Econometrics , 29 , 671 – 691 . Google Scholar CrossRef Search ADS PubMed Basu Anirban , Heckman James J. , Navarro-Lozano Salvador , Urzua Sergio ( 2007 ). “Use of Instrumental Variables in the Presence of Heterogeneity and Self-selection: An Application to Treatments of Breast Cancer Patients.” Health Economics , 16 , 1133 – 1157 . Google Scholar CrossRef Search ADS PubMed Björklund Anders , Moffitt Robert ( 1987 ). “The Estimation of Wage Gains and Welfare Gains in Self-Selection.” The Review of Economics and Statistics , 69 , 42 – 49 . Google Scholar CrossRef Search ADS Blossfeld H.-P. , Roßbach H.-G. , Maurice J. von ( 2011 ). “Education as a Lifelong Process—The German National Educational Panel Study (NEPS).” Zeitschrift für Erziehungswissenschaft , 14 [Special Issue 14-2011] . Brinch Christian N. , Mogstad Magne , Wiswall Matthew ( 2017 ). “Beyond LATE with a Discrete Instrument.” Journal of Political Economy , 125 , 985 – 1039 . Google Scholar CrossRef Search ADS Card David ( 1995 ). “Using Geographic Variation in College Proximity to Estimate the Return to Schooling.” In Aspects of Labour Market Behaviour: Essays in Honour of John Vanderkamp , edited by Grant K. , Christofides L. , Swidinsky R. . University of Toronto Press , pp. 201 – 222 . Card David ( 2001 ). “Estimating the Return to Schooling: Progress on Some Persistent Econometric Problems.” Econometrica , 69 , 1127 – 1160 . Google Scholar CrossRef Search ADS Carneiro Pedro , Hansen Karsten T. , Heckman James J. ( 2003 ). “2001 Lawrence R. Klein Lecture: Estimating Distributions of Treatment Effects with an Application to the Returns to Schooling and Measurement of the Effects of Uncertainty on College Choice.” International Economic Review , 44 ( 2 ), 361 – 422 . Google Scholar CrossRef Search ADS Carneiro Pedro , Hansen Karsten T. , Heckman James J. ( 2001 ). “Removing the Veil of Ignorance in Assessing the Distributional Impacts of Social Policies.” Swedish Economic Policy Review , 8 , 273 – 301 . Carneiro Pedro , Heckman James J. , Vytlacil Edward J. ( 2010 ). “Evaluating Marginal Policy Changes and the Average Effect of Treatment for Individuals at the Margin.” Econometrica , 78 , 377 – 394 . Google Scholar CrossRef Search ADS PubMed Carneiro Pedro , Heckman James J. , Vytlacil Edward J. ( 2011 ). “Estimating Marginal Returns to Education.” American Economic Review , 101 ( 6 ), 2754 – 2781 . Google Scholar CrossRef Search ADS PubMed Cawley John , Heckman James J. , Vytlacil Edward J. ( 2001 ). “Three Observations on Wages and Measured Cognitive Ability.” Labour Economics , 8 , 419 – 442 . Google Scholar CrossRef Search ADS Cervellati Matteo , Sunde Uwe ( 2005 ). “Human Capital Formation, Life Expectancy, and the Process of Development.” American Economic Review , 95 ( 5 ), 1653 – 1672 . Google Scholar CrossRef Search ADS PubMed Clark Damon , Martorell Paco ( 2014 ). “The Signaling Value of a High School Diploma.” Journal of Political Economy , 122 , 282 – 318 . Google Scholar CrossRef Search ADS Cornelissen Thomas , Dustmann Christian , Raute Anna , Schönberg Uta ( forthcoming ). “Who Benefits from Universal Childcare? Estimating Marginal Returns to Early Childcare Attendance.” Journal of Political Economy . Costa Dora L. ( 2015 ). “Health and the Economy in the United States from 1750 to the Present.” Journal of Economic Literature , 53 , 503 – 570 . Google Scholar CrossRef Search ADS PubMed Cunha F. , Heckman J. J. , Lochner L. J. , Masterov D. V. ( 2006 ). “Interpreting the Evidence on Life Cycle Skill Formation.” In Handbook of the Economics of Education , Vol. 1 , edited by Hanushek E. A. , Welch F. . North-Holland . Cunha Flavio , Heckman James J. ( 2007 ). “The Technology of Skill Formation.” American Economic Review , 97 ( 2 ), 31 – 47 . Google Scholar CrossRef Search ADS Currie Janet , Moretti Enrico ( 2003 ). “Mother’s Education and The Intergenerational Transmission of Human Capital: Evidence From College Openings.” The Quarterly Journal of Economics , 118 , 1495 – 1532 . Google Scholar CrossRef Search ADS Cutler David M. , Lleras-Muney Adriana ( 2010 ). “Understanding Differences in Health Behaviors by Education.” Journal of Health Economics , 29 , 1 – 28 . Google Scholar CrossRef Search ADS PubMed de Walque Damien ( 2007 ). “Does Education Affect Smoking Behaviors?: Evidence using the Vietnam Draft as an Instrument for College Education.” Journal of Health Economics , 26 , 877 – 895 . Google Scholar CrossRef Search ADS PubMed Fisher Gwenith , Stachowski Alicia , Infurna Frank , Faul Jessica , Grosch James , Tetrick Lois ( 2014 ). “Mental Work Demands, Retirement, and Longitudinal Trajectories of Cognitive Functioning.” Journal of Occupational Health Psychology , 19 , 231 – 242 . Google Scholar CrossRef Search ADS PubMed German Federal Statistical Office (various issues , 1959–1991 ). “Statistisches Jahrbuch für die Bundesrepublik Deutschland.” Tech. rep. , German Federal Statistical Office (Statistisches Bundesamt) , Wiesbaden . Glymour M. , Kawachi I. , Jencks C. , Berkman L. ( 2008 ). “Does Childhood Schooling Affect Old Age Memory or Mental Status? Using State Schooling Laws as Natural Experiments.” Journal of Epidemiology and Community Health , 62 , 532 – 537 . Google Scholar CrossRef Search ADS PubMed Grimard Franque , Parent Daniel ( 2007 ). “Education and Smoking: Were Vietnam War Draft Avoiders also more Likely to Avoid Smoking?” Journal of Health Economics , 26 , 896 – 926 . Google Scholar CrossRef Search ADS PubMed Hansen Karsten T. , Heckman James J. , Mullen Kathleen J. ( 2004 ). “The Effect of Schooling and Ability on Achievement Test Scores.” Journal of Econometrics , 121 , 39 – 98 . Google Scholar CrossRef Search ADS Heckman J. J. , Lochner L. J. , Todd P. E. ( 1999 ). “Earnings Equations and Rates of Return: The Mincer Equation and Beyond.” In Handbook of the Economics of Education , Vol. 1 , edited by Hanushek E. , Welch F. . Elsevier . Heckman James J. ( 1990 ). “Varieties of Selection Bias.” American Economic Review , 80 ( 2 ), 313 – 318 . Heckman James J. , Pinto Rodrigo , Savelyev Peter ( 2013 ). “Understanding the Mechanisms through Which an Influential Early Childhood Program Boosted Adult Outcomes.” American Economic Review , 103 ( 6 ), 2052 – 2086 . Google Scholar CrossRef Search ADS PubMed Heckman James J. , Urzua Sergio , Vytlacil Edward J. ( 2006 ). “Understanding Instrumental Variables in Models with Essential Heterogeneity.” The Review of Economics and Statistics , 88 , 389 – 432 . Google Scholar CrossRef Search ADS Heckman James J. , Vytlacil Edward J. ( 2005 ). “Structural Equations, Treatment Effects, and Econometric Policy Evaluation.” Econometrica , 73 , 669 – 738 . Google Scholar CrossRef Search ADS Heckman James J. , Vytlacil Edward J. ( 2007 ). “Econometric Evaluation of Social Programs, Part II: Using the Marginal Treatment Effect to Organize Alternative Econometric Estimators to Evaluate Social Programs, and to Forecast their Effects in New.” In Handbook of Econometrics , Vol. 6 , edited by Heckman J. J. , Leamer E. E. . Elsevier . Google Scholar CrossRef Search ADS Jürges Hendrik , Reinhold Steffen , Salm Martin ( 2011 ). “Does Schooling Affect Health Behavior? Evidence from the Educational Expansion in Western Germany.” Economics of Education Review , 30 , 862 – 872 . Google Scholar CrossRef Search ADS Kamhöfer Daniel , Schmitz Hendrik ( 2016 ). “Reanalyzing Zero Returns to Education in Germany.” Journal of Applied Econometrics , 31 , 912 – 919 . Google Scholar CrossRef Search ADS Kamhöfer Daniel , Schmitz Hendrik , Westphal Matthias ( 2015 ). “Heterogeneity in Marginal Non-monetary Returns to Higher Education.” Tech. rep. , Ruhr Economic Papers , RWI Essen , No. 591 . Google Scholar CrossRef Search ADS Lang Frieder , Weiss David , Stocker Andreas , Rosenbladt Bernhard von ( 2007 ). “The Returns to Cognitive Abilities and Personality Traits in Germany.” Schmollers Jahrbuch: Journal of Applied Social Science Studies/Zeitschrift für Wirtschafts- und Sozialwissenschaften , 127 , 183 – 192 . Lengerer Andrea , Schroedter Julia , Boehle Mara , Hubert Tobias , Wolf Christof ( 2008 ). “Harmonisierung der Mikrozensen 1962 bis 2005.” GESIS-Methodenbericht 12/2008 . GESIS–Leibniz Institute for the Social Sciences , German Microdata Lab, Mannheim . LIfBi ( 2011 ). “Starting Cohort 6 Main Study 2010/11 (B67) Adults Information on the Competence Test.” Tech. rep. , Leibniz Institute for Educational Trajectories (LIfBi) – National Educational Panel Study . LIfBi ( 2015 ). “Startkohorte 6: Erwachsene (SC6) – Studienübersicht Wellen 1 bis 5.” Tech. rep. , Leibniz Institute for Educational Trajectories (LIfBi) – National Educational Panel Study . Mazumder Bhashkar ( 2008 ). “Does Education Improve Health? A Reexamination of the Evidence from Compulsory Schooling Laws.” Economic Perspectives , 32 , 2 – 16 . Meng Xiangfei , D’Arcy Carl ( 2012 ). “Education and Dementia in the Context of the Cognitive Reserve Hypothesis: A Systematic Review with Meta-Analyses and Qualitative Analyses.” PLoS ONE , 7 , e38268 . Google Scholar CrossRef Search ADS PubMed Nybom Martin ( 2017 ). “The Distribution of Lifetime Earnings Returns to College.” Journal of Labor Economics , 35 , 903 – 952 . Google Scholar CrossRef Search ADS OECD ( 2015a ). “Education Policy Outlook 2015: Germany.” Report, Organisation for Economic Co-operation and Development (OECD) . OECD ( 2015b ). “Education Policy Outlook 2015: Making Reforms Happen.” Report, Organisation for Economic Co-operation and Development (OECD) . Oreopoulos Philip , Petronijevic Uros ( 2013 ). “Making College Worth It: A Review of the Returns to Higher Education.” The Future of Children , 23 , 41 – 65 . Google Scholar CrossRef Search ADS PubMed Oreopoulos Philip , Salvanes Kjell ( 2011 ). “Priceless: The Nonpecuniary Benefits of Schooling.” Journal of Economic Perspectives , 25 ( 3 ), 159 – 184 . Google Scholar CrossRef Search ADS Picht Georg ( 1964 ). Die deutsche Bildungskatastrophe: Analyse und Dokumentation . Walter Verlag . Pischke Jörn-Steffen , Wachter Till von ( 2008 ). “Zero Returns to Compulsory Schooling in Germany: Evidence and Interpretation.” The Review of Economics and Statistics , 90 , 592 – 598 . Google Scholar CrossRef Search ADS Robinson Peter M ( 1988 ). “Root-N-Consistent Semiparametric Regression.” Econometrica , 56 , 931 – 954 . Google Scholar CrossRef Search ADS Rohwedder Susann , Willis Robert J. ( 2010 ). “Mental Retirement.” Journal of Economic Perspectives , 24 , 119 – 138 . Google Scholar CrossRef Search ADS PubMed Salthouse Timothy A. ( 2006 ). “Mental Exercise and Mental Aging: Evaluating the Validity of the “Use It or Lose It” Hypothesis.” Perspectives on Psychological Science , 1 , 68 – 87 . Google Scholar CrossRef Search ADS PubMed Schneeweis Nicole , Skirbekk Vegard , Winter-Ebmer Rudolf ( 2014 ). “Does Education Improve Cognitive Performance Four Decades After School Completion?” Demography , 51 , 619 – 643 . Google Scholar CrossRef Search ADS PubMed Stephens Melvin Jr , Yang Dou-Yan ( 2014 ). “Compulsory Education and the Benefits of Schooling.” American Economic Review , 104 ( 6 ), 1777 – 1792 . Google Scholar CrossRef Search ADS Stern Yaakov ( 2012 ). “Cognitive Reserve in Ageing and Alzheimer’s Disease.” The Lancet Neurology , 11 , 1006 – 1012 . Google Scholar CrossRef Search ADS PubMed Stern Yaakov , Albert Steven , Tang Ming-Xin , Tsai Wei-Yen ( 1999 ). “Rate of Memory Decline in AD is Related to Education and Occupation: Cognitive Reserve?” Neurology , 53 , 1942 – 1947 . Google Scholar CrossRef Search ADS PubMed Vytlacil Edward ( 2002 ). “Independence, Monotonicity, and Latent Index Models: An Equivalence Result.” Econometrica , 70 , 331 – 341 . Google Scholar CrossRef Search ADS Weinert S. , Artelt C. , Prenzel M. , Senkbeil M. , Ehmke T. , Carstensen C. ( 2011 ). “Development of Competencies across the Life Span.” Zeitschrift für Erziehungswissenschaft , 14 , 67 – 86 . Google Scholar CrossRef Search ADS Weisser Ansgar ( 2005 ). “18. Juli 1961 – Entscheidung zur Gründung der Ruhr-Universität Bochum.” Tech. rep. , Internet-Portal Westfälische Geschichte , http://www.westfaelische-geschichte.de/web495 . Zimmermann Stefan , Artelt Cordula , Weinert Sabine ( 2014 ). “The Assessment of Reading Speed in Adults and First-Year Students.” Tech. rep. , Leibniz Institute for Educational Trajectories (LIfBi) – National Educational Panel Study . Supplementary Data Supplementary data are available at JEEA online. © The Author(s) 2018. Published by Oxford University Press on behalf of European Economic Association. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of the European Economic Association Oxford University Press

# Heterogeneity in Marginal Non-Monetary Returns to Higher Education

, Volume Advance Article – Feb 2, 2018
40 pages

/lp/ou_press/heterogeneity-in-marginal-non-monetary-returns-to-higher-education-MHJCZtIWR0
Publisher
Oxford University Press
ISSN
1542-4766
eISSN
1542-4774
D.O.I.
10.1093/jeea/jvx058
Publisher site
See Article on Publisher Site

### Abstract

Abstract In this paper we estimate the effects of college education on cognitive abilities, health, and wages, exploiting exogenous variation in college availability. By means of semiparametric local instrumental variables techniques we estimate marginal treatment effects in an environment of essential heterogeneity. The results suggest positive average effects on cognitive abilities, wages, and physical health. Yet, there is heterogeneity in the effects, which points toward selection into gains. Although the majority of individuals benefits from more education, the average causal effect for individuals with the lowest unobserved desire to study is zero for all outcomes. Mental health effects, however, are absent for the entire population. (JEL: C31, H52, I10, I21) 1. Introduction “The whole world is going to university—Is it worth it?” The Economist’s headline read in March 2015.1 Although convincing causal evidence on positive labor market returns to higher education is still rare and nearly exclusively available for the United States, even less is known about the non-monetary returns to college education (see Oreopoulos and Petronijevic 2013; Barrow and Malamud 2015). Although non-monetary factors are acknowledged to be important outcomes of education (Oreopoulos and Salvanes 2011), evidence on the effect of college education is so far limited to health behaviors (see in what follows). We estimate the long-lasting marginal returns to college education in Germany decades after leaving college. As a benchmark, we start by looking at wage returns to higher education but the paper’s focus is on the non-monetary returns that might also be seen as mediators of the more often studied effect of education on wages. These non-monetary returns are cognitive abilities and health. Cognitive abilities and health belong to the most important non-monetary determinants of individual well-being. Moreover, the stock of both factors also influences the economy as a whole (see, among many others, Heckman et al. 1999 and Cawley et al. 2001 for cognitive abilities and Acemoglu and Johnson 2007, Cervellati and Sunde 2005, and Costa 2015 for health). Yet, non-monetary returns to college education are not fully understood (Oreopoulos and Salvanes 2011). Psychological research broadly distinguishes between effects of education on the long-term cognitive ability differential that are either due to a change in the cognitive reserve (i.e., the cognitive capacity) or due to an altered age-related decline (see, e.g., Stern 2012). Still, even the compound manifestation of the overall effect has rarely been studied for college education over a short-term horizon2 and—as far as we are aware—it has never been assessed for the long run. Few studies analyze the returns to college education on health behaviors (Currie and Moretti 2003; Grimard and Parent 2007; de Walque 2007). We use a slightly modified version of the marginal treatment effect approach introduced and forwarded by Björklund and Moffitt (1987) and Heckman and Vytlacil (2005). The main feature of this approach is to explicitly model the choice for education, thus turning back from a mere statistical view of exploiting exogenous variation in education to identify casual effects toward a description of the behavior of economic agents. Translated into our research question, the MTE is the effect of education on different outcomes for individuals at the margin of taking higher education. The MTE can be used to generate all conventional treatment parameters, such as the average treatment effect (ATE). On top of this, comparing the marginal effects along the probability of taking higher education is also informative in its own right: different marginal effects do not just reveal effect heterogeneity but also some of its underlying structure (for instance, selection into gains). This is an important property that the local average treatment effect—LATE, as identified by conventional two stage least squares methods—would miss. The individuals in our sample made their college decision between 1958 and 1990 and graduated in the case of college education between 1963 and 1995. Our outcome variables (wages, standardized measures of cognitive abilities3 and mental and physical health) are assessed between 2010 and 2012, thus, 20–54 years after the college decision. Our instrument is a measure of the relative availability of college spots (operationalized by the number of enrolled students divided by the number of inhabitants) in the area of residence at the time of the secondary school graduation. Using detailed information on the arguably exogenous expansions of college capacities in all 326 West German districts (cities or rural areas) during the so-called “educational expansion” between the 1960s and 1980s generates variation in the availability of higher education. By deriving treatment effects over the entire support of the probability of college attendance, this paper contributes to the literature mainly in two important ways. First, this is the first study that analyzes the long-term effect of college education on cognitive abilities and general health measures (instead of specific health behaviors). Long-run effects on skills are crucial in showing the sustainability of human capital investments after the age of 19. Along this line, this outcome can complement existing evidence in identifying the fundamental value of college education since—unlike studies on monetary returns—effects on cognitive skills do neither directly exhibit signaling (see the debate on discrepancy between private and social returns as in Clark and Martorell 2014) nor adverse general equilibrium effects (as skills are not determined by both, forces of demand and supply). Second, by going beyond the point estimate of the LATE, we provide a more comprehensive picture in an environment of essential heterogeneity. The results suggest positive average returns to college education for wages, cognitive abilities, and physical health. Yet, the returns are heterogeneous—thus, we find evidence for selection into gains—and even close to zero for the around 30% of individuals with the lowest desire to study. Mental health effects are zero throughout the population. Thus, our findings can be interpreted as evidence for remarkable positive average returns for those who took college education in the past. Yet, a further expansion in college education, as sometimes called for, is likely not to pay off as this would mostly affect individuals in the part of the distribution that are not found to be positively affected by education. We also try to substantiate our results by looking at potential mechanisms of the average effects. Although we cannot causally differentiate all channels and the data allow us to provide suggestive evidence only, our findings may be interpreted as follows. Mentally more demanding jobs, jobs with a less health deteriorating effects and better health behaviors probably add to the explanation of skill and health returns to education. The paper is organized as follows. Section 2 briefly introduces the German educational system and describes the exogenous variation we exploit. Section 3 outlines the empirical approach. Section 4 presents the data. The main results are reported in Section 5 whereas Section 6 addresses some of its potential underlying pathways. Section 7 concludes. 2. Institutional Background and Exogenous Variation 2.1. The German Higher Educational System After graduating from secondary school, adolescents in Germany either enroll into higher education or start an apprenticeship. The latter is part-time training-on-the-job and part-time schooling. This vocational training usually takes three years and individuals often enter the firm (or another firm in the sector) as a full-time employee afterward. To be eligible for higher education in Germany, individuals need a university entrance degree. In the years under review, only academic secondary schools (Gymnasien) with 13 years of schooling in total award this degree (Abitur). Although the tracking from elementary schools to secondary schools takes place rather early at the age of 10, students can switch secondary school tracks in every grade. It is also possible to enroll into academic schools after graduating from basic or intermediate schools in order to receive a university entrance degree. In Germany, mainly two institutions offer higher education: universities/colleges4 and universities of applied science (Fachhochschulen). The regular time to receive the formerly common Diplom degree (master’s equivalent) was 4.5 years at both institutions. Colleges are usually large institutions that offer degrees in various subjects. The other type of higher educational institutions, universities of applied science, are usually smaller than colleges and often specialized in one field of study (e.g., business schools). Moreover, universities of applied science have a less theoretical curriculum and a teaching structure that is similar to schools. Nearly all institutions of higher education in Germany do not charge any tuition fees. However, students have to cover their own costs of living. On the other hand, their peers in apprenticeship training earn a small salary. Possible budget constraints (e.g., transaction costs arising through the need to move to another city in order to go to college) are likely determinants of the decision to enroll into higher education. 2.2. Exogenous Variation in College Education over Time Although the higher educational system as described in Section 2.1 did not change in the years under review, the accessibility (in terms of mere quantity but also distribution within Germany) of tertiary education changed significantly, providing us with a source of exogenous variation. This so called “educational expansion” falls well into the period of study (1958–1990). Within this period, the shrinking transaction costs of studying may have changed incentives and the mere presence of new or growing colleges could also have nudged individuals toward higher education that otherwise would not have studied. In this paper, we consider two processes in order to quantify the educational expansion. The first is the openings of new colleges, the second is the extension in capacity of all colleges (we refer to both as college availability).5 College availability as an instrument for higher education was introduced to the literature by Card (1995) and has frequently been employed since then (e.g., Currie and Moretti 2003), also to estimate the MTE (e.g., Carneiro et al. 2011; Nybom 2017). We exploit the rapid increase in the number of new colleges and in the number of available spots to study as exogenous variation in the college decision. Between 1958 (the earliest secondary school graduation year in our sample) and 1990 the number of colleges in Germany doubled from 33 to 66.6 In particular, the opening of new colleges introduced discrete discontinuities in choice sets. As an example, students had to travel 50 km, on average, to the closest college before a college was opened in their district (measured from district centroid to centroid), see Figure 1. Figure A.1 in the Appendix gives an impression of the spatial variation in college availability over time. Figure 1. View largeDownload slide Average distance to the closest college over time for districts with a college opening. Own illustration. Information on colleges are taken from the German Statistical Yearbooks 1959–1991 (German Federal Statistical Office various issues, 1959–1991). The distances (in km) between the districts are calculated using district centroids. These distances are weighted by the number of individuals observed in the particular district-year cells in our estimation sample of the NEPS-Starting Cohort 6 data. The resulting average distances are depicted by the black circles. Note that prior to time period 0, the average distance changes over time either due to sample composition or a college opening in a neighboring district. Only districts with a college opening are taken into account. Figure 1. View largeDownload slide Average distance to the closest college over time for districts with a college opening. Own illustration. Information on colleges are taken from the German Statistical Yearbooks 1959–1991 (German Federal Statistical Office various issues, 1959–1991). The distances (in km) between the districts are calculated using district centroids. These distances are weighted by the number of individuals observed in the particular district-year cells in our estimation sample of the NEPS-Starting Cohort 6 data. The resulting average distances are depicted by the black circles. Note that prior to time period 0, the average distance changes over time either due to sample composition or a college opening in a neighboring district. Only districts with a college opening are taken into account. There was an increase in the size of existing colleges and, therefore, in the number of available spots to study as well. The average number of students per college was 5,013 in 1958 and 15,438 in 1990. Of the 33 colleges in 1958, 30 still existed in 1990 and had an average size of 23,099 students. The total number of students increased from 155,000 in 1958 to 1 million in 1990. Figure 2 shows the trends in college openings and enrolled students (normalized by the number of inhabitants) for the five most-populated German states. Although the actual numbers used in the regressions vary on the much smaller district level, the state level figures simplify the visualization of the pattern. Figure 2. View largeDownload slide Number of colleges and students over the time in selected states. Own illustration. College opening and size information are taken from the German Statistical Yearbooks 1959–1991 (German Federal Statistical Office various issues, 1959–1991). Yearly information on the district-specific population size is based on personal correspondence with the statistical offices of the federal states. For sake of lucidity the trends are only plotted for the five most  populated states. Figure 2. View largeDownload slide Number of colleges and students over the time in selected states. Own illustration. College opening and size information are taken from the German Statistical Yearbooks 1959–1991 (German Federal Statistical Office various issues, 1959–1991). Yearly information on the district-specific population size is based on personal correspondence with the statistical offices of the federal states. For sake of lucidity the trends are only plotted for the five most  populated states. Factors that have driven the increase in the number of colleges and their size can briefly be summarized into four groups: (i) The large majority of the population had a low level of education. This did not only result from WWII but also from the “anti-intellectualism” (Picht 1964, p. 66) in the Third Reich, and the notion of education in Imperial Germany before, befitting the social status of certain individuals only. (ii) An increase in the number of academic secondary schools at the same time (as analyzed in Jürges et al. 2011; Kamhöfer and Schmitz 2016 for instance) qualified a larger share of school graduates to enroll into higher education (Bartz 2007). (iii) A change in production technologies led to an increase in firm’s demand for high-skilled workers—especially, given the low level of educational participation (Weisser 2005). (iv) Political decision makers were afraid that “without an increase in the number of skilled graduates the West German economy would not be able to compete with communist rivals” (Jürges et al. 2011, p. 846, in reference to Picht 1964). Although these reasons (maybe except for the firm’s demand for more educated workers) affected the 10 West German federal states—that are in charge of educational policy—in the same way, the measures taken and the timing of actions differed widely between states. Because of local politics (e.g., the balancing of regional interests and avoiding clusters of colleges) there was also a large amount of variation in college openings within the federal states. See Online Appendix B to the paper for a much more detailed description of the political process involved. A major concern for instrument validity is that, even though the political process did not follow a unified structure and included some randomness in the final choice of locations and timing of openings, regions where colleges were opened differed from those that already had colleges before (or that never established any). Table 1 reports some numbers on the regional level as of the year 1962 (the earliest possible year available to us with representative data).7 Regions that already had colleges before did not differ in terms of sociodemographics (except for population densities, as mostly large cities had colleges before) but were somewhat stronger in terms of socioeconomic indices. The differences were not large however. Given that we include district fixed-effects and a large set of socioeconomic controls (including the socioeconomic environment before the college decision, see Section 4), this should not be a problematic issue. Table 1. Comparison of regions with and without college openings before college opens using administrative data. (1) (2) (3) (4) (5) (6) College opening... Before Between Later than 1958 1958–1990 1990 or never Mean s.d. Mean s.d. Mean s.d. Observations Number of regions 27 30 190 Sociodemographic characteristics Female (in %) 53.0 (2.0) 53.0 (1.4) 52.9 (4.3) Average age (in years) 37.2 (1.1) 37.0 (1.1) 36.6 (1.9) Singles (in %) 38.8 (2.5) 37.7 (2.3) 38.9 (4.6) Population density per km2 in 1962 1381.9 (1076.7) 1170.1 (1047.3) 327.1 (479.7) Change in population density 1962–1990 1.6 (186.3) −71.0 (202.8) 31.5 (98.5) Migrational background (in %) 2.7 (3.0) 1.6 (1.5) 2.1 (2.3) Socioeconomic characteristics Share of employees to all individuals (in %) 47.0 (3.6) 45.3 (4.2) 46.2 (5.2) Employees with an income > 600 DM (in %) 27.3 (3.8) 24.8 (5.3) 25.9 (6.4)  Employees by industry (in %)  – Primary 2.1 (5.2) 5.2 (5.2) 2.8 (5.5)  – Secondary 52.9 (8.4) 54.7 (6.2) 54.3 (8.9)  – Tertiary 45.0 (9.3) 40.1 (8.3) 42.9 (9.6) Employees in blue collar occup. (in %) 53.6 (9.4) 59.0 (7.9) 56.5 (9.3) Employees in academic occup. (in %) 22.0 (4.4) 17.5 (4.3) 20.3 (5.9) (1) (2) (3) (4) (5) (6) College opening... Before Between Later than 1958 1958–1990 1990 or never Mean s.d. Mean s.d. Mean s.d. Observations Number of regions 27 30 190 Sociodemographic characteristics Female (in %) 53.0 (2.0) 53.0 (1.4) 52.9 (4.3) Average age (in years) 37.2 (1.1) 37.0 (1.1) 36.6 (1.9) Singles (in %) 38.8 (2.5) 37.7 (2.3) 38.9 (4.6) Population density per km2 in 1962 1381.9 (1076.7) 1170.1 (1047.3) 327.1 (479.7) Change in population density 1962–1990 1.6 (186.3) −71.0 (202.8) 31.5 (98.5) Migrational background (in %) 2.7 (3.0) 1.6 (1.5) 2.1 (2.3) Socioeconomic characteristics Share of employees to all individuals (in %) 47.0 (3.6) 45.3 (4.2) 46.2 (5.2) Employees with an income > 600 DM (in %) 27.3 (3.8) 24.8 (5.3) 25.9 (6.4)  Employees by industry (in %)  – Primary 2.1 (5.2) 5.2 (5.2) 2.8 (5.5)  – Secondary 52.9 (8.4) 54.7 (6.2) 54.3 (8.9)  – Tertiary 45.0 (9.3) 40.1 (8.3) 42.9 (9.6) Employees in blue collar occup. (in %) 53.6 (9.4) 59.0 (7.9) 56.5 (9.3) Employees in academic occup. (in %) 22.0 (4.4) 17.5 (4.3) 20.3 (5.9) Notes: Own calculations based on Micro Census 1962, see Lengerer et al. (2008). Regions are defined through administrative Regierungsbezirk entries and the degree urbanization (Gemeindegrößenklasse) and may cover more than one district. College information is aggregated at regional level and a region is considered to have a college if at least one of its districts has a college. Calculations for population density and change in population density based on district-level data acquired through personal correspondence with the statistical offices of the federal states. Data are available on request. The variables “employees in blue collar occup.” and “employees in academic occup.” state the shares of employees in the region in an occupation that is usually conducted by a blue collar worker/a college graduate, respectively. Standard deviations (s.d.) are given in italics in parentheses. View Large Table 1. Comparison of regions with and without college openings before college opens using administrative data. (1) (2) (3) (4) (5) (6) College opening... Before Between Later than 1958 1958–1990 1990 or never Mean s.d. Mean s.d. Mean s.d. Observations Number of regions 27 30 190 Sociodemographic characteristics Female (in %) 53.0 (2.0) 53.0 (1.4) 52.9 (4.3) Average age (in years) 37.2 (1.1) 37.0 (1.1) 36.6 (1.9) Singles (in %) 38.8 (2.5) 37.7 (2.3) 38.9 (4.6) Population density per km2 in 1962 1381.9 (1076.7) 1170.1 (1047.3) 327.1 (479.7) Change in population density 1962–1990 1.6 (186.3) −71.0 (202.8) 31.5 (98.5) Migrational background (in %) 2.7 (3.0) 1.6 (1.5) 2.1 (2.3) Socioeconomic characteristics Share of employees to all individuals (in %) 47.0 (3.6) 45.3 (4.2) 46.2 (5.2) Employees with an income > 600 DM (in %) 27.3 (3.8) 24.8 (5.3) 25.9 (6.4)  Employees by industry (in %)  – Primary 2.1 (5.2) 5.2 (5.2) 2.8 (5.5)  – Secondary 52.9 (8.4) 54.7 (6.2) 54.3 (8.9)  – Tertiary 45.0 (9.3) 40.1 (8.3) 42.9 (9.6) Employees in blue collar occup. (in %) 53.6 (9.4) 59.0 (7.9) 56.5 (9.3) Employees in academic occup. (in %) 22.0 (4.4) 17.5 (4.3) 20.3 (5.9) (1) (2) (3) (4) (5) (6) College opening... Before Between Later than 1958 1958–1990 1990 or never Mean s.d. Mean s.d. Mean s.d. Observations Number of regions 27 30 190 Sociodemographic characteristics Female (in %) 53.0 (2.0) 53.0 (1.4) 52.9 (4.3) Average age (in years) 37.2 (1.1) 37.0 (1.1) 36.6 (1.9) Singles (in %) 38.8 (2.5) 37.7 (2.3) 38.9 (4.6) Population density per km2 in 1962 1381.9 (1076.7) 1170.1 (1047.3) 327.1 (479.7) Change in population density 1962–1990 1.6 (186.3) −71.0 (202.8) 31.5 (98.5) Migrational background (in %) 2.7 (3.0) 1.6 (1.5) 2.1 (2.3) Socioeconomic characteristics Share of employees to all individuals (in %) 47.0 (3.6) 45.3 (4.2) 46.2 (5.2) Employees with an income > 600 DM (in %) 27.3 (3.8) 24.8 (5.3) 25.9 (6.4)  Employees by industry (in %)  – Primary 2.1 (5.2) 5.2 (5.2) 2.8 (5.5)  – Secondary 52.9 (8.4) 54.7 (6.2) 54.3 (8.9)  – Tertiary 45.0 (9.3) 40.1 (8.3) 42.9 (9.6) Employees in blue collar occup. (in %) 53.6 (9.4) 59.0 (7.9) 56.5 (9.3) Employees in academic occup. (in %) 22.0 (4.4) 17.5 (4.3) 20.3 (5.9) Notes: Own calculations based on Micro Census 1962, see Lengerer et al. (2008). Regions are defined through administrative Regierungsbezirk entries and the degree urbanization (Gemeindegrößenklasse) and may cover more than one district. College information is aggregated at regional level and a region is considered to have a college if at least one of its districts has a college. Calculations for population density and change in population density based on district-level data acquired through personal correspondence with the statistical offices of the federal states. Data are available on request. The variables “employees in blue collar occup.” and “employees in academic occup.” state the shares of employees in the region in an occupation that is usually conducted by a blue collar worker/a college graduate, respectively. Standard deviations (s.d.) are given in italics in parentheses. View Large Yet, changes in district characteristics that are potentially related to the outcome variables might be a more important problem. There could, for instance, be changes in the population structure that both induce a higher demand for college education and go along with improved cognitive abilities and health. This could be the case if the regions with college openings were more “dynamic” with a younger and potentially increasing population. Table 1 shows a decline in the population density by 6% between 1962 and 1990 in the areas that opened colleges whereas there were no average changes in the areas with preexisting colleges and a 10% increase in the areas that never opened any. This reflects different regional trends in population ageing. As one example, the Ruhr Area in the west, where three colleges were opened, experienced a population decline and comparably stronger population ageing over time. Again, these differences are not dramatically large, but we might be worried of different trends in health and cognitive abilities that are correlated with college expansion. If this was the case—more expansion in areas that have a more ageing population with deteriorating health and cognitive abilities—we might underestimate the effect of college eduction on these outcomes. We include a district-specific time trend to account for this in the analysis. The expansion in secondary schooling noted previously was unrelated to the college expansion. Although college expansion naturally took place in a small number of districts, expansion in secondary schooling was across all regions. In addition, Kamhöfer and Schmitz (2016) do not find any local average treatment effects of school expansion on cognitive abilities and wages. Thus, it seems unlikely that selective increases in cognitive abilities due to secondary school expansion invalidate the instrument. Nevertheless, again, district-specific time trends should capture large parts if this was a problem. So essentially, what we do is the following: we look within each district and attribute changes in the college (graduation/enrollment) rate from the general trend (by controlling for cohort FE) and the district specific trend (which might be due to continually increased access to higher secondary education) to either changes in the college spots or a new opening of a college nearby. We use discontinuities in college access over time that cannot be exploited using data on individuals that make the college decision at the same point in time (for instance cohort studies) as some of the previous literature that used college availability as an instrument did. Details on how we exploit the variation in college availability in the empirical specification are discussed in Section 4.4 after presenting the data. 3. Empirical Strategy Our estimation framework widely builds on Heckman and Vytlacil (2005) and Carneiro et al. (2011). Derivations and in-depth discussion of most issues can be found there. We start with the potential outcome model, where Y1 and Y0 are the potential outcomes with and without treatment. The observed outcome Y either equals Y1 in case an individual received a treatment—which is college education here—or Y0 in the absence of treatment (the individual identifier i is implied). Obviously, treatment participation is voluntary, rendering a treatment dummy D in a simple linear regression endogenous. In the marginal treatment effect framework, this is explicitly modeled by using a choice equation, that is, we specify the following latent index model: $$Y^1 = X^{\prime }\beta _1 + U_1,$$ (1) $$Y^0 = X^{\prime }\beta _0 + U_0,$$ (2) $$D^* = Z^{\prime }\delta - V, \quad \mbox{where }\, D = \boldsymbol {1}[ D^* \ge 0] = \boldsymbol {1}[ Z^{\prime }\delta \ge V].$$ (3) The vector X contains observable, and U1, U0 unobservable factors that affect the potential outcomes.8D* is the latent desire to take up college education that depends on observed variables Z and unobservables V. Z includes all variables in X plus the instruments. Whenever D* exceeds a threshold (set to zero without loss of generality), the individual opts for college education, otherwise she does not. U1, U0, V are potentially correlated, inducing the endogeneity problem (as well as heterogenous returns) as we observe Y(=DY1 + (1 − D)Y0), D, X, Z, but not U1, U0, V. Following this model, individuals are indifferent between higher education and directly entering the labor market (e.g., through an apprenticeship) whenever the index of observables Z΄δ is equal to the unobservables V. Thus, if we knew the switching point (point of indifference) and its corresponding value of the observables, we could make sharp restriction on the value of the unobservables. This property is exploited in the estimation. Since for every value of the index Z΄δ one needs individuals with and without higher education, it is important to meaningfully aggregate the index by a monotonous transformation that for example returns the quantiles of Z΄δ and V. One such rank-preserving transformation is done by the cumulative distribution function that returns the propensity score P(Z) (quantiles of Z) and UD (quantiles of V).9 If we vary the excluded instruments in Z΄δ from the lowest to the highest value while holding the covariates X constant, more and more individuals will select into higher education. Those who react to this shift also reveal their rank in the unobservable distribution. Thus, the unobservables are fixed given the propensity score and it is feasible to evaluate any outcome for those who select into treatment at any quantile UD that is identified by the instrument-induced change of the higher education choice. In general, estimating marginal effects by UD does not require stronger assumptions than those required by the LATE since Vytlacil (2002) showed its equivalence.10 Yet, strong instruments are beneficial for robustly identifying effects over the support of P(Z). This, however, is testable. The marginal treatment effect (MTE), then, is the marginal (gross) benefit of taking the treatment for those who are just indifferent between taking and not-taking it and can be expressed as \begin{equation*} {\mathit {MTE}}(x,u_D) = \frac{\partial E(Y|x, p)}{\partial p}. \end{equation*} This is the effect of an incremental increase in the propensity score on the observed outcome. The MTE varies along the line of UD in case of heterogeneous treatment effects that arise if individuals self-select into the treatment based on their expected idiosyncratic gains. This is a situation Heckman et al. (2006) call “essential heterogeneity”. This is an important structural property that the MTE can recover: If individuals already react at low values of the instrument, where the observed part of the latent desire of selecting into higher education (P(Z)) is still very low, a prerequisite for yet going to college is that V is marginally lower. These individuals could choose college against all (observed) odds because they are more intrinsically talented or motivated as indicated by a low V. If this is translated into higher future gains (U1 − U0), the MTE would exhibit a significant negative slope: As P(Z) rises, marginal individuals need less and less compensation in terms of unobserved and expected returns to yet choose college—this is called selection into gains. As Basu (2011, 2014) notes, essential heterogeneity is not restricted to active sorting into gains but is always an issue if selection is based on factors that are not completely independent of the gains. Thus, in health economic applications, where gains are arguably harder to predict for the individual than, say, monetary returns, essential heterogeneity is also an important phenomenon. In this case the common treatment parameters ATE, ATT, and LATE do not coincide. The MTE can be interpreted as a more fundamental parameter than the usual ones as it unfolds all local switching effects by intrinsic “willingness” to study and not only some weighted average of those.11 The main component for estimating the MTE is the conditional expectation E(Y | X, p). Heckman and Vytlacil (2007) show that if we plug in the counterfactuals in (1) and (2) in the potential outcome equation, rearrange and apply the expectation E(. | X, p) to all expressions and impose an exclusion restriction of p on Y (exposed in what follows), we get an expression that can be estimated: \begin{eqnarray} E(Y|X, p) & =& X^{\prime }\beta _0 + X^{\prime }(\beta _1 -\beta _0) \cdot p + E(U_1 - U_0 | D=1, X) \cdot p \nonumber \\ & =& X^{\prime }\beta _0 + X^{\prime }(\beta _1 -\beta _0) \cdot p + K(p), \end{eqnarray} (4) where K(p) is some not further specified function of the propensity score if one wants to avoid distributional assumptions of the error terms. Thus, the estimation of the MTE involves estimating the propensity score in order to estimate equation (4) and, finally, taking its derivative with respect to p. Note that this derivative—and hence the effect of college education—depends on heterogeneity due to observed components X and unobserved components K(p), since this structure was imposed by equations (1) and (2): \begin{eqnarray} \frac{\partial E(Y|X, p)}{\partial p} & =& X^{\prime }(\beta _1 -\beta _0) + \frac{\partial K(p)}{\partial p}. \end{eqnarray} (5) To achieve non-parametric identification of the terms in equation (5), the Conditional Independence Assumption has to be imposed on the instrument \begin{equation*} (U_1,U_0,V)\!\perp \!\!\!\perp Z|X \end{equation*} meaning that the error terms are independent of Z given X. That is, after conditioning on X a shift in the instruments Z (or the single index P(Z)) has no effect on the potential outcome distributions. Non-parametrically estimating separate MTEs for every data cell determined by X is hardly ever feasible due to a lack of observations and powerful instruments within each such cell. Yet, in case of parametric or semiparametric specifications a conditional independence assumption is not sufficient to decompose the effect into observed and unobserved sources of heterogeneity. To separately identify the right hand side of equation (5) unconditional independence is required: (U1, U0, V) ⊥⊥ Z, X (Carneiro et al. 2011, for more details consult the Online Appendices).12 In a pragmatic approach, one can now either follow Brinch et al. (2017) or Cornelissen et al. (forthcoming) who do not aim at causally separating the causes of the effect heterogeneity. In this case a conventional exclusion restriction on the instruments suffices for estimating the overall level and the curvature of the MTE. Our solution in bringing the empirical framework to the data without too strong assumptions, is to estimate marginal effects that only vary over the unobservables while fixing the X-effects at mean value. This means to deviate from (4) by restricting β1 = β0 = β except for the intercepts α1, α0 in (1) and (2) such that E(Y | X, p) becomes \begin{eqnarray} E(Y|X, p) & = X^{\prime }\beta + (\alpha _1 -\alpha _0) \cdot p + K(p). \end{eqnarray} (6) Thus, we allow for different levels of potential outcomes, whereas we keep conditioning on X. This might look like a strong restriction at first sight but is no more different than the predominant approach in empirical economics of trying to identify average treatment effects where the treatment indicator is typically not interacted with other observables. Certainly, this does not rule out that the MTE varies by observable characteristics. Even with the true population effects that are varying over X, note that the derivative of equation (4) w.r.t. the propensity score is constant in X. Hence, only the level of the MTE changes for certain subpopulations determined by X, the curvature remains unaffected. Thus, estimation of equation (6) delivers an MTE that has a level that is averaged over all subpopulations without changing the curvature. In this way all crucial elements of the MTE are preserved, since we are interested in the average effect and its heterogeneity with respect to the unobservables for the whole population. How this heterogeneity is varying for certain subpopulations is of less importance and also the literature has focused on MTEs where the X-part is averaged out. On the other hand we gain with this approach by considerably relaxing our identifying assumption from an unconditional to a conditional independence of the instrument. One advantage in not estimating heterogeneity in the observables can arise if X contains many variables that each take many different values. In this case, problems of weak instruments can inflate the results.13 In estimating (6), we follow Carneiro et al. (2010, 2011) again and use semi-parametric techniques as suggested by Robinson (1988).14 Standard errors are clustered at the district level and were generated by bootstrapping the entire procedure using 200 replications. 4. Data 4.1. Sample Selection and College Education Our main data source are individual level data from the German National Educational Panel Study (NEPS), see Blossfeld et al. (2011). The NEPS data map the educational trajectories of more than 60,000 individuals in total. The data set consists of a multi-cohort sequence design and covers six age groups, called “starting cohorts”: newborns and their parents, pre-school children, children in school grades 5 and 9, college freshmen students, and adults. Within each starting cohort the data are organized in a longitudinal manner, that is, individuals are interviewed repeatedly. For each starting cohort, the interviews cover extensive information on competence development, learning environments, educational decisions, migrational background, and socioeconomic outcomes. We aim at analyzing longer term effects of college education and, therefore, restrict the analysis to the “adults starting cohort”. For this age group six waves are available with interviews conducted between 2007/2008 (wave 1) and 2013 (wave 6), see LIfBi (2015). Moreover, the NEPS includes detailed retrospective information on the educational and occupational history as well as the living conditions at the age of 15—about three years before individuals decide for higher education. From the originally 17,000 respondents in the adults starting cohort, born between 1944 and 1989, we exclude observations for four reasons: First, we focus on individuals from West Germany due to the different educational system in the former German Democratic Republic (GDR), thereby dropping 3,500 individuals living in the GDR at the age of the college decision. Second, to allow for long-term effects we make a cut-off at college attendance before 1990 and drop 2,800 individuals who graduated from secondary school in 1990 or later. Third, we drop 1,000 individuals with missing geographic information. An attractive (and for our analysis necessary) feature of the NEPS data is that they include information on the district (German Kreis) of residence during secondary schooling that is used in assigning the instrument in the selection equation. The fourth reason for losing observations is that the dependent variables are not available for each respondent, see in what follows. Our final sample includes between 2,904 and 4,813 individuals, depending on the outcome variable. The explanatory variable “college degree” takes on the value 1 if an individual has any higher educational degree, and 0 otherwise. Dropouts are treated as all other individuals without college education. More than one fourth of the sample has a college degree, whereas three fourths do not. 4.2. Dependent Variables Wages. The data set covers a wide range of individual employment information such as monthly income and weekly hours worked. We calculate the hourly gross wage for 2013 (wave 6) by dividing the monthly gross labor market income by the actual weekly working hours (including extra hours) times the average number of weeks per month, 4.3. A similar strategy is, for example, applied by Pischke and von Wachter (2008) to calculate hourly wages using German data. For this outcome variable, we restrict our sample to individuals in working age up to 65 years and drop observations with hourly wages below €5 and above the 99th quantile (€77.52) as this might result from misreporting. Table 2 reports descriptive statistics and reveals considerably higher hourly wages for individuals with college degree. The full distribution of wages (and the other outcomes) for both groups is shown in Figure A.2 in the Appendix. In the regression analysis we use log gross hourly wages. Table 2. Descriptive statistics dependent variables. (1) (2) (3) (4) (5) (6) Health measure Cognitive ability component Gross hourly wage PCS MCS Read. speed Read. comp. Math liter. Observations 3,378 4,813 4,813 3,995 4,576 2,904 with college degree (in %) 31.0 28.1 28.1 27.8 28.1 28.0 Raw values Mean with degree 27.95 53.31 51.15 39.69 29.76 13.37 Mean without degree 19.35 50.39 50.53 35.99 22.75 9.36 Maximum possible value –a 100 100 51 39 22 Transformed values Mean with degree 3.25 0.23 0.04 0.32 0.63 0.61 Mean without degree 2.88 −0.09 −0.02 −0.12 −0.25 −0.24 (1) (2) (3) (4) (5) (6) Health measure Cognitive ability component Gross hourly wage PCS MCS Read. speed Read. comp. Math liter. Observations 3,378 4,813 4,813 3,995 4,576 2,904 with college degree (in %) 31.0 28.1 28.1 27.8 28.1 28.0 Raw values Mean with degree 27.95 53.31 51.15 39.69 29.76 13.37 Mean without degree 19.35 50.39 50.53 35.99 22.75 9.36 Maximum possible value –a 100 100 51 39 22 Transformed values Mean with degree 3.25 0.23 0.04 0.32 0.63 0.61 Mean without degree 2.88 −0.09 −0.02 −0.12 −0.25 −0.24 Notes: Own calculations based on NEPS-Starting Cohort 6 data. Gross hourly wage given in euros. Gross hourly wage is transformed to its log value, the other variables are transformed in units of standard deviation with mean 0 and standard deviation 1. a. The gross hourly wage is truncated below at €5 and above at the highest quantile (€77.52). View Large Table 2. Descriptive statistics dependent variables. (1) (2) (3) (4) (5) (6) Health measure Cognitive ability component Gross hourly wage PCS MCS Read. speed Read. comp. Math liter. Observations 3,378 4,813 4,813 3,995 4,576 2,904 with college degree (in %) 31.0 28.1 28.1 27.8 28.1 28.0 Raw values Mean with degree 27.95 53.31 51.15 39.69 29.76 13.37 Mean without degree 19.35 50.39 50.53 35.99 22.75 9.36 Maximum possible value –a 100 100 51 39 22 Transformed values Mean with degree 3.25 0.23 0.04 0.32 0.63 0.61 Mean without degree 2.88 −0.09 −0.02 −0.12 −0.25 −0.24 (1) (2) (3) (4) (5) (6) Health measure Cognitive ability component Gross hourly wage PCS MCS Read. speed Read. comp. Math liter. Observations 3,378 4,813 4,813 3,995 4,576 2,904 with college degree (in %) 31.0 28.1 28.1 27.8 28.1 28.0 Raw values Mean with degree 27.95 53.31 51.15 39.69 29.76 13.37 Mean without degree 19.35 50.39 50.53 35.99 22.75 9.36 Maximum possible value –a 100 100 51 39 22 Transformed values Mean with degree 3.25 0.23 0.04 0.32 0.63 0.61 Mean without degree 2.88 −0.09 −0.02 −0.12 −0.25 −0.24 Notes: Own calculations based on NEPS-Starting Cohort 6 data. Gross hourly wage given in euros. Gross hourly wage is transformed to its log value, the other variables are transformed in units of standard deviation with mean 0 and standard deviation 1. a. The gross hourly wage is truncated below at €5 and above at the highest quantile (€77.52). View Large Health. Two variables from the health domain are used as outcome measures: the Physical Health Component Summary Score (PCS) and the Mental Health Component Summary Score (MCS), both from 2011/2012 (wave 4).15 These summary scores are based on the SF12 questionnaire, which is an internationally standardized set of 12 items regarding eight dimensions of the individual health status. The eight dimensions comprise physical functioning, physical role functioning, bodily pain, general health perceptions, vitality, social role functioning, emotional role functioning and mental health. A scale ranging from 0 to 100 is calculated for each of these eight dimensions. The eight dimensions or subscales are then aggregated to the two main dimensions mental and physical health, using explorative factor analysis (Andersen et al. 2007). For our regression analysis, we standardize the aggregated scales (MCS and PCS) to have mean 0 and standard deviation 1, where higher values indicate better health. Columns (2) and (3) of Table 2 report sample means of the health measures across individuals by college graduation. Those with college degree have, on average, a better physical health score. With respect to mental health, both groups differ only marginally. Cognitive Abilities. Cognitive abilities summarize the “ability to understand complex ideas, to adapt effectively to the environment, to learn from experience, to engage in various forms of reasoning, to overcome obstacles by taking thought” (American Psychological Association 1995), where the sum of these abilities is referred to as intelligence. Psychologists distinguish several concepts of intelligence with different cognitive abilities. However, they all include measures of verbal comprehension, memory and recall as well as processing speed. Although comprehensive cognitive intelligence tests take hours, a growing number of socioeconomic surveys includes much shorter proxies that measure specific skill components. The short ability tests are usually designed by psychologists and the results are highly correlated with the results of more comprehensive intelligence tests (cf. Lang et al. 2007 for a comparison of cognitive skill tests in the German Socio-economic Panel with larger psychological test batteries). The NEPS includes three kinds of competence tests that cover various domains of cognitive functioning: reading speed, reading competence, and mathematical competence.16 All competence tests were conducted once in 2010/2011 (wave 3) or 2012/2013 (wave 5), respectively, as paper and pencil tests under the supervision of a trained interviewer and the test language was German. The first test measures reading speed.17 The participants receive a booklet consisting of 51 short true-or-false questions and the test duration is 2 min. Each question has between 5 and 18 words. The participants have to answer as many questions as possible in the given window. The test score is the number of correct answers. Since the test aims at the answering speed, the questions only deal with general knowledge and use easy language. One question/statement, for example, reads “There is a bath tub in every garage.” The mean number of correct answers in our estimation sample is 39.69 (out of 51) for college graduates and 35.99 for others, see Table 2. For more information, see Zimmermann et al. (2014). The reading competence test measures understanding of texts. It lasts 28 min and covers 32 items. The test consists of three different tasks. First, participants have to answer multiple choice questions about the content of a text, where only one out of four possible answers is right. In a decision-making task, the participants are asked whether statements are right or wrong according to the text. In a third task, participants need to assign possible titles out of a list to sections of the text. The test includes several types of texts, for example, comments, instructions, and advertising texts (LIfBi 2011). Again, the test score reflects the number of correct answers. Participants with college degree score on average 29.76 and without 22.75 (out of 39).18 The mathematical literacy test evaluates “recognizing and [...] applying [of] mathematics in realistic, mainly extra-mathematical situations” (LIfBi 2011, p. 8). The test has 22 items and takes 28 min. It follows the principle of the OECD-PISA tests and consists of the areas quantity, space and shape, change and relations, as well as data and change, and measures the cognitive competencies in the areas of application of skills, modeling, arguing, communicating, representing, as well as problem solving; see LIfBi (2011). Individuals without college degree score on average 9.36 (out of 22) and persons who graduated from college receive 4 points more. Due to the rather long test duration given the total interview time, not every respondent had to do all three tests. Similarly to the OECD-PISA tests for high school students, individuals were randomly assigned a booklet with either all three or two out of the three tests. 3,995 individuals did the reading speed test, 4,576 the reading competence test, and 2,904 math. Since the tests measure different competencies that refer to distinct cognitive abilities, we may not combine the different test scores into an overall score but give the results separately (see Anderson 2007). 4.3. Control Variables Individuals in our sample made their college decision between 1958 and 1990. The NEPS allows us to consider important socioeconomic characteristics that probably affect both the college education decision as well as the outcomes today (variables denoted with X in Section 3). This is general demographic information such gender, migrational background, and family structure, parental characteristics like parent’s educational background. Moreover, we include two blocks of controls that were determined before the educational decision was made. Pre-college living conditions include family structure, parental job situation, and household income at the age of 15, whereas pre-college education includes educational achievements (number of repeated grades and secondary school graduation mark). Table A.1 in the Appendix provides more detailed descriptions of all variables and reports the sample means by treatment status. Apart from higher wages, abilities and a better physical health status (as seen in Table 2), individuals with a college degree are more likely to be males from an urban district without a migrational background. Moreover, they are more likely to have healthy parents (in terms of mortality). Other variables seem to differ less between both groups. We also account for cohort effects of mother and father, district fixed effects as well as district-specific time trends (see Mazumder 2008; Stephens and Yang 2014 for the importance of the latter). 4.4. Instrument The processes of college expansion discussed in Section 2.2 probably shifted individuals also with a lower desire to study into college education. Such powerful exogenous variation is beneficial for our approach as we try to identify the MTE along the distribution of the desire to study. We assign each individual the college availability as instrument (that is, a variable in Z but not in X). In doing so, we use the information on the district of the secondary school graduation and the year of the college decision, which is the year of secondary school graduation. The district—there are 326 districts in West Germany—is either a city or a certain rural area. The question is how to exploit the regional variation in openings and spots most efficiently as it is almost infeasible to control for all distances to all colleges simultaneously. Our approach to this question is to create an index that best reflects the educational environment in Germany and combines the distance with the number of college spots, \begin{eqnarray} Z_{it}=\sum _{j}^{326}K( {\mathit {dist}}_{ij}) \times \Bigg (\frac{\# {\mathit {students}}_{jt}}{\# {\mathit {inhabitants}}_{jt}}\Bigg ). \end{eqnarray} (7) The college availability instrument Zit basically includes the total number of college spots (measured by the number of students) per inhabitant in district j (out of the 326 districts in total) individual i faces in year t weighted by the distance between i’s home district and district j. Weighting the number of students by the population of the district takes into account that districts with the same number of inhabitants might have colleges of a different size. This local availability is then weighted by the Gaussian kernel distance $$K( {\mathit {dist}}_j)$$ between the centroid of the home district and the centroid of district j. The kernel puts a lot of weight to close colleges and a very small weight to distant ones. Since individuals can choose between many districts with colleges, we calculate the sum of all district-specific college availabilities within the kernel bandwidth. Using a bandwidth of 250 km, this basically amounts to $$K( {\mathit {dist}}_j) = \phi ( {\mathit {dist}}_j/250)$$ where ϕ is the standard normal pdf. Although 250 km sounds like a large bandwidth, this implies that colleges in the same district receive a weight of 0.4, whereas the weight for colleges that are 100 km away is 0.37, but it is reduced to 0.24 for 250 km. Colleges that are 500 km away only get a very low weight of 0.05. A smaller bandwidth of, say, 100 km would mean that already colleges that are 250 km away receive a weight of 0.02 that implies the assumption that individuals basically do not take them into account at all. Most likely this does not reflect actual behavior. As a robustness check, however, we carry out all estimations with bandwidths between 100 and 250 km and the results are remarkably stable, see Online Appendix Figure C.1. Table 3 presents the descriptive statistics. We also provide background information on certain descriptive measures on distance and student density. Table 3. Descriptive statistics of instruments and background information. (1) (2) (3) (4) Statistics Mean SD Min Max Instrument: college availability 0.459 0.262 0.046 1.131 Background information on college availability (implicitly included in the instrument)  Distance to nearest college 27.580 26.184 0 172.269  At least one college in district 0.130 0.337 0 1  Colleges within 100 km 5.860 3.401 0 16  College spots per inhabitant within 100 km 0.034 0.019 0 0.166 (1) (2) (3) (4) Statistics Mean SD Min Max Instrument: college availability 0.459 0.262 0.046 1.131 Background information on college availability (implicitly included in the instrument)  Distance to nearest college 27.580 26.184 0 172.269  At least one college in district 0.130 0.337 0 1  Colleges within 100 km 5.860 3.401 0 16  College spots per inhabitant within 100 km 0.034 0.019 0 0.166 Notes: Own calculations based on NEPS-Starting Cohort 6 data and German Statistical Yearbooks 1959–1991 (German Federal Statistical Office various issues, 1959–1991). Distances are calculated as the Euclidean distance between two respective district centroids. View Large Table 3. Descriptive statistics of instruments and background information. (1) (2) (3) (4) Statistics Mean SD Min Max Instrument: college availability 0.459 0.262 0.046 1.131 Background information on college availability (implicitly included in the instrument)  Distance to nearest college 27.580 26.184 0 172.269  At least one college in district 0.130 0.337 0 1  Colleges within 100 km 5.860 3.401 0 16  College spots per inhabitant within 100 km 0.034 0.019 0 0.166 (1) (2) (3) (4) Statistics Mean SD Min Max Instrument: college availability 0.459 0.262 0.046 1.131 Background information on college availability (implicitly included in the instrument)  Distance to nearest college 27.580 26.184 0 172.269  At least one college in district 0.130 0.337 0 1  Colleges within 100 km 5.860 3.401 0 16  College spots per inhabitant within 100 km 0.034 0.019 0 0.166 Notes: Own calculations based on NEPS-Starting Cohort 6 data and German Statistical Yearbooks 1959–1991 (German Federal Statistical Office various issues, 1959–1991). Distances are calculated as the Euclidean distance between two respective district centroids. View Large The instrument jointly uses college openings and increases in size. Size is measured in enrollment as there is no available information on actual college spots. This might be considered worrisome as enrollment might reflect demand factors that are potentially endogenous. Although we believe that this is not a major problem as most study programs in the colleges were used to capacity, we also, as a robustness check, neglect information on enrollment and merely exploit information on college openings by using $$Z_{it}=\sum _{j}^{326}K( {\mathit {dist}}_{ij}) \times \boldsymbol {1}{[\mbox{college{\,\,}available}_{jt}],}$$ (8) where $$\boldsymbol {1}{[\cdot ]}$$ is the indicator function. The results when using this instrument are comparable, with minor differences, to those from the baseline specification as shown in Figure A.3 in the Appendix. Certainly, the overall findings and conclusions are not affected by this choice. We prefer the combined instrument as this uses information from both aspects of the educational expansion. 5. Results 5.1. OLS Although we are primarily interested in analyzing the returns to college education for the marginal individuals, we start with ordinary least squares (OLS) estimations as a benchmark. Column (1) in Table 4, panel A, reports results for hourly wages, columns (2) and (3) for the two health measures, whereas columns (4)–(6) do the same for the three measures of cognitive abilities. Each cell reports the coefficient of college education from a separate regression. After conditioning on observables, individuals with a college degree earn approximately 28% higher wages, on average. Although PCS is higher by around 0.3 of a standard deviation—recall that all outcomes but wages are standardized—there is no significant relation with MCS. Individuals with a college degree read, on average, 0.4 SD faster than those without college education. Moreover, they approximately have a by 0.7 SD better text understanding and mathematical literacy. All in all, the results are pretty much in line with the differences in standardized means as shown in Table 2, slightly attenuated, however, due to the inclusion of control variables. Table 4. Regression results for OLS and first stage estimations. (1) (2) (3) (4) (5) (6) Health measure Cognitive ability component Gross hourly wage PCS MCS Read. speed Read. comp. Math liter. Panel A: OLS results College degree 0.277*** 0.277*** 0.003 0.398*** 0.729*** 0.653*** (0.019) (0.033) (0.036) (0.037) (0.032) (0.044) Panel B: 2SLS first-stage results College availability 2.368*** 2.576*** 2.576*** 2.521*** 2.327*** 2.454*** (0.132) (0.122) (0.122) (0.132) (0.119) (0.159) Observations 3,378 4,813 4,813 3,995 4,576 2,904 (1) (2) (3) (4) (5) (6) Health measure Cognitive ability component Gross hourly wage PCS MCS Read. speed Read. comp. Math liter. Panel A: OLS results College degree 0.277*** 0.277*** 0.003 0.398*** 0.729*** 0.653*** (0.019) (0.033) (0.036) (0.037) (0.032) (0.044) Panel B: 2SLS first-stage results College availability 2.368*** 2.576*** 2.576*** 2.521*** 2.327*** 2.454*** (0.132) (0.122) (0.122) (0.132) (0.119) (0.159) Observations 3,378 4,813 4,813 3,995 4,576 2,904 Notes: Own calculations based on NEPS-Starting Cohort 6 data. Regressions also include a full set of control variables as well as year-of-birth and district fixed effects, and district-specific linear trends. District clustered standard errors in parentheses. ***p < 0.01. View Large Table 4. Regression results for OLS and first stage estimations. (1) (2) (3) (4) (5) (6) Health measure Cognitive ability component Gross hourly wage PCS MCS Read. speed Read. comp. Math liter. Panel A: OLS results College degree 0.277*** 0.277*** 0.003 0.398*** 0.729*** 0.653*** (0.019) (0.033) (0.036) (0.037) (0.032) (0.044) Panel B: 2SLS first-stage results College availability 2.368*** 2.576*** 2.576*** 2.521*** 2.327*** 2.454*** (0.132) (0.122) (0.122) (0.132) (0.119) (0.159) Observations 3,378 4,813 4,813 3,995 4,576 2,904 (1) (2) (3) (4) (5) (6) Health measure Cognitive ability component Gross hourly wage PCS MCS Read. speed Read. comp. Math liter. Panel A: OLS results College degree 0.277*** 0.277*** 0.003 0.398*** 0.729*** 0.653*** (0.019) (0.033) (0.036) (0.037) (0.032) (0.044) Panel B: 2SLS first-stage results College availability 2.368*** 2.576*** 2.576*** 2.521*** 2.327*** 2.454*** (0.132) (0.122) (0.122) (0.132) (0.119) (0.159) Observations 3,378 4,813 4,813 3,995 4,576 2,904 Notes: Own calculations based on NEPS-Starting Cohort 6 data. Regressions also include a full set of control variables as well as year-of-birth and district fixed effects, and district-specific linear trends. District clustered standard errors in parentheses. ***p < 0.01. View Large Panel B of Table 4 reports the first stage results of the 2SLS estimations. The coefficients of the instrument point into the expected direction and are highly significant. As to be expected, they barely change across the outcome variables (as the first-stage specifications only differ in the number of observations across the columns). In order to get a feeling for the effect size of college availability in the first-stage, we consider, as an example, the college opening in the city of Essen in 1972. In 1978, about 11,000 students studied there. To illustrate the effect of the opening, we assume a constant population size of 700,000 inhabitants. The kernel weight of new spots in the same district is 0.4 (=K(0)). According to equation (7), the instrument value increases by 0.006 (rounded). Given the coefficient of college availability of 2.4, an individual who made the college decision in Essen in 1978 had a 1.44 percentage points higher probability to go to college due to the opening of the college in Essen (compared to an individual who made the college decision in 1971). This seems to be a plausible effect. The effect of the college opening in Essen on individuals who live in districts other than Essen is smaller, depending on the distance to Essen. 5.2. Marginal Treatment Effects Figure 3(a) shows the distribution of the propensity scores used in estimating the MTE by treatment and control group. They are obtained by logit regressions of the college degree on all Z and X variables. Full regression results of the first and the second stage of the 2SLS estimations are reported in the Online Appendices. For both groups, the propensity score varies from 0 to about 1. Moreover, there is a common support of the propensity score almost on the unit interval. Variation in the propensity score where the effects of the X variables are integrated out is used to identify local effects. Figure 3. View largeDownload slide Distribution of propensity scores. Own illustration based on NEPS-Starting Cohort 6 data. The left panel shows the propensity score (PS) density by treatment status. The right panel illustrates the joint PS density (dashed line). The solid line shows the PS variation solely caused by variation in Z, since the X-effects have been integrated out. Further note that in the right panel the densities were both normalized such that they sum up to one over the 100 points where we evaluate the density. Figure 3. View largeDownload slide Distribution of propensity scores. Own illustration based on NEPS-Starting Cohort 6 data. The left panel shows the propensity score (PS) density by treatment status. The right panel illustrates the joint PS density (dashed line). The solid line shows the PS variation solely caused by variation in Z, since the X-effects have been integrated out. Further note that in the right panel the densities were both normalized such that they sum up to one over the 100 points where we evaluate the density. This variation is presented in Figure 3(b). It shows the conditional support of P when the influence of the linear X-index of observables on the propensity score is integrated out (∫fP(Z, X)dX). Here, the support ranges nearly from 0 to 0.8 only caused by variation in the instrument—the identifying variation. This is important in the semiparametric estimation since it shows the regions in which we can credibly identify (conditional on our assumptions) marginal effects without having to rely on inter- or extrapolations to regions where we do not have identifying variation. We calculate the MTE using a local linear regression with a bandwidth that ranges from 0.10 to 0.16 depending on the outcome variable.19 We calculate the marginal effects along the quantiles UD by evaluating the derivative of the treatment effect with respect to the propensity score (see equation (6) in Section 3). Figure 4 shows the MTE for all outcome variables. The upper left panel presents the MTE for wages. We find that individuals with low values of UD have the highest monetary returns to college education. Low values of UD mean that these are the individuals who are very likely to study as already small values of P(z) exceed UD, see the transformed choice equation in Section 3. The returns are close to 80% for the smallest values of UD and then approach 0 at UD ≈ 0.7. Thus, we tend to interpret these findings as clear and strong positive returns for the 70% of individuals with the highest desire to study, whereas there is no clear evidence for any returns for the remaining 30%. Hence, there is obviously selection into gains with respect to wages, where individuals with higher (realized) returns self-select into more education. This reflects the notion that individuals make choices based on their expected gains. Figure 4. View largeDownload slide Marginal treatment effects for cognitive abilities and health. Own illustration based on NEPS-Starting Cohort 6 data. For gross hourly wage, the log value is taken. Health and cognitive skill outcomes are standardized to mean 0 and standard deviation 1. The MTE (vertical axis) is measured in logs for wage and in units of standard deviations of the health and cognitive skill outcomes. The dashed lines give the 95% confidence intervals based on clustered bootstrapped standard errors with 200 replications. Calculations based on a local linear regression where the influence of the control variables was isolated using a semiparametric Robinson estimator (Robinson 1988) for each outcome variable. The optimal, exact bandwidths for the local linear regressions are: for wage 0.10, PCS 0.13, MCS 0.16, reading competence 0.10, for reading speed 0.11, math score 0.12. Figure 4. View largeDownload slide Marginal treatment effects for cognitive abilities and health. Own illustration based on NEPS-Starting Cohort 6 data. For gross hourly wage, the log value is taken. Health and cognitive skill outcomes are standardized to mean 0 and standard deviation 1. The MTE (vertical axis) is measured in logs for wage and in units of standard deviations of the health and cognitive skill outcomes. The dashed lines give the 95% confidence intervals based on clustered bootstrapped standard errors with 200 replications. Calculations based on a local linear regression where the influence of the control variables was isolated using a semiparametric Robinson estimator (Robinson 1988) for each outcome variable. The optimal, exact bandwidths for the local linear regressions are: for wage 0.10, PCS 0.13, MCS 0.16, reading competence 0.10, for reading speed 0.11, math score 0.12. The curve of marginal treatment effects resembles the one found by Carneiro et al. (2011) for the United States with the main difference that we do not find negative effects (but just zero) for a part of the distribution. The effect sizes are also comparable although ours are somewhat smaller. For instance, Carneiro et al. (2011) find highest returns of 28% per year of college, whereas we find 80% for the college degree that, on average, takes 4.5 years to be earned. What could explain these wage returns? Two potential channels of higher earnings could be better cognitive skills and/or better health due to increased education. The findings on skills and health that we discuss in the following could, thus, be read as investigations into mechanisms for the positive wage returns. However, at least for health, this would only be one potential interpretation as health might also be directly affected by income. The right column of Figure 4 plots the results for cognitive skills. The distribution of marginal treatment effects is remarkably similar to the one for wages. We see that, also in terms of cognitive skills, not everybody benefits from more education. Some individuals, again those with high desire to study, strongly benefit, while the effects approach zero for individuals with UD > 0.6. This holds for reading speed, reading competence, as well as mathematical literacy. The largest returns are as high as 2–3 standard deviations, again, for the small group with highest college readiness only. Thus, we observe the same selection into gains as with wages and the findings could be interpreted as returns to cognitive abilities from education being a potential pathway for positive earnings returns. The findings are somewhat different for health, as seen in the lower left part of Figure 4. First of all, the returns are much more homogeneous than those for wages and skills. Although there is still some heterogeneity in returns to physical health (though to a smaller degree than before) returns are completely homogeneous for mental health. Moreover, the returns are zero throughout for mental health. Physical health effects are positive (although not always statistically significant) for around 75% of the individuals whereas they are close to zero for the 25% with the lowest desire to study. The main findings of this paper can be summarized as follows: – Education leads to higher wages and cognitive abilities for the same approximately 60% of individuals. This can also be read as suggestive evidence for cognitive abilities being a channel for the effect of education on wages. – Education does not pay off for everybody. However, in no case are the effects negative. Thus, education does never harm in terms of gross wages, skills and health. (Obviously, this view only considers potential benefits and disregards costs—thus, net benefits might well be negative for some individuals.) – There are clear signs of selection into gains. Those individuals who realize the highest returns to education are those who are most ready to take it. With policy initiatives such as the “Higher Education Pact 2020” Germany continuously increases participation in higher education in order to meet OECD standards (see OECD 2015a,b). Our results imply that this might not pay off, at least in terms of productivity (measured by wages), cognitive abilities, and health. Without fully simulating the results of further increased numbers of students in Germany, it is safe to assume that additional students would be those with higher values of UD as those with the high desire to study are in large parts already enrolled. But these additional students are the ones that do not seem to benefit from college education. However, this projection needs to be taken with a grain of salt as our findings are based on education in the 1960s–1980s and current education might yield different effects. We carry out two kinds of robustness checks with respect to the definition of the instrument (see Section 4.4). Figure A.3 in the Appendix reports the findings when the instrument definition does not consider the increases in college size. The MTE curves do not exactly stay the same as before but the main conclusions are unchanged. Wage returns are slightly more homogeneous. The results for reading competence and mathematical literacy are virtually the same whereas for reading speed homogeneously positive effects are found. However, the confidence bands of the curves for both definitions of the instrument widely overlap. This also holds for the health measures. The MTE curve for MCS is slightly shifted upward and the one for PCS is more homogeneous but the difference in the curves across both kinds of instruments are not significant. Although the likelihood that two valid instruments exactly deliver the same results is fairly low in any application (and basically zero when so many points are evaluated as is the case here), the broad picture that leads to the conclusions stated previously is invariant to the change in the instrument definition. In Online Appendix C, we report the results of robustness check where we use different kernel bandwidths to weight the college distance (bandwidths between 100 and 250 km). Here the differences are indeed widely absent. Although the condensation of college availability in equation (7) seems somewhat arbitrary, these robustness checks show that the specification of the instrument does not affect our conclusions. 5.3. Treatment Parameters Table 5 reports the conventional treatment parameters estimated using the MTE and the respective weights as described previously and more formally derived and explained in, for example, Heckman et al. (2006). In particular, we calculate the average treatment effect (ATE), the average treatment effect on the treated (ATT), the average treatment effect on the untreated (ATU) and the local average treatment effect (LATE). The estimated weights applied to the returns for each UD on the MTE curve are shown in Figure 5. Figure 5. View largeDownload slide Treatment parameter weights conditional on the propensity score. Own illustration based on NEPS-Starting Cohort 6 data. Weights were calculated using the entire sample of 8,672 observations for that we have instrument and control variable information in spite of availability of the outcome variable. Figure 5. View largeDownload slide Treatment parameter weights conditional on the propensity score. Own illustration based on NEPS-Starting Cohort 6 data. Weights were calculated using the entire sample of 8,672 observations for that we have instrument and control variable information in spite of availability of the outcome variable. Table 5. Estimated treatment parameters for main results. (1) (2) (3) (4) Treatment parameter ATE ATT ATU LATE Main outcomes  Log gross wage 0.43 0.59 0.36 0.49 (0.06) (0.07) (0.07) (0.05)  PCS 0.45 0.86 0.29 0.55 (0.13) (0.13) (0.16) (0.09)  MCS 0.10 0.09 0.10 0.05 (0.10) (0.12) (0.13) (0.08)  Reading competence 1.10 1.88 0.78 1.18 (0.13) (0.15) (0.16) (0.08)  Reading speed 0.72 1.17 0.54 0.70 (0.14) (0.15) (0.18) (0.11)  Mathematical literacy 1.11 1.56 0.93 1.13 (0.17) (0.21) (0.19) (0.14) (1) (2) (3) (4) Treatment parameter ATE ATT ATU LATE Main outcomes  Log gross wage 0.43 0.59 0.36 0.49 (0.06) (0.07) (0.07) (0.05)  PCS 0.45 0.86 0.29 0.55 (0.13) (0.13) (0.16) (0.09)  MCS 0.10 0.09 0.10 0.05 (0.10) (0.12) (0.13) (0.08)  Reading competence 1.10 1.88 0.78 1.18 (0.13) (0.15) (0.16) (0.08)  Reading speed 0.72 1.17 0.54 0.70 (0.14) (0.15) (0.18) (0.11)  Mathematical literacy 1.11 1.56 0.93 1.13 (0.17) (0.21) (0.19) (0.14) Notes: Own calculations based on NEPS-Starting Cohort 6 data. The MTE is estimated with a semiparametric Robinson estimator. The LATE is estimated using the IV weights depicted in Figure 5. Therefore, the LATE in this table deviates slightly from corresponding 2SLS estimates. Standard error estimated using a clustered bootstrap (at district level) with 200 replications. View Large Table 5. Estimated treatment parameters for main results. (1) (2) (3) (4) Treatment parameter ATE ATT ATU LATE Main outcomes  Log gross wage 0.43 0.59 0.36 0.49 (0.06) (0.07) (0.07) (0.05)  PCS 0.45 0.86 0.29 0.55 (0.13) (0.13) (0.16) (0.09)  MCS 0.10 0.09 0.10 0.05 (0.10) (0.12) (0.13) (0.08)  Reading competence 1.10 1.88 0.78 1.18 (0.13) (0.15) (0.16) (0.08)  Reading speed 0.72 1.17 0.54 0.70 (0.14) (0.15) (0.18) (0.11)  Mathematical literacy 1.11 1.56 0.93 1.13 (0.17) (0.21) (0.19) (0.14) (1) (2) (3) (4) Treatment parameter ATE ATT ATU LATE Main outcomes  Log gross wage 0.43 0.59 0.36 0.49 (0.06) (0.07) (0.07) (0.05)  PCS 0.45 0.86 0.29 0.55 (0.13) (0.13) (0.16) (0.09)  MCS 0.10 0.09 0.10 0.05 (0.10) (0.12) (0.13) (0.08)  Reading competence 1.10 1.88 0.78 1.18 (0.13) (0.15) (0.16) (0.08)  Reading speed 0.72 1.17 0.54 0.70 (0.14) (0.15) (0.18) (0.11)  Mathematical literacy 1.11 1.56 0.93 1.13 (0.17) (0.21) (0.19) (0.14) Notes: Own calculations based on NEPS-Starting Cohort 6 data. The MTE is estimated with a semiparametric Robinson estimator. The LATE is estimated using the IV weights depicted in Figure 5. Therefore, the LATE in this table deviates slightly from corresponding 2SLS estimates. Standard error estimated using a clustered bootstrap (at district level) with 200 replications. View Large Whereas the local average treatment effect is an average effect weighted by the conditional density of the instrument, the ATT (vice versa for the ATU) for example gives more weight to those individuals that select already into higher education at low UD values (indicating low intrinsic reluctance for higher education). The reason is that their likelihood of being in any “treatment group” is higher compared to individuals with higher values of UD. The ATE places equal weight over the whole support. In all cases but mental health and reading speed, the LATE parameters in column (4) approximately double compared to the OLS estimates. Increasing local average treatment effects (compared to OLS) seem to be counterintuitive as one often expects OLS to overestimate the true effects. Yet, this is not an uncommon finding and in a world with heterogeneous effects often explained by the group of compliers that potentially has higher individual treatment effects than the average individual (Card 2001). This is directly obvious by comparing the LATE to column (1) that is another indication of selection into gains. Regarding the other treatment parameters, the LATE lies within the range of the ATT and the ATU. Note that these are the “empirical”, conditional-on-the-sample parameters as calculated in Basu et al. (2007), that is, the treatment parameters conditional on the common support of the propensity score. The population ATE, however, would require full support on the unity interval.20 As depicted in Figure 3, we do not have full support in the data at hand. Although we observe individuals with and without college degree for most probabilities to study, we cannot observe an individual with a probability arbitrarily close to 100% without college degree (and arbitrarily close to 0% with a degree). Instead, the parameters in Table 5 were computed using the marginal treatment effects on the common support only. However, as this reaches from 0.002 to 0.969 it seems fair to say that this probably comes very close to the true parameters. Table 5 is informative in particular for two reasons. First, it boils down the MTE to single numbers such that the average effect size immediately becomes clear. And, second, differences between the parameters again emphasize the role of effect heterogeneity. Together with the bootstrapped standard errors the table reveals that the ATT and the ATU structurally differ from each other for all outcomes but mental health. Hence, the treatment group of college graduates seems to benefit from higher education in terms of wages, skills, and physical health compared to the non-graduates. One reason is that they might choose to study because of their idiosyncratic skill returns. Yet, it is also likely to be windfall gains that go along with monetary college premiums that the decision was more likely to be based on. Nonetheless, this also is evidence for selection into gains. The effect sizes for all (ATE), for the university degree subgroup (ATT), and for those without higher education (ATU) in Table 5 capture the overall returns to college education, not the per-year effects. On average, the per-year effect is approximately the overall effect divided by 4.5 years (the regular time it takes to receive a Diplom degree), if we assume linear additivity of the yearly effects. The per-year effects for mathematical literacy and reading competence are about 25% of a standard deviation for all parameters. For reading speed the effects are around 15% of an SD, whereas the wage effects are around 10%. These effects are of considerable size, yet slightly smaller than those found in the previous literature on different treatments and, importantly, different compliers. For instance, ability returns to an additional year of compulsory schooling were found to be up to 0.5 SD (see, e.g., Banks and Mazzonna 2012). To get an idea of the total effect of college education on, say, math skills, the following example might help. If you start at the median of the standardized unconditional math score distribution (Φ(0) = 50%), the average effect of 1.11 of a standard deviation, all other things the same, will make you end up at the 87% quantile of that distribution (Φ(0 + 1.11) = 87%)—in the thought experiment of being the only treated in the peer group. As suggested by the pattern of the marginal treatment effects in Figure 4, the health returns to higher education are smaller than the skill returns, still they are around 10% of an SD per year (except for the zero effect on mental health). Given the previous literature, the results seem reasonable. Regarding statistical significance of the effects, note that we use several outcome variables and potentially run into multiple testing problems. Yet, we refrain from taking this into account by a complex algorithm that also accounts for the correlation of the six outcome variables and argue the following way: All ATEs and ATTs are highly statistically significant. Thus, our multiple testing procedure with six outcomes should not be a major issue. Even with a most conservative Bonferroni correction, critical values for statistical significance at the 5% level would increase from 1.96 to 2.65 and would not change any conclusions regarding significance.21 6. Potential Mechanisms for Health and Cognitive Abilities In this section, we investigate the role of potential mechanisms through which college education may work. It is likely to affect the observed level of health and cognitive abilities through the attained stock of health capital and the cognitive reserve—the mind’s ability to tolerate brain damage (Stern 2012; Meng and D’Arcy 2012). There are probably three channels through which education affects long-run health and cognitive abilities: – in college: a direct effect from education; – post-college: a diminished age-related decline in health and skills due to the higher health capital/cognitive reserve attained in college (e.g., the “cognitive reserve hypothesis”, Stern et al. 1999); – post-college: different health behavior or different jobs that are less detrimental to health and more cognitively demanding (Stern 2012). The post-college mechanisms that compensate for the decline also contain implicit multiplying factors like complementarities and self-productivity, see Cunha et al. (2006) and Cunha and Heckman (2007). The NEPS data includes various job characteristics and health behaviors that potentially reduce the age-related skill/health decline. However, the data neither allow us to disentangle these components empirically (i.e., observing changes in one channel that are exogenous from other channels) nor to analyze how the effect on the mechanism causally maps into higher skills or better health (as, e.g., in Heckman et al. 2013). Thus, it should be noted that this sub-analysis is merely suggestive and by no means a comprehensive analysis on the mechanisms of the effects found in the previous section. Moreover, the following analysis focusses on the potential channel of different jobs and health behaviors. It does the same as before (same controls, same estimation strategy and instrument) but replaces the outcome variables by the indicators of potential mechanisms. Cognitive Abilities. The main driving force behind skill formation after college might lie in activities on the job. When individuals with college education engage in more cognitively demanding activities, for example, more sophisticated jobs, this might mentally exercise their minds (Rohwedder and Willis 2010). This effect of mental training is sometimes referred to as use-it-or-lose-it hypothesis, see Rohwedder and Willis (2010) or Salthouse (2006). If such an exercise effect leads to alternating brain networks that “may compensate for the pathological disruption of preexisting networks” (Meng and D’Arcy 2012, p. 2), a higher demand for cognitively demanding tasks (as a result of college education) increases the individual’s cognitive capacity. In order to investigate if a more cognitively demanding job might be a potential mechanism (as, e.g., suggested by Fisher et al. 2014), we use information on the individual’s activities on the job. All four outcome variables considered in this subsection are binary, their definitions, sample means effects of college education are given in Table 6. For the sake of brevity we focus on the most relevant treatment parameters here and do not discuss the MTE curvatures. Table 6. Potential mechanisms for cognitive skills. Parameter Definition Sample mean ATE ATT ATU Math: percentages =1 if job requires calculating with 0.711 0.20 0.23 0.19 percentages and fractions (0.06) (0.07) (0.07) Reading =1 if respondent often spends more 0.777 0.23 0.30 0.30 than 2 hours reading (0.03) (0.03) (0.04) Writing =1 if respondent often writes more 0.704 0.39 0.64 0.29 than 1 page (0.07) (0.09) (0.07) Learning new things =1 if respondent reports to learn new 0.671 0.22 0.31 0.18 things often (0.07) (0.09) (0.07) Parameter Definition Sample mean ATE ATT ATU Math: percentages =1 if job requires calculating with 0.711 0.20 0.23 0.19 percentages and fractions (0.06) (0.07) (0.07) Reading =1 if respondent often spends more 0.777 0.23 0.30 0.30 than 2 hours reading (0.03) (0.03) (0.04) Writing =1 if respondent often writes more 0.704 0.39 0.64 0.29 than 1 page (0.07) (0.09) (0.07) Learning new things =1 if respondent reports to learn new 0.671 0.22 0.31 0.18 things often (0.07) (0.09) (0.07) Notes: Own calculations based on NEPS-Starting Cohort 6 data. Definitions are taken from the data manual. Standard error estimated using a clustered bootstrap (district level) and reported in parentheses. View Large Table 6. Potential mechanisms for cognitive skills. Parameter Definition Sample mean ATE ATT ATU Math: percentages =1 if job requires calculating with 0.711 0.20 0.23 0.19 percentages and fractions (0.06) (0.07) (0.07) Reading =1 if respondent often spends more 0.777 0.23 0.30 0.30 than 2 hours reading (0.03) (0.03) (0.04) Writing =1 if respondent often writes more 0.704 0.39 0.64 0.29 than 1 page (0.07) (0.09) (0.07) Learning new things =1 if respondent reports to learn new 0.671 0.22 0.31 0.18 things often (0.07) (0.09) (0.07) Parameter Definition Sample mean ATE ATT ATU Math: percentages =1 if job requires calculating with 0.711 0.20 0.23 0.19 percentages and fractions (0.06) (0.07) (0.07) Reading =1 if respondent often spends more 0.777 0.23 0.30 0.30 than 2 hours reading (0.03) (0.03) (0.04) Writing =1 if respondent often writes more 0.704 0.39 0.64 0.29 than 1 page (0.07) (0.09) (0.07) Learning new things =1 if respondent reports to learn new 0.671 0.22 0.31 0.18 things often (0.07) (0.09) (0.07) Notes: Own calculations based on NEPS-Starting Cohort 6 data. Definitions are taken from the data manual. Standard error estimated using a clustered bootstrap (district level) and reported in parentheses. View Large College education has strong effects on all four outcomes. It increases the likelihood to be in a job that requires calculating with percentages and fractions, that involves reading or writing and in which individuals often learn new things. The effect sizes are very large which is not too surprising as many of the jobs that entail these mentally demanding tasks require a college diploma as a quasi-formal condition of employment. Moreover, as observed before, there seems to be effect heterogeneity here as well and selection into gains as all average treatment effects on the treated are larger than the treatment effects on the untreated (except for the case of reading more than 2 h). The differences are particularly strong for writing and for learning new things. All in all, the findings suggest that cognitively more demanding jobs due to college education might play a role in explaining long-run cognitive returns to education. Note again, however, that these findings are only suggestive evidence for a causal mechanism. It might as well be that it is the other way around and the cognitive abilities attained in college induce a selection into these job types. Health Concerning the health mechanisms, we study job-related effects and effects on health behavior. The NEPS data cover engagement in several physical activities on the job, for example, working in a standing position, working in an uncomfortable position (like bending often), walking or cycling long distances, or carrying heavy loads. Table 7 reports definitions, sample means and effects. The binary indicators are coded as 1 if the respondent reports to engage in the activity (and 0 otherwise) in the upper panel of the table. Table 7. Potential mechanisms for health. Parameter Definition Sample mean ATE ATT ATU Physically demanding activities on the job  Standing position =1 if often working in a standing 0.302 −0.37 −0.56 −0.30 position for 2 or more hours (0.07) (0.09) 0.08)  Uncomfortable pos. =1 if respondent needs to bend, crawl, 0.190 −0.20 −0.37 −0.13 lie down, keen or squat (0.05) (0.06) (0.06)  Walking =1 if job often requires walking, 0.242 −0.39 −0.56 −0.32 running or cycling (0.06) (0.07) (0.07)  Carrying =1 if often carrying a load of at least 0.182 −0.40 −0.50 −0.37 10 kg (0.05) (0.05) (0.05) Health behaviors  Obesity =1 if body mass index (=weight in 0.155 −0.08 −0.15 −0.05 kg/height in m2) > 30 (0.04) (0.05) (0.05)  Smoking =1 if currently smoking 0.270 −0.18 −0.23 −0.16 (0.06) (0.06) (0.07)  Alcohol amount =1 if three or more drinks when 0.187 −0.14 −0.13 −0.14 consuming alcohol (0.05) (0.06) (0.06)  Sport =1 if any sporting exercise in the 0.717 0.16 0.31 0.10 previous 3 months (0.07) (0.07) (0.09) Parameter Definition Sample mean ATE ATT ATU Physically demanding activities on the job  Standing position =1 if often working in a standing 0.302 −0.37 −0.56 −0.30 position for 2 or more hours (0.07) (0.09) 0.08)  Uncomfortable pos. =1 if respondent needs to bend, crawl, 0.190 −0.20 −0.37 −0.13 lie down, keen or squat (0.05) (0.06) (0.06)  Walking =1 if job often requires walking, 0.242 −0.39 −0.56 −0.32 running or cycling (0.06) (0.07) (0.07)  Carrying =1 if often carrying a load of at least 0.182 −0.40 −0.50 −0.37 10 kg (0.05) (0.05) (0.05) Health behaviors  Obesity =1 if body mass index (=weight in 0.155 −0.08 −0.15 −0.05 kg/height in m2) > 30 (0.04) (0.05) (0.05)  Smoking =1 if currently smoking 0.270 −0.18 −0.23 −0.16 (0.06) (0.06) (0.07)  Alcohol amount =1 if three or more drinks when 0.187 −0.14 −0.13 −0.14 consuming alcohol (0.05) (0.06) (0.06)  Sport =1 if any sporting exercise in the 0.717 0.16 0.31 0.10 previous 3 months (0.07) (0.07) (0.09) Notes: Own calculations based on NEPS-Starting Cohort 6 data. Definitions are taken from the data manual. Standard error estimated using a clustered bootstrap (at district level) and reported in parentheses. View Large Table 7. Potential mechanisms for health. Parameter Definition Sample mean ATE ATT ATU Physically demanding activities on the job  Standing position =1 if often working in a standing 0.302 −0.37 −0.56 −0.30 position for 2 or more hours (0.07) (0.09) 0.08)  Uncomfortable pos. =1 if respondent needs to bend, crawl, 0.190 −0.20 −0.37 −0.13 lie down, keen or squat (0.05) (0.06) (0.06)  Walking =1 if job often requires walking, 0.242 −0.39 −0.56 −0.32 running or cycling (0.06) (0.07) (0.07)  Carrying =1 if often carrying a load of at least 0.182 −0.40 −0.50 −0.37 10 kg (0.05) (0.05) (0.05) Health behaviors  Obesity =1 if body mass index (=weight in 0.155 −0.08 −0.15 −0.05 kg/height in m2) > 30 (0.04) (0.05) (0.05)  Smoking =1 if currently smoking 0.270 −0.18 −0.23 −0.16 (0.06) (0.06) (0.07)  Alcohol amount =1 if three or more drinks when 0.187 −0.14 −0.13 −0.14 consuming alcohol (0.05) (0.06) (0.06)  Sport =1 if any sporting exercise in the 0.717 0.16 0.31 0.10 previous 3 months (0.07) (0.07) (0.09) Parameter Definition Sample mean ATE ATT ATU Physically demanding activities on the job  Standing position =1 if often working in a standing 0.302 −0.37 −0.56 −0.30 position for 2 or more hours (0.07) (0.09) 0.08)  Uncomfortable pos. =1 if respondent needs to bend, crawl, 0.190 −0.20 −0.37 −0.13 lie down, keen or squat (0.05) (0.06) (0.06)  Walking =1 if job often requires walking, 0.242 −0.39 −0.56 −0.32 running or cycling (0.06) (0.07) (0.07)  Carrying =1 if often carrying a load of at least 0.182 −0.40 −0.50 −0.37 10 kg (0.05) (0.05) (0.05) Health behaviors  Obesity =1 if body mass index (=weight in 0.155 −0.08 −0.15 −0.05 kg/height in m2) > 30 (0.04) (0.05) (0.05)  Smoking =1 if currently smoking 0.270 −0.18 −0.23 −0.16 (0.06) (0.06) (0.07)  Alcohol amount =1 if three or more drinks when 0.187 −0.14 −0.13 −0.14 consuming alcohol (0.05) (0.06) (0.06)  Sport =1 if any sporting exercise in the 0.717 0.16 0.31 0.10 previous 3 months (0.07) (0.07) (0.09) Notes: Own calculations based on NEPS-Starting Cohort 6 data. Definitions are taken from the data manual. Standard error estimated using a clustered bootstrap (at district level) and reported in parentheses. View Large Table A.1. Control variables and means by college degree. Respondents Variable Definition with college degree w/o college degree General information  Female =1 if respondent is female 40.38 54.18  Year of birth (FE) Year of birth of the respondent 1959 1959  Migrational background =1 if respondent was born abroad 0.89 0.64  No native speaker =1 if mother tongue is not German 0.30 0.43  Rural district =1 if current district is rural 16.79 24.96  Mother still alive =1 if mother is still alive in 2009/10 65.38 63.83  Father still alive =1 if father is still alive in 2009/10 45.27 42.3 Pre-college living conditions  Married before college =1 if respondent got married before the year of the college decision or in the same year 0.20 0.44  Parent before college =1 if respondent became a parent before the year of the college decision or in the same year 0.30 0.17  Siblings Number of siblings 1.56 1.87  First born =1 if respondent was the first born in the family 33.73 29.01  Age 15: lived by single parent =1 if respondent was raised by single parent 5.33 5.32  Age 15: lived in patchwork family =1 if respondent was raised in a patchwork family 1.11 0.27  Age 15: orphan =1 if respondent was an orphan at the age of 15 0.10 0.20  Age 15: mother employed =1 if mother was employed at the respondent’s age of 15 45.93 46.87  Age 15: mother never unemployed =1 if mother was never unemployed until the respondent’s age of 15 61.24 62.29  Age 15: father employed =1 if father was employed at the respondent’s age of 15 92.46 90.73  Age 15: father never unemployed =1 if father was never unemployed until the respondent’s age of 15 98.45 97.14 Pre-college education  Final school grade: excellence =1 if the overall grade of the highest school degree was excellent 4.59 1.79  Final school grade: good =1 if the overall grade of the highest school degree was good 31.51 25.83  Final school grade: satisfactory =1 if the overall grade of the highest school degree was satisfactory 17.97 28.03  Final school grade: sufficient or worse =1 if the overall grade of the highest school degree was sufficient or worse 1.04 1.42  Repeated one grade =1 if student needed to repeat one grade in elementary or secondary school 19.97 20.51  Repeated two or more grades =1 if student needed to repeat two or more grades in elementary or secondary school 2.74 1.85  Military service =1 if respondent was drafted for compulsory military service 28.03 23.89 Parental characteristics (M: mother, F: father)  M: year of birth (FE) Year of birth of the respondent’s mother 1930 1932  M: migrational background =1 if mother was born abroad 5.47 4.85  M: at least inter. edu =1 if mother has at least an intermediate secondary school degree 17.97 5.95  M: vocational training =1 if mother’s highest degree is vocational training 20.86 16.18  M: further job qualification =1 if mother has further job qualification (e.g., Meister degree) 4.29 1.73  F: year of birth (FE) Year of birth of the respondent’s father 1927 1929  F: migrational background =1 if father was born abroad 6.36 5.54  F: at least inter. edu =1 if father has at least an intermediate secondary school degree 20.86 8.09  F: vocational training =1 if father’s highest degree is vocational training 19.12 21.99  F: further job qualification =1 if father has further job qualification (e.g., Meister degree) 11.46 6.76 Number of observations (PCS and MCS sample) 1,352 3,461 Respondents Variable Definition with college degree w/o college degree General information  Female =1 if respondent is female 40.38 54.18  Year of birth (FE) Year of birth of the respondent 1959 1959  Migrational background =1 if respondent was born abroad 0.89 0.64  No native speaker =1 if mother tongue is not German 0.30 0.43  Rural district =1 if current district is rural 16.79 24.96  Mother still alive =1 if mother is still alive in 2009/10 65.38 63.83  Father still alive =1 if father is still alive in 2009/10 45.27 42.3 Pre-college living conditions  Married before college =1 if respondent got married before the year of the college decision or in the same year 0.20 0.44  Parent before college =1 if respondent became a parent before the year of the college decision or in the same year 0.30 0.17  Siblings Number of siblings 1.56 1.87  First born =1 if respondent was the first born in the family 33.73 29.01  Age 15: lived by single parent =1 if respondent was raised by single parent 5.33 5.32  Age 15: lived in patchwork family =1 if respondent was raised in a patchwork family 1.11 0.27  Age 15: orphan =1 if respondent was an orphan at the age of 15 0.10 0.20  Age 15: mother employed =1 if mother was employed at the respondent’s age of 15 45.93 46.87  Age 15: mother never unemployed =1 if mother was never unemployed until the respondent’s age of 15 61.24 62.29  Age 15: father employed =1 if father was employed at the respondent’s age of 15 92.46 90.73  Age 15: father never unemployed =1 if father was never unemployed until the respondent’s age of 15 98.45 97.14 Pre-college education  Final school grade: excellence =1 if the overall grade of the highest school degree was excellent 4.59 1.79  Final school grade: good =1 if the overall grade of the highest school degree was good 31.51 25.83  Final school grade: satisfactory =1 if the overall grade of the highest school degree was satisfactory 17.97 28.03  Final school grade: sufficient or worse =1 if the overall grade of the highest school degree was sufficient or worse 1.04 1.42  Repeated one grade =1 if student needed to repeat one grade in elementary or secondary school 19.97 20.51  Repeated two or more grades =1 if student needed to repeat two or more grades in elementary or secondary school 2.74 1.85  Military service =1 if respondent was drafted for compulsory military service 28.03 23.89 Parental characteristics (M: mother, F: father)  M: year of birth (FE) Year of birth of the respondent’s mother 1930 1932  M: migrational background =1 if mother was born abroad 5.47 4.85  M: at least inter. edu =1 if mother has at least an intermediate secondary school degree 17.97 5.95  M: vocational training =1 if mother’s highest degree is vocational training 20.86 16.18  M: further job qualification =1 if mother has further job qualification (e.g., Meister degree) 4.29 1.73  F: year of birth (FE) Year of birth of the respondent’s father 1927 1929  F: migrational background =1 if father was born abroad 6.36 5.54  F: at least inter. edu =1 if father has at least an intermediate secondary school degree 20.86 8.09  F: vocational training =1 if father’s highest degree is vocational training 19.12 21.99  F: further job qualification =1 if father has further job qualification (e.g., Meister degree) 11.46 6.76 Number of observations (PCS and MCS sample) 1,352 3,461 Notes: Own calculations based on NEPS-Starting Cohort 6 data. Definitions are taken from the data manual. Mean values refer to the MCS and PCS sample. FE = variable values are included as fixed effects in the analysis. View Large Table A.1. Control variables and means by college degree. Respondents Variable Definition with college degree w/o college degree General information  Female =1 if respondent is female 40.38 54.18  Year of birth (FE) Year of birth of the respondent 1959 1959  Migrational background =1 if respondent was born abroad 0.89 0.64  No native speaker =1 if mother tongue is not German 0.30 0.43  Rural district =1 if current district is rural 16.79 24.96  Mother still alive =1 if mother is still alive in 2009/10 65.38 63.83  Father still alive =1 if father is still alive in 2009/10 45.27 42.3 Pre-college living conditions  Married before college =1 if respondent got married before the year of the college decision or in the same year 0.20 0.44  Parent before college =1 if respondent became a parent before the year of the college decision or in the same year 0.30 0.17  Siblings Number of siblings 1.56 1.87  First born =1 if respondent was the first born in the family 33.73 29.01  Age 15: lived by single parent =1 if respondent was raised by single parent 5.33 5.32  Age 15: lived in patchwork family =1 if respondent was raised in a patchwork family 1.11 0.27  Age 15: orphan =1 if respondent was an orphan at the age of 15 0.10 0.20  Age 15: mother employed =1 if mother was employed at the respondent’s age of 15 45.93 46.87  Age 15: mother never unemployed =1 if mother was never unemployed until the respondent’s age of 15 61.24 62.29  Age 15: father employed =1 if father was employed at the respondent’s age of 15 92.46 90.73  Age 15: father never unemployed =1 if father was never unemployed until the respondent’s age of 15 98.45 97.14 Pre-college education  Final school grade: excellence =1 if the overall grade of the highest school degree was excellent 4.59 1.79  Final school grade: good =1 if the overall grade of the highest school degree was good 31.51 25.83  Final school grade: satisfactory =1 if the overall grade of the highest school degree was satisfactory 17.97 28.03  Final school grade: sufficient or worse =1 if the overall grade of the highest school degree was sufficient or worse 1.04 1.42  Repeated one grade =1 if student needed to repeat one grade in elementary or secondary school 19.97 20.51  Repeated two or more grades =1 if student needed to repeat two or more grades in elementary or secondary school 2.74 1.85  Military service =1 if respondent was drafted for compulsory military service 28.03 23.89 Parental characteristics (M: mother, F: father)  M: year of birth (FE) Year of birth of the respondent’s mother 1930 1932  M: migrational background =1 if mother was born abroad 5.47 4.85  M: at least inter. edu =1 if mother has at least an intermediate secondary school degree 17.97 5.95  M: vocational training =1 if mother’s highest degree is vocational training 20.86 16.18  M: further job qualification =1 if mother has further job qualification (e.g., Meister degree) 4.29 1.73  F: year of birth (FE) Year of birth of the respondent’s father 1927 1929  F: migrational background =1 if father was born abroad 6.36 5.54  F: at least inter. edu =1 if father has at least an intermediate secondary school degree 20.86 8.09  F: vocational training =1 if father’s highest degree is vocational training 19.12 21.99  F: further job qualification =1 if father has further job qualification (e.g., Meister degree) 11.46 6.76 Number of observations (PCS and MCS sample) 1,352 3,461 Respondents Variable Definition with college degree w/o college degree General information  Female =1 if respondent is female 40.38 54.18  Year of birth (FE) Year of birth of the respondent 1959 1959  Migrational background =1 if respondent was born abroad 0.89 0.64  No native speaker =1 if mother tongue is not German 0.30 0.43  Rural district =1 if current district is rural 16.79 24.96  Mother still alive =1 if mother is still alive in 2009/10 65.38 63.83  Father still alive =1 if father is still alive in 2009/10 45.27 42.3 Pre-college living conditions  Married before college =1 if respondent got married before the year of the college decision or in the same year 0.20 0.44  Parent before college =1 if respondent became a parent before the year of the college decision or in the same year 0.30 0.17  Siblings Number of siblings 1.56 1.87  First born =1 if respondent was the first born in the family 33.73 29.01  Age 15: lived by single parent =1 if respondent was raised by single parent 5.33 5.32  Age 15: lived in patchwork family =1 if respondent was raised in a patchwork family 1.11 0.27  Age 15: orphan =1 if respondent was an orphan at the age of 15 0.10 0.20  Age 15: mother employed =1 if mother was employed at the respondent’s age of 15 45.93 46.87  Age 15: mother never unemployed =1 if mother was never unemployed until the respondent’s age of 15 61.24 62.29  Age 15: father employed =1 if father was employed at the respondent’s age of 15 92.46 90.73  Age 15: father never unemployed =1 if father was never unemployed until the respondent’s age of 15 98.45 97.14 Pre-college education  Final school grade: excellence =1 if the overall grade of the highest school degree was excellent 4.59 1.79  Final school grade: good =1 if the overall grade of the highest school degree was good 31.51 25.83  Final school grade: satisfactory =1 if the overall grade of the highest school degree was satisfactory 17.97 28.03  Final school grade: sufficient or worse =1 if the overall grade of the highest school degree was sufficient or worse 1.04 1.42  Repeated one grade =1 if student needed to repeat one grade in elementary or secondary school 19.97 20.51  Repeated two or more grades =1 if student needed to repeat two or more grades in elementary or secondary school 2.74 1.85  Military service =1 if respondent was drafted for compulsory military service 28.03 23.89 Parental characteristics (M: mother, F: father)  M: year of birth (FE) Year of birth of the respondent’s mother 1930 1932  M: migrational background =1 if mother was born abroad 5.47 4.85  M: at least inter. edu =1 if mother has at least an intermediate secondary school degree 17.97 5.95  M: vocational training =1 if mother’s highest degree is vocational training 20.86 16.18  M: further job qualification =1 if mother has further job qualification (e.g., Meister degree) 4.29 1.73  F: year of birth (FE) Year of birth of the respondent’s father 1927 1929  F: migrational background =1 if father was born abroad 6.36 5.54  F: at least inter. edu =1 if father has at least an intermediate secondary school degree 20.86 8.09  F: vocational training =1 if father’s highest degree is vocational training 19.12 21.99  F: further job qualification =1 if father has further job qualification (e.g., Meister degree) 11.46 6.76 Number of observations (PCS and MCS sample) 1,352 3,461 Notes: Own calculations based on NEPS-Starting Cohort 6 data. Definitions are taken from the data manual. Mean values refer to the MCS and PCS sample. FE = variable values are included as fixed effects in the analysis. View Large We find that college education reduces the probability of engaging in all four physically demanding activities. Again, the estimated effects are very large in size, implying that it is the college diploma that qualifies for a white-collar office-job position. These effects might explain why we find physical health effects of education and are in line with the absence of mental health effects. White-collar jobs are usually less demanding with respect to physical health but not at all less stressful. Besides physical activities on the job, health behaviors may be considered as an important dimension of the general formation of health over the life-cycle, see Cutler and Lleras-Muney (2010). To analyze this, we resort to the following variables in our data set: a binary indicator for obesity (body mass index exceeds 30) as a compound lifestyle measure and more direct behavioral variables like an indicator for smoking, the amount of alcohol consumption (1 if having at least three or more drinks when consuming alcohol), as well as physical activity measured by an indicator of having taken any sport exercise in the previous 3 months. The lower panel in Table 7 reports the sample means and treatment effects. College education leads to a decrease in the probability of being obese, but increases the probability of smoking. This is in line with LATE estimates of the effect of college education in the United States of Grimard and Parent (2007) and de Walque (2007). College education also seems to negatively affect alcohol consumption and increases the likelihood to engage in sport exercise. Again, the effect sizes are large, if not as large compared to the other potential mechanisms. Moreover, some of them are only marginally statistically significant. Taken together, college education affects potential health mechanisms in the expected direction. Again, there is effect heterogeneity, observable in different treatment parameters for the same outcome variables. Since health is a high dimensional measure, the potential mechanisms at hand are of course not able to explain the health returns to college education entirely. Nevertheless, the findings encourage us in our interpretation of the effects of college education on physical health. 7. Conclusion This paper uses the Marginal Treatment Effect framework introduced and advanced by Björklund and Moffitt (1987) and Heckman and Vytlacil (2005, 2007) to estimate returns to college education under essential heterogeneity. We use representative data from the German National Educational Panel Study (NEPS). Our outcome measures are wages, cognitive abilities, and health. Cognitive abilities are assessed using state-of-the-art cognitive competence tests on individual reading speed, text understanding, and mathematical literacy. As expected, all outcome variables are positively correlated with having a college degree in our data set. Using an instrument that exploit exogenous variation in the supply of colleges, we estimate marginal returns to college education. The main findings of this paper are as follows: College education improves average wages, cognitive abilities and physical health (but not mental health). There is heterogeneity in the effects and clear signs of selection into gains. Those individuals who realize the highest returns to education are those who are most ready to take it. Moreover, education does not pay off for everybody. Although it is never harmful, we find zero causal effects for around 30%–40% of the population. Thus, although college education is beneficial on average, further increasing the number of students—as sometimes called for—is less likely to pay off, as the current marginal students are those who are mostly in the range of zero causal effects. Potential mechanisms of skill returns are more demanding jobs that slow down the cognitive decline in later ages. Regarding health we find positive effects of higher education on BMI, non-smoking, sports participation and alcohol consumption. All in all, given that the average individual clearly seems to benefit from education and provided that the continuing technological progress has skills become more and more valuable, education should still be an answer to the technological change for the average individual. One limitation of this paper is that we are not able to stratify the analysis by study subject. This is left for future work. Appendix: Additional Figures and Tables Figure A.1. View largeDownload slide Spatial variation of colleges across districts and over time. Own illustration based on the German Statistical Yearbooks 1959–1991 (German Federal Statistical Office various issues, 1959–1991). The maps show all 326 West-German districts (Kreise, spatial units of 2009) but Berlin in the years 1958 (first year in the sample), 1970, 1980, and 1990 (last year in the sample). Districts usually cover a bigger city or some administratively connected villages. If a district has at least one college, the district is depicted in black. Only few districts have more than one college. For those districts the number of students is added up in the calculations but multiple colleges are not depicted separately in the maps. Figure A.1. View largeDownload slide Spatial variation of colleges across districts and over time. Own illustration based on the German Statistical Yearbooks 1959–1991 (German Federal Statistical Office various issues, 1959–1991). The maps show all 326 West-German districts (Kreise, spatial units of 2009) but Berlin in the years 1958 (first year in the sample), 1970, 1980, and 1990 (last year in the sample). Districts usually cover a bigger city or some administratively connected villages. If a district has at least one college, the district is depicted in black. Only few districts have more than one college. For those districts the number of students is added up in the calculations but multiple colleges are not depicted separately in the maps. Figure A.2. View largeDownload slide Distribution of dependent variables by college graduation. Own illustration based on NEPS-Starting Cohort 6 data. Figure A.2. View largeDownload slide Distribution of dependent variables by college graduation. Own illustration based on NEPS-Starting Cohort 6 data. Figure A.3. View largeDownload slide Sensitivity in marginal treatment effects when using only the sum of the kernel weighted college distances. Own illustration based on NEPS-Starting Cohort 6 data. For gross hourly wage, the log value is taken. Health and cognitive skill outcomes are standardized to mean 0 and standard deviation 1. The MTE (vertical axis) is measured in logs for wage and in units of standard deviations of the health and cognitive skill outcomes. The dashed lines give the 95% confidence intervals. Calculations based on a local linear regression where the influence of the control variables was isolated using a semiparametric Robinson estimator (Robinson 1988) for each outcome variable. Figure A.3. View largeDownload slide Sensitivity in marginal treatment effects when using only the sum of the kernel weighted college distances. Own illustration based on NEPS-Starting Cohort 6 data. For gross hourly wage, the log value is taken. Health and cognitive skill outcomes are standardized to mean 0 and standard deviation 1. The MTE (vertical axis) is measured in logs for wage and in units of standard deviations of the health and cognitive skill outcomes. The dashed lines give the 95% confidence intervals. Calculations based on a local linear regression where the influence of the control variables was isolated using a semiparametric Robinson estimator (Robinson 1988) for each outcome variable. The editor in charge of this paper was Claudio Michelacci. Acknowledgments We thank the editor and two anonymous referees for many helpful suggestions which improved the paper considerably. We are grateful to Pedro Carneiro, Arnaud Chevalier, Damon Clark, Eleonora Fichera, Martin Fischer, Hendrik Jürges and Corinna Kleinert for valuable comments and Claudia Fink for excellent research assistance. Furthermore, we would like to thank the participants of several conferences and seminars for helpful discussions. Access to Micro Census data at the GESIS-German Microdata Lab, Mannheim, is gratefully acknowledged. Financial support from the Deutsche Forschungsgemeinschaft (DFG, Grant number SCHM 3140/1-1) is gratefully acknowledged. Matthias Westphal is affiliated with and was also partly funded by the Ruhr Graduate School in Economics. Hendrik Schmitz and Matthias Westphal are furthermore affiliated with the Leibniz Science Campus Ruhr. This paper uses data from the National Educational Panel Study (NEPS): Starting Cohort Adults, 10.5157/NEPS:SC6:5.1.0. From 2008 to 2013, NEPS data was collected as part of the Framework Program for the Promotion of Empirical Educational Research funded by the German Federal Ministry of Education and Research (BMBF). As of 2014, NEPS is carried out by the Leibniz Institute for Educational Trajectories (LIfBi) at the University of Bamberg in cooperation with a nationwide network. Footnotes 1 The Economist, edition March 28th to April 3rd 2015. 2 Hansen et al. (2004) use a control function approach to adjust for education in the short-term development of cognitive abilities. Carneiro et al. (2001, 2003) analyze the short-term effects of college education. Glymour et al. (2008), Banks and Mazzonna (2012), Schneeweis et al. (2014), and Kamhöfer and Schmitz (2016) analyze the effects of secondary schooling on long-term cognitive skills. 3 See Section 4 for a detailed definition of cognitive abilities. We use the terms “cognitive abilities”, “cognitive skills”, and “skills” interchangeably. 4 We use the words university and college as synonyms to refer to German Universitäten and closely related institutions like institutes of technology (Technische Universitäten/Technische Hochschulen), an institutional type that combines features of colleges and universities applied science (Gesamthochschulen) and universities of the armed forces (Bundeswehruniversitäten/Bundeswehrhochschulen). 5 The working paper version Kamhöfer et al. (2015) also uses the introduction of a student loan program as further source exogenous variation. Using this instrument does not affect the findings at all but is not considered here for the sake of legibility of the paper. 6 All data are taken from the German Statistical Yearbooks, 1959–1991, see German Federal Statistical Office (various issues, 1959–1991). We only use colleges and no other higher educational institutes described in Section 2.1 (e.g., universities of applied science). Administrative data on openings and the number of students are not available for other institutions than colleges. However, since other higher educational institutions are small in size and highly specialized, they should be less relevant for the higher education decision and, thus, neglecting them should not affect the results. 7 Table 1 uses a different data source than the main analysis and the local level is slightly broader than districts, see the notes to the table. 8 Note that the general derivation does not require linear indices. However, it is standard to assume linearity when it comes to estimation. 9 By applying, for instance, the standard normal distribution to the left and the right of the equation: Z΄δ ≥ V ⇔ Φ(Z΄δ) ≥ Φ(V) ⇔ P(Z) ≥ UD, where P(Z) ≡ P(D = 1|Z) = Φ(Z΄δ). 10 In this model the exclusion restriction is implicit since Z has an effect on D* but not on Y1, Y0. Monotonicity is implied by the choice equation since D* monotonously either increases are decreases the higher the values of Z. 11 To make this explicit, all treatment parameters (TEj(x)) can be decomposed into a weight (hj(x, uD)) and the MTE: $$TE_j(x)=\int _{0}^{1} {\mathit {MTE}}(x,u_D)h_j(x,u_D)du_D$$. See, for example, Heckman and Vytlacil (2007) for the exact expressions of the weights for common parameters. 12 Essentially, this is equivalent to a simple 2SLS case. If one wants to identify observable effect heterogeneity (i.e., interact the treatment indicator with control variables in the regression model) the instrument needs to be independent unconditional of these controls. 13 On the other hand, estimating with heterogeneity in the observables can lead to an efficiency gain. 14 Semi-parametrically, the MTE can only be identified over the support of P. The greater the variation in Z (conditional on X) and, thus P(Z), the larger the range over which the MTE can be identified. This may be considered a drawback of the MTE approach, in particular, because treatment parameters that have weight unequal to zero outside the support of the propensity score are not identified using semi-parameteric techniques. This is sometimes called the “identification at infinity” requirement (see Heckman 1990) of the MTE. However, we argue that the MTE over the support of P is already very informative. We use semi-parametric estimates of the MTE and restrict the results to the empirical ATE or ATT that are identified for those individuals who are in the sample (see Basu et al. 2007). Alternatively one might use a flexible approximation of K(p) based on a polynomial of the propensity score as done by Basu et al. (2007). This amounts to estimating $$E(Y|X, p) = X^{\prime }\beta + (\alpha _1 -\alpha _0) \cdot p + \sum _{j=1}^k \phi _j p^j$$ by OLS and using the estimated coefficients to calculate $$\widehat{\,\it MTE\,}\!(x,p) = (\widehat{\alpha }_1 - \widehat{\alpha }_0) + \sum _{j=1}^k \widehat{\phi }_j j p^{j-1}$$. 15 The working paper version also considers health satisfaction with results very similar to PCS (Kamhöfer et al. 2015). 16 For a general overview over test designs and applications in the NEPS, see Weinert et al. (2011). 17 The test measures the “assessment of automatized reading processes”, where a “low degree of automation in decoding [...] will hinder the comprehension process”, that is, understanding of texts (Zimmermann et al. 2014, p. 1). The test was newly designed for NEPS but based on the well-established Salzburg reading screening test design principles (LIfBi 2011). 18 The total number of possible points exceeds 32 because some items were worth more than one point. 19 We assess the optimal bandwidth in the local linear regression using Stata’s lpoly rule of thumb. Our results are also robust to the inclusion of higher order polynomials in the local (polynomial) regression. The optimal, exact bandwidths are: wage 0.10, PCS 0.13, MCS 0.16, reading competence 0.10, for reading speed 0.11, math score 0.12. 20 The ATT would require for every college graduate in the population a non-graduate with the same propensity score (including 0%). For the ATU one would need the opposite: a graduate for every non-graduate with the same propensity score including 100%. 21 Also taking into account the outcomes from Section 6 and assuming that we test 18 times would increase the critical value to 2.98 in the (overly conservative) Bonferroni correction. References Acemoglu Daron , Johnson Simon ( 2007 ). “Disease and Development: The Effect of Life Expectancy on Economic Growth.” Journal of Political Economy , 115 , 925 – 985 . Google Scholar CrossRef Search ADS American Psychological Association ( 1995 ). “Intelligence: Knowns and Unknowns.” Report of a task force convened by the American Psychological Association. Andersen Hanfried H. , Mühlbacher Axel , Nübling Matthias , Schupp Jürgen , Wagner Gert G. ( 2007 ). “Computation of Standard Values for Physical and Mental Health Scale Scores Using the SOEP Version of SF-12v2.” Schmollers Jahrbuch: Journal of Applied Social Science Studies/Zeitschrift für Wirtschafts- und Sozialwissenschaften , 127 , 171 – 182 . Anderson John ( 2007 ). Cognitive Psychology and its Implications , 7 ed. Worth Publishers , New York . Banks James , Mazzonna Fabrizio ( 2012 ). “The Effect of Education on Old Age Cognitive Abilities: Evidence from a Regression Discontinuity Design.” The Economic Journal , 122 , 418 – 448 . Barrow Lisa , Malamud Ofer ( 2015 ). “Is College a Worthwhile Investment?” Annual Review of Economics , 7 , 519 – 555 . Google Scholar CrossRef Search ADS Bartz Olaf ( 2007 ). “Expansion und Umbau–Hochschulreformen in der Bundesrepublik Deutschland zwischen 1964 und 1977.” Die Hochschule , 2007 , 154 – 170 . Basu Anirban ( 2011 ). “Estimating Decision-Relevant Comparative Effects using Instrumental Variables.” Statistics in Biosciences , 3 , 6 – 27 . Google Scholar CrossRef Search ADS PubMed Basu Anirban ( 2014 ). “Person-Centered Treatment (PeT) Effects using Instrumental Variables: An Application to Evaluating Prostate Cancer Treatments.” Journal of Applied Econometrics , 29 , 671 – 691 . Google Scholar CrossRef Search ADS PubMed Basu Anirban , Heckman James J. , Navarro-Lozano Salvador , Urzua Sergio ( 2007 ). “Use of Instrumental Variables in the Presence of Heterogeneity and Self-selection: An Application to Treatments of Breast Cancer Patients.” Health Economics , 16 , 1133 – 1157 . Google Scholar CrossRef Search ADS PubMed Björklund Anders , Moffitt Robert ( 1987 ). “The Estimation of Wage Gains and Welfare Gains in Self-Selection.” The Review of Economics and Statistics , 69 , 42 – 49 . Google Scholar CrossRef Search ADS Blossfeld H.-P. , Roßbach H.-G. , Maurice J. von ( 2011 ). “Education as a Lifelong Process—The German National Educational Panel Study (NEPS).” Zeitschrift für Erziehungswissenschaft , 14 [Special Issue 14-2011] . Brinch Christian N. , Mogstad Magne , Wiswall Matthew ( 2017 ). “Beyond LATE with a Discrete Instrument.” Journal of Political Economy , 125 , 985 – 1039 . Google Scholar CrossRef Search ADS Card David ( 1995 ). “Using Geographic Variation in College Proximity to Estimate the Return to Schooling.” In Aspects of Labour Market Behaviour: Essays in Honour of John Vanderkamp , edited by Grant K. , Christofides L. , Swidinsky R. . University of Toronto Press , pp. 201 – 222 . Card David ( 2001 ). “Estimating the Return to Schooling: Progress on Some Persistent Econometric Problems.” Econometrica , 69 , 1127 – 1160 . Google Scholar CrossRef Search ADS Carneiro Pedro , Hansen Karsten T. , Heckman James J. ( 2003 ). “2001 Lawrence R. Klein Lecture: Estimating Distributions of Treatment Effects with an Application to the Returns to Schooling and Measurement of the Effects of Uncertainty on College Choice.” International Economic Review , 44 ( 2 ), 361 – 422 . Google Scholar CrossRef Search ADS Carneiro Pedro , Hansen Karsten T. , Heckman James J. ( 2001 ). “Removing the Veil of Ignorance in Assessing the Distributional Impacts of Social Policies.” Swedish Economic Policy Review , 8 , 273 – 301 . Carneiro Pedro , Heckman James J. , Vytlacil Edward J. ( 2010 ). “Evaluating Marginal Policy Changes and the Average Effect of Treatment for Individuals at the Margin.” Econometrica , 78 , 377 – 394 . Google Scholar CrossRef Search ADS PubMed Carneiro Pedro , Heckman James J. , Vytlacil Edward J. ( 2011 ). “Estimating Marginal Returns to Education.” American Economic Review , 101 ( 6 ), 2754 – 2781 . Google Scholar CrossRef Search ADS PubMed Cawley John , Heckman James J. , Vytlacil Edward J. ( 2001 ). “Three Observations on Wages and Measured Cognitive Ability.” Labour Economics , 8 , 419 – 442 . Google Scholar CrossRef Search ADS Cervellati Matteo , Sunde Uwe ( 2005 ). “Human Capital Formation, Life Expectancy, and the Process of Development.” American Economic Review , 95 ( 5 ), 1653 – 1672 . Google Scholar CrossRef Search ADS PubMed Clark Damon , Martorell Paco ( 2014 ). “The Signaling Value of a High School Diploma.” Journal of Political Economy , 122 , 282 – 318 . Google Scholar CrossRef Search ADS Cornelissen Thomas , Dustmann Christian , Raute Anna , Schönberg Uta ( forthcoming ). “Who Benefits from Universal Childcare? Estimating Marginal Returns to Early Childcare Attendance.” Journal of Political Economy . Costa Dora L. ( 2015 ). “Health and the Economy in the United States from 1750 to the Present.” Journal of Economic Literature , 53 , 503 – 570 . Google Scholar CrossRef Search ADS PubMed Cunha F. , Heckman J. J. , Lochner L. J. , Masterov D. V. ( 2006 ). “Interpreting the Evidence on Life Cycle Skill Formation.” In Handbook of the Economics of Education , Vol. 1 , edited by Hanushek E. A. , Welch F. . North-Holland . Cunha Flavio , Heckman James J. ( 2007 ). “The Technology of Skill Formation.” American Economic Review , 97 ( 2 ), 31 – 47 . Google Scholar CrossRef Search ADS Currie Janet , Moretti Enrico ( 2003 ). “Mother’s Education and The Intergenerational Transmission of Human Capital: Evidence From College Openings.” The Quarterly Journal of Economics , 118 , 1495 – 1532 . Google Scholar CrossRef Search ADS Cutler David M. , Lleras-Muney Adriana ( 2010 ). “Understanding Differences in Health Behaviors by Education.” Journal of Health Economics , 29 , 1 – 28 . Google Scholar CrossRef Search ADS PubMed de Walque Damien ( 2007 ). “Does Education Affect Smoking Behaviors?: Evidence using the Vietnam Draft as an Instrument for College Education.” Journal of Health Economics , 26 , 877 – 895 . Google Scholar CrossRef Search ADS PubMed Fisher Gwenith , Stachowski Alicia , Infurna Frank , Faul Jessica , Grosch James , Tetrick Lois ( 2014 ). “Mental Work Demands, Retirement, and Longitudinal Trajectories of Cognitive Functioning.” Journal of Occupational Health Psychology , 19 , 231 – 242 . Google Scholar CrossRef Search ADS PubMed German Federal Statistical Office (various issues , 1959–1991 ). “Statistisches Jahrbuch für die Bundesrepublik Deutschland.” Tech. rep. , German Federal Statistical Office (Statistisches Bundesamt) , Wiesbaden . Glymour M. , Kawachi I. , Jencks C. , Berkman L. ( 2008 ). “Does Childhood Schooling Affect Old Age Memory or Mental Status? Using State Schooling Laws as Natural Experiments.” Journal of Epidemiology and Community Health , 62 , 532 – 537 . Google Scholar CrossRef Search ADS PubMed Grimard Franque , Parent Daniel ( 2007 ). “Education and Smoking: Were Vietnam War Draft Avoiders also more Likely to Avoid Smoking?” Journal of Health Economics , 26 , 896 – 926 . Google Scholar CrossRef Search ADS PubMed Hansen Karsten T. , Heckman James J. , Mullen Kathleen J. ( 2004 ). “The Effect of Schooling and Ability on Achievement Test Scores.” Journal of Econometrics , 121 , 39 – 98 . Google Scholar CrossRef Search ADS Heckman J. J. , Lochner L. J. , Todd P. E. ( 1999 ). “Earnings Equations and Rates of Return: The Mincer Equation and Beyond.” In Handbook of the Economics of Education , Vol. 1 , edited by Hanushek E. , Welch F. . Elsevier . Heckman James J. ( 1990 ). “Varieties of Selection Bias.” American Economic Review , 80 ( 2 ), 313 – 318 . Heckman James J. , Pinto Rodrigo , Savelyev Peter ( 2013 ). “Understanding the Mechanisms through Which an Influential Early Childhood Program Boosted Adult Outcomes.” American Economic Review , 103 ( 6 ), 2052 – 2086 . Google Scholar CrossRef Search ADS PubMed Heckman James J. , Urzua Sergio , Vytlacil Edward J. ( 2006 ). “Understanding Instrumental Variables in Models with Essential Heterogeneity.” The Review of Economics and Statistics , 88 , 389 – 432 . Google Scholar CrossRef Search ADS Heckman James J. , Vytlacil Edward J. ( 2005 ). “Structural Equations, Treatment Effects, and Econometric Policy Evaluation.” Econometrica , 73 , 669 – 738 . Google Scholar CrossRef Search ADS Heckman James J. , Vytlacil Edward J. ( 2007 ). “Econometric Evaluation of Social Programs, Part II: Using the Marginal Treatment Effect to Organize Alternative Econometric Estimators to Evaluate Social Programs, and to Forecast their Effects in New.” In Handbook of Econometrics , Vol. 6 , edited by Heckman J. J. , Leamer E. E. . Elsevier . Google Scholar CrossRef Search ADS Jürges Hendrik , Reinhold Steffen , Salm Martin ( 2011 ). “Does Schooling Affect Health Behavior? Evidence from the Educational Expansion in Western Germany.” Economics of Education Review , 30 , 862 – 872 . Google Scholar CrossRef Search ADS Kamhöfer Daniel , Schmitz Hendrik ( 2016 ). “Reanalyzing Zero Returns to Education in Germany.” Journal of Applied Econometrics , 31 , 912 – 919 . Google Scholar CrossRef Search ADS Kamhöfer Daniel , Schmitz Hendrik , Westphal Matthias ( 2015 ). “Heterogeneity in Marginal Non-monetary Returns to Higher Education.” Tech. rep. , Ruhr Economic Papers , RWI Essen , No. 591 . Google Scholar CrossRef Search ADS Lang Frieder , Weiss David , Stocker Andreas , Rosenbladt Bernhard von ( 2007 ). “The Returns to Cognitive Abilities and Personality Traits in Germany.” Schmollers Jahrbuch: Journal of Applied Social Science Studies/Zeitschrift für Wirtschafts- und Sozialwissenschaften , 127 , 183 – 192 . Lengerer Andrea , Schroedter Julia , Boehle Mara , Hubert Tobias , Wolf Christof ( 2008 ). “Harmonisierung der Mikrozensen 1962 bis 2005.” GESIS-Methodenbericht 12/2008 . GESIS–Leibniz Institute for the Social Sciences , German Microdata Lab, Mannheim . LIfBi ( 2011 ). “Starting Cohort 6 Main Study 2010/11 (B67) Adults Information on the Competence Test.” Tech. rep. , Leibniz Institute for Educational Trajectories (LIfBi) – National Educational Panel Study . LIfBi ( 2015 ). “Startkohorte 6: Erwachsene (SC6) – Studienübersicht Wellen 1 bis 5.” Tech. rep. , Leibniz Institute for Educational Trajectories (LIfBi) – National Educational Panel Study . Mazumder Bhashkar ( 2008 ). “Does Education Improve Health? A Reexamination of the Evidence from Compulsory Schooling Laws.” Economic Perspectives , 32 , 2 – 16 . Meng Xiangfei , D’Arcy Carl ( 2012 ). “Education and Dementia in the Context of the Cognitive Reserve Hypothesis: A Systematic Review with Meta-Analyses and Qualitative Analyses.” PLoS ONE , 7 , e38268 . Google Scholar CrossRef Search ADS PubMed Nybom Martin ( 2017 ). “The Distribution of Lifetime Earnings Returns to College.” Journal of Labor Economics , 35 , 903 – 952 . Google Scholar CrossRef Search ADS OECD ( 2015a ). “Education Policy Outlook 2015: Germany.” Report, Organisation for Economic Co-operation and Development (OECD) . OECD ( 2015b ). “Education Policy Outlook 2015: Making Reforms Happen.” Report, Organisation for Economic Co-operation and Development (OECD) . Oreopoulos Philip , Petronijevic Uros ( 2013 ). “Making College Worth It: A Review of the Returns to Higher Education.” The Future of Children , 23 , 41 – 65 . Google Scholar CrossRef Search ADS PubMed Oreopoulos Philip , Salvanes Kjell ( 2011 ). “Priceless: The Nonpecuniary Benefits of Schooling.” Journal of Economic Perspectives , 25 ( 3 ), 159 – 184 . Google Scholar CrossRef Search ADS Picht Georg ( 1964 ). Die deutsche Bildungskatastrophe: Analyse und Dokumentation . Walter Verlag . Pischke Jörn-Steffen , Wachter Till von ( 2008 ). “Zero Returns to Compulsory Schooling in Germany: Evidence and Interpretation.” The Review of Economics and Statistics , 90 , 592 – 598 . Google Scholar CrossRef Search ADS Robinson Peter M ( 1988 ). “Root-N-Consistent Semiparametric Regression.” Econometrica , 56 , 931 – 954 . Google Scholar CrossRef Search ADS Rohwedder Susann , Willis Robert J. ( 2010 ). “Mental Retirement.” Journal of Economic Perspectives , 24 , 119 – 138 . Google Scholar CrossRef Search ADS PubMed Salthouse Timothy A. ( 2006 ). “Mental Exercise and Mental Aging: Evaluating the Validity of the “Use It or Lose It” Hypothesis.” Perspectives on Psychological Science , 1 , 68 – 87 . Google Scholar CrossRef Search ADS PubMed Schneeweis Nicole , Skirbekk Vegard , Winter-Ebmer Rudolf ( 2014 ). “Does Education Improve Cognitive Performance Four Decades After School Completion?” Demography , 51 , 619 – 643 . Google Scholar CrossRef Search ADS PubMed Stephens Melvin Jr , Yang Dou-Yan ( 2014 ). “Compulsory Education and the Benefits of Schooling.” American Economic Review , 104 ( 6 ), 1777 – 1792 . Google Scholar CrossRef Search ADS Stern Yaakov ( 2012 ). “Cognitive Reserve in Ageing and Alzheimer’s Disease.” The Lancet Neurology , 11 , 1006 – 1012 . Google Scholar CrossRef Search ADS PubMed Stern Yaakov , Albert Steven , Tang Ming-Xin , Tsai Wei-Yen ( 1999 ). “Rate of Memory Decline in AD is Related to Education and Occupation: Cognitive Reserve?” Neurology , 53 , 1942 – 1947 . Google Scholar CrossRef Search ADS PubMed Vytlacil Edward ( 2002 ). “Independence, Monotonicity, and Latent Index Models: An Equivalence Result.” Econometrica , 70 , 331 – 341 . Google Scholar CrossRef Search ADS Weinert S. , Artelt C. , Prenzel M. , Senkbeil M. , Ehmke T. , Carstensen C. ( 2011 ). “Development of Competencies across the Life Span.” Zeitschrift für Erziehungswissenschaft , 14 , 67 – 86 . Google Scholar CrossRef Search ADS Weisser Ansgar ( 2005 ). “18. Juli 1961 – Entscheidung zur Gründung der Ruhr-Universität Bochum.” Tech. rep. , Internet-Portal Westfälische Geschichte , http://www.westfaelische-geschichte.de/web495 . Zimmermann Stefan , Artelt Cordula , Weinert Sabine ( 2014 ). “The Assessment of Reading Speed in Adults and First-Year Students.” Tech. rep. , Leibniz Institute for Educational Trajectories (LIfBi) – National Educational Panel Study . Supplementary Data Supplementary data are available at JEEA online. © The Author(s) 2018. Published by Oxford University Press on behalf of European Economic Association. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

### Journal

Journal of the European Economic AssociationOxford University Press

Published: Feb 2, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create lists to

Export lists, citations