The thief’s wages: theft and human capital development

The thief’s wages: theft and human capital development Abstract In this paper, a model is developed to investigate whether theft can be economically rational. It is shown that heterogeneity in capital accumulation rates (or ‘learning ability’) cannot create any noticeable difference in incentives to steal. Further heterogeneity in instantaneous opportunity cost is both too low and runs in the wrong direction to have any explanatory role. However, heterogeneity in discount rates in combination with differences in initial human capital can create an incentive for theft. The model is calibrated from the National Longitudinal Study of Youth 1997 with data from 1997 to 2011. 1. Introduction The rational agent model can explain variation in criminal activity across individuals with two different types of heterogeneity. One potential explanation is that criminals face objectively different costs and benefits from crime than non-criminals. For example, criminals may systematically lack access to employment opportunities, and turn to crime as a partial or total substitute for legitimate earnings. The other general explanation may be that criminals have systematically different preference structures than non-criminals. In particular, it may be the case that criminals are more present-oriented and less patient than non-criminals. In this paper I develop a model that integrates human capital development and criminal behaviour choices as a way to compare the predictions of these two types of heterogeneity. I look at theft only (excluding violent crime or drug dealing) in an effort to focus on economic trade-offs across time. Working with three parameters that could vary human capital production and earnings—initial human capital, human capital growth, and time preference—I analyse how variation might influence an individual’s incentives to commit theft. Using parameterizations consistent with previous literature and the National Longitudinal Study of Youth (NLSY) data, I use the model to understand how these parameters affect hourly earnings (a measure of instantaneous opportunity cost) as well as discounted present value of future earnings. The model builds on previous work in this area, but differs in several significant ways. As with Lochner (2004), I add criminal behaviour choices to an adjusted Ben-Porath model that looks at trade-offs between training and schooling on one side against work on the other. In contrast, I focus on only the crime of theft and simplify the model. This allows me to make precise estimates of how thieves differ from non-thieves in several core human capital parameters, something not done in previous literature on crime and human capital development. More generally, the exercise illuminates how different traits (reflected in the parameters) interact to influence theft within a rational agent model. The first specific result, coming from a simple analysis of the model, is that differences in human capital growth, what we might think of as ‘potential’, fail to create any difference in incentives for individuals in their early adulthood, the period of peak property crime offences. Individuals with lower capacity for human capital development, even though they will have significantly lower future income, are not predicted to face lower present value of future income and in fact are predicted to have higher hourly earnings in early life. Moving beyond this theoretical analysis, I then estimate the parameters of the model using generalized method of moments (GMM), using wages and annual income of male respondents in the NLSY 1997 cohort from 1997 to 2011. Across a range of specifications, estimated initial human capital for thieves is roughly $0.60–0.67/hour less than non-thieves (honest individuals). For annual income, the estimated difference is $2,200–2,450/year. Similarly, the estimated discount rates for thieves are about 0.3–0.4% lower than those of non-thieves. A third result comes from using the model with the data of the first several rounds of the NLSY, during the period of peak property crime activity: wages, income and employment levels are the same if not higher for thieves than for non-thieves (honest individuals). This clearly is significant evidence against explanations for theft that assume lower instantaneous opportunity cost, or arguing that theft is a substitute for legitimate labour market opportunities. The difference in earnings and employment shows up in the mid- to late-twenties, after virtually all self-reported thieves have stopped committing property crime (the empirical observation, without theoretical analysis, can also be found in Williams, 2015). The combination of low initial human capital and high impatience suggested by wage development is also consistent with other measures in the data. Thieves have lower average standardized test scores and worse average mental health than non-thieves (honest individuals). They show less stability over jobs (on average having more employers) and are more likely to repeat grades, even when controlling for a range of measures of socioeconomic status and ability. Beyond the literature on theft and crime, the analysis here has a broader utility, in suggesting that individuals with significantly limited human capital may have equal or better labour market outcomes in their early careers. While thieves show no difference in wages and hours work at age 20, a range of parameterizations suggest that by age 40 there could be a gap of 20% or more in income. Analysis of labour market outcomes that focuses on individuals in their twenties (for example, Angrist et al., 2010) may be missing significant long-term differences. The paper proceeds as follows. In Section 2 I review the literature on earnings from crime and the honest earnings of criminals, as well as the literature on human capital. In Section 3 I develop and solve a model that links the decision to steal with the decision to invest in human capital. I then look at what the model predicts across a range of parameterizations. In Section 4 I review the data from the NLSY. In Section 5 I estimate the model’s parameters using GMM based on data from the NLSY. In Section 6 I summarize the results and discuss implications. Appendices A through C outline various technical issues. 2. Literature 2.1 Incentives for crime There has been substantial debate for decades as to how to use the rational agent model to understand criminal behaviour. Some economists have focused on crime as a substitute for legal work, beginning with Becker (1968) and Ehrlich (1973). The labour economist Richard Freeman is perhaps the researcher of the past few decades who has done the most to develop this idea empirically (Freeman, 1991, 1999). A range of researcher have worked with individual-level data on actual wages and criminal behaviour, and have found some evidence in support of the idea (Grogger, 1998; Lochner, 2004). A number of papers work at the aggregate level, trying to find links between unemployment and crime, and have found a limited link (Gould et al., 2002). Helpful literature reviews include Piehl (1998) and Lin (2008). Criminologists have been more skeptical of the idea of crime as a labour substitute, and a number of researchers in that field find little or no evidence. As Wilson and Abrahamse (1992) argue, while a single crime can be a rational choice, given a low probability of capture (‘[t]he wonder is that more people don’t steal’) a criminal career seems not to pay off at all. One of the dominant theories of crime, outlined in Gottfredson and Hirschi (1990), can be summarized as arguing that criminals simply have less self-control. Even a number of economists have found evidence directly contradicting the idea of crime as a substitute for labour. Levitt and Venkatesh (2000) examine the revenues and wages provided by drug-dealing in a gang in Chicago and conclude ‘it is difficult (but not impossible) to reconcile the behaviour of the gang members with an optimizing economic model without assuming nonstandard preferences or bringing in social/nonpecuniary benefits of gang participation’. Lee and McCrary (2005) look at the deterrent power of prison sentences and conclude ‘criminal behavior – at least for the kinds of crimes that we focus on – could be thought of as the consequence of a self-control problem and a taste for immediate gratification’. Many researchers have found that criminals actually have slightly higher wages than non-criminals, especially in early periods. Specifically, Nagin and Waldfogel (1995) find that conviction leads to greater instability but also higher pay. They reference West and Farrington (1977) who noted a: … tendency for delinquents and especially recidivists to take up laboring or unskilled occupations offering relatively high rates of pay for beginners, but with relatively few prospects of long-term advancement. Non-delinquents were more likely to defer immediate material rewards for the sake of obtaining apprenticeships or training for skilled work.Paternoster et al. (2003) provides an excellent review of the literature on the link between early work and antisocial behaviour. They establish a strong correlation between intensive work in adolescence and theft (and other behaviours), but show that unobserved heterogeneity appears to be the cause (see also Holzman, 1982; Brame et al., 2004; Apel et al., 2008, Table 4). All in all, the evidence is contradictory. While certain crimes, in certain places, may make sense as a source of income (drug-dealing, in particular), there is no clear-cut case that crime ‘pays’, particularly for theft. Even as there is debate about how much criminals earn, there also debate about the long-term costs of a criminal record. Freeman (1991) finds that imprisonment reduces weeks worked from somewhere between eight to 20 weeks per year, and reduces the probability of work by 15% to 30%. Grogger (1995) and Nagin and Waldfogel (1995) (cited above) are much more cautious; both find limited effects on wages and employment (in some cases positive). An important stylized fact is that criminal activity is particularly intense during adolescence, and individuals are actively engaged in property crime for extremely short periods, usually less than a year (Williams, 2015). This adolescent aspect of crime, and particularly property crime, has been carefully documented but there is no consensus explanation as to why it occurs (Levitt, 1997; Grogger, 1998; Lee and McCrary, 2005; Levitt and Lochner, 2001). Overall, the rational agent model has shown significant value in understanding criminal behaviour, but there are important gaps in our knowledge. 2.2 Human capital and crime A number of researchers have developed models looking at different types of human capital and the interaction of education, work and criminal behaviour, for example Merlo and Wolpin (2009).1 The paper that most resembles this one is Lochner (2004), in that both use a human capital model based on the Ben-Porath model and then test this model against NLSY data. It is worth discussing the similarities and differences in detail. Both papers work with the Ben-Porath model of human capital development, and incorporate choices to commit criminal activity. The key differences stem from a difference in approach: this paper uses a simpler human capital model, focuses purely on theft, and takes advantage of the additional details provided by the NLSY 1997 in both modelling and estimating the logic of theft choices. The simplified model of human capital development satisfies both Kuhn-Tucker necessity and Kuhn-Tucker sufficiency allowing me to identify a single optimum. This allows me to estimate the differences in average values of key parameters between thieves and non-thieves, something not done in Lochner (2004). The choice to focus on theft only, to the exclusion of other crimes, is based on the very different patterns that can be seen in violent crime, property crime and drug dealing. All three peak at around 20 years of age, but property crime peaks earlier and drops off very rapidly while the other two fall much more slowly. Moreover, most perpetrators of violent crime do it very rarely, while during periods of activity thieves and drug dealers average multiple offences in a year. Drug dealing and theft offer direct financial rewards, while violent crime does not, and both theft and violent crime are directly predatory which is not true of drug dealing. The choice to focus on theft increases my ability to make precise estimates without having to make additional strong assumptions. Additionally, I use the NLSY 1997 which provides more detailed data as opposed to the NLSY 1979 that Lochner uses. Specifically, the NLSY 1997 asks about property crime every year, while the 1979 only asks in a single year. From this, it becomes clear that virtually all property criminals in the NLSY 1997 are active in only one or two years, and show very low earnings. Moreover, punishment and imprisonment rates are low for thieves at about 20%. The data informs a number of differences in modelling and estimation. To match the data, I model theft as a ‘negative lottery’ that has a strong stochastic element. Because of the low rates of punishment and short periods of imprisonment, I model expected punishment as a potential disutility with a single scalar value, while Lochner focuses on loss of consumption due to time in prison. Because within the NLSY virtually no respondents report theft after the age of 21, I focus on the decision to steal in adolescence/early adulthood. In contrast, Lochner (2004) develops a more general understanding of the link between all criminal behaviour and human capital development. He does not focus on a single optimum and views all crime (property crime, violent crime, and drug dealing) as providing fungible utility. Instead of estimating specific values he generates important but broad insights about trade-offs in the timing of criminal behaviour, work, and human capital investment. Our conclusions are similar, in that we both place significant emphasis on differences in initial human capital levels. However, the combination of more comprehensive data and my modelling choices allows me to make several contributions not seen in Lochner (2004) or elsewhere in the literature. First, I show that heterogeneity in capital accumulation rates (or ‘learning ability’) cannot create any noticeable difference in incentives to steal. Second, I show that heterogeneity in instantaneous opportunity cost is both too low and runs the wrong direction to have any explanatory role. Thirdly, I establish that heterogeneity in discount rates works in combination with differences in initial human capital to create the incentive to theft. Finally, I estimate average values for differences in key parameters between thieves and non-thieves (‘honest’ types) in the NLSY 1997 data. 3. Model The model focuses on understanding the decision to steal in the context of overall career choices and lifetime income. Thus, one component of the model is a human capital investment problem. The other component of the model is a decision to steal or not to steal. The decision to steal is modelled in a way that might be described as a ‘negative lottery’, where theft is largely about the trade-off between immediate returns and long-term risks.2 Partly because it fits the data well, and partly for reasons of mathematical simplicity, the human capital investment decision and the decision to steal are not incorporated in a single equation, but instead are kept separate. Moreover, the individual’s decisions in these two areas are made at separate points in time, interactions are limited and the effects tractable. Each agent lives for T periods. At the beginning of each period (indexed 1 to T) agents set their human capital investment levels, based on their parameters (human capital level, discount rate, human capital production parameters). During the first period they have the opportunity to steal; they are exposed to a criminal opportunity where for each they can make the binary choice to take the opportunity or to pass on it. If they choose to steal, there is a risk of punishment by the end of that period. The punishment inflicts direct disutility as well as a negative shock to human capital levels, requiring revised production choices. If captured and punished, an agent’s human capital is ‘tarnished’ by the end of period 1. Human capital investment and earnings continue until the last period T. Table 1 briefly summarizes the process. Table 1. Model timeline Period t  =    1    2    …  T  Decisions:    Choose  Steal/Don’t Steal  Choose    Choose          investment    investment    investment          for period 1    for period 2    for period 3      Payouts:      All theft payouts                  and punishments                  Earn wage for    Earn wage for      Earn wage for        period 1    period 2      period T  Period t  =    1    2    …  T  Decisions:    Choose  Steal/Don’t Steal  Choose    Choose          investment    investment    investment          for period 1    for period 2    for period 3      Payouts:      All theft payouts                  and punishments                  Earn wage for    Earn wage for      Earn wage for        period 1    period 2      period T  Source: This table outlines the model discussed in the text. The decision to allow theft in only a single period fits the NLSY data very closely—as discussed above, virtually no respondents report theft after the age of 21—as well as the literature on patterns of theft.3 3.1 Human capital component There is a population of N agents, indexed i∈{1,…,N} who are maximizing utility over T periods. At the outset of each period t each agent chooses what percentage of their human capital, K, to hold back from the market as ‘investment’, which I denote with the variable xit∈[0,1]. The value of xit ranges from 1 for a full time student with no job to 0 for a full-time worker at a job receiving no training, and would be somewhere in the range (0,1) if i is receiving some wage compensation but is also receiving some training, formal or informal. The value of future earnings is discounted at the rate β4 and the investment grows according to the expression d0(xitKit)d1, where β,d0,d1 vary across individuals. The problem in period 1 is thus formally:   max⁡{xit}1T ∑t=1TE1(βit−1(1−xit)Kit)s.t. Kit+1=d0i(xitKit)d1i+KitKi1=K^i (1) That is, the agent i needs to find an optimal sequence of investment decisions {xit}1T given the parameter values and his initial level of human capital K^i. Four brief points should be made about the model at this juncture. First, this is a simplified variant of the Ben-Porath human capital model, which in turn is a variant of the standard dynamic optimization capital accumulation model, with an added wrinkle that the capital is not consumed. Second, as with Ben-Porath (1967) and Lochner (2004), this model views all human capital investment as a fungible component of xit; an hour of schooling and an hour of job training look exactly the same. Even jobs that pay a salary, where there is no explicit, separate training but only a general, ongoing on-the-job training, are modelled as an increase in xit that leads to a partial reduction in salary in exchange for a chance to learn. Third, the model does not include a depreciation parameter; this decision is made for simplicity, and because for the population we will be focusing on, youths age 17 to 31, all evidence suggests that human capital growth swamps human capital decay and it is difficult to separate any decline in human capital from a general failure to invest in human capital. Fourth, there is individual heterogeneity in human capital and time preference parameters. We can think of each of these parameters, for each individual, as a draw from a continuous distribution. At time t>1, in response to any stochastic shocks (to be discussed below), agent i can reassess his level of human capital Kit and reset his planned sequence of investment {xit}tT. In the context of this paper, the only stochastic shock possible would be ‘tarnishing’ due to punishment by the end of period 1. 3.2 Criminal opportunities component The second component of the model involves criminal activity. There is a single point in time when individuals have the opportunity to steal, during period 1 (after the investment decisions of period 1, but before reinvesting for period 2), the agent is offered one opportunity to steal. For the remainder of the agent’s life, periods 2 through T, there will be no further theft opportunities, and no need for any change of behaviour. The opportunity can be thought of as a ‘negative lottery’. A standard lottery asks you to pay out a sum of money in exchange for the possibility of winning a positive sum in the future; in contrast, the theft opportunity offers the agent a positive sum of money in exchange for the possibility of a future loss. The cash value of the opportunity is denoted ri and has a random distribution f(ri;θ) where f() has support (0,∞) and is decreasing in ri. To simplify matters, I assume that when the opportunity is presented the agent can perfectly perceive ri and has no risk of being stopped or captured while committing the theft, only after consuming the value ri. His decision to steal or not is a very simple cost/benefit analysis of the two paths. If he does not take opportunity ri, his expected payoff at the beginning of period 2 is simply the future stream of revenue from human capital investment and ‘rental’ (as outlined in eq. 1):   ∑t=2TE1(βit−1(1−xit)Kit) (2) If he does take the opportunity ri, his expected payoff is:   ri+(1−p)(∑t=2Tβit−1(1−xit)Kit) +p(∑t=2Tβit−1(1−x(τ)it)K(τ)it)+βiδpD+γ(1−xi1)K^i (3) where: p is the probability of being caught and punished by the criminal justice system. For simplicity, I do not try to separate out the impact of being arrested vs being charged, being charged vs being convicted, being convicted vs being sentenced, etc. D<0 is the disutility from any detection, capture or punishment (including any social stigma from an arrest record, a conviction record, etc.). γ(1−xi0)K^i<0 is the opportunity cost of the crime, with |γ| the amount of time the crime takes (as a fraction of the five-year period of work). (τ) ( τ for ‘tarnishing’) marks the shift in his human capital Kit and his human capital investment strategy xit if he is caught and develops a criminal justice record that limits employer’s willingness to hire and pay him. Thus, τ∈(0,1) and K(τ)it=(1−τ)Kit δ is the length of time, less than a year, between the commission of the crime, and the beginning of punishment, so δ∈(0,1).A few other modelling decisions are worth brief comments: Theft opportunities are modelled as a random process, where the distribution is the same across all individuals. The logic behind this is that per act earnings from theft in the NLSY 1997 follow a roughly lognormal distribution. Average earnings per theft are low, and negatively correlated with number of acts (i.e. people who steal frequently tend to make less than people who steal rarely). There does not seem to be any stability in per-act earnings from year to year, and correlations in earnings from theft are low between siblings (for a fuller discussion of the patterns in the NLSY 1997 data, see Williams, 2015). The NLSY patterns are extremely consistent with the bulk of research into earnings from crime, whether data is from criminal self-reports, police reports or victimization surveys (see Gottfredson and Hirschi, 1990, part I, chapter 2, or reports such as Catalano, 2010 or Doyle, 2012). While it is difficult to prove a negative, it does not appear to be any meaningful variation of ‘theft ability’ in populations the size of the NLSY panel. Given that individual human capital and time preference parameters vary randomly as do theft opportunities, we can thus view the variation in theft opportunities as providing a stochastic error term. Assuming that theft opportunities have the same mean across all individuals, we can thus look for different means in the human capital and time preference parameters for the explanation of why some individuals steal and others do not. As will be discussed further below, this matches the data well; while individuals with lower human capital values (either in standardized test scores or mental health) or with lower family resources (measured by income of family of origin) are statistically significantly more likely to commit theft, the difference is on the order of 10–20 percentage points. Males with particularly high human capital or family resources have at least a 10% probability of committing theft, while males with particularly low values have at worst a 31% probability of committing theft. Finally, one could potentially develop a process by which expected risk of punishment is updated over time, or by which an initial tarnishing (from a criminal record) can be worsened by a second offence. A dynamic model, where the potential to commit a crime can be made in many or even all periods, may make sense for the very small group of individuals who are ‘hard cases’ and commit theft into their thirties or forties. However, theft behaviour is generally very short-lived for the vast majority of thieves (roughly 99% of all theft occurs by the age of 21), and so a more complicated dynamic model is unlikely to yield significantly different results in the case of the NLSY. It would be an interesting extension of the model, to be estimated with a dataset that included more ‘professional’ thieves.5 3.3 Solving the model We begin by focusing on the decision to steal during period 1, as outlined in eqs 2 and 3 above, and then work backward to solve the human capital optimization problem of eq. 1 3.3.1 Criminal opportunities It is advantageous to put ri on one side, as the direct benefit of stealing, and put all the other terms on the other side, simplifying a bit, to measure the cost of stealing. We then get an indicator function c that becomes the optimal policy in response to opportunity ri:   c={1ifri≥p∑t=2Tβit−1((1−xit)Kit−(1−x(τ)it)K(τ)it)+βiδpD+γ(1−xi1)K^i0else (4) As constructed, the optimal choice in response to every criminal opportunity ri is to ‘play’ c; that is, to steal if and only if the return is greater than the future cost. The probability of becoming a thief is decreasing in all variables that drive the right-hand side costs. In the empirical work, we can therefore focus on understanding how differences in K,β and d0,d1 lead to differences in the following three costs: Lost earnings from developing a criminal record, the difference between earnings from ‘pure’ and ‘tarnished’ human capital:   ∑t=2Tβit−1((1−xit)Kit−(1−x(τ)it)K(τ)it); expected future disutility from criminal justice punishment:   βiδpD; and immediate opportunity cost:   γi(1−xi1)Ki1. 3.3.2 Human capital Having resolved the criminal decision problem, we can then move backward to look at optimal human capital investment. This is most easily understood by first looking at the general deterministic case, and the special stochastic case of the first period. The human capital optimization problem is a dynamic optimization problem, and with no uncertainty, the policy function for the optimal xit (see online appendix for derivation) is:   xit=1Kit(d0id1iβit(1−βitT−t)(1−βit))1(1−d1i) (5) for t∈1,2, …,T. As would be expected, optimal human capital investment for agent i in period t is increasing in time preference β and productivity of investment ( d0,d1) and decreasing in T−t, the time remaining in the agent’s career. Perhaps a bit surprising, it is decreasing in Kit. The intuition for this is actually straightforward: in this case human capital is specifically marketable skills, as opposed to general potential. It is a measure of the money agent i can make right now, and so higher values of Kit are direct measures of opportunity cost of investment. The production parameters d0,d1 measure the sensitivity of future growth in marketable skills to the level of investment, and are thus more closely tied to the related idea of general potential. The uncertainty regarding theft adds one minor bit of complexity. All individuals have some probability of committing a theft, and no individual has zero probability of committing a theft. Thus, in period 1, every individual must optimize with a non-zero probability of committing a theft and hence some probability of being tarnished. The solution (see the appendix for derivation) is to make an adjustment for the stochastic component using the expression:   (π(1−τ)d1i+(1−π)) (6) and set xi1 to:   xi1=(π(1−τ)d1i+(1−π))Ki1(d0id1iβi(1−βiT−1)(1−βi))1(1−d1i) (7) where π is the ex ante probability of committing a theft and being caught, that is, the probability that: (i) individual i seizes his theft opportunity multiplied by; (ii) the probability of being caught p. With advance knowledge of the distribution of opportunities the individual can determine the probability of a draw of ri that would induce theft, and hence compute π. As will be discussed further below, it appears that for NLSY 1997 males π takes values between 0.03 and 0.06. Because empirical sources strongly suggest the odds of punishment and the effect of tarnishing are low, we can put a likely bound on the stochastic adjustment expression of eq. 6 as varying from 0.97 to 0.99. Moreover, because variation depends on unobservables, there is no way to unbiased way to estimate it. In the work on estimation, I use a range of probable values to look at effects. 3.4 Wage development under different parameterizations of the model If we focus on the three key parameters that are important in both the human capital component of the model as well as the theft component—initial capital Ki^, discount rate β and human capital production parameters d0 and d1—we can see that variation in these parameters will induce predictable changes in measurable variables such as wages, hours worked and years of education. In this subsection we focus on three questions: How does variation in these parameters lead to variation in wage development? Does variation in these parameters lead to lower wage (opportunity cost) in early adulthood? And finally, does variation lead to a lower present value of future earnings in early adulthood? I work with a baseline case for the parameters that is consistent with other literature and the data. The specific values of the baseline case are:   beta=0.82d0=0.308d1=0.948K^i=15.00 In Figs 1,2 and 3 I show the development of wages within the baseline and with illustrative variants (low initial human capital, low patience, and low human capital growth parameters). (Further discussion of how the parameters were selected can be found in Section 5 and in the online Appendix 5.1.) Fig. 1. View largeDownload slide Baseline and Low Ki^ Notes: Wage development in the human capital model outlined in the text. The baseline parameterization is compared with the low Ki^ parameterization (in the baseline, initial Ki^ is $15; in the low Ki^ parameterization is $14). As can be seen, wages are shifted systematically downward in the low Ki^ version. Fig. 1. View largeDownload slide Baseline and Low Ki^ Notes: Wage development in the human capital model outlined in the text. The baseline parameterization is compared with the low Ki^ parameterization (in the baseline, initial Ki^ is $15; in the low Ki^ parameterization is $14). As can be seen, wages are shifted systematically downward in the low Ki^ version. Fig. 2. View largeDownload slide Baseline and Low βi Notes: Wage development in the human capital model outlined in the text. The baseline parameterization with an annual discount factor of 0.961 is compared with the low β parameterization (where β is 0.959). As can be seen, wages are extremely similar, but start slightly higher in the low β version, dropping relative to the baseline. This is because human capital investment yields a lower discounted value for a low β individual. Fig. 2. View largeDownload slide Baseline and Low βi Notes: Wage development in the human capital model outlined in the text. The baseline parameterization with an annual discount factor of 0.961 is compared with the low β parameterization (where β is 0.959). As can be seen, wages are extremely similar, but start slightly higher in the low β version, dropping relative to the baseline. This is because human capital investment yields a lower discounted value for a low β individual. Fig. 3. View largeDownload slide Baseline and low d0,d1 Notes: Wage development in the human capital model outlined in the text. The baseline parameterization, where d0=0.308 and d1=0.948 is compared with the low d0 and d1 parameterization (where d0=0.28 and d1=0.92). As can be seen, wages start slightly higher in the low d0 and d1 version, dropping relative to the baseline. This is because human capital investment yields a lower return for an individual with lower d0 and d1 values. Although this is far more extreme than any pattern in the data, the discounted value of lost wages is virtually identical for both individuals. Fig. 3. View largeDownload slide Baseline and low d0,d1 Notes: Wage development in the human capital model outlined in the text. The baseline parameterization, where d0=0.308 and d1=0.948 is compared with the low d0 and d1 parameterization (where d0=0.28 and d1=0.92). As can be seen, wages start slightly higher in the low d0 and d1 version, dropping relative to the baseline. This is because human capital investment yields a lower return for an individual with lower d0 and d1 values. Although this is far more extreme than any pattern in the data, the discounted value of lost wages is virtually identical for both individuals. A few things are worth noting about the exercise in general, and about the different parameterizations. First, there is the general pattern of development of wages over the agent’s ‘lifetime’ (i.e. to period T). Figures 1,2 and 3 give a straightforward sense of the overall pattern: investment in early life, which reduces slowly, leading to a plateau by the forties. A second important note is how variation in different parameters leads to different patterns. For instance, low initial human capital Ki^ acts to reduce the overall development of human capital Kt and wages (1−xt)Kt. We see roughly parallel development in wages in the baseline and the low Ki^ cases (Fig. 1). The intuition here is simple: in the case of low Ki^, investment levels are higher, and there is some catchup effect, but it is not worth the loss of present wages to fully catchup to the baseline. In contrast, low discount rates6 (Fig. 2) and low human capital production parameters (Fig. 3) both create a ‘crossing’ pattern, where wages start high relative to the baseline but then develop more slowly and drop below the baseline wages. In the case of low discount rate β or low human capital production parameters d0 and d1, the utility payoff of human capital investment is lower than in the baseline, and so human capital investment is lower over all periods. This has the effect of increasing wages in the early periods but reducing them in later periods. It is thus clear that lower initial human capital could potentially explain lower instantaneous opportunity cost for the commission of crime, but lower patience and lower human capital potential could not. An additional note is that relative minor changes in discount factor β and human capital production parameters d0 and d1 lead to very significant changes in wage development. The next question is the impact of the parameterizations on the present value (in period 0) of lost future earnings from a criminal record. To provide some simple estimate of this, I need a sense of τ, the loss to human capital from ‘tarnishing’. While there is no definite consensus of the impact of conviction on wages or employment, the highest estimate (from Freeman, 1991) are roughly a 30% drop in employment prospects, and others are much more conservative (for example, Grogger, 1995 and Nagin and Waldfogel, 1995). I compute what impact a loss of 10% of human capital after period 1 would have on the present value of earnings. I then compare this impact across the four parameterizations in Table 2. Table 2. Impact of a 10% loss in income under representative parameterizations Model Name  Parameters    Lost Income From Criminal Conviction      Name  Value  Present Value  Difference    Name  Value  of 10% Loss  from Baseline  Baseline  β  0.82  $56,884  NA    d0  0.308        d1  0.948        K0  15      Low Initial Human  β  0.82  $53,092  −$3,791  Capital K^  d0  0.308        d1  0.948        K0  14      Low Discount  β  0.81  $54,350  −$2,534  Rate β  d0  0.308        d1  0.948        K0  15                Low Human Capital  β  0.82  $56,879  −$4  Production Coefficients  d0  0.28        d1  0.92        K0  15      Low K^ and  β  0.81  $50,726  −$6,157  Low β  d0  0.308        d1  0.948        K0  14      Model Name  Parameters    Lost Income From Criminal Conviction      Name  Value  Present Value  Difference    Name  Value  of 10% Loss  from Baseline  Baseline  β  0.82  $56,884  NA    d0  0.308        d1  0.948        K0  15      Low Initial Human  β  0.82  $53,092  −$3,791  Capital K^  d0  0.308        d1  0.948        K0  14      Low Discount  β  0.81  $54,350  −$2,534  Rate β  d0  0.308        d1  0.948        K0  15                Low Human Capital  β  0.82  $56,879  −$4  Production Coefficients  d0  0.28        d1  0.92        K0  15      Low K^ and  β  0.81  $50,726  −$6,157  Low β  d0  0.308        d1  0.948        K0  14      Notes: Representative parameterizations of the model, showing how a 10% loss of income from a conviction differentially impacts those with lower initial human capital, discount rates, or human capital potential. Notice that while low initial human capital and a low discount rate lead to a lower anticipated present value from a conviction, lower ‘potential’ (lower human capital development parameters) does not substantially vary lost future income. The combination of low initial human capital and low discount rate leads to a very substantial reduction in lost future income and is a close match to the wage progression in the data. Note that d0,d1,β are five-year values (the model works in 10 periods, representing a 50-year career length). More detail on these computations is provided in Table C1 in online Appendix C. More background on the source of parameters is given in online Appendix 5.1. The results are striking. Both the low β and Ki^ parameterizations do lead to greater incentives to steal; discounting all fifty years and summing we get a total difference in lost wages ranging between $3,000 to $7,000. Interestingly, lower human capital production capacity (low values of d0,d1), even though it leads to substantially lower earnings at the end of life (period 5–10, or ages 40–65), has effectively no impact whatsoever on the discounted present value of the individual’s income. Using a very extreme and unrealistic difference in parameterizations I get at most a total difference in the discounted value lost lifetime earnings of $4. Since differences in d0,d1 cannot help to explain any differences in incentives to theft, I spend little time on this possibility in the discussion that follows. 4. Data In this section I begin with a general discussion of the data, including summary statistics comparing thieves and non-thieves (honest individuals). The NLSY 1997 cohort tracks 8,984 individuals from 1997 onwards. The majority of respondents, roughly 3/4, come from a simple random sample of the US’s youth population in 1997. The remaining one quarter of respondents come from an over-sampling of ethnic minority populations, specifically black and non-black Hispanics. The respondents were equally ranged from the age of 12 to 16 in the initial round in 1997. The rate of attrition across years is quite low both for thieves and non-thieves. In addition to a wide range of questions about work, earnings, education, assets, beliefs, health, family and other issues, every round of the NLSY 1997 includes a self-administered questionnaire that asks respondents about potentially compromising issues such as criminal behaviour. Of the respondents, 1202 (approximately 12.5%) admitted to stealing items worth more than $50 at any point. Thieves are defined as individuals who reported stealing at least one item worth more than $50 by 2003. In Table 3 I compare some basic attributes of male thieves and non-thieves in the data. There is some evidence that thieves have lower potential human capital—they come from slightly poorer homes, on average, and have lower evaluated test scores and mental health. Household income is reported household of origin income in 1997. Mental health scores are from an evaluation performed in 2000 as part of the NLSY panel, with 20 being the best possible evaluation, and 0 the lowest, or least healthy. Table 3. Summary statistics for males   All  Non-Thieves  Thieves      (Honest)    Age in 1997  14.83  14.84  14.86  (St. Dev)  (1.45)  (1.45)  (1.42)  Household of origin income (N=3,316)  47,647  48,672  43,046  (St. Dev)  (42,441)  (42,723)  (40,871)  White  52.5%  53.2%  49.2%  Black  25.4%  25.1%  27.1%  Hispanic  21.2%  20.8%  23.1%  ASVAB Scores (Math and Verbal, N=3,575)  44,518  46,158  36,793  (St. Dev)  (29,656)  (29,799)  (27,716)  Mental Health (20 best, 0 worst, N=4,085)  15.80  15.92  15.23  (St. Dev)  (2.47)  (2.38)  (2.78)  Estimated Years in School to 20th Birthday (N=3,386)    12.7  11.9  Hours Working to 20th Birthday (N=3,386)    3,342.1  3,615.9    All  Non-Thieves  Thieves      (Honest)    Age in 1997  14.83  14.84  14.86  (St. Dev)  (1.45)  (1.45)  (1.42)  Household of origin income (N=3,316)  47,647  48,672  43,046  (St. Dev)  (42,441)  (42,723)  (40,871)  White  52.5%  53.2%  49.2%  Black  25.4%  25.1%  27.1%  Hispanic  21.2%  20.8%  23.1%  ASVAB Scores (Math and Verbal, N=3,575)  44,518  46,158  36,793  (St. Dev)  (29,656)  (29,799)  (27,716)  Mental Health (20 best, 0 worst, N=4,085)  15.80  15.92  15.23  (St. Dev)  (2.47)  (2.38)  (2.78)  Estimated Years in School to 20th Birthday (N=3,386)    12.7  11.9  Hours Working to 20th Birthday (N=3,386)    3,342.1  3,615.9  Notes: A comparison of basic data between male thieves and non-thieves. When not otherwise given, N=4,599. While only very minor age or ethnicity differences emerge in the data, there are significant differences on measure directly related to human capital. Particularly, thieves are more likely come from a family of origin with lower income and have lower test scores and evaluated mental health. To compare time working with time in education up to the respondents twentieth birthday, an estimate is made of how much time they spent in school (using highest grade up through a second year of college, plus any repeated grades) combined with the NLSY’s numbers for total hours working as a teenager. Source: NLSY 1997, author’s computations. Table 4. Percentage of males committing theft, by quartiles   Quartile    First  Second  Third  Fourth  Household of origin income (N=3,316)  21.6%  20.3%  16.6%  14.4%  ASVAB Scores (Math and Verbal, N=3,575)  24.2%  17.9%  16.8%  11.2%  Mental Health (20 best, 0 worst, N=4,085)  27.9%  17.5%  16.6%  15.1%  Individuals in the same quartile (N=220)  31.0%  14.6%  14.0%  10.9%  for all three measures            Quartile    First  Second  Third  Fourth  Household of origin income (N=3,316)  21.6%  20.3%  16.6%  14.4%  ASVAB Scores (Math and Verbal, N=3,575)  24.2%  17.9%  16.8%  11.2%  Mental Health (20 best, 0 worst, N=4,085)  27.9%  17.5%  16.6%  15.1%  Individuals in the same quartile (N=220)  31.0%  14.6%  14.0%  10.9%  for all three measures          Notes: The table uses three measures that show correlation with theft behaviour, and looks at the percent of male respondents in each quartile who report theft. As can be seen, while the odds of theft fall as you move to higher quartiles, the difference in rates of theft from the most likely to the least likely groups is roughly 20% (from 31.0% to 10.9%). This supports the modelling assumptions of a strong stochastic element in theft decision-making, and estimation results that support a small difference in ex ante risk of criminal behaviour and a criminal record. Source: Author’s computations, NLSY 1997, waves 1997–2011. The differences are statistically significant. The difference that are most striking are on Armed Services Vocational Aptitude Battery (ASVAB) combined test scores and mental health, where the differences in mean are 28% and 24% of the respective standard deviations. These differences in test scores and mental health would naively expect us to see some differences in human capital. Theft could be associated with lower Ki^ (initial capital) values, but also with lower βi (patience). (It is also true that they are consistent with differences in d0i,d1i, but as can be seen in the fourth set of parameters in Table 2 these cannot help to explain greater incentive to steal, so are ignored.) At the bottom of Table 3 I provide a comparison of hours in school and hours working for males, up to the age of 20. As can be seen, thieves spend slightly less time in school, about 0.85 less of a school year (perhaps seven to eight months). Additionally, they have worked about 300 hours more, or nearly seven weeks. In theory, this might represent approximately five to six more months of idleness among thieves (because it is likely that some education and work are concurrent, this figure is approximate). In Table 4 I compute probability of theft for males as grouped by several observable traits that are solid proxies for either human capital or family resources: ASVAB scores, mental health and household income for family of origin. As can be seen, 10% of males in the quartiles associated with high human capital levels still commit theft, while no more than 31% of males in the quartiles associated with low human capital commit theft. We can see that, as discussed in the review of the model, above, while mean human capital measures vary between thieves and non-thieves, there is a significant stochastic component to the decision to steal. In Fig. 4 I show the development of hourly wages for thieves and non-thieves to 2011. The means are very similar to the early twenties, after which thieves begin to fall below non-thieves. There is no evidence that this similarity is due to access to opportunities, attrition, non-reporting or selection bias in the NLSY; attrition rates are about the same for both thieves and non-thieves. Imprisonment appears to have an effect after the ages of 21 to 23; if we exclude men who have been imprisoned, the lines stay together to the late twenties. Labour market participation data, as measured by the probability of reporting income from wages or salaries, presents an identical picture, as seen in Fig. 5. Fig. 4. View largeDownload slide Wage development by theft behaviour Notes: Actual development of hourly compensation in the NLSY 1997, in 2009 dollars, for thieves and non-thieves. Notice that: (i) wages are extremely similar across all years, making any opportunity cost very difficult to substantiate; and (ii) wages are slightly higher for thieves in the teen years but then become lower in the late twenties, very much in keeping with differences in discount rates. Comparison is among males in the simple random sample. Source: NLSY 1997, males in simple random sample only, waves 1997 to 2011. Fig. 4. View largeDownload slide Wage development by theft behaviour Notes: Actual development of hourly compensation in the NLSY 1997, in 2009 dollars, for thieves and non-thieves. Notice that: (i) wages are extremely similar across all years, making any opportunity cost very difficult to substantiate; and (ii) wages are slightly higher for thieves in the teen years but then become lower in the late twenties, very much in keeping with differences in discount rates. Comparison is among males in the simple random sample. Source: NLSY 1997, males in simple random sample only, waves 1997 to 2011. Fig. 5. View largeDownload slide Employment development by theft behaviour Notes: Development of labour market engagement in the NLSY 1997, for thieves and non-thieves. Notice that likelihood of employment is extremely similar across to age 21–23 (after which point virtually no respondents report theft) supporting the pattern in the previous figure. Comparison is among males in the simple random sample. Fig. 5. View largeDownload slide Employment development by theft behaviour Notes: Development of labour market engagement in the NLSY 1997, for thieves and non-thieves. Notice that likelihood of employment is extremely similar across to age 21–23 (after which point virtually no respondents report theft) supporting the pattern in the previous figure. Comparison is among males in the simple random sample. Fig. 6. View largeDownload slide Wage development by theft behaviour Note: Development of wages for thieves and non-thieves to age 30, using the estimated GMM parameters from Table 5 (second specification) with the model of the text. Fig. 6. View largeDownload slide Wage development by theft behaviour Note: Development of wages for thieves and non-thieves to age 30, using the estimated GMM parameters from Table 5 (second specification) with the model of the text. 5. GMM analysis 5.1 Overview of methodology Having used the initial analysis of the model to focus on heterogeneity of: (i) time preference; and (ii) initial human capital, I now use General Method of Moments to estimate parameters for the model for thieves and non-thieves from the NLSY data. Online Appendix 5.1 provides a fuller discussion of the entire approach; in this section I focus on the basic method and overall results. It should be emphasized that the focus of this exercise is looking at likely and consistent differences between those who commit theft and those who do not. Because of the nonlinearity of the model and the need to fix parameters, some standard errors are quite low and significance quite high. I would urge readers to focus on the patter of differences in key parameters between thieves and non-thieves over the many estimations. Following Heckman et al. (2002), I group the NLSY wage data in five-year periods—Period 1, ages 17–21; period 2, ages 22–26; and period 3, ages 27–31. Within each period, I average observations of real hourly compensation and annual income for each individual, ignoring missing values. Dollar values are adjusted for inflation to 2009 dollars using the implicit price deflator for personal consumption7. Based on the 10-period model, we are able to observe some or all of three periods for most NLSY respondents. This is sufficient to get estimates of model parameters, and precise estimates of the differences between thieves and non-thieves. The core GMM model is thus three equations:   0=(W1i−K^+(d0d1β(1−β9)(1−β))1/1−d1)0=(W2i−d0((d0d1β(1−β9)(1−β))1/1−d1)d1−K^+(d0d1β(1−β8)(1−β))1/1−d1)0=(W3i−d0((d0d1β(1−β8)(1−β))1/1−d1)d1+d0((d0d1β(1−β9)(1−β))1/1−d1)d1+K^−(d0d1β(1−β7)(1−β))1/1−d1), where W1,W2,W3 are a measure of wages or earnings for ages 17–21, 22–26 and 27–31, respectively. In the analysis that follows I use hourly compensation and annual income, getting very similar results for both. As discussed in online Appendix 5.1, I have experimented with running the regressions with various parameters set, focusing on approaches where β is set. As is standard in the literature (see for example Heckman et al., 1998, page 27) I begin the estimation process with some parameters derived from other sources, in particular the value of discount rate parameter β. As a robustness check I rerun the estimation with a range of values. Results are stable within a range of values that is consistent with the literature. In particular, relative differences between thieves and non-thieves are extremely stable.8 Further details can be seen in the online Appendix. Having found a set of values that are both consistent with the data and the literature, it now becomes possible to estimate the difference in K0 and β between non-thieves (honest individuals) and thieves. I now run identical equations for thieves and non-thieves, which creates six equations (three each for both categories). By using a range of reasonable values for d1 and then d0,d1 I was able to get estimates of K^h,K^t,βh,βt that showed a stable and robust differences between thieves and non-thieves. For a fuller discussion of the process and additional regression output, see online Appendix 5.1. As can be seen in Table 5, the regressions estimate the difference in initial human capital for thieves as about $0.60–0.67/hour, and the difference in discount rate at 0.3%. I rerun all these regressions using annual income data. The results from these are very similar to the hourly wage regressions, and can be seen in Table 6. The difference in initial human capital is estimated at $2,200 to $2,450, with a difference in discount rate of 0.4%. Table 5. Establishing βh, βt, K^h, K^t from d1 and d0 for hourly wages GMM Regression        Hourly Wages, Males Only  Parameters Set: d1, d0  Parameters Estimated: βh, βt, K^h, K^t  Parameters      Values    of d1  0.940  0.948  0.954  of d0  0.286  0.308  0.328    βh  0.838***  0.817***  0.799***    (0.000)  (0.000)  (0.000)  βt  0.835***  0.814***  0.796***    (0.001)  (0.001)  (0.001)  K^h  15.395***  15.227***  15.088***    (0.079)  (0.077)  (0.075)  K^t  14.727***  14.599***  14.493***    (0.168)  (0.162)  (0.157)  Criterion Q(b)  0.000  0.000  0.000  No. of Parameters  4  4  4  No. of Moments  6  6  6  No. of observations  3833  3833  3833  Hansen’s J Test  .0013  .0003  .0001  J Test d.f.  2  2  2  GMM Regression        Hourly Wages, Males Only  Parameters Set: d1, d0  Parameters Estimated: βh, βt, K^h, K^t  Parameters      Values    of d1  0.940  0.948  0.954  of d0  0.286  0.308  0.328    βh  0.838***  0.817***  0.799***    (0.000)  (0.000)  (0.000)  βt  0.835***  0.814***  0.796***    (0.001)  (0.001)  (0.001)  K^h  15.395***  15.227***  15.088***    (0.079)  (0.077)  (0.075)  K^t  14.727***  14.599***  14.493***    (0.168)  (0.162)  (0.157)  Criterion Q(b)  0.000  0.000  0.000  No. of Parameters  4  4  4  No. of Moments  6  6  6  No. of observations  3833  3833  3833  Hansen’s J Test  .0013  .0003  .0001  J Test d.f.  2  2  2  Notes: * p<0.05, ** p<0.01, *** p<0.001 robust standard errors in parentheses d0,d1 represent development parameters, β represents discount rate, K^ represents initial capital h,t distinguish honest and thief types Source: GMM regression, using NLSY 1997, 1997 to 2011 waves. Section 5 discusses methodology, with additional details in Appendix B. Table 6. Establishing βh, βt, K^h, K^t from d1 and d0 for annual income GMM Regression        Annual Income, Males Only  Parameters Set: d1, d0  Parameters Estimated: βh, βt, K^h, K^t      Values    of d1  0.942  0.933  0.924  of d0  0.550  0.553  0.561    βh  0.7962***  0.8171***  0.8358***    (0.0001)  (0.0002)  (0.0002)  βt  0.7930***  0.8134***  0.8317***    (0.0005)  (0.0006)  (0.0006)  K^h  24,795***  25,407***  25,962***    (162)  (167)  (172)  K^t  22,586***  23,069***  23,508***    (350)  (366)  (380)  Criterion Q(b)  0.001  0.001  0.001  No. of Parameters  4  4  4  No. of Moments  6  6  6  No. of observations  3811  3811  3811  Hansen’s J Test  2.48  2.52  2.56  J Test d.f.  2  2  2  GMM Regression        Annual Income, Males Only  Parameters Set: d1, d0  Parameters Estimated: βh, βt, K^h, K^t      Values    of d1  0.942  0.933  0.924  of d0  0.550  0.553  0.561    βh  0.7962***  0.8171***  0.8358***    (0.0001)  (0.0002)  (0.0002)  βt  0.7930***  0.8134***  0.8317***    (0.0005)  (0.0006)  (0.0006)  K^h  24,795***  25,407***  25,962***    (162)  (167)  (172)  K^t  22,586***  23,069***  23,508***    (350)  (366)  (380)  Criterion Q(b)  0.001  0.001  0.001  No. of Parameters  4  4  4  No. of Moments  6  6  6  No. of observations  3811  3811  3811  Hansen’s J Test  2.48  2.52  2.56  J Test d.f.  2  2  2  Notes: * p<0.05, ** p<0.01, *** p<0.001 robust standard errors in parentheses d0,d1 represent development parameters β represents discount rate K^ represents initial capital h,t distinguish honest and thief types Source: GMM regression, using NLSY 1997, 1997 to 2011 waves. Section 5 discusses methodology, with additional details in Appendix B. What would this difference mean in the real world? The difference in human capital when analysed as hourly wage is about 5%. Using annual earning’s data, the difference is greater, about 10% at the beginning. If there was no discount rate difference, these differences would stay fixed. However, because of lower investment in earlier years, the model predicts that the differences will accelerate over time, so that thieves will be earning about 10% less per hour, and as much as 13% less in annual earnings, for the second half of their career. The final stage is to estimate the wage equations with the period 1 correction for possible future punishment and tarnishing. As a reminder, all individuals have the possibility of committing crime and ex ante all should be ready to commit crime; we are looking for ex post differences between those who did and those who did not. The correction done in period 1 is fairly simple, and requires individuals to take into account the odds that: (i) an attractive theft opportunity will come; (ii) they will take it; (iii) they will be caught and punished; and (iv) their human capital tarnished with damage τ. As discussed above (more details can be seen in Appendices A and 5.1), the adjustment term is (π(1−τ)d1i+(1−π)), where π is the product of the ex ante odds of an attractive opportunity multiplied by the odds of punishment and tarnishing, conditional on committing the crime, and τ is of course tarnishing. Since roughly 20% of thieves report criminal convictions for theft, and the variation in probability of stealing by observables goes from about 10% to about 30%, I focus on values for π that go from 0.2×0.10=0.02 to 0.2×0.30=0.06. The value for τ shown in table 7 is 0.10. In Appendix I show additional work for values of τ at 0.05 and 0.30. Table 7. Evaluating model with period 1 investment adjustment GMM Regression  Parameters Set: d1, d0, τ, πh, πt  Parameters Estimated: βh, βt, K^h, K^t    Hourly Wages    Annual Income      d1=0.948,d0=0.308    d1=0.933,d0=0.553      τ=0.1  τ=0.1  πh=  0.01  0.02  0.01  0.02  πt=  0.10  0.06  0.10  0.06    βh  0.817***  0.817***  0.817***  0.818***    (0.000)  (0.000)  (0.000)  (0.000)            βt  0.817***  0.815***  0.817***  0.815***    (0.001)  (0.001)  (0.001)  (0.001)  K^h  15.251***  15.270***  25,496.172***  25,560.540***    (0.077)  (0.077)  (168.389)  (169.032)  K^t  14.730***  14.679***  23,616.785***  23,397.353***    (0.169)  (0.166)  (387.083)  (377.894)  Criterion Q(b)  0.001  0.000  0.004  0.002  No. of Parameters  4  4  4  4  No. of Moments  6  6  6  6  No. of observations  3833  3833  3811  3811  Hansen’s J Test  2.413769  .994485  14.75184  7.61716  J Test d.f.  2  2  2  2  GMM Regression  Parameters Set: d1, d0, τ, πh, πt  Parameters Estimated: βh, βt, K^h, K^t    Hourly Wages    Annual Income      d1=0.948,d0=0.308    d1=0.933,d0=0.553      τ=0.1  τ=0.1  πh=  0.01  0.02  0.01  0.02  πt=  0.10  0.06  0.10  0.06    βh  0.817***  0.817***  0.817***  0.818***    (0.000)  (0.000)  (0.000)  (0.000)            βt  0.817***  0.815***  0.817***  0.815***    (0.001)  (0.001)  (0.001)  (0.001)  K^h  15.251***  15.270***  25,496.172***  25,560.540***    (0.077)  (0.077)  (168.389)  (169.032)  K^t  14.730***  14.679***  23,616.785***  23,397.353***    (0.169)  (0.166)  (387.083)  (377.894)  Criterion Q(b)  0.001  0.000  0.004  0.002  No. of Parameters  4  4  4  4  No. of Moments  6  6  6  6  No. of observations  3833  3833  3811  3811  Hansen’s J Test  2.413769  .994485  14.75184  7.61716  J Test d.f.  2  2  2  2  Notes: * p<0.05, ** p<0.01, *** p<0.001 robust standard errors in parentheses d0,d1 represent development parameters τ represents loss of human capital from criminal record π represents ex ante expectation of developing criminal record β represents discount rate K^ represents initial capital h,t distinguish honest and thief types Source: GMM regression, using NLSY 1997, 1997 to 2011 waves. Section 5 discusses methodology, with additional details in Appendix B. Reviewing the values in Table 7 we can see that the period 1 adjustment only changes values very little—all the parameter estimates are virtually identical to those shown in Tables 5 and 6. Across all the estimation exercises, the patterns remain very stable across a range of parameters value and modelling choices. In addition to the GMM results detailed above, I have also run a range of multiple regressions comparing: (i) number of grades repeated; and (ii) number of employers for thieves and non-thieves. Across a range of specifications, using a variety of controls, and limiting the population to individuals with no criminal justice penalties, theft is positively and significantly associated with both repeating grades and number of employers. Thus, we can see a strong pattern of underinvestment in schooling and at the same time active engagement in the labour market. As Figs 4 and 5 show, between the ages of 21 and 23, thieves appear to have at least as much access to work as non-thieves. 5.2 Analysis: heterogeneity leading to theft In eq. 4, theft is predicted if the return is greater than the sum of: (i) the expected lost future earnings (from the tarnishing of human capital if caught); (ii) the expected disutility from punishment; and (iii) the opportunity cost of lost time. The preceding analysis points strongly towards certain meaningful differences between thieves and non-thieves. In particular, differences in lost future earnings and expected disutility from punishment can help to explain differences in behaviour, while there is little sign of differences in the opportunity cost of lost time. As shown in Table 2, if being caught and sentenced for theft is likely to reduce future earnings by 10%, then the individuals who have reported theft would on average expect to lose about $6,500 in net present value from such a sentence. In the NLSY 1997 data, about 20% of those reporting thefts also report being charged with a crime (about 15% report any conviction). Thus, the difference in expected cost of punishment between thieves and non-thieves might be as much as 0.2×$6,500=$1,300 on average9. Since the reported thefts are for items valued at $50 or more, this suggests that thieves’ decisions can be justified on a rational basis. Additionally, the subjective difference in discount rate also leads to a difference in the discounted disutility of punishment, although not as great—since the difference on a monthly basis is about one one-hundredth of a percent, disutility from punishment would have to be on the order of $100,000 or more for it to make so much as $1.00 difference. One plausible source of heterogeneity is not supported in this analysis or in the NLSY 1997 data: differences in opportunity cost, which would make theft reasonable as a substitute for labour, do not show up. The hourly wages and annual incomes are effectively the same for both groups. 6. Conclusion I have developed and solved a model that links human capital investment and theft behaviour. I focus on the three sets of human capital parameters as potential sources of variation in theft behaviour: initial human capital Ki^, patience βi, and human capital production potential d0,d1. At first blush, all are plausible candidates for particularly low perceived costs to theft. Very low initial human capital, within the model, could significantly reduce the opportunity cost of theft and the perceived future lost wages from developing a criminal record. Similarly, very low discount rates (high impatience) could reduce the present discounted value of future punishment and future lost wages. Low human capital production potential would mean lower earnings further on in the individual’s career, and might reduce the expected cost of theft. The last of these is the most easily disposed of: low human capital production potential (low d0,d1), while a reasonable match to the differential wage development of thieves and non-thieves, does not actually lead to any difference in the present value of lost future wages. Low potential does not appear to lower any of the other costs of theft—it does not affect disutility of criminal justice sanctions or the immediate opportunity cost of crime. These parameters can thus be rejected as a potential explanation within the rational agent framework. This is a completely novel result within the literature. Low initial human capital appears to play a role, but only with regard to net present value of future earnings. One obvious explanation of theft, lower opportunity cost, cannot be supported by the data. Hourly and annual earnings for thieves are about the same if not higher than for non-thieves, a pattern that can be seen in other data as well (see for example Holzman, 1982, Table 4; Paternoster et al., 2003; Brame et al., 2004; Apel et al., 2008). Lower discount rate also appears to play a role. Estimated discount rates for thieves are slightly lower than for non-thieves; combined with the lower initial human capital, this leads to significant differences in the discounted value of lost future earnings. More generally, lower discount rates may be linked to lack of self-control and a lower fear of punishment. Combined, the decision to steal appears to have some rational basis: If we estimate that a criminal justice record reduces future earnings by 10%, and that thieves have a 20% risk of punishment (as appears to be the case in the NLSY data) individuals reporting having committed a theft face an expected loss from theft that is $1,300 less in discounted lost earnings than the individuals not reporting a theft. Thus, the decision to steal can be matched to a rational analysis of utility maximization. Importantly, and in a substantial contribution to previous research, the model and empirical work together establish that the opportunity cost of time during the period of theft cannot help explain the decision to steal, at least within the NLSY 1997 data. The entire difference between thieves and non-thieves comes from expected future earnings. This novel result makes it extremely difficult to sustain arguments that view property crime as a substitute for legitimate labour10. By allowing heterogeneity in both initial human capital and in discount rates, this exercise makes it possible to relate both objective and subjective aspects of individual circumstances to clearly measurable differences in career path. Finally, combined with Lochner (2004) these results have the potential to inform policy at two levels. With regard to theft and crime generally, the results reinforce much work of the last few decades, that interventions in early stage human capital have the most promise Heckman (2006). With the focus on a single type of crime the results help us to understand with much greater specificity how human capital deficiencies effect behaviour. At a much broader level, the changing relative earnings of thieves and non-thieves show that differences in human capital do not have simple, predictable effects on earnings. Supplementary Material Supplementary material (the Appendix) is available online at the OUP website. Footnotes 1 The literature on human capital goes back to Friedman and Kuznets (1954), and Becker (1962) seems to have been the critical article to put all the pieces in place (with additional work by Mincer (1974) and Ben-Porath (1967)). The most active work in modelling and estimating human capital development has been by James Heckman and his students (Cunha et al., 2006, 2010; Cunha and Heckman, 2007). 2 A more exact but less intuitive description is that it is like selling a call option. 3 Another modelling approach would be to assume that upon reaching adulthood all individuals become drastically more concerned about reputation, and avoid all theft. An even simpler approach would be to simply focus on the human capital model and related estimation as an exercise in understanding the differences between thieves and non-thieves. Both of these approaches yield identical results as the approach used in the main text. An interesting extension of this work might be to extend the criminal opportunities to multiple periods and then estimate it using a data set with numerous individuals stealing into their twenties and thirties. Given that such individuals are all but non-existent in this dataset, a multi-period theft model applied to the NLSY 1997 would be highly unlikely to yield different results. 4 The parameter β is referred to as the discount factor or measure of time preference and takes values on the interval [0,1], with lower values showing greater impatience (less patience). The use of the parameter is common to human capital models (Lochner 2004); further information can be seen in Frederick et al. (2002). 5 Obviously, the modelling and estimation here would be biased with regard to understanding the ‘professional’ thieves who remain active over much of adulthood. Since they are a minor part of the population, the estimates in this paper should be unbiased for understanding most thieves, but not for all thieves. 6 That is to say, lower patience levels. 7 Specifically, this is annual, not seasonally adjusted, BEA series DPCERD3A086NBEA. 8 Because of the strong assumptions of the model, and the fact that β and d0,d1 cannot be estimated in a single run, standard errors are small and the significance levels for individual parameters are high. 9 Reducing the damage to future earnings and/or the odds of punishment reduces the difference, but even a 5% risk of punishment and 5% damage to earnings yields a difference of $162.50, enough to rationally justify the pattern. 10 Other crimes, particularly drug dealing, are different in important ways, and labour substitution may play a significant role with them. Acknowledgements I would like to thank my dissertation committee Tomas Sjöström, Anne Morrison Piehl, Roger Klein and Lance Lochner, as well as Richard McLean, Francis Teal and several anonymous referees for helpful comments. References Angrist J., Lavy V., Schlosser A. ( 2010) Multiple experiments for the causal link between the quantity and quality of children, Journal of Labor Economics , 28, 773– 824. Google Scholar CrossRef Search ADS   Apel R., Bushway S.D., Paternoster R., Brame R., Sweeten G. ( 2008) Using state child labor laws to identify the causal effect of youth employment on deviant behavior and academic achievement, Journal of Quantitative Criminology , 24, 337– 62. Google Scholar CrossRef Search ADS PubMed  Becker G.S. ( 1962) Investment in human capital: a theoretical analysis, Journal of Political Economy , 70, 9– 49. Google Scholar CrossRef Search ADS   Becker G.S. ( 1968) Crime and punishment: an economic approach, Journal of Political Economy , 76, 169– 217. Google Scholar CrossRef Search ADS   Ben-Porath Y. ( 1967) The production of human capital and the life cycle of earnings, The Journal of Political Economy , 75, 352– 65. Google Scholar CrossRef Search ADS   Brame R., Bushway S.D., Paternoster R., Apel R. ( 2004) Assessing the effect of adolescent employment on involvement in criminal activity, Journal of Contemporary Criminal Justice , 20, 236– 56. Google Scholar CrossRef Search ADS   Catalano S. ( 2010) Victimization during burglary, Technical Report NCJ 227379 , US Department of Justice Office of Justice Programs Bureau of Justice Statistics, Washington, DC. Cunha F., Heckman J. ( 2007) The technology of skill formation, The American Economic Review , 97, 31– 47. Google Scholar CrossRef Search ADS   Cunha F., Heckman J.J., Lochner L., Masterov D.V. ( 2006) Interpreting the Evidence on Life Cycle Skill Formation , Vol. 1, Elsevier, Oxford. Cunha F., Heckman J.J., Schennach S.M. ( 2010) Estimating the technology of cognitive and noncognitive skill formation, Econometrica , 78, 883– 931. Google Scholar CrossRef Search ADS PubMed  Doyle M. ( 2012) 24th Annual Retail Theft Survey , Jack L. Hayes International, Inc., 27520 Water Ash Drive, Wesley Chapel, FL 33544. Ehrlich I. ( 1973) Participation in illegitimate activities: a theoretical and empirical investigation, Journal of Political Economy , 81, 521– 65. Google Scholar CrossRef Search ADS   Frederick S., Loewenstein G., O’Donoghue T. ( 2002) Time discounting and time preference: a critical review, Journal of Economic Literature , 40, 351– 401. Google Scholar CrossRef Search ADS   Freeman R.B. ( 1991) Income from independent professional practice, NBER Working Paper No. 3875, Cambridge, MA. Freeman R.B. ( 1999) The Economics of Crime , Vol. 5, Elsevier, Oxford. Friedman M., Kuznets S. ( 1954) Income from independent professional practice, NBER, Cambridge, MA. Available at: http://papers.nber.org/books/frie54–1 (accessed 18 September 2017). Gottfredson M.R., Hirschi T. ( 1990) A General Theory of Crime , Stanford University Press, Stanford, CA. Gould E.D., Weinberg B.A., Mustard D.B. ( 2002) Crime rates and local labor market opportunities in the United States: 1979–1997, Review of Economics and Statistics , 84, 45– 61. Google Scholar CrossRef Search ADS   Grogger J. ( 1995) The effect of arrests on the employment and earnings of young men, The Quarterly Journal of Economics , 110, 51– 71. Google Scholar CrossRef Search ADS   Grogger J. ( 1998) Market wages and youth crime, Journal of Labor Economics , 16, 756– 91. Google Scholar CrossRef Search ADS   Heckman J.J. ( 2006) Skill formation and the economics of investing in disadvantaged children, Science (New York, N.Y.)  312, 1900– 1902. doi: 10.1126/science.1128898. Google Scholar CrossRef Search ADS   Heckman J., Lochner L., Cossa R. ( 2002) Learning-by-doing vs. on-the-job training: Using variation induced by the EITC to distinguish between models of skill formation, NBER Working Paper No. 9083, Cambridge, MA. Heckman J.J., Lochner L., Taber C. ( 1998) Explaining rising wage inequality: explorations with a dynamic general equilibrium model of labor earnings with heterogeneous agents, Review of Economic Dynamics , 1, 1– 58. Google Scholar CrossRef Search ADS   Holzman H.R. ( 1982) The serious habitual property offender as ‘moonlighter’: an empirical study of labor force participation among robbers and burglars, The Journal of Criminal Law and Criminology , 73, 1774– 92. Google Scholar CrossRef Search ADS   Lee D., McCrary J. ( 2005) Crime, punishment and myopia, NBER Working Paper No. 11491, Cambridge, MA. Levitt S.D. ( 1997) Juvenile crime and punishment, Journal of Political Economy , 106, 1156– 85. Google Scholar CrossRef Search ADS   Levitt S.D., Lochner L. ( 2001) The determinants of juvenile crime, in Gruber J. (ed.) Risky Behavior Among Youths: An Economic Analysis , NBER, University of Chicago Press, 327– 73. Levitt S.D., Venkatesh S.A. ( 2000) An economic analysis of a drug-selling gang’s finances, Quarterly Journal of Economics , 115, 755– 89. Google Scholar CrossRef Search ADS   Lin M.-J. ( 2008) Does unemployment increase crime? Evidence from US data 1974–2000, Journal of Human Resources , 43, 413– 436. Google Scholar CrossRef Search ADS   Lochner L. ( 2004) Education, work, and crime: a human capital approach, International Economic Review , 45, 811– 43. Google Scholar CrossRef Search ADS   Merlo A., Wolpin K.I. ( 2009) The transition from school to jail: youth crime and high school completion among black males, Penn Institute for Economic Research Working Paper No. 09–002. Mincer J. ( 1974) Schooling, Experience, and Earnings , Columbia University Press, New York. Nagin D., Waldfogel J. ( 1995) The effects of criminality and conviction on the labor market status of young British offenders, International Review of Law and Economics , 15, 109– 26. Google Scholar CrossRef Search ADS   Paternoster R., Bushway S., Apel R., Brame R. ( 2003) The effect of teenage employment on delinquency and problem behaviors, Social Forces , 82, 297– 335. Google Scholar CrossRef Search ADS   Piehl A.M. ( 1998) Economic Conditions, Work, and Crime. Handbook on Crime and Punishment , Oxford University Press, Oxford. West D.J., Farrington D.P. ( 1977) The Deliquent Way of Life , Heinemann, Oxford. Williams G.F. ( 2015) Property crime: investigating career patterns and earnings, Journal of Economic Behavior & Organization , 119, 124– 38. Google Scholar CrossRef Search ADS   Wilson J.Q., Abrahamse A. ( 1992) Does crime pay?, Justice Quarterly , 9, 359– 77. Google Scholar CrossRef Search ADS   © Oxford University Press 2017 All rights reserved This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Oxford Economic Papers Oxford University Press

The thief’s wages: theft and human capital development

Loading next page...
 
/lp/ou_press/the-thief-s-wages-theft-and-human-capital-development-2m9u9Yl0iT
Publisher
Oxford University Press
Copyright
© Oxford University Press 2017 All rights reserved
ISSN
0030-7653
eISSN
1464-3812
D.O.I.
10.1093/oep/gpx046
Publisher site
See Article on Publisher Site

Abstract

Abstract In this paper, a model is developed to investigate whether theft can be economically rational. It is shown that heterogeneity in capital accumulation rates (or ‘learning ability’) cannot create any noticeable difference in incentives to steal. Further heterogeneity in instantaneous opportunity cost is both too low and runs in the wrong direction to have any explanatory role. However, heterogeneity in discount rates in combination with differences in initial human capital can create an incentive for theft. The model is calibrated from the National Longitudinal Study of Youth 1997 with data from 1997 to 2011. 1. Introduction The rational agent model can explain variation in criminal activity across individuals with two different types of heterogeneity. One potential explanation is that criminals face objectively different costs and benefits from crime than non-criminals. For example, criminals may systematically lack access to employment opportunities, and turn to crime as a partial or total substitute for legitimate earnings. The other general explanation may be that criminals have systematically different preference structures than non-criminals. In particular, it may be the case that criminals are more present-oriented and less patient than non-criminals. In this paper I develop a model that integrates human capital development and criminal behaviour choices as a way to compare the predictions of these two types of heterogeneity. I look at theft only (excluding violent crime or drug dealing) in an effort to focus on economic trade-offs across time. Working with three parameters that could vary human capital production and earnings—initial human capital, human capital growth, and time preference—I analyse how variation might influence an individual’s incentives to commit theft. Using parameterizations consistent with previous literature and the National Longitudinal Study of Youth (NLSY) data, I use the model to understand how these parameters affect hourly earnings (a measure of instantaneous opportunity cost) as well as discounted present value of future earnings. The model builds on previous work in this area, but differs in several significant ways. As with Lochner (2004), I add criminal behaviour choices to an adjusted Ben-Porath model that looks at trade-offs between training and schooling on one side against work on the other. In contrast, I focus on only the crime of theft and simplify the model. This allows me to make precise estimates of how thieves differ from non-thieves in several core human capital parameters, something not done in previous literature on crime and human capital development. More generally, the exercise illuminates how different traits (reflected in the parameters) interact to influence theft within a rational agent model. The first specific result, coming from a simple analysis of the model, is that differences in human capital growth, what we might think of as ‘potential’, fail to create any difference in incentives for individuals in their early adulthood, the period of peak property crime offences. Individuals with lower capacity for human capital development, even though they will have significantly lower future income, are not predicted to face lower present value of future income and in fact are predicted to have higher hourly earnings in early life. Moving beyond this theoretical analysis, I then estimate the parameters of the model using generalized method of moments (GMM), using wages and annual income of male respondents in the NLSY 1997 cohort from 1997 to 2011. Across a range of specifications, estimated initial human capital for thieves is roughly $0.60–0.67/hour less than non-thieves (honest individuals). For annual income, the estimated difference is $2,200–2,450/year. Similarly, the estimated discount rates for thieves are about 0.3–0.4% lower than those of non-thieves. A third result comes from using the model with the data of the first several rounds of the NLSY, during the period of peak property crime activity: wages, income and employment levels are the same if not higher for thieves than for non-thieves (honest individuals). This clearly is significant evidence against explanations for theft that assume lower instantaneous opportunity cost, or arguing that theft is a substitute for legitimate labour market opportunities. The difference in earnings and employment shows up in the mid- to late-twenties, after virtually all self-reported thieves have stopped committing property crime (the empirical observation, without theoretical analysis, can also be found in Williams, 2015). The combination of low initial human capital and high impatience suggested by wage development is also consistent with other measures in the data. Thieves have lower average standardized test scores and worse average mental health than non-thieves (honest individuals). They show less stability over jobs (on average having more employers) and are more likely to repeat grades, even when controlling for a range of measures of socioeconomic status and ability. Beyond the literature on theft and crime, the analysis here has a broader utility, in suggesting that individuals with significantly limited human capital may have equal or better labour market outcomes in their early careers. While thieves show no difference in wages and hours work at age 20, a range of parameterizations suggest that by age 40 there could be a gap of 20% or more in income. Analysis of labour market outcomes that focuses on individuals in their twenties (for example, Angrist et al., 2010) may be missing significant long-term differences. The paper proceeds as follows. In Section 2 I review the literature on earnings from crime and the honest earnings of criminals, as well as the literature on human capital. In Section 3 I develop and solve a model that links the decision to steal with the decision to invest in human capital. I then look at what the model predicts across a range of parameterizations. In Section 4 I review the data from the NLSY. In Section 5 I estimate the model’s parameters using GMM based on data from the NLSY. In Section 6 I summarize the results and discuss implications. Appendices A through C outline various technical issues. 2. Literature 2.1 Incentives for crime There has been substantial debate for decades as to how to use the rational agent model to understand criminal behaviour. Some economists have focused on crime as a substitute for legal work, beginning with Becker (1968) and Ehrlich (1973). The labour economist Richard Freeman is perhaps the researcher of the past few decades who has done the most to develop this idea empirically (Freeman, 1991, 1999). A range of researcher have worked with individual-level data on actual wages and criminal behaviour, and have found some evidence in support of the idea (Grogger, 1998; Lochner, 2004). A number of papers work at the aggregate level, trying to find links between unemployment and crime, and have found a limited link (Gould et al., 2002). Helpful literature reviews include Piehl (1998) and Lin (2008). Criminologists have been more skeptical of the idea of crime as a labour substitute, and a number of researchers in that field find little or no evidence. As Wilson and Abrahamse (1992) argue, while a single crime can be a rational choice, given a low probability of capture (‘[t]he wonder is that more people don’t steal’) a criminal career seems not to pay off at all. One of the dominant theories of crime, outlined in Gottfredson and Hirschi (1990), can be summarized as arguing that criminals simply have less self-control. Even a number of economists have found evidence directly contradicting the idea of crime as a substitute for labour. Levitt and Venkatesh (2000) examine the revenues and wages provided by drug-dealing in a gang in Chicago and conclude ‘it is difficult (but not impossible) to reconcile the behaviour of the gang members with an optimizing economic model without assuming nonstandard preferences or bringing in social/nonpecuniary benefits of gang participation’. Lee and McCrary (2005) look at the deterrent power of prison sentences and conclude ‘criminal behavior – at least for the kinds of crimes that we focus on – could be thought of as the consequence of a self-control problem and a taste for immediate gratification’. Many researchers have found that criminals actually have slightly higher wages than non-criminals, especially in early periods. Specifically, Nagin and Waldfogel (1995) find that conviction leads to greater instability but also higher pay. They reference West and Farrington (1977) who noted a: … tendency for delinquents and especially recidivists to take up laboring or unskilled occupations offering relatively high rates of pay for beginners, but with relatively few prospects of long-term advancement. Non-delinquents were more likely to defer immediate material rewards for the sake of obtaining apprenticeships or training for skilled work.Paternoster et al. (2003) provides an excellent review of the literature on the link between early work and antisocial behaviour. They establish a strong correlation between intensive work in adolescence and theft (and other behaviours), but show that unobserved heterogeneity appears to be the cause (see also Holzman, 1982; Brame et al., 2004; Apel et al., 2008, Table 4). All in all, the evidence is contradictory. While certain crimes, in certain places, may make sense as a source of income (drug-dealing, in particular), there is no clear-cut case that crime ‘pays’, particularly for theft. Even as there is debate about how much criminals earn, there also debate about the long-term costs of a criminal record. Freeman (1991) finds that imprisonment reduces weeks worked from somewhere between eight to 20 weeks per year, and reduces the probability of work by 15% to 30%. Grogger (1995) and Nagin and Waldfogel (1995) (cited above) are much more cautious; both find limited effects on wages and employment (in some cases positive). An important stylized fact is that criminal activity is particularly intense during adolescence, and individuals are actively engaged in property crime for extremely short periods, usually less than a year (Williams, 2015). This adolescent aspect of crime, and particularly property crime, has been carefully documented but there is no consensus explanation as to why it occurs (Levitt, 1997; Grogger, 1998; Lee and McCrary, 2005; Levitt and Lochner, 2001). Overall, the rational agent model has shown significant value in understanding criminal behaviour, but there are important gaps in our knowledge. 2.2 Human capital and crime A number of researchers have developed models looking at different types of human capital and the interaction of education, work and criminal behaviour, for example Merlo and Wolpin (2009).1 The paper that most resembles this one is Lochner (2004), in that both use a human capital model based on the Ben-Porath model and then test this model against NLSY data. It is worth discussing the similarities and differences in detail. Both papers work with the Ben-Porath model of human capital development, and incorporate choices to commit criminal activity. The key differences stem from a difference in approach: this paper uses a simpler human capital model, focuses purely on theft, and takes advantage of the additional details provided by the NLSY 1997 in both modelling and estimating the logic of theft choices. The simplified model of human capital development satisfies both Kuhn-Tucker necessity and Kuhn-Tucker sufficiency allowing me to identify a single optimum. This allows me to estimate the differences in average values of key parameters between thieves and non-thieves, something not done in Lochner (2004). The choice to focus on theft only, to the exclusion of other crimes, is based on the very different patterns that can be seen in violent crime, property crime and drug dealing. All three peak at around 20 years of age, but property crime peaks earlier and drops off very rapidly while the other two fall much more slowly. Moreover, most perpetrators of violent crime do it very rarely, while during periods of activity thieves and drug dealers average multiple offences in a year. Drug dealing and theft offer direct financial rewards, while violent crime does not, and both theft and violent crime are directly predatory which is not true of drug dealing. The choice to focus on theft increases my ability to make precise estimates without having to make additional strong assumptions. Additionally, I use the NLSY 1997 which provides more detailed data as opposed to the NLSY 1979 that Lochner uses. Specifically, the NLSY 1997 asks about property crime every year, while the 1979 only asks in a single year. From this, it becomes clear that virtually all property criminals in the NLSY 1997 are active in only one or two years, and show very low earnings. Moreover, punishment and imprisonment rates are low for thieves at about 20%. The data informs a number of differences in modelling and estimation. To match the data, I model theft as a ‘negative lottery’ that has a strong stochastic element. Because of the low rates of punishment and short periods of imprisonment, I model expected punishment as a potential disutility with a single scalar value, while Lochner focuses on loss of consumption due to time in prison. Because within the NLSY virtually no respondents report theft after the age of 21, I focus on the decision to steal in adolescence/early adulthood. In contrast, Lochner (2004) develops a more general understanding of the link between all criminal behaviour and human capital development. He does not focus on a single optimum and views all crime (property crime, violent crime, and drug dealing) as providing fungible utility. Instead of estimating specific values he generates important but broad insights about trade-offs in the timing of criminal behaviour, work, and human capital investment. Our conclusions are similar, in that we both place significant emphasis on differences in initial human capital levels. However, the combination of more comprehensive data and my modelling choices allows me to make several contributions not seen in Lochner (2004) or elsewhere in the literature. First, I show that heterogeneity in capital accumulation rates (or ‘learning ability’) cannot create any noticeable difference in incentives to steal. Second, I show that heterogeneity in instantaneous opportunity cost is both too low and runs the wrong direction to have any explanatory role. Thirdly, I establish that heterogeneity in discount rates works in combination with differences in initial human capital to create the incentive to theft. Finally, I estimate average values for differences in key parameters between thieves and non-thieves (‘honest’ types) in the NLSY 1997 data. 3. Model The model focuses on understanding the decision to steal in the context of overall career choices and lifetime income. Thus, one component of the model is a human capital investment problem. The other component of the model is a decision to steal or not to steal. The decision to steal is modelled in a way that might be described as a ‘negative lottery’, where theft is largely about the trade-off between immediate returns and long-term risks.2 Partly because it fits the data well, and partly for reasons of mathematical simplicity, the human capital investment decision and the decision to steal are not incorporated in a single equation, but instead are kept separate. Moreover, the individual’s decisions in these two areas are made at separate points in time, interactions are limited and the effects tractable. Each agent lives for T periods. At the beginning of each period (indexed 1 to T) agents set their human capital investment levels, based on their parameters (human capital level, discount rate, human capital production parameters). During the first period they have the opportunity to steal; they are exposed to a criminal opportunity where for each they can make the binary choice to take the opportunity or to pass on it. If they choose to steal, there is a risk of punishment by the end of that period. The punishment inflicts direct disutility as well as a negative shock to human capital levels, requiring revised production choices. If captured and punished, an agent’s human capital is ‘tarnished’ by the end of period 1. Human capital investment and earnings continue until the last period T. Table 1 briefly summarizes the process. Table 1. Model timeline Period t  =    1    2    …  T  Decisions:    Choose  Steal/Don’t Steal  Choose    Choose          investment    investment    investment          for period 1    for period 2    for period 3      Payouts:      All theft payouts                  and punishments                  Earn wage for    Earn wage for      Earn wage for        period 1    period 2      period T  Period t  =    1    2    …  T  Decisions:    Choose  Steal/Don’t Steal  Choose    Choose          investment    investment    investment          for period 1    for period 2    for period 3      Payouts:      All theft payouts                  and punishments                  Earn wage for    Earn wage for      Earn wage for        period 1    period 2      period T  Source: This table outlines the model discussed in the text. The decision to allow theft in only a single period fits the NLSY data very closely—as discussed above, virtually no respondents report theft after the age of 21—as well as the literature on patterns of theft.3 3.1 Human capital component There is a population of N agents, indexed i∈{1,…,N} who are maximizing utility over T periods. At the outset of each period t each agent chooses what percentage of their human capital, K, to hold back from the market as ‘investment’, which I denote with the variable xit∈[0,1]. The value of xit ranges from 1 for a full time student with no job to 0 for a full-time worker at a job receiving no training, and would be somewhere in the range (0,1) if i is receiving some wage compensation but is also receiving some training, formal or informal. The value of future earnings is discounted at the rate β4 and the investment grows according to the expression d0(xitKit)d1, where β,d0,d1 vary across individuals. The problem in period 1 is thus formally:   max⁡{xit}1T ∑t=1TE1(βit−1(1−xit)Kit)s.t. Kit+1=d0i(xitKit)d1i+KitKi1=K^i (1) That is, the agent i needs to find an optimal sequence of investment decisions {xit}1T given the parameter values and his initial level of human capital K^i. Four brief points should be made about the model at this juncture. First, this is a simplified variant of the Ben-Porath human capital model, which in turn is a variant of the standard dynamic optimization capital accumulation model, with an added wrinkle that the capital is not consumed. Second, as with Ben-Porath (1967) and Lochner (2004), this model views all human capital investment as a fungible component of xit; an hour of schooling and an hour of job training look exactly the same. Even jobs that pay a salary, where there is no explicit, separate training but only a general, ongoing on-the-job training, are modelled as an increase in xit that leads to a partial reduction in salary in exchange for a chance to learn. Third, the model does not include a depreciation parameter; this decision is made for simplicity, and because for the population we will be focusing on, youths age 17 to 31, all evidence suggests that human capital growth swamps human capital decay and it is difficult to separate any decline in human capital from a general failure to invest in human capital. Fourth, there is individual heterogeneity in human capital and time preference parameters. We can think of each of these parameters, for each individual, as a draw from a continuous distribution. At time t>1, in response to any stochastic shocks (to be discussed below), agent i can reassess his level of human capital Kit and reset his planned sequence of investment {xit}tT. In the context of this paper, the only stochastic shock possible would be ‘tarnishing’ due to punishment by the end of period 1. 3.2 Criminal opportunities component The second component of the model involves criminal activity. There is a single point in time when individuals have the opportunity to steal, during period 1 (after the investment decisions of period 1, but before reinvesting for period 2), the agent is offered one opportunity to steal. For the remainder of the agent’s life, periods 2 through T, there will be no further theft opportunities, and no need for any change of behaviour. The opportunity can be thought of as a ‘negative lottery’. A standard lottery asks you to pay out a sum of money in exchange for the possibility of winning a positive sum in the future; in contrast, the theft opportunity offers the agent a positive sum of money in exchange for the possibility of a future loss. The cash value of the opportunity is denoted ri and has a random distribution f(ri;θ) where f() has support (0,∞) and is decreasing in ri. To simplify matters, I assume that when the opportunity is presented the agent can perfectly perceive ri and has no risk of being stopped or captured while committing the theft, only after consuming the value ri. His decision to steal or not is a very simple cost/benefit analysis of the two paths. If he does not take opportunity ri, his expected payoff at the beginning of period 2 is simply the future stream of revenue from human capital investment and ‘rental’ (as outlined in eq. 1):   ∑t=2TE1(βit−1(1−xit)Kit) (2) If he does take the opportunity ri, his expected payoff is:   ri+(1−p)(∑t=2Tβit−1(1−xit)Kit) +p(∑t=2Tβit−1(1−x(τ)it)K(τ)it)+βiδpD+γ(1−xi1)K^i (3) where: p is the probability of being caught and punished by the criminal justice system. For simplicity, I do not try to separate out the impact of being arrested vs being charged, being charged vs being convicted, being convicted vs being sentenced, etc. D<0 is the disutility from any detection, capture or punishment (including any social stigma from an arrest record, a conviction record, etc.). γ(1−xi0)K^i<0 is the opportunity cost of the crime, with |γ| the amount of time the crime takes (as a fraction of the five-year period of work). (τ) ( τ for ‘tarnishing’) marks the shift in his human capital Kit and his human capital investment strategy xit if he is caught and develops a criminal justice record that limits employer’s willingness to hire and pay him. Thus, τ∈(0,1) and K(τ)it=(1−τ)Kit δ is the length of time, less than a year, between the commission of the crime, and the beginning of punishment, so δ∈(0,1).A few other modelling decisions are worth brief comments: Theft opportunities are modelled as a random process, where the distribution is the same across all individuals. The logic behind this is that per act earnings from theft in the NLSY 1997 follow a roughly lognormal distribution. Average earnings per theft are low, and negatively correlated with number of acts (i.e. people who steal frequently tend to make less than people who steal rarely). There does not seem to be any stability in per-act earnings from year to year, and correlations in earnings from theft are low between siblings (for a fuller discussion of the patterns in the NLSY 1997 data, see Williams, 2015). The NLSY patterns are extremely consistent with the bulk of research into earnings from crime, whether data is from criminal self-reports, police reports or victimization surveys (see Gottfredson and Hirschi, 1990, part I, chapter 2, or reports such as Catalano, 2010 or Doyle, 2012). While it is difficult to prove a negative, it does not appear to be any meaningful variation of ‘theft ability’ in populations the size of the NLSY panel. Given that individual human capital and time preference parameters vary randomly as do theft opportunities, we can thus view the variation in theft opportunities as providing a stochastic error term. Assuming that theft opportunities have the same mean across all individuals, we can thus look for different means in the human capital and time preference parameters for the explanation of why some individuals steal and others do not. As will be discussed further below, this matches the data well; while individuals with lower human capital values (either in standardized test scores or mental health) or with lower family resources (measured by income of family of origin) are statistically significantly more likely to commit theft, the difference is on the order of 10–20 percentage points. Males with particularly high human capital or family resources have at least a 10% probability of committing theft, while males with particularly low values have at worst a 31% probability of committing theft. Finally, one could potentially develop a process by which expected risk of punishment is updated over time, or by which an initial tarnishing (from a criminal record) can be worsened by a second offence. A dynamic model, where the potential to commit a crime can be made in many or even all periods, may make sense for the very small group of individuals who are ‘hard cases’ and commit theft into their thirties or forties. However, theft behaviour is generally very short-lived for the vast majority of thieves (roughly 99% of all theft occurs by the age of 21), and so a more complicated dynamic model is unlikely to yield significantly different results in the case of the NLSY. It would be an interesting extension of the model, to be estimated with a dataset that included more ‘professional’ thieves.5 3.3 Solving the model We begin by focusing on the decision to steal during period 1, as outlined in eqs 2 and 3 above, and then work backward to solve the human capital optimization problem of eq. 1 3.3.1 Criminal opportunities It is advantageous to put ri on one side, as the direct benefit of stealing, and put all the other terms on the other side, simplifying a bit, to measure the cost of stealing. We then get an indicator function c that becomes the optimal policy in response to opportunity ri:   c={1ifri≥p∑t=2Tβit−1((1−xit)Kit−(1−x(τ)it)K(τ)it)+βiδpD+γ(1−xi1)K^i0else (4) As constructed, the optimal choice in response to every criminal opportunity ri is to ‘play’ c; that is, to steal if and only if the return is greater than the future cost. The probability of becoming a thief is decreasing in all variables that drive the right-hand side costs. In the empirical work, we can therefore focus on understanding how differences in K,β and d0,d1 lead to differences in the following three costs: Lost earnings from developing a criminal record, the difference between earnings from ‘pure’ and ‘tarnished’ human capital:   ∑t=2Tβit−1((1−xit)Kit−(1−x(τ)it)K(τ)it); expected future disutility from criminal justice punishment:   βiδpD; and immediate opportunity cost:   γi(1−xi1)Ki1. 3.3.2 Human capital Having resolved the criminal decision problem, we can then move backward to look at optimal human capital investment. This is most easily understood by first looking at the general deterministic case, and the special stochastic case of the first period. The human capital optimization problem is a dynamic optimization problem, and with no uncertainty, the policy function for the optimal xit (see online appendix for derivation) is:   xit=1Kit(d0id1iβit(1−βitT−t)(1−βit))1(1−d1i) (5) for t∈1,2, …,T. As would be expected, optimal human capital investment for agent i in period t is increasing in time preference β and productivity of investment ( d0,d1) and decreasing in T−t, the time remaining in the agent’s career. Perhaps a bit surprising, it is decreasing in Kit. The intuition for this is actually straightforward: in this case human capital is specifically marketable skills, as opposed to general potential. It is a measure of the money agent i can make right now, and so higher values of Kit are direct measures of opportunity cost of investment. The production parameters d0,d1 measure the sensitivity of future growth in marketable skills to the level of investment, and are thus more closely tied to the related idea of general potential. The uncertainty regarding theft adds one minor bit of complexity. All individuals have some probability of committing a theft, and no individual has zero probability of committing a theft. Thus, in period 1, every individual must optimize with a non-zero probability of committing a theft and hence some probability of being tarnished. The solution (see the appendix for derivation) is to make an adjustment for the stochastic component using the expression:   (π(1−τ)d1i+(1−π)) (6) and set xi1 to:   xi1=(π(1−τ)d1i+(1−π))Ki1(d0id1iβi(1−βiT−1)(1−βi))1(1−d1i) (7) where π is the ex ante probability of committing a theft and being caught, that is, the probability that: (i) individual i seizes his theft opportunity multiplied by; (ii) the probability of being caught p. With advance knowledge of the distribution of opportunities the individual can determine the probability of a draw of ri that would induce theft, and hence compute π. As will be discussed further below, it appears that for NLSY 1997 males π takes values between 0.03 and 0.06. Because empirical sources strongly suggest the odds of punishment and the effect of tarnishing are low, we can put a likely bound on the stochastic adjustment expression of eq. 6 as varying from 0.97 to 0.99. Moreover, because variation depends on unobservables, there is no way to unbiased way to estimate it. In the work on estimation, I use a range of probable values to look at effects. 3.4 Wage development under different parameterizations of the model If we focus on the three key parameters that are important in both the human capital component of the model as well as the theft component—initial capital Ki^, discount rate β and human capital production parameters d0 and d1—we can see that variation in these parameters will induce predictable changes in measurable variables such as wages, hours worked and years of education. In this subsection we focus on three questions: How does variation in these parameters lead to variation in wage development? Does variation in these parameters lead to lower wage (opportunity cost) in early adulthood? And finally, does variation lead to a lower present value of future earnings in early adulthood? I work with a baseline case for the parameters that is consistent with other literature and the data. The specific values of the baseline case are:   beta=0.82d0=0.308d1=0.948K^i=15.00 In Figs 1,2 and 3 I show the development of wages within the baseline and with illustrative variants (low initial human capital, low patience, and low human capital growth parameters). (Further discussion of how the parameters were selected can be found in Section 5 and in the online Appendix 5.1.) Fig. 1. View largeDownload slide Baseline and Low Ki^ Notes: Wage development in the human capital model outlined in the text. The baseline parameterization is compared with the low Ki^ parameterization (in the baseline, initial Ki^ is $15; in the low Ki^ parameterization is $14). As can be seen, wages are shifted systematically downward in the low Ki^ version. Fig. 1. View largeDownload slide Baseline and Low Ki^ Notes: Wage development in the human capital model outlined in the text. The baseline parameterization is compared with the low Ki^ parameterization (in the baseline, initial Ki^ is $15; in the low Ki^ parameterization is $14). As can be seen, wages are shifted systematically downward in the low Ki^ version. Fig. 2. View largeDownload slide Baseline and Low βi Notes: Wage development in the human capital model outlined in the text. The baseline parameterization with an annual discount factor of 0.961 is compared with the low β parameterization (where β is 0.959). As can be seen, wages are extremely similar, but start slightly higher in the low β version, dropping relative to the baseline. This is because human capital investment yields a lower discounted value for a low β individual. Fig. 2. View largeDownload slide Baseline and Low βi Notes: Wage development in the human capital model outlined in the text. The baseline parameterization with an annual discount factor of 0.961 is compared with the low β parameterization (where β is 0.959). As can be seen, wages are extremely similar, but start slightly higher in the low β version, dropping relative to the baseline. This is because human capital investment yields a lower discounted value for a low β individual. Fig. 3. View largeDownload slide Baseline and low d0,d1 Notes: Wage development in the human capital model outlined in the text. The baseline parameterization, where d0=0.308 and d1=0.948 is compared with the low d0 and d1 parameterization (where d0=0.28 and d1=0.92). As can be seen, wages start slightly higher in the low d0 and d1 version, dropping relative to the baseline. This is because human capital investment yields a lower return for an individual with lower d0 and d1 values. Although this is far more extreme than any pattern in the data, the discounted value of lost wages is virtually identical for both individuals. Fig. 3. View largeDownload slide Baseline and low d0,d1 Notes: Wage development in the human capital model outlined in the text. The baseline parameterization, where d0=0.308 and d1=0.948 is compared with the low d0 and d1 parameterization (where d0=0.28 and d1=0.92). As can be seen, wages start slightly higher in the low d0 and d1 version, dropping relative to the baseline. This is because human capital investment yields a lower return for an individual with lower d0 and d1 values. Although this is far more extreme than any pattern in the data, the discounted value of lost wages is virtually identical for both individuals. A few things are worth noting about the exercise in general, and about the different parameterizations. First, there is the general pattern of development of wages over the agent’s ‘lifetime’ (i.e. to period T). Figures 1,2 and 3 give a straightforward sense of the overall pattern: investment in early life, which reduces slowly, leading to a plateau by the forties. A second important note is how variation in different parameters leads to different patterns. For instance, low initial human capital Ki^ acts to reduce the overall development of human capital Kt and wages (1−xt)Kt. We see roughly parallel development in wages in the baseline and the low Ki^ cases (Fig. 1). The intuition here is simple: in the case of low Ki^, investment levels are higher, and there is some catchup effect, but it is not worth the loss of present wages to fully catchup to the baseline. In contrast, low discount rates6 (Fig. 2) and low human capital production parameters (Fig. 3) both create a ‘crossing’ pattern, where wages start high relative to the baseline but then develop more slowly and drop below the baseline wages. In the case of low discount rate β or low human capital production parameters d0 and d1, the utility payoff of human capital investment is lower than in the baseline, and so human capital investment is lower over all periods. This has the effect of increasing wages in the early periods but reducing them in later periods. It is thus clear that lower initial human capital could potentially explain lower instantaneous opportunity cost for the commission of crime, but lower patience and lower human capital potential could not. An additional note is that relative minor changes in discount factor β and human capital production parameters d0 and d1 lead to very significant changes in wage development. The next question is the impact of the parameterizations on the present value (in period 0) of lost future earnings from a criminal record. To provide some simple estimate of this, I need a sense of τ, the loss to human capital from ‘tarnishing’. While there is no definite consensus of the impact of conviction on wages or employment, the highest estimate (from Freeman, 1991) are roughly a 30% drop in employment prospects, and others are much more conservative (for example, Grogger, 1995 and Nagin and Waldfogel, 1995). I compute what impact a loss of 10% of human capital after period 1 would have on the present value of earnings. I then compare this impact across the four parameterizations in Table 2. Table 2. Impact of a 10% loss in income under representative parameterizations Model Name  Parameters    Lost Income From Criminal Conviction      Name  Value  Present Value  Difference    Name  Value  of 10% Loss  from Baseline  Baseline  β  0.82  $56,884  NA    d0  0.308        d1  0.948        K0  15      Low Initial Human  β  0.82  $53,092  −$3,791  Capital K^  d0  0.308        d1  0.948        K0  14      Low Discount  β  0.81  $54,350  −$2,534  Rate β  d0  0.308        d1  0.948        K0  15                Low Human Capital  β  0.82  $56,879  −$4  Production Coefficients  d0  0.28        d1  0.92        K0  15      Low K^ and  β  0.81  $50,726  −$6,157  Low β  d0  0.308        d1  0.948        K0  14      Model Name  Parameters    Lost Income From Criminal Conviction      Name  Value  Present Value  Difference    Name  Value  of 10% Loss  from Baseline  Baseline  β  0.82  $56,884  NA    d0  0.308        d1  0.948        K0  15      Low Initial Human  β  0.82  $53,092  −$3,791  Capital K^  d0  0.308        d1  0.948        K0  14      Low Discount  β  0.81  $54,350  −$2,534  Rate β  d0  0.308        d1  0.948        K0  15                Low Human Capital  β  0.82  $56,879  −$4  Production Coefficients  d0  0.28        d1  0.92        K0  15      Low K^ and  β  0.81  $50,726  −$6,157  Low β  d0  0.308        d1  0.948        K0  14      Notes: Representative parameterizations of the model, showing how a 10% loss of income from a conviction differentially impacts those with lower initial human capital, discount rates, or human capital potential. Notice that while low initial human capital and a low discount rate lead to a lower anticipated present value from a conviction, lower ‘potential’ (lower human capital development parameters) does not substantially vary lost future income. The combination of low initial human capital and low discount rate leads to a very substantial reduction in lost future income and is a close match to the wage progression in the data. Note that d0,d1,β are five-year values (the model works in 10 periods, representing a 50-year career length). More detail on these computations is provided in Table C1 in online Appendix C. More background on the source of parameters is given in online Appendix 5.1. The results are striking. Both the low β and Ki^ parameterizations do lead to greater incentives to steal; discounting all fifty years and summing we get a total difference in lost wages ranging between $3,000 to $7,000. Interestingly, lower human capital production capacity (low values of d0,d1), even though it leads to substantially lower earnings at the end of life (period 5–10, or ages 40–65), has effectively no impact whatsoever on the discounted present value of the individual’s income. Using a very extreme and unrealistic difference in parameterizations I get at most a total difference in the discounted value lost lifetime earnings of $4. Since differences in d0,d1 cannot help to explain any differences in incentives to theft, I spend little time on this possibility in the discussion that follows. 4. Data In this section I begin with a general discussion of the data, including summary statistics comparing thieves and non-thieves (honest individuals). The NLSY 1997 cohort tracks 8,984 individuals from 1997 onwards. The majority of respondents, roughly 3/4, come from a simple random sample of the US’s youth population in 1997. The remaining one quarter of respondents come from an over-sampling of ethnic minority populations, specifically black and non-black Hispanics. The respondents were equally ranged from the age of 12 to 16 in the initial round in 1997. The rate of attrition across years is quite low both for thieves and non-thieves. In addition to a wide range of questions about work, earnings, education, assets, beliefs, health, family and other issues, every round of the NLSY 1997 includes a self-administered questionnaire that asks respondents about potentially compromising issues such as criminal behaviour. Of the respondents, 1202 (approximately 12.5%) admitted to stealing items worth more than $50 at any point. Thieves are defined as individuals who reported stealing at least one item worth more than $50 by 2003. In Table 3 I compare some basic attributes of male thieves and non-thieves in the data. There is some evidence that thieves have lower potential human capital—they come from slightly poorer homes, on average, and have lower evaluated test scores and mental health. Household income is reported household of origin income in 1997. Mental health scores are from an evaluation performed in 2000 as part of the NLSY panel, with 20 being the best possible evaluation, and 0 the lowest, or least healthy. Table 3. Summary statistics for males   All  Non-Thieves  Thieves      (Honest)    Age in 1997  14.83  14.84  14.86  (St. Dev)  (1.45)  (1.45)  (1.42)  Household of origin income (N=3,316)  47,647  48,672  43,046  (St. Dev)  (42,441)  (42,723)  (40,871)  White  52.5%  53.2%  49.2%  Black  25.4%  25.1%  27.1%  Hispanic  21.2%  20.8%  23.1%  ASVAB Scores (Math and Verbal, N=3,575)  44,518  46,158  36,793  (St. Dev)  (29,656)  (29,799)  (27,716)  Mental Health (20 best, 0 worst, N=4,085)  15.80  15.92  15.23  (St. Dev)  (2.47)  (2.38)  (2.78)  Estimated Years in School to 20th Birthday (N=3,386)    12.7  11.9  Hours Working to 20th Birthday (N=3,386)    3,342.1  3,615.9    All  Non-Thieves  Thieves      (Honest)    Age in 1997  14.83  14.84  14.86  (St. Dev)  (1.45)  (1.45)  (1.42)  Household of origin income (N=3,316)  47,647  48,672  43,046  (St. Dev)  (42,441)  (42,723)  (40,871)  White  52.5%  53.2%  49.2%  Black  25.4%  25.1%  27.1%  Hispanic  21.2%  20.8%  23.1%  ASVAB Scores (Math and Verbal, N=3,575)  44,518  46,158  36,793  (St. Dev)  (29,656)  (29,799)  (27,716)  Mental Health (20 best, 0 worst, N=4,085)  15.80  15.92  15.23  (St. Dev)  (2.47)  (2.38)  (2.78)  Estimated Years in School to 20th Birthday (N=3,386)    12.7  11.9  Hours Working to 20th Birthday (N=3,386)    3,342.1  3,615.9  Notes: A comparison of basic data between male thieves and non-thieves. When not otherwise given, N=4,599. While only very minor age or ethnicity differences emerge in the data, there are significant differences on measure directly related to human capital. Particularly, thieves are more likely come from a family of origin with lower income and have lower test scores and evaluated mental health. To compare time working with time in education up to the respondents twentieth birthday, an estimate is made of how much time they spent in school (using highest grade up through a second year of college, plus any repeated grades) combined with the NLSY’s numbers for total hours working as a teenager. Source: NLSY 1997, author’s computations. Table 4. Percentage of males committing theft, by quartiles   Quartile    First  Second  Third  Fourth  Household of origin income (N=3,316)  21.6%  20.3%  16.6%  14.4%  ASVAB Scores (Math and Verbal, N=3,575)  24.2%  17.9%  16.8%  11.2%  Mental Health (20 best, 0 worst, N=4,085)  27.9%  17.5%  16.6%  15.1%  Individuals in the same quartile (N=220)  31.0%  14.6%  14.0%  10.9%  for all three measures            Quartile    First  Second  Third  Fourth  Household of origin income (N=3,316)  21.6%  20.3%  16.6%  14.4%  ASVAB Scores (Math and Verbal, N=3,575)  24.2%  17.9%  16.8%  11.2%  Mental Health (20 best, 0 worst, N=4,085)  27.9%  17.5%  16.6%  15.1%  Individuals in the same quartile (N=220)  31.0%  14.6%  14.0%  10.9%  for all three measures          Notes: The table uses three measures that show correlation with theft behaviour, and looks at the percent of male respondents in each quartile who report theft. As can be seen, while the odds of theft fall as you move to higher quartiles, the difference in rates of theft from the most likely to the least likely groups is roughly 20% (from 31.0% to 10.9%). This supports the modelling assumptions of a strong stochastic element in theft decision-making, and estimation results that support a small difference in ex ante risk of criminal behaviour and a criminal record. Source: Author’s computations, NLSY 1997, waves 1997–2011. The differences are statistically significant. The difference that are most striking are on Armed Services Vocational Aptitude Battery (ASVAB) combined test scores and mental health, where the differences in mean are 28% and 24% of the respective standard deviations. These differences in test scores and mental health would naively expect us to see some differences in human capital. Theft could be associated with lower Ki^ (initial capital) values, but also with lower βi (patience). (It is also true that they are consistent with differences in d0i,d1i, but as can be seen in the fourth set of parameters in Table 2 these cannot help to explain greater incentive to steal, so are ignored.) At the bottom of Table 3 I provide a comparison of hours in school and hours working for males, up to the age of 20. As can be seen, thieves spend slightly less time in school, about 0.85 less of a school year (perhaps seven to eight months). Additionally, they have worked about 300 hours more, or nearly seven weeks. In theory, this might represent approximately five to six more months of idleness among thieves (because it is likely that some education and work are concurrent, this figure is approximate). In Table 4 I compute probability of theft for males as grouped by several observable traits that are solid proxies for either human capital or family resources: ASVAB scores, mental health and household income for family of origin. As can be seen, 10% of males in the quartiles associated with high human capital levels still commit theft, while no more than 31% of males in the quartiles associated with low human capital commit theft. We can see that, as discussed in the review of the model, above, while mean human capital measures vary between thieves and non-thieves, there is a significant stochastic component to the decision to steal. In Fig. 4 I show the development of hourly wages for thieves and non-thieves to 2011. The means are very similar to the early twenties, after which thieves begin to fall below non-thieves. There is no evidence that this similarity is due to access to opportunities, attrition, non-reporting or selection bias in the NLSY; attrition rates are about the same for both thieves and non-thieves. Imprisonment appears to have an effect after the ages of 21 to 23; if we exclude men who have been imprisoned, the lines stay together to the late twenties. Labour market participation data, as measured by the probability of reporting income from wages or salaries, presents an identical picture, as seen in Fig. 5. Fig. 4. View largeDownload slide Wage development by theft behaviour Notes: Actual development of hourly compensation in the NLSY 1997, in 2009 dollars, for thieves and non-thieves. Notice that: (i) wages are extremely similar across all years, making any opportunity cost very difficult to substantiate; and (ii) wages are slightly higher for thieves in the teen years but then become lower in the late twenties, very much in keeping with differences in discount rates. Comparison is among males in the simple random sample. Source: NLSY 1997, males in simple random sample only, waves 1997 to 2011. Fig. 4. View largeDownload slide Wage development by theft behaviour Notes: Actual development of hourly compensation in the NLSY 1997, in 2009 dollars, for thieves and non-thieves. Notice that: (i) wages are extremely similar across all years, making any opportunity cost very difficult to substantiate; and (ii) wages are slightly higher for thieves in the teen years but then become lower in the late twenties, very much in keeping with differences in discount rates. Comparison is among males in the simple random sample. Source: NLSY 1997, males in simple random sample only, waves 1997 to 2011. Fig. 5. View largeDownload slide Employment development by theft behaviour Notes: Development of labour market engagement in the NLSY 1997, for thieves and non-thieves. Notice that likelihood of employment is extremely similar across to age 21–23 (after which point virtually no respondents report theft) supporting the pattern in the previous figure. Comparison is among males in the simple random sample. Fig. 5. View largeDownload slide Employment development by theft behaviour Notes: Development of labour market engagement in the NLSY 1997, for thieves and non-thieves. Notice that likelihood of employment is extremely similar across to age 21–23 (after which point virtually no respondents report theft) supporting the pattern in the previous figure. Comparison is among males in the simple random sample. Fig. 6. View largeDownload slide Wage development by theft behaviour Note: Development of wages for thieves and non-thieves to age 30, using the estimated GMM parameters from Table 5 (second specification) with the model of the text. Fig. 6. View largeDownload slide Wage development by theft behaviour Note: Development of wages for thieves and non-thieves to age 30, using the estimated GMM parameters from Table 5 (second specification) with the model of the text. 5. GMM analysis 5.1 Overview of methodology Having used the initial analysis of the model to focus on heterogeneity of: (i) time preference; and (ii) initial human capital, I now use General Method of Moments to estimate parameters for the model for thieves and non-thieves from the NLSY data. Online Appendix 5.1 provides a fuller discussion of the entire approach; in this section I focus on the basic method and overall results. It should be emphasized that the focus of this exercise is looking at likely and consistent differences between those who commit theft and those who do not. Because of the nonlinearity of the model and the need to fix parameters, some standard errors are quite low and significance quite high. I would urge readers to focus on the patter of differences in key parameters between thieves and non-thieves over the many estimations. Following Heckman et al. (2002), I group the NLSY wage data in five-year periods—Period 1, ages 17–21; period 2, ages 22–26; and period 3, ages 27–31. Within each period, I average observations of real hourly compensation and annual income for each individual, ignoring missing values. Dollar values are adjusted for inflation to 2009 dollars using the implicit price deflator for personal consumption7. Based on the 10-period model, we are able to observe some or all of three periods for most NLSY respondents. This is sufficient to get estimates of model parameters, and precise estimates of the differences between thieves and non-thieves. The core GMM model is thus three equations:   0=(W1i−K^+(d0d1β(1−β9)(1−β))1/1−d1)0=(W2i−d0((d0d1β(1−β9)(1−β))1/1−d1)d1−K^+(d0d1β(1−β8)(1−β))1/1−d1)0=(W3i−d0((d0d1β(1−β8)(1−β))1/1−d1)d1+d0((d0d1β(1−β9)(1−β))1/1−d1)d1+K^−(d0d1β(1−β7)(1−β))1/1−d1), where W1,W2,W3 are a measure of wages or earnings for ages 17–21, 22–26 and 27–31, respectively. In the analysis that follows I use hourly compensation and annual income, getting very similar results for both. As discussed in online Appendix 5.1, I have experimented with running the regressions with various parameters set, focusing on approaches where β is set. As is standard in the literature (see for example Heckman et al., 1998, page 27) I begin the estimation process with some parameters derived from other sources, in particular the value of discount rate parameter β. As a robustness check I rerun the estimation with a range of values. Results are stable within a range of values that is consistent with the literature. In particular, relative differences between thieves and non-thieves are extremely stable.8 Further details can be seen in the online Appendix. Having found a set of values that are both consistent with the data and the literature, it now becomes possible to estimate the difference in K0 and β between non-thieves (honest individuals) and thieves. I now run identical equations for thieves and non-thieves, which creates six equations (three each for both categories). By using a range of reasonable values for d1 and then d0,d1 I was able to get estimates of K^h,K^t,βh,βt that showed a stable and robust differences between thieves and non-thieves. For a fuller discussion of the process and additional regression output, see online Appendix 5.1. As can be seen in Table 5, the regressions estimate the difference in initial human capital for thieves as about $0.60–0.67/hour, and the difference in discount rate at 0.3%. I rerun all these regressions using annual income data. The results from these are very similar to the hourly wage regressions, and can be seen in Table 6. The difference in initial human capital is estimated at $2,200 to $2,450, with a difference in discount rate of 0.4%. Table 5. Establishing βh, βt, K^h, K^t from d1 and d0 for hourly wages GMM Regression        Hourly Wages, Males Only  Parameters Set: d1, d0  Parameters Estimated: βh, βt, K^h, K^t  Parameters      Values    of d1  0.940  0.948  0.954  of d0  0.286  0.308  0.328    βh  0.838***  0.817***  0.799***    (0.000)  (0.000)  (0.000)  βt  0.835***  0.814***  0.796***    (0.001)  (0.001)  (0.001)  K^h  15.395***  15.227***  15.088***    (0.079)  (0.077)  (0.075)  K^t  14.727***  14.599***  14.493***    (0.168)  (0.162)  (0.157)  Criterion Q(b)  0.000  0.000  0.000  No. of Parameters  4  4  4  No. of Moments  6  6  6  No. of observations  3833  3833  3833  Hansen’s J Test  .0013  .0003  .0001  J Test d.f.  2  2  2  GMM Regression        Hourly Wages, Males Only  Parameters Set: d1, d0  Parameters Estimated: βh, βt, K^h, K^t  Parameters      Values    of d1  0.940  0.948  0.954  of d0  0.286  0.308  0.328    βh  0.838***  0.817***  0.799***    (0.000)  (0.000)  (0.000)  βt  0.835***  0.814***  0.796***    (0.001)  (0.001)  (0.001)  K^h  15.395***  15.227***  15.088***    (0.079)  (0.077)  (0.075)  K^t  14.727***  14.599***  14.493***    (0.168)  (0.162)  (0.157)  Criterion Q(b)  0.000  0.000  0.000  No. of Parameters  4  4  4  No. of Moments  6  6  6  No. of observations  3833  3833  3833  Hansen’s J Test  .0013  .0003  .0001  J Test d.f.  2  2  2  Notes: * p<0.05, ** p<0.01, *** p<0.001 robust standard errors in parentheses d0,d1 represent development parameters, β represents discount rate, K^ represents initial capital h,t distinguish honest and thief types Source: GMM regression, using NLSY 1997, 1997 to 2011 waves. Section 5 discusses methodology, with additional details in Appendix B. Table 6. Establishing βh, βt, K^h, K^t from d1 and d0 for annual income GMM Regression        Annual Income, Males Only  Parameters Set: d1, d0  Parameters Estimated: βh, βt, K^h, K^t      Values    of d1  0.942  0.933  0.924  of d0  0.550  0.553  0.561    βh  0.7962***  0.8171***  0.8358***    (0.0001)  (0.0002)  (0.0002)  βt  0.7930***  0.8134***  0.8317***    (0.0005)  (0.0006)  (0.0006)  K^h  24,795***  25,407***  25,962***    (162)  (167)  (172)  K^t  22,586***  23,069***  23,508***    (350)  (366)  (380)  Criterion Q(b)  0.001  0.001  0.001  No. of Parameters  4  4  4  No. of Moments  6  6  6  No. of observations  3811  3811  3811  Hansen’s J Test  2.48  2.52  2.56  J Test d.f.  2  2  2  GMM Regression        Annual Income, Males Only  Parameters Set: d1, d0  Parameters Estimated: βh, βt, K^h, K^t      Values    of d1  0.942  0.933  0.924  of d0  0.550  0.553  0.561    βh  0.7962***  0.8171***  0.8358***    (0.0001)  (0.0002)  (0.0002)  βt  0.7930***  0.8134***  0.8317***    (0.0005)  (0.0006)  (0.0006)  K^h  24,795***  25,407***  25,962***    (162)  (167)  (172)  K^t  22,586***  23,069***  23,508***    (350)  (366)  (380)  Criterion Q(b)  0.001  0.001  0.001  No. of Parameters  4  4  4  No. of Moments  6  6  6  No. of observations  3811  3811  3811  Hansen’s J Test  2.48  2.52  2.56  J Test d.f.  2  2  2  Notes: * p<0.05, ** p<0.01, *** p<0.001 robust standard errors in parentheses d0,d1 represent development parameters β represents discount rate K^ represents initial capital h,t distinguish honest and thief types Source: GMM regression, using NLSY 1997, 1997 to 2011 waves. Section 5 discusses methodology, with additional details in Appendix B. What would this difference mean in the real world? The difference in human capital when analysed as hourly wage is about 5%. Using annual earning’s data, the difference is greater, about 10% at the beginning. If there was no discount rate difference, these differences would stay fixed. However, because of lower investment in earlier years, the model predicts that the differences will accelerate over time, so that thieves will be earning about 10% less per hour, and as much as 13% less in annual earnings, for the second half of their career. The final stage is to estimate the wage equations with the period 1 correction for possible future punishment and tarnishing. As a reminder, all individuals have the possibility of committing crime and ex ante all should be ready to commit crime; we are looking for ex post differences between those who did and those who did not. The correction done in period 1 is fairly simple, and requires individuals to take into account the odds that: (i) an attractive theft opportunity will come; (ii) they will take it; (iii) they will be caught and punished; and (iv) their human capital tarnished with damage τ. As discussed above (more details can be seen in Appendices A and 5.1), the adjustment term is (π(1−τ)d1i+(1−π)), where π is the product of the ex ante odds of an attractive opportunity multiplied by the odds of punishment and tarnishing, conditional on committing the crime, and τ is of course tarnishing. Since roughly 20% of thieves report criminal convictions for theft, and the variation in probability of stealing by observables goes from about 10% to about 30%, I focus on values for π that go from 0.2×0.10=0.02 to 0.2×0.30=0.06. The value for τ shown in table 7 is 0.10. In Appendix I show additional work for values of τ at 0.05 and 0.30. Table 7. Evaluating model with period 1 investment adjustment GMM Regression  Parameters Set: d1, d0, τ, πh, πt  Parameters Estimated: βh, βt, K^h, K^t    Hourly Wages    Annual Income      d1=0.948,d0=0.308    d1=0.933,d0=0.553      τ=0.1  τ=0.1  πh=  0.01  0.02  0.01  0.02  πt=  0.10  0.06  0.10  0.06    βh  0.817***  0.817***  0.817***  0.818***    (0.000)  (0.000)  (0.000)  (0.000)            βt  0.817***  0.815***  0.817***  0.815***    (0.001)  (0.001)  (0.001)  (0.001)  K^h  15.251***  15.270***  25,496.172***  25,560.540***    (0.077)  (0.077)  (168.389)  (169.032)  K^t  14.730***  14.679***  23,616.785***  23,397.353***    (0.169)  (0.166)  (387.083)  (377.894)  Criterion Q(b)  0.001  0.000  0.004  0.002  No. of Parameters  4  4  4  4  No. of Moments  6  6  6  6  No. of observations  3833  3833  3811  3811  Hansen’s J Test  2.413769  .994485  14.75184  7.61716  J Test d.f.  2  2  2  2  GMM Regression  Parameters Set: d1, d0, τ, πh, πt  Parameters Estimated: βh, βt, K^h, K^t    Hourly Wages    Annual Income      d1=0.948,d0=0.308    d1=0.933,d0=0.553      τ=0.1  τ=0.1  πh=  0.01  0.02  0.01  0.02  πt=  0.10  0.06  0.10  0.06    βh  0.817***  0.817***  0.817***  0.818***    (0.000)  (0.000)  (0.000)  (0.000)            βt  0.817***  0.815***  0.817***  0.815***    (0.001)  (0.001)  (0.001)  (0.001)  K^h  15.251***  15.270***  25,496.172***  25,560.540***    (0.077)  (0.077)  (168.389)  (169.032)  K^t  14.730***  14.679***  23,616.785***  23,397.353***    (0.169)  (0.166)  (387.083)  (377.894)  Criterion Q(b)  0.001  0.000  0.004  0.002  No. of Parameters  4  4  4  4  No. of Moments  6  6  6  6  No. of observations  3833  3833  3811  3811  Hansen’s J Test  2.413769  .994485  14.75184  7.61716  J Test d.f.  2  2  2  2  Notes: * p<0.05, ** p<0.01, *** p<0.001 robust standard errors in parentheses d0,d1 represent development parameters τ represents loss of human capital from criminal record π represents ex ante expectation of developing criminal record β represents discount rate K^ represents initial capital h,t distinguish honest and thief types Source: GMM regression, using NLSY 1997, 1997 to 2011 waves. Section 5 discusses methodology, with additional details in Appendix B. Reviewing the values in Table 7 we can see that the period 1 adjustment only changes values very little—all the parameter estimates are virtually identical to those shown in Tables 5 and 6. Across all the estimation exercises, the patterns remain very stable across a range of parameters value and modelling choices. In addition to the GMM results detailed above, I have also run a range of multiple regressions comparing: (i) number of grades repeated; and (ii) number of employers for thieves and non-thieves. Across a range of specifications, using a variety of controls, and limiting the population to individuals with no criminal justice penalties, theft is positively and significantly associated with both repeating grades and number of employers. Thus, we can see a strong pattern of underinvestment in schooling and at the same time active engagement in the labour market. As Figs 4 and 5 show, between the ages of 21 and 23, thieves appear to have at least as much access to work as non-thieves. 5.2 Analysis: heterogeneity leading to theft In eq. 4, theft is predicted if the return is greater than the sum of: (i) the expected lost future earnings (from the tarnishing of human capital if caught); (ii) the expected disutility from punishment; and (iii) the opportunity cost of lost time. The preceding analysis points strongly towards certain meaningful differences between thieves and non-thieves. In particular, differences in lost future earnings and expected disutility from punishment can help to explain differences in behaviour, while there is little sign of differences in the opportunity cost of lost time. As shown in Table 2, if being caught and sentenced for theft is likely to reduce future earnings by 10%, then the individuals who have reported theft would on average expect to lose about $6,500 in net present value from such a sentence. In the NLSY 1997 data, about 20% of those reporting thefts also report being charged with a crime (about 15% report any conviction). Thus, the difference in expected cost of punishment between thieves and non-thieves might be as much as 0.2×$6,500=$1,300 on average9. Since the reported thefts are for items valued at $50 or more, this suggests that thieves’ decisions can be justified on a rational basis. Additionally, the subjective difference in discount rate also leads to a difference in the discounted disutility of punishment, although not as great—since the difference on a monthly basis is about one one-hundredth of a percent, disutility from punishment would have to be on the order of $100,000 or more for it to make so much as $1.00 difference. One plausible source of heterogeneity is not supported in this analysis or in the NLSY 1997 data: differences in opportunity cost, which would make theft reasonable as a substitute for labour, do not show up. The hourly wages and annual incomes are effectively the same for both groups. 6. Conclusion I have developed and solved a model that links human capital investment and theft behaviour. I focus on the three sets of human capital parameters as potential sources of variation in theft behaviour: initial human capital Ki^, patience βi, and human capital production potential d0,d1. At first blush, all are plausible candidates for particularly low perceived costs to theft. Very low initial human capital, within the model, could significantly reduce the opportunity cost of theft and the perceived future lost wages from developing a criminal record. Similarly, very low discount rates (high impatience) could reduce the present discounted value of future punishment and future lost wages. Low human capital production potential would mean lower earnings further on in the individual’s career, and might reduce the expected cost of theft. The last of these is the most easily disposed of: low human capital production potential (low d0,d1), while a reasonable match to the differential wage development of thieves and non-thieves, does not actually lead to any difference in the present value of lost future wages. Low potential does not appear to lower any of the other costs of theft—it does not affect disutility of criminal justice sanctions or the immediate opportunity cost of crime. These parameters can thus be rejected as a potential explanation within the rational agent framework. This is a completely novel result within the literature. Low initial human capital appears to play a role, but only with regard to net present value of future earnings. One obvious explanation of theft, lower opportunity cost, cannot be supported by the data. Hourly and annual earnings for thieves are about the same if not higher than for non-thieves, a pattern that can be seen in other data as well (see for example Holzman, 1982, Table 4; Paternoster et al., 2003; Brame et al., 2004; Apel et al., 2008). Lower discount rate also appears to play a role. Estimated discount rates for thieves are slightly lower than for non-thieves; combined with the lower initial human capital, this leads to significant differences in the discounted value of lost future earnings. More generally, lower discount rates may be linked to lack of self-control and a lower fear of punishment. Combined, the decision to steal appears to have some rational basis: If we estimate that a criminal justice record reduces future earnings by 10%, and that thieves have a 20% risk of punishment (as appears to be the case in the NLSY data) individuals reporting having committed a theft face an expected loss from theft that is $1,300 less in discounted lost earnings than the individuals not reporting a theft. Thus, the decision to steal can be matched to a rational analysis of utility maximization. Importantly, and in a substantial contribution to previous research, the model and empirical work together establish that the opportunity cost of time during the period of theft cannot help explain the decision to steal, at least within the NLSY 1997 data. The entire difference between thieves and non-thieves comes from expected future earnings. This novel result makes it extremely difficult to sustain arguments that view property crime as a substitute for legitimate labour10. By allowing heterogeneity in both initial human capital and in discount rates, this exercise makes it possible to relate both objective and subjective aspects of individual circumstances to clearly measurable differences in career path. Finally, combined with Lochner (2004) these results have the potential to inform policy at two levels. With regard to theft and crime generally, the results reinforce much work of the last few decades, that interventions in early stage human capital have the most promise Heckman (2006). With the focus on a single type of crime the results help us to understand with much greater specificity how human capital deficiencies effect behaviour. At a much broader level, the changing relative earnings of thieves and non-thieves show that differences in human capital do not have simple, predictable effects on earnings. Supplementary Material Supplementary material (the Appendix) is available online at the OUP website. Footnotes 1 The literature on human capital goes back to Friedman and Kuznets (1954), and Becker (1962) seems to have been the critical article to put all the pieces in place (with additional work by Mincer (1974) and Ben-Porath (1967)). The most active work in modelling and estimating human capital development has been by James Heckman and his students (Cunha et al., 2006, 2010; Cunha and Heckman, 2007). 2 A more exact but less intuitive description is that it is like selling a call option. 3 Another modelling approach would be to assume that upon reaching adulthood all individuals become drastically more concerned about reputation, and avoid all theft. An even simpler approach would be to simply focus on the human capital model and related estimation as an exercise in understanding the differences between thieves and non-thieves. Both of these approaches yield identical results as the approach used in the main text. An interesting extension of this work might be to extend the criminal opportunities to multiple periods and then estimate it using a data set with numerous individuals stealing into their twenties and thirties. Given that such individuals are all but non-existent in this dataset, a multi-period theft model applied to the NLSY 1997 would be highly unlikely to yield different results. 4 The parameter β is referred to as the discount factor or measure of time preference and takes values on the interval [0,1], with lower values showing greater impatience (less patience). The use of the parameter is common to human capital models (Lochner 2004); further information can be seen in Frederick et al. (2002). 5 Obviously, the modelling and estimation here would be biased with regard to understanding the ‘professional’ thieves who remain active over much of adulthood. Since they are a minor part of the population, the estimates in this paper should be unbiased for understanding most thieves, but not for all thieves. 6 That is to say, lower patience levels. 7 Specifically, this is annual, not seasonally adjusted, BEA series DPCERD3A086NBEA. 8 Because of the strong assumptions of the model, and the fact that β and d0,d1 cannot be estimated in a single run, standard errors are small and the significance levels for individual parameters are high. 9 Reducing the damage to future earnings and/or the odds of punishment reduces the difference, but even a 5% risk of punishment and 5% damage to earnings yields a difference of $162.50, enough to rationally justify the pattern. 10 Other crimes, particularly drug dealing, are different in important ways, and labour substitution may play a significant role with them. Acknowledgements I would like to thank my dissertation committee Tomas Sjöström, Anne Morrison Piehl, Roger Klein and Lance Lochner, as well as Richard McLean, Francis Teal and several anonymous referees for helpful comments. References Angrist J., Lavy V., Schlosser A. ( 2010) Multiple experiments for the causal link between the quantity and quality of children, Journal of Labor Economics , 28, 773– 824. Google Scholar CrossRef Search ADS   Apel R., Bushway S.D., Paternoster R., Brame R., Sweeten G. ( 2008) Using state child labor laws to identify the causal effect of youth employment on deviant behavior and academic achievement, Journal of Quantitative Criminology , 24, 337– 62. Google Scholar CrossRef Search ADS PubMed  Becker G.S. ( 1962) Investment in human capital: a theoretical analysis, Journal of Political Economy , 70, 9– 49. Google Scholar CrossRef Search ADS   Becker G.S. ( 1968) Crime and punishment: an economic approach, Journal of Political Economy , 76, 169– 217. Google Scholar CrossRef Search ADS   Ben-Porath Y. ( 1967) The production of human capital and the life cycle of earnings, The Journal of Political Economy , 75, 352– 65. Google Scholar CrossRef Search ADS   Brame R., Bushway S.D., Paternoster R., Apel R. ( 2004) Assessing the effect of adolescent employment on involvement in criminal activity, Journal of Contemporary Criminal Justice , 20, 236– 56. Google Scholar CrossRef Search ADS   Catalano S. ( 2010) Victimization during burglary, Technical Report NCJ 227379 , US Department of Justice Office of Justice Programs Bureau of Justice Statistics, Washington, DC. Cunha F., Heckman J. ( 2007) The technology of skill formation, The American Economic Review , 97, 31– 47. Google Scholar CrossRef Search ADS   Cunha F., Heckman J.J., Lochner L., Masterov D.V. ( 2006) Interpreting the Evidence on Life Cycle Skill Formation , Vol. 1, Elsevier, Oxford. Cunha F., Heckman J.J., Schennach S.M. ( 2010) Estimating the technology of cognitive and noncognitive skill formation, Econometrica , 78, 883– 931. Google Scholar CrossRef Search ADS PubMed  Doyle M. ( 2012) 24th Annual Retail Theft Survey , Jack L. Hayes International, Inc., 27520 Water Ash Drive, Wesley Chapel, FL 33544. Ehrlich I. ( 1973) Participation in illegitimate activities: a theoretical and empirical investigation, Journal of Political Economy , 81, 521– 65. Google Scholar CrossRef Search ADS   Frederick S., Loewenstein G., O’Donoghue T. ( 2002) Time discounting and time preference: a critical review, Journal of Economic Literature , 40, 351– 401. Google Scholar CrossRef Search ADS   Freeman R.B. ( 1991) Income from independent professional practice, NBER Working Paper No. 3875, Cambridge, MA. Freeman R.B. ( 1999) The Economics of Crime , Vol. 5, Elsevier, Oxford. Friedman M., Kuznets S. ( 1954) Income from independent professional practice, NBER, Cambridge, MA. Available at: http://papers.nber.org/books/frie54–1 (accessed 18 September 2017). Gottfredson M.R., Hirschi T. ( 1990) A General Theory of Crime , Stanford University Press, Stanford, CA. Gould E.D., Weinberg B.A., Mustard D.B. ( 2002) Crime rates and local labor market opportunities in the United States: 1979–1997, Review of Economics and Statistics , 84, 45– 61. Google Scholar CrossRef Search ADS   Grogger J. ( 1995) The effect of arrests on the employment and earnings of young men, The Quarterly Journal of Economics , 110, 51– 71. Google Scholar CrossRef Search ADS   Grogger J. ( 1998) Market wages and youth crime, Journal of Labor Economics , 16, 756– 91. Google Scholar CrossRef Search ADS   Heckman J.J. ( 2006) Skill formation and the economics of investing in disadvantaged children, Science (New York, N.Y.)  312, 1900– 1902. doi: 10.1126/science.1128898. Google Scholar CrossRef Search ADS   Heckman J., Lochner L., Cossa R. ( 2002) Learning-by-doing vs. on-the-job training: Using variation induced by the EITC to distinguish between models of skill formation, NBER Working Paper No. 9083, Cambridge, MA. Heckman J.J., Lochner L., Taber C. ( 1998) Explaining rising wage inequality: explorations with a dynamic general equilibrium model of labor earnings with heterogeneous agents, Review of Economic Dynamics , 1, 1– 58. Google Scholar CrossRef Search ADS   Holzman H.R. ( 1982) The serious habitual property offender as ‘moonlighter’: an empirical study of labor force participation among robbers and burglars, The Journal of Criminal Law and Criminology , 73, 1774– 92. Google Scholar CrossRef Search ADS   Lee D., McCrary J. ( 2005) Crime, punishment and myopia, NBER Working Paper No. 11491, Cambridge, MA. Levitt S.D. ( 1997) Juvenile crime and punishment, Journal of Political Economy , 106, 1156– 85. Google Scholar CrossRef Search ADS   Levitt S.D., Lochner L. ( 2001) The determinants of juvenile crime, in Gruber J. (ed.) Risky Behavior Among Youths: An Economic Analysis , NBER, University of Chicago Press, 327– 73. Levitt S.D., Venkatesh S.A. ( 2000) An economic analysis of a drug-selling gang’s finances, Quarterly Journal of Economics , 115, 755– 89. Google Scholar CrossRef Search ADS   Lin M.-J. ( 2008) Does unemployment increase crime? Evidence from US data 1974–2000, Journal of Human Resources , 43, 413– 436. Google Scholar CrossRef Search ADS   Lochner L. ( 2004) Education, work, and crime: a human capital approach, International Economic Review , 45, 811– 43. Google Scholar CrossRef Search ADS   Merlo A., Wolpin K.I. ( 2009) The transition from school to jail: youth crime and high school completion among black males, Penn Institute for Economic Research Working Paper No. 09–002. Mincer J. ( 1974) Schooling, Experience, and Earnings , Columbia University Press, New York. Nagin D., Waldfogel J. ( 1995) The effects of criminality and conviction on the labor market status of young British offenders, International Review of Law and Economics , 15, 109– 26. Google Scholar CrossRef Search ADS   Paternoster R., Bushway S., Apel R., Brame R. ( 2003) The effect of teenage employment on delinquency and problem behaviors, Social Forces , 82, 297– 335. Google Scholar CrossRef Search ADS   Piehl A.M. ( 1998) Economic Conditions, Work, and Crime. Handbook on Crime and Punishment , Oxford University Press, Oxford. West D.J., Farrington D.P. ( 1977) The Deliquent Way of Life , Heinemann, Oxford. Williams G.F. ( 2015) Property crime: investigating career patterns and earnings, Journal of Economic Behavior & Organization , 119, 124– 38. Google Scholar CrossRef Search ADS   Wilson J.Q., Abrahamse A. ( 1992) Does crime pay?, Justice Quarterly , 9, 359– 77. Google Scholar CrossRef Search ADS   © Oxford University Press 2017 All rights reserved This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

Oxford Economic PapersOxford University Press

Published: Apr 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off