The Review of Economic Studies, Volume 85 (4) – Oct 1, 2018

/lp/ou_press/structural-estimation-of-a-becker-ehrlich-equilibrium-model-of-crime-dl4Yz4HESq

- Publisher
- Oxford University Press
- Copyright
- © The Author(s) 2017. Published by Oxford University Press on behalf of The Review of Economic Studies Limited.
- ISSN
- 0034-6527
- eISSN
- 1467-937X
- D.O.I.
- 10.1093/restud/rdx068
- Publisher site
- See Article on Publisher Site

Abstract We develop a model of crime in which the number of police, the crime rate, the arrest rate, the employment rate, and the wage rate are joint outcomes of a subgame perfect Nash equilibrium. The local government chooses the size of its police force and citizens choose among work, home, and crime alternatives. We estimate the model using metropolitan statistical area (MSA)-level data. We use the estimated model to examine the effects on crime of targeted federal transfers to local governments to increase police. We find that knowledge about unobserved MSA-specific attributes is critical for the optimal allocation of police across MSA’s. 1. Introduction The modern literature on the economics of crime, originating with Becker (1968), recognized that the crime rate is the equilibrium outcome of the joint determination of the supply of crime, resulting from the uncoordinated decisions of citizens about the supply of their labour to legal and/or illegal income-generating activities, and the demand for crime, determined by a government policymaker who decides on the level of resources to commit to preventing crime.1 Much of the empirical literature, beginning with Ehrlich (1973), has used aggregate data based on either cross-sectional or time-series variation.2 Coupled with the supply/demand theoretical framework, the use of aggregate data led naturally to the adoption of a simultaneous equations econometric structure.3 The estimating equations in that system consisted of a supply of offences function representing the decision rule of potential criminals, an apprehension production function and the policymaker’s decision rule governing the level of resources devoted to apprehension (and punishment). The econometric structure was meant to approximate the solution to the equilibrium model. As such, its parameters are combinations of those of the underlying behavioural structure, that is, of the preference function of potential offenders (the citizenry), the apprehension production function, the objective function of the policymaker and the distribution of the unobservables that enter those functions. An alternative approach, pursued in this article, is to estimate the behavioural structure by specifying and solving a parametric model of agent decision-making. This approach relaxes two potentially important aspects of the approximation in the traditional approach. First, it explicitly aggregates the individual decisions about whether to engage in illegal activities over the citizen population, which naturally leads to nonlinearities and location-specific effects. The existence of location-specific effects, as derived from the model, implies that allowing for random coefficients would be a more appropriate approximation in the traditional regression-based approach. This observation is particularly important in the context of instrumental variables (IVs) (see Angrist and Imbens, 1994; Heckman and Vytlacil, 1998) as is a common procedure used in estimating the deterrent effect of apprehension with aggregate data. Secondly, this approach accounts for the existence of multiple equilibria that is inherent in the structure. Multiple equilibria arise because the probability that a criminal is apprehended is assumed to depend on the number of crimes committed, which creates a positive externality for engaging in crime. The existence of multiple equilibria confounds the interpretation of regressions in the traditional approach. To see how, note that because each city faces different fundamentals, the equilibrium sets differ across cities. Such sets may differ both in the number of equilibria in the set and by the equilibrium outcomes (the crime rate and the apprehension rate) within the set. In order that the regression coefficient on, for example, the apprehension rate reflect what would happen to the crime rate within a location if the apprehension rate were to exogenously change, the goal of estimation, it is necessary that the equilibrium selected across locations that differ in their apprehension rates is in some sense the same. Examples of equilibrium selection rules that satisfy a sameness criterion include choosing the equilibrium with the lowest crime rate or choosing the Pareto-dominant equilibrium (if one always exists). If different cities select different equilibria, then the regression coefficient would reflect that change in equilibria. The model that we estimate specifies the optimization problem of potential criminals and of the policymaker in a city. Each city has a continuum of individuals of different types, unobservable to the researcher, in terms of their legal sector human capital and in terms of their preferences for staying at home and for committing crimes. Types are correlated with observable characteristics, age, gender, education, and race. Besides the distributions of their citizens’ characteristics, cities also differ in their production technology, in their apprehension efficiency and in the opportunity cost of the resources devoted to law enforcement. Each citizen chooses whether to be a criminal, work in the legal sector, or stay at home. City output is produced with the human capital of citizens who choose to work in the legal sector. Criminals meet victims randomly and receive a fraction of the victim’s legal sector income. For a given city, the arrest rate varies with the crime rate and the size of the police force. Given any size of the police force, it is feasible to solve for all possible equilibria because the model structure yields an ordering over citizens in their propensity to be criminals that is invariant across equilibria. Acting as a Stackelberg leader, the policymaker chooses the number of police to maximize the expected value of an objective function that includes the number of police (negatively), the crime rate (negatively) and the apprehension rate (positively), where the expectation is taken over the distribution of equilibria that we estimate. The solution to the model yields equilibrium values of the number of police, the employment rate, the crime rate, the apprehension rate, and the competitively determined rental price of human capital. The estimation of the model is by simulation. At any given set of parameter values and set of location-specific unobservable characteristics, the model can be solved for the equilibrium outcomes for each location and for all of the equilibria. Doing multiple simulations for the same parameter values by randomly drawing from the distribution of unobservables and averaging over the simulations within each location provides statistics for the equilibrium outcomes that can be matched with data. The data are from two sources: the Uniform Crime Reports (UCRs) of the FBI and the Current Population Survey (CPS). The CPS provides demographics, employment, and wage data, while the UCR provides data on crimes, arrests and the number of police. We apply the model to property crimes. We estimate the model using data on the 238 metropolitan statistical area (MSA)’s that can be matched between the two data sets for the year 2008. We use data for the year 2003 to conduct an out-of-sample validation. On the whole, we find the out-of-sample fit to be reasonably good. With our estimated model, we examine the factors underlying the different crime rates across MSA’s: the observable and unobservable characteristics of the MSA’s and their citizens, and the selection of different equilibria. We develop a method for distinguishing the extent to which crime rate differences are due to differences in observable characteristics versus differences in the equilibrium that is selected based on a metric we propose for comparing equilibria across cities. We illustrate the decomposition by comparing pairs of MSA’s. We find that there is heterogeneity across pairs in the relative importance of equilibrium selection. For example, controlling for unobservables, the crime rate in Atlanta would fall by 15% if the equilibrium selection had been the same as in Philadelphia, resulting in a crime rate essentially equal to that in Philadelphia. On the other hand, the difference in crime rates between Houston and Philadelphia is hardly driven at all by the different equilibria that are selected. In addition, conditioning on the equilibrium selection being the same across MSA’s, we find that observable MSA characteristics also affect crime rates; for example, crime rates are higher in MSA’s with a younger and less educated population. These findings are qualitatively similar to that found in the “reduced form” literature which does not account for multiple equilibria. We then conduct two sets of counterfactual policy experiments that are not feasible within the conventional simultaneous equations estimation framework using cross-sectional data. These experiments are motivated by federal programmes that provide subsidies to local governments for increasing the number of police. For example, the Community Oriented Police Services (COPS) programme, initiated by the Clinton administration in 1994, aimed at a nationwide increase in the number of police of 20%.4 In all of the experiments, we assume that the federal government can perfectly monitor the use of the resources to any locality, ensuring that the intended increase in the size of the police force is realized. In the first set of experiments, we explore two scenarios. In the first, unlike the COPS programme, the planner (federal government) uniformly increases the size of the police force by 20% for each MSA; the program leads to an 8.2% reduction in the national crime rate. In the second scenario, as in the COPS programme, instead of a uniform transfer, the planner subsidizes newly hired police, where the number of new hires is chosen by each local government. First, we determine the subsidy rate such that the planner’s total spending is the same as under the uniform transfer. At such a subsidy rate (48%), there is a 42% increase in the number of police and the national crime rate falls by 21%, or 2.6 times the decrease in the uniform policy. Secondly, we determine the subsidy rate such that the increase in police nationwide, hence the total cost (federal and local), is the same as in the uniform transfer case, $$i.e.$$, a 20% increase. The subsidy rate in that case is 27%, which leads to a 10.6 percent decrease in crime, as compared to 8.2% in the uniform transfer case. Allowing local governments to optimally choose the number of police, instead of a uniform transfer, leads to a greater reduction in crime. Finally, we determine the subsidy rate such that the total spending mimics the intended cost of the COPS programme. Such a subsidy rate (34%) leads to a 26% increase in the number of police, as compared to the 20% goal set by COPS, and reduces the crime rate by 13%, which would be the effect of a programme like COPS if the federal government were able to perfectly monitor the use of its grants. Our second set of experiments explores the effects of various targeting schemes to allocate federally sponsored additional police across locations, given a fixed resource constraint. We illustrate the idea by focusing on pair-wise allocation problems, where the total additional resource to be allocated between a pair of MSA’s is equivalent to 20% of their current total police force. To achieve the optimal targeting, we first use the model estimates, for each MSA, to determine the values of the MSA-level unobserved characteristics: the MSA’s arrest efficiency, marginal cost of police, value of leisure and productivity. We explore five different allocation rules for each of three different planner objective functions. The first allocation rule assumes the planner has information on both the observable and unobservable characteristics of the MSA’s and optimizes over the objective function. The second rule assumes the planner has information only on observables and optimizes over the objective function. The other three rules are not based on optimizing an objective function, but instead allocate police using either current crime rates, current arrest rates, or current GDP per capita. We study the cases for five pairs of MSA’s. In the case of Philadelphia versus Phoenix, for example, the allocation with complete information, based on minimizing the overall crime rate, leads to a 16% decrease in the crime rate, while the crime reduction ranges from 5.6% (allocation based on crime rates) to 10.1% (allocation based on observables) across the other four allocation rules. Although the allocation with complete information always dominates the other rules, the relative effectiveness of the other four allocation rules is found to be case-dependent. 1.1. Related literature Our article contributes to the literature that studies crime from an equilibrium perspective. Via different channels, various theoretical studies have demonstrated the existence of multiplicity of equilibria in models of crime. For example, Fender (1999) studies a model of crime where agents with different earnings abilities decide whether to work or to commit crimes. The arrest rate depends on the crime rate and the expenditure on law enforcement. They show that multiple equilibria may exist, and that for certain parameter values there may be multiple stable equilibrium levels of crime. Conley and Wang (2006) study an equilibrium model where agents, with heterogenous working abilities and tastes for crime, choose either to commit crimes, or invest in education and become workers. The arrest rate depends on the crime rate and the number of police. They establish that when individuals differ in more than one dimension, multiple interior equilibria with different positive crime rates may exist. Taking the arrest rate as an exogenous parameter, Burdett et al. (2003, 2004) introduce crime into an otherwise classical random search equilibrium framework, where firms post wages and meet workers with an exogenous probability.5 Besides the random job offers they receive, workers (unemployed or employed) may also receive a criminal opportunity at random. Workers choose whether or not to accept the job offer in the case they receive one and whether or not to commit a crime in the case they receive a criminal opportunity. Multiple interior equilibria with a positive crime rate may arise due to matching externalities.6 Our article is the first to empirically implement a model of crime with multiple equilibria. Imrohoroglu et al. (2004) develop and calibrate a dynamic (supply-side) equilibrium model to study the trend in crime between 1980 and 1996. Taking the arrest rate as exogenous, they study individuals’ dynamic choices of whether or not to be a criminal after a stochastic period-specific employment status is realized. An equilibrium requires, among others, that the aggregate crime rate is consistent with individuals’ choices. Imrohoroglu et al. (2000) embed a static equilibrium model of crime in a political economy framework. Individuals choose to specialize in either legitimate or criminal activities. The police force is funded by tax revenues from labour income, where the tax rate is determined via a majority-voting rule. The number of police is the sole determinant of the arrest rate, regardless of the crime rate. In Fella and Gallipoli (2014), heterogeneous agents live for three periods: a schooling period, work period, and retirement period. An agent decides on savings/consumption (with borrowing constraints) in each period, chooses an educational level in the first period and in the second period inelastically supplies labour and decides on crime involvement while facing an exogenous arrest rate per crime. This life cycle model is embedded in a general equilibrium framework, where the rental rates for human capital and physical capital are determined in competitive markets. The government administers a pay-as-you-go pension system, collects taxes to maintain the justice system (a fixed cost per convicted criminal) and to spend on transfers and wasteful public expenditure. Using their calibrated model, they find that, compared to lengthening sentences, subsidizing high school graduation (balanced via adjusting income tax rates) is much more effective in reducing crime and improving total welfare. Adda et al. (2014) study the impact of a cannabis depenalization experiment within an equilibrium framework. The experiment depenalized cannabis purchase in the London borough of Lambeth. The model incorporates the decision by potential cannabis consumers as to their location of purchase, which depends on the allocation of police resources in Lambeth and all other London boroughs. The supply of other non-drug-related crimes is modelled similarly. Police resources in each borough are allocated between apprehending drug and non-drug offenders. The equilibrium rate at which crimes of each type is detected is determined by the interaction of the crime-specific supply of offenders and supply of police. The model is used to perform the counterfactual of depenalizing cannabis purchases city-wide. They find that cannabis consumption rises modestly in all boroughs. Moreover, non-drug crimes fall in all boroughs as police are reallocated to those crimes. Our article complements these papers by incorporating the direct impact of the crime rate on the arrest rate within a general equilibrium framework and confronting the potential multiple equilibria problem that results. Given the complications arising from potential multiple equilibria, we do not consider the political economy aspect studied in Imrohoroglu et al. (2000) or the life cycle aspect studied in Fella and Gallipoli (2014). As with Adda et al. (2014), we focus on the allocation of police resources, though in our case in the context of a centralized policy.7 In order to feasibly capture the equilibrium aspect of the determination of crimes, our model necessarily makes some simplifications. For example, as discussed in the review by Draca and Machin (2015), one can allow for, though we do not, the choice between criminal and legal activities as an allocation problem with continuous hours, allow for different types of criminal specialization ($$e.g.$$ shoplifting versus auto theft) and/or differential (monetary) payoffs from crime and allow for dynamics. Incorporating some of these aspects into an equilibrium setting like ours would comprise a substantial agenda for future research. In the simultaneous equations framework, it is possible to derive a relationship between crime and police. Estimating that full structure would enable one to determine how an exogenous increase in police would affect the apprehension rate and how the apprehension rate affects criminal behaviour. There have been many criticisms of this methodology.8 The more recent literature that has sought to estimate the effect of police on crime has adopted a“search for instruments” approach. Using our model estimates to mimic that approach, we shed light on the interpretation of IV estimates in the literature. In particular, we show how IV estimates vary with the particular sample and with the available instrument. The rest of the article is as follows. The next section describes the model. Section 3 describes the data and Section 4 our estimation strategy and results. Section 5 explains the difference across MSA’s in terms of their crime rates. Counterfactual experiments are presented in Section 6. The last section concludes the article. Some details and additional tables are in the Appendix.9 2. Model There are $$J$$ cities $$j=1,...,J$$, each with a government and a continuum of individual citizens. Cities are considered as closed economies.10 Each government acts as a Stackelberg leader by choosing the size of its police force.11 Observing their government’s decision, individuals in each city choose one of three mutually exclusive and exhaustive discrete options: work in the legal sector, work in the criminal sector, or remain at home.12 Each citizen is endowed with a human capital level $$\left( l\right)$$, a taste for crime $$\left( \eta \right)$$ and a value of “leisure” when at home $$\left( \kappa \right)$$. The triplet $$\left( l,\eta ,\kappa \right)$$ defines an individual’s type, which is unobservable to the researcher. Each component of an individual’s type is assumed to be discrete with $$l\in \{l_{1},...l_{N_{l}}\}$$, $$\eta \in \left\{ \eta _{1},...,\eta _{N_{\eta }}\right\}$$ and $$\kappa \in \left\{ \kappa _{1},...,\kappa _{N_{\kappa }}\right\}$$. Therefore, there are $$N=N_{l}\times N_{\eta }\times N_{\kappa }$$ types of individuals. We let $$n\in \left\{ 1,...,N\right\}$$ be the index of a type defined as $$\left( l_{n},\eta _{n},\kappa _{n}\right) \in \{l_{1},...l_{N_{l}}\}\times \left\{ \eta _{1},...,\eta _{N_{\eta }}\right\} \times \left\{ \kappa _{1},...,\kappa _{N_{\kappa }}\right\}$$ and denote the proportion of individuals of type $$n$$ in city $$j$$ as $$p_{jn}$$. The type proportions are city-specific and related to observable characteristics $$\left( x\right)$$: age, gender, race, education, and the number of young children in the household, all treated as discrete where the number of discrete values are 4, 4, 2, 2, and 3. In total, there are 192 distinct demographic groups. The distribution of $$x,$$$$G_{j}(x),$$ is city-specific. We denote the discrete choices for a type $$n$$ individual as $$d_{n1}=1$$ if working in the legal sector ($$=0$$ otherwise), $$d_{n2}=1$$ if at home ($$=0$$ otherwise) and $$d_{n3}=1$$ if working in the criminal sector ($$=0$$ otherwise). 2.1. The legal sector ($$d_{n1}=1)$$ Legal sector output in city $$j$$, $$Y_{j},$$ is produced using the aggregate stock of human capital of those citizens in city $$j$$ who choose to work in that sector, $$L_{j}$$. The production technology is given by \[ Y_{j}=\tau _{j}L_{j}^{\theta }, \] where $$\theta \in (0,1)$$ is the elasticity of output with respect to aggregate human capital and $$\tau _{j}$$ is a city-specific Hicks-neutral technology factor drawn from the distribution $$\tau _{j}\thicksim \ln N(-0.5\sigma _{\tau }^{2},\sigma _{\tau }^{2}).$$ Assuming a competitive labour market in each city, the rental rate for a unit of human capital is given by its marginal product, \begin{equation} r_{j}=\tau _{j}\theta L_{j}^{\theta -1}, \label{rental} \end{equation} (1) and earnings for an individual of type $$n$$ residing in city $$j$$, $$y_{jn},$$ is the product of the rental price in city $$j$$ and the individual’s level of human capital, that is, $$y_{jn}=r_{j}l_{n}$$. 2.2. The home sector ($$d_{n2}=1)$$ The income of those who choose to be at home, denoted as $$b_{j},$$ is assumed to be equal to the human capital rental price in the city times the lowest level of human capital, $$l_{1}$$. Because we do not distinguish between non-participation and unemployment, their income is intended to capture both unearned income and unemployment insurance. It is set to a low value, although one that varies with a city’s productivity via $$r_{j}$$, because those sources of income comprise, on average, a small proportion of total household income across all households within the population. 2.3. The criminal sector ($$d_{n3}=1)$$ Each law-abiding citizen in city $$j$$, whether working or at home, faces an equal probability $$\mu _{j}$$ of being the victim of a crime. If victimized, an individual loses a fraction $$\alpha$$ of his income to the criminal.13 A citizen who chooses to work in the criminal sector faces probability $$\pi _{j}$$ of being arrested. The probability of an arrest (the arrest production function) depends positively on the size of the police force, $$s_{j}$$, and negatively on the crime (victimization) rate, $$\mu _{j}.$$ There is a city-specific unobserved component of technology, $$\epsilon _{j},$$ which captures the unobservable factors that affect the efficiency of criminals in avoiding arrests, or equivalently police inefficiency.14 A city with a lower value of $$\epsilon _{j}$$, that is, with higher police efficiency, has a higher arrest rate for a given number of police$$.$$ The arrest technology function is given by \begin{equation} \pi _{j}=\Pi (s_{j},\mu _{j},\epsilon _{j})=\exp (-\frac{\gamma (\mu _{j}\epsilon _{j})^{\rho }}{s_{j}}), \label{arrest} \end{equation} (2) where $$\rho >0,$$$$\epsilon _{j}\thicksim \ln N(-0.5\sigma _{\epsilon }^{2},\sigma _{\epsilon }^{2}),$$ and $$\gamma >0$$ is a normalizing constant.15 The functional form ensures that $$\Pi \left( \cdot \right) \in \left[ 0,1\right].$$ Notice that the arrest rate declines with the crime rate and that the rate of decline depends on whether $$\rho \gtreqless 1.$$ Note also that the parameterization of the degree of police (in)efficiency, which has mean one and is multiplicative with the crime rate, implies that a city with twice the crime rate and twice the police efficiency as another city will have the same arrest rate (given the same number of police). 2.4. The individual’s decision problem The decision of an individual in city $$j$$ depends on his type, $$\left( l,\eta ,\kappa \right) ,$$ as well as on city-level variables: the crime rate $$\left( \mu _{j}\right)$$, the human capital rental rate $$\left( r_{j}\right)$$, the aggregate labour input $$\left( L_{j}\right)$$, and the arrest rate $$\left( \pi _{j}\right)$$. It will also depend on the expected utility of being a successful criminal $$\left( A_{j}\right)$$, where the expectation is taken over the victim’s income.16 Define the vector $$\Omega _{j}( s_{j}) \equiv \left[ \mu _{j},r_{j},L_{j},\pi _{j},A_{j}\right]$$, which will vary with the number of police $$(s_{j})$$.17 Flow utility is assumed to be logarithmic in disposable income (consumption) and additive in the taste for crime and in the value of staying home. We also assume that criminals cannot target their victims and that they cannot steal from other criminals. Letting $$d_{n}=\left[ d_{n1},d_{n2},d_{n3}\right]$$ be an individual of type $$n^{\prime }s$$ choice vector, the alternative-specific values for such an individual residing in city $$j$$ is given by \begin{equation} V_{nj}\left( d_{n}|\Omega _{j}\left( s_{j}\right) \right) =\left\{ \begin{array}{l} \mu _{j}\ln \left( (1-\alpha )y_{jn}\right) +(1-\mu _{j})\ln (y_{jn})\ \text{ if }d_{n1}=1, \\ \mu _{j}\ln \left( (1-\alpha )b_{j}\right) +(1-\mu _{j})\ln (b_{j})+\kappa _{n}\ \text{if }d_{n2}=1, \\ \mu _{j}\ln (b_{j})+(1-\mu _{j})\left[ \pi _{j}\ln (\underline{c})+(1-\pi _{j})A_{j}\right] +\eta _{n}\text{ if }d_{n3}=1. \end{array} \right. \label{value} \end{equation} (3) The first (second) row in (3) shows the value if the individual chooses to work (stay home). With probability $$\mu _{j}$$, the individual is victimized and consumes $$(1-\alpha )y_{jn}$$ if employed or $$(1-\alpha )b_{j}$$ if at home. With probability $$(1-\mu _{j}),$$ the individual is not victimized and consumes his income. If he chooses to be at home, he also enjoys the value of staying home, $$\kappa _{n}$$. The third row shows the value if the individual chooses to be a criminal. With probability $$\mu _{j}$$, a criminal fails to find a victim (criminals cannot be victims); in that case, we assume that the criminal has the same income as a law-abiding non-worker, $$b_{j}$$. With probability $$(1-\mu _{j})$$, a criminal meets a victim. In this case, with probability $$\pi _{j}$$, he is arrested and punished, consuming $$\underline{c}$$.18 With probability $$\left(1-\pi _{j}\right)$$, he is not arrested, and has expected utility $$A_{j}$$. Engaging in crime also directly increases (or decreases) utility by the value $$\eta_{n}$$. 2.4.1. Optimal decisions It can be shown that an individual in city $$j$$ of type $$\left( l_{n},\eta _{n},\kappa _{n}\right)$$ will engage in crime if and only if19 \begin{eqnarray} &&(1-\mu _{j})\left[ \pi _{j}\ln (\underline{c})+(1-\pi _{j})A_{j}\right] +\mu _{j}\ln (\frac{b_{j}}{1-\alpha })-\ln (r_{j}) \label{c1} \\ &>&\max \left\{ \ln (l_{n}),\ln (l_{1})+\kappa _{n}\right\} -\eta _{n}. \nonumber \end{eqnarray} (4) If the individual does not choose to be a criminal, the individual will choose to work if only if \begin{equation} \ln (l_{n})\geq \ln (l_{1})+\kappa _{n}. \label{c1a} \end{equation} (5) We denote the optimal decision of an individual by $$d_{n}(\Omega _{j}\left( s_{j}\right) ).$$ From condition (4), it can be seen that an individual’s propensity to engage in crime can be summarized by \begin{equation} T_{n}\equiv \max \left\{ \ln (l_{n}),\ln (l_{1})+\kappa _{n}\right\} -\eta _{n}. \label{Tn} \end{equation} (6) We index these criminal propensities such that $$T_{n}\leq T_{n+1},$$ in which case the lower is $$n$$, the higher is one’s criminal propensity. Thus, if $$T_{n}$$ type chooses to be a criminal, all $$T_{n^{\prime }}$$ types will do so for $$n^{\prime }<n$$.20 2.5. Market equilibrium Definition 1. Given the size of the police force $$s_{j}$$, a market equilibrium in city $$j$$ consists of a vector $$\widetilde{\Omega }_{j}\left( s_{j}\right) =\left[ \mu _{j},r_{j},L_{j},\pi _{j},A_{j}\right] ,$$ together with a set of optimal individual decision rules $$\left\{ d_{n}(\cdot )\right\}$$ for $$n=1,...,N$$ such that (a) for all $$n,$$$$d_{n}\left( \widetilde{\Omega}_{j}\left( s_{j}\right) \right)$$ is an optimal decision for type $$n,$$$$i.e.$$ conditions (4) and (5) hold; (b) $$\widetilde{\Omega }_{j}\left( s_{j}\right)$$ is consistent with individual choices where \begin{eqnarray} \textit{crime rate} &\textit{:}&\mu _{j}=\sum_{n=1}^{N}p_{jn}d_{n3}\left( \widetilde{\Omega }_{j}\left( s_{j}\right) \right) , \label{A} \\ \textit{rental rate} &\textit{:}&r_{j}=\tau _{j}\theta L_{j}^{\theta -1}, \nonumber \\ \textit{aggregate labor} &\textit{:}&L_{j}=\sum_{n=1}^{N}p_{jn}l_{n}d_{n1}\left( \widetilde{\Omega }_{j}\left( s_{j}\right) \right) , \nonumber \\ \textit{arrest rate} &\textit{:}&\pi _{j}=\Pi (s_{j},\mu _{j},\epsilon _{j}), \nonumber \\ \textit{crime utility $|$ not apprehended} &\textit{:}& \nonumber \\ A_{j} &=&\sum_{n=1}^{N}p_{jn}\frac{1-d_{n3}\left( \widetilde{\Omega } _{j}\left( s_{j}\right) \right) }{1-\mu _{j}}\ln \left(\!\! \alpha \left[\!\! \begin{array}{c} y_{jn}d_{n1}\left( \widetilde{\Omega }_{j}\left( s_{j}\right) \right) \\ +b_{j}d_{n2}\left( \widetilde{\Omega }_{j}\left( s_{j}\right) \right) \end{array} \!\!\right] +b_{j}\!\!\right).\nonumber \end{eqnarray} (7) In the equation for $$A,$$ with probability $$p_{jn}$$, a criminal meets a type-$$n$$ citizen in city $$j,$$ who can be a victim if and only if he is not a criminal ($$1-d_{n3}\left( \cdot \right) =1$$). Besides the government subsidy $$b_{j}$$, a get-away criminal also consumes an $$\alpha$$ fraction of the victim’s income, which is $$y_{jn}$$ if the victim works $$\left( d_{n1}\left( \cdot \right) =1\right)$$ and $$b_{j}$$ if he stays home $$\left( d_{n2}\left( \cdot \right) =1\right)$$. Integrating over all types and dividing the integrated value by the probability that the criminal meets a victim $$\left( 1-\mu _{j}\right)$$, one obtains the criminal’s expected pecuniary utility conditional on successfully stealing from a victim (and not being arrested), $$i.e.$$$$A_{j}$$ as in the value function (3). As is clear from the arrest technology, a higher crime rate implies a lower arrest rate for a given size of the police force, thereby increasing the expected value of becoming a criminal. That is, there exists a positive externality among criminals. As a result, multiple $$\Omega _{j}\left( s_{j}\right)$$’s can be supported as market equilibria. However, as seen from (6), if $$T_{n}$$ type chooses to be a criminal, all $$T_{n^{\prime }}$$ types will do so for $$n^{\prime }<n;$$ and the ranking of $$T_{n}$$ types is independent of equilibrium objects. Thus, all of the equilibria can be strictly ordered by their equilibrium crime rates, the latter being the total measure of types that choose to be criminals. Letting $$h_{jn}$$ denote the measure of type $$T_{n}$$ in city $$j$$ and $$H_{jn}=\sum_{n^{\prime }\leq n}h_{jn^{\prime }}$$ the cumulative distribution of criminal propensities, then $$\Omega _{j}^{n}\left( s_{j}\right)$$ with $$\mu _{j}^{n}=H_{jn}$$ can be supported as an equilibrium if \begin{equation} T_{n}<(1-\mu _{j}^{n})\left[ \pi _{j}^{n}\ln (\underline{c})+(1-\pi _{j}^{n})A_{j}^{n}\right] +\mu _{j}^{n}\ln (\frac{b_{j}^{n}}{1-\alpha })-\ln (r_{j}^{n})\leq T_{n+1}, \label{equil} \end{equation} (8) where the superscript $$n$$ indexes the $$n^{th}$$ potential equilibrium. In equilibrium $$\widetilde{\Omega }_{j}^{n}\left( s_{j}\right)$$, an individual will choose to be a criminal if and only if the individual’s criminal propensity is ranked among the top $$n$$ groups. The total number of equilibria, $$N^{\ast },$$ is no greater than the total number of $$T_{n}$$ types and is bounded above by $$N_{\eta }\times \left( N_{\kappa }+N_{l}-1\right)$$.21 The fact that there are at most $$N^{\ast }$$ equilibria and that they can be ordered, as given by (8), allows us to compute all of the market equilibria given $$s_{j}.$$22 2.6. Government problem A government cares about the crime rate in the city and can affect the level of criminal activity by choosing the size of the police force. In addition, a government may also care directly about the arrest rate for political reasons; a government without the ability to catch criminals could be considered as ineffectual in combatting crime. Therefore, the government’s loss function will be positively related to the crime rate and, given its cost, to the size of the police force and negatively related to the arrest rate. The government is assumed to minimize its expected loss, where the expectation is taken over all possible market equilibria. Formally, the government in city $$j$$ solves the following problem \begin{equation} \min_{s_{j}}\left\{ \sum_{n=1}^{N^{\ast }}q_{j}^{n}(s_{j})\left( \omega _{1}\exp (\mu _{j}^{n})-\omega _{2}\ln (\pi _{j}^{n})+\nu _{j}s_{j}\right) \right\} , \label{gov} \end{equation} (9) where $$\omega _{1}$$ and $$\omega _{2}$$ are the weights governments put on the crime and arrest rates relative to the cost of the police force. $$\nu _{j}\thicksim \ln N(-0.5\sigma _{\nu }^{2},\sigma _{\nu }^{2})$$ is the city-specific marginal (opportunity) cost of the police force.23$$q_{j}^{n}\left( s_{j}\right)$$ is the probability that $$\Omega _{j}^{n}\left( s_{j}\right)$$ is realized as a market equilibrium, to be specified in Section 4. Although given any $$s_{j}$$ there may be multiple market equilibria, the government optimal choice is generically unique.24 2.7. Subgame perfect Nash equilibrium Definition 2. A subgame perfect Nash equilibrium in city $$j$$ is $$\left\{ s_{j}^{\ast },d\left( \cdot \right) ,\left\{ \widetilde{\Omega }_{j}^{n}\left( \cdot \right) \right\} \right\}$$ such that (a) Given any $$s_{j},$$$$\left( d\left( \cdot \right),\widetilde{\Omega}_{j}^{n}\left( s_{j}\right) \right)$$ is a market equilibrium; (b) $$s_{j}^{\ast }$$ solves the government’s problem. 3. Data We make use of data from two sources. One source is the CPS, which provides micro-level data on demographics $$(x)$$, wages and employment. We focus on the population aged 16 to 64.25 We aggregate individuals within a MSA, which is the counterpart of a “city” in our model. The distribution of $$x$$ within an MSA is taken as the “city-specific” distribution of $$x$$ in our model. We define an individual as employed if he/she works for more than 13 weeks during the year, and define the employment rate as the fraction of those employed among the age 16–64 population.26 For those who are employed, we also use information on their annual earnings. We focus on MSA’s instead of cities because city identity is not available in the CPS and a reporting unit in the crime data may cover multiple cities. The second data source is the UCR, which contains agency-level reports on crimes, arrests, number of police, and population size. As in previous studies ($$e.g.$$Imrohoroglu et al., 2000, 2004), we focus on property crimes, which include robbery, burglary, larceny-theft, and motor vehicle theft.27 We made this choice because our model is more suitable to study crimes that are mainly driven by financial incentives. Each agency is supposed to report on a monthly basis. However, not all agencies have data for all months. It may be inappropriate to assume that crime patterns in the months reported are representative of that in the whole year, given for example, the well-known seasonality of crimes. Therefore, within each MSA, we aggregate only agencies for which data are available for all 12 months of the year. These agencies account for 70% of the total and 92% of the population covered in the UCR. As shown in Appendix Table A11, the population covered by agencies with non-missing reports is typically much greater than that covered by agencies with missing reports (average population size 23,400 versus 4,800). Although we cannot compare crime rates or arrest rates across these two groups of agencies (90% of the agencies with missing reports did not report crime statistics in any month), police-to-population ratios are similar. Both the population coverage and the representativeness of the police-to-population ratio among the agencies in our final sample suggest that our results may not be sensitive to this sample selection. For each MSA, we define the crime rate as the total number of actual crimes divided by the total 16–64 population covered by non-missing agencies, the arrest rate as the total number of arrests divided by the total number of actual crimes and the size of the police force as the total number of police divided by the total 16–64 population covered by non-missing agencies. For both the CPS and the UCR, we estimate our model using data for the year 2008, the most recent year for which we have usable agency-level data. In 2008, the CPS data covers 265 MSA’s and the UCR data covers 366 MSA’s (including those with some non-missing agencies). We are able to match 245 MSA’s between the two data sets. We exclude seven MSA’s with zero arrests as extreme outliers, which results in a final sample consisting of 238 MSA’s and 86,248 individuals living in these MSA’s in the CPS sample. As shown in Appendix Table A12, the MSA’s we use for estimation on average are larger than the ones we fail to match between the two data sets (average population of 588,410 versus 142,700). However, the two groups of MSA’s have similar crime rates, arrest rates and police-to-population ratios. These facts suggest that the dropping the unmatched MSA’s should not be problematic. Table 1 summarizes the MSA-specific marginal distributions of individual characteristics. The first (second) row gives the mean (standard deviation) of the within-MSA marginal distribution over the 238 MSA’s. As seen, MSA’s are more diverse in their educational and racial compositions than in their age or gender composition. For example, the coefficient of variation (CV) across MSA’s in the fraction of people without a high school degree is 0.43, but only 0.23 for the fraction of residents under the age of 25. Table 1 Summary statistics: $$x$$ distribution Education Age Race Gender Num. Young children $$\%$$ $$<$$ HS $$\geq$$ BA $$<$$ 25 $$>$$ 50 Black/Hispanic Male 0 1 Mean 16.08 24.32 21.43 25.75 24.07 49.32 84.09 10.85 Std. Dev. 6.99 9.34 5.03 6.78 18.70 3.80 4.95 4.16 Education Age Race Gender Num. Young children $$\%$$ $$<$$ HS $$\geq$$ BA $$<$$ 25 $$>$$ 50 Black/Hispanic Male 0 1 Mean 16.08 24.32 21.43 25.75 24.07 49.32 84.09 10.85 Std. Dev. 6.99 9.34 5.03 6.78 18.70 3.80 4.95 4.16 Number of Obs: 238 MSA’s in the U.S. Table 1 Summary statistics: $$x$$ distribution Education Age Race Gender Num. Young children $$\%$$ $$<$$ HS $$\geq$$ BA $$<$$ 25 $$>$$ 50 Black/Hispanic Male 0 1 Mean 16.08 24.32 21.43 25.75 24.07 49.32 84.09 10.85 Std. Dev. 6.99 9.34 5.03 6.78 18.70 3.80 4.95 4.16 Education Age Race Gender Num. Young children $$\%$$ $$<$$ HS $$\geq$$ BA $$<$$ 25 $$>$$ 50 Black/Hispanic Male 0 1 Mean 16.08 24.32 21.43 25.75 24.07 49.32 84.09 10.85 Std. Dev. 6.99 9.34 5.03 6.78 18.70 3.80 4.95 4.16 Number of Obs: 238 MSA’s in the U.S. Table 2 reports statistics from the data on the equilibrium outcomes of the model. Specifically, it shows the cross-MSA mean, standard deviation and CV of the number of reported crimes per 1,000 population, the arrest rate (percent), the number of police per 1,000 population, the employment rate (percent) and within-MSA mean earnings. As seen, the mean of the crime rate across the MSA’s is 56.5 with a standard deviation of 17.0. As reflected by the CV, the variation in the crime rate is similar to that of the arrest rate (0.36) and of the number of police (0.40). The labour market outcomes, on the other hand, exhibit less variation across MSA’s; the CV in the employment rate is 0.09 and that in earnings 0.19. Table 2 Summary statistics: outcomes Crime Arrest Police Employment Mean earnings (per 1,000) (%) (per 1,000) (%) ( $\$$) Mean 56.5 19.0 4.78 73.3 38,142 Std Dev. 17.0 6.85 1.91 6.87 7,340 CV 0.30 0.36 0.40 0.09 0.19 Crime Arrest Police Employment Mean earnings (per 1,000) (%) (per 1,000) (%) ( $\$$) Mean 56.5 19.0 4.78 73.3 38,142 Std Dev. 17.0 6.85 1.91 6.87 7,340 CV 0.30 0.36 0.40 0.09 0.19 Number of Obs: 238 MSA’s in the U.S. Table 2 Summary statistics: outcomes Crime Arrest Police Employment Mean earnings (per 1,000) (%) (per 1,000) (%) ( $\$$) Mean 56.5 19.0 4.78 73.3 38,142 Std Dev. 17.0 6.85 1.91 6.87 7,340 CV 0.30 0.36 0.40 0.09 0.19 Crime Arrest Police Employment Mean earnings (per 1,000) (%) (per 1,000) (%) ( $\$$) Mean 56.5 19.0 4.78 73.3 38,142 Std Dev. 17.0 6.85 1.91 6.87 7,340 CV 0.30 0.36 0.40 0.09 0.19 Number of Obs: 238 MSA’s in the U.S. 4. Empirical Implementation and Estimation 4.1. Additional empirical specifications 4.1.1. Distribution of individual types To implement the model, we need to specify the relationship between individual observable demographics and unobservable type $$\left( l_{n},\eta _{n},\kappa _{n}\right)$$.28 Given that MSA’s differ in their population distribution of observables, type distributions will also differ across MSA’s. Conditional on $$x$$, the $$l$$, $$\eta$$ and $$\kappa$$ are assumed to be independent. For an individual $$i$$ with demographics $$x_{i}$$, regardless of the MSA, the probability that $$i^{\prime }s$$ human capital is of level $$l_{m}$$ is given by \begin{equation} p_{l_{m}}\left( x_{i}\right) =\left\{\begin{array}{l} \Phi (\frac{\ln (l_{m})-x_{i}^{\prime }\lambda ^{l}}{\sigma _{l}})-\Phi (\frac{\ln (l_{m-1})-x_{i}^{\prime }\lambda ^{l}}{\sigma _{l}}) \ \text{ for }1<m<N_{l}, \\ \Phi (\frac{\ln (l_{m})-x_{i}^{\prime }\lambda ^{l}}{\sigma _{l}}) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{ for }m=1, \\ 1-\Phi (\frac{\ln (l_{m-1})-x_{i}^{\prime }\lambda ^{l}}{\sigma _{l}}) \ \ \ \ \ \ \ \ \ \ \ \ \ \text{ for }m=N_{l}, \end{array} \right. \label{pl} \end{equation} (10) The mass points $$l^{\prime }s$$ are assumed to be quantiles from $$\ln N(\overline{x}^{\prime }\lambda ^{l},\sigma _{l}^{2})$$, where $$\overline{x}$$ is the national (population) mean of $$x$$.29 The probability that $$i^{\prime }s$$ preference for engaging in crime is of level $$\eta _{m}$$ is given by \begin{equation} p_{\eta _{m}}\left( x_{i}\right) =\left\{ \begin{array}{l} \Phi (\frac{\eta _{m}-x_{i}^{\prime }\lambda ^{\eta }}{\sigma _{\eta }} )-\Phi (\frac{\eta _{m-1}-x_{i}^{\prime }\lambda ^{\eta }}{\sigma _{\eta }}) \ \text{ for }1<m<N_{\eta }, \\ \Phi (\frac{\eta _{m}-x_{i}^{\prime }\lambda ^{\eta }}{\sigma _{\eta }}) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{ for }m=1, \\ 1-\Phi (\frac{\eta _{m-1}-x_{i}^{\prime }\lambda ^{\eta }}{\sigma _{\eta }}) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{ for }m=N_{\eta }. \end{array} \right. \label{peta} \end{equation} (11) It can be shown that there exists an $$\eta ^{\ast }$$ such that people with $$\eta \leq \eta ^{\ast }$$ will never commit a crime independent of equilibrium outcomes.30 We, therefore, set the lowest $$\eta,\ \eta _{1}=$$$$\eta^{\ast}$$. We also assume that the largest $$\eta \ (\eta _{N_{\eta }})$$ is such that individuals with $$\eta _{N_{\eta }}$$ will always commit crimes.31 The other mass points $$\eta$$’s are assumed to be quantiles from the distribution $$N(\overline{x}^{\prime} \lambda^{\eta}, \sigma_{\eta}^{2})$$ that are above $$\eta^{\ast}$$. The probability that $$i^{\prime }s$$ value of home option is of level $$\kappa _{m},$$is given by \begin{equation} p_{\kappa _{m}}\left( x_{i},j\right) =\left\{ \begin{array}{l} \Phi (\frac{\ln \left( \kappa _{m}\right) -x_{^{i}}^{\prime }\lambda ^{\kappa }}{\sigma _{\kappa }})-\Phi (\frac{\ln \left( \kappa _{m-1}\right) -x_{i}^{\prime }\lambda ^{\kappa }}{\sigma _{\kappa }}) \ \text{ for } 1<m<N_{\kappa }, \\ \Phi (\frac{\ln (\kappa _{m})-x_{i}^{\prime }\lambda ^{\kappa }}{\sigma _{\kappa }}) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{ for }m=1, \\ 1-\Phi (\frac{\ln \left( \kappa _{m-1}\right) -x_{i}^{\prime }\lambda ^{\kappa }}{\sigma _{\kappa }})\ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{ for } m=N_{\kappa }. \end{array} \right. \label{pkapa} \end{equation} (12) where the mass points $$\kappa$$’s are quantiles from the distribution $$\ln N(\overline{x}^{\prime }\lambda ^{\kappa },\sigma _{\kappa }^{2}$$).32 4.1.2. Probability distribution of market equilibria Given police size $$\left( s_{j}\right)$$, $$q_{j}^{n}\left( s_{j}\right) =0$$ if $$\Omega _{j}^{n}\left( s_{j}\right)$$ is not supportable as a market equilibrium. As shown, all the supportable $$\Omega _{j}^{n}\left( s_{j}\right)$$’s can be ranked by their crime rates. Based on this fact, we assume that the probability a particular market equilibrium is realized depends on the ranking of its crime rate in the set of equilibria. It is not clear a priori whether a low crime or a high crime equilibrium is more likely, and we use the following flexible yet parsimonious structure that allows for various possible scenarios with only two parameters.33 Let $$N_{j}^{\ast }\left( s_{j}\right)$$ be the number of market equilibria in city $$j$$ under $$s_{j},$$ and $$\left\{ \widetilde{\Omega }_{j}^{n^{\prime }}\left( s_{j}\right) \right\} _{n^{\prime }=1}^{N_{j}^{\ast }\left( s_{j}\right) }$$ the set of these equilibria ranked from low crime rate to high crime rate. The probability of the $$n^{th}$$ element in $$\left\{ \widetilde{\Omega }_{j}^{n^{\prime }}\left( s_{j}\right) \right\} _{n^{\prime }=1}^{N_{j}^{\ast }\left( s_{j}\right) }$$, i.e., the probability that the $$n^{th}$$-ranked equilibrium $$\widetilde{\Omega }_{j}^{n}\left( s_{j}\right)$$ is realized is given by \begin{equation} q_{j}^{n}\left( s_{j}\right) =\frac{\exp \left[ 1+\zeta _{1}\left( n_{j1}^{\ast }-n\right) I\left( n<n_{j1}^{\ast }\right) +\zeta _{2}\left( n-n_{j2}^{\ast }\right) I\left( n>n_{j2}^{\ast }\right) \right] }{ \sum_{n^{\prime }=1}^{N_{j}^{\ast }\left( s_{j}\right) }\exp \left[ 1+\zeta _{1}\left( n_{j1}^{\ast }-n^{\prime }\right) I\left( n^{\prime }<n_{j1}^{\ast }\right) +\zeta _{2}\left( n^{\prime }-n_{j2}^{\ast }\right) I\left( n^{\prime }>n_{j2}^{\ast }\right) \right] }, \label{qq} \end{equation} (13) where $$I\left( \cdot \right)$$ is the indicator function. If $$N_{j}^{\ast }\left( s_{j}\right)$$ is odd, $$n_{j1}^{\ast }\left( s_{j}\right) =n_{j2}^{\ast }\left( s_{j}\right) =\frac{N_{j}^{\ast }\left( s_{j}\right) +1 }{2}$$, that is, the median of $$\left\{ 1,...,N_{j}^{\ast }\left( s_{j}\right) \right\}.$$ If $$N_{j}^{\ast }\left( s_{j}\right)$$ is even, $$n_{j1}^{\ast }\left( s_{j}\right) =\frac{N_{j}^{\ast }\left( s_{j}\right) }{2 }+1,$$ the first number to the right of the median, and $$n_{j2}^{\ast }\left( s_{j}\right) =\frac{N_{j}^{\ast }\left( s_{j}\right) }{2}-1,$$ the first number to the left of the median. The two parameters, $$\zeta _{1}$$ and $$\zeta _{2},$$ capture the relationship between the probability of an equilibrium and its ranking. For example, if $$\zeta _{1}=\zeta _{2}=0,$$ all equilibria are equally likely; if $$\zeta _{1}>0,$$$$\zeta _{2}>0,$$ the distribution of equilibria is U-shaped; if $$\zeta _{1}<0,$$$$\zeta _{2}<0,$$ the distribution of equilibria is inverse-U-shaped; if $$\zeta _{1}>0,$$$$\zeta _{2}<0,$$ lower crime-rate equilibria are more likely, and if $$\zeta _{1}<0,$$$$\zeta _{2}>0,$$ higher crime-rate equilibria are more likely.34 4.2. Estimation We estimate the model using simulated generalized method of moments. For each parameter configuration, we solve for the equilibria of the model, compute the model-predicted moments for each equilibrium and integrate over them. The parameter estimates minimize the distance between the model-predicted moments $$\left( M\left( \Theta \right) \right)$$ and the data moments $$\left( M^{d}\right)$$: \[ \widehat{\Theta }=\arg \min_{\Theta }\left\{ \left( M\left( \Theta \right) -M^{d}\right) ^{\prime }W\left( M\left( \Theta \right) -M^{d}\right) \right\} , \] where $$\Theta$$ is the vector of structural parameters, and $$W$$ is a positive-definite weighting matrix.35$$\Theta$$ includes parameters governing the distributions over individuals in their human capital and tastes for crime and for home, the distributions of city-level unobservables, the arrest technology, the production technology, the return to crime, the consumption level if arrested, government preferences, and the probability distribution of market equilibria. 4.2.1. Identification We provide the following intuition for identification of the model. First, the different outcomes across MSA’s with the same observable characteristics arise from different unobservable characteristics and/or different equilibrium realizations. On the one hand, if there are no differences in unobservables, then all of these MSA’s should have the same level of the police force, because the decision depends on the (common) expectation of outcomes over all possible equilibria. The other outcomes could be different, and “discretely” so, arising from different equilibria. For example, with the same level of the police force, crime rates across these MSA’s will be distributed as “discretely” different mass points, because there are discrete types of citizens and hence discrete levels of potential crime rates.36 On the other hand, if a unique equilibrium is always selected, then differences in all outcomes arise from MSA unobservables. Given the assumption that these unobservables are continuous random variables drawn from a smooth distribution, the observed differences in outcomes should be much smoother than the previous case. For example, given that a government now knows exactly which equilibrium will be realized, it is hard to rationalize that the same level of the police force is optimal across MSA’s with “discretely” different crime rates. Between these two extremes, the parameters governing the distribution of the multiple equilibria and those governing the distribution of unobservables adjust so as to match the data. Secondly, the reduced form of the model is a set of equations relating the equilibrium outcomes of the model in a given MSA to its distribution of population observables. In particular, each of the six equilibrium outcomes (the crime rate, the arrest rate, police expenditures, the employment rate, the mean wage, and the standard deviation of the wage) is a nonlinear function of the proportion of people in the MSA of each of the 192 different observable types. In the linear case, there would be 192*6 reduced form parameters which are functions of fifty-six structural parameters.37 There are thus many within- and cross-equation restrictions. The moments we choose relate these equilibrium outcomes to the population proportions. Although the necessary condition for identification is satisfied, to provide some evidence beyond that, we have conducted Monte Carlo exercises in which we first simulated data with known parameter values, treated as the “truth” and then, using moments from the simulated data, started the estimation of the model from a wide range of initial guesses of parameter values. In all cases, we were able to recover parameter values that are close to the “truth”. In addition, as in Adda et al. (2017), we show in Appendix A5 that our objective function is sensitive to discrete changes in each model parameter, suggesting that the objective function is at a (local) minimum. 4.2.2. Estimation routine The estimation routine involves an outer loop searching over the parameter space, and an inner loop determining the set of equilibria. The following describes the inner loop for each city. Step 1: Given a set of parameters $$\Theta$$, calculate the mass points of $$l$$ and $$\kappa$$ as the quantiles from $$\ln N(\overline{x}^{\prime }\lambda ^{l},\sigma _{l}^{2})$$ and $$\ln N(\overline{x}^{\prime }\lambda ^{\kappa },\sigma _{\kappa }^{2}),$$ respectively. Calculate the first mass point $$\eta _{1}=\eta ^{\ast }$$ as defined in Appendix A5, and other mass points $$\eta$$ as quantiles from $$N(\overline{x}^{\prime }\lambda ^{\eta },\sigma _{\eta }^{2})$$ that are above $$\eta ^{\ast }.$$ Calculate the criminal propensity $$T_{n}$$ according to equation (6), and the number of $$T_{n}$$ types $$N^{\ast }$$ as in Appendix A3. Index $$T_{n}$$ such that $$T_{n}\leq T_{n+1}.$$ Step 2: For each city $$j,$$ draw the city-level unobservable characteristics $$\left( \nu _{j},\tau _{j},\epsilon _{j},\overline{\kappa }_{j}\right)$$. For each vector of observable characteristics $$x$$, which is assumed to be discrete, calculate the probability vectors $$p_{l}\left( x\right) ,$$$$p_{\eta }\left( x\right)$$ and $$p_{\kappa }\left( x,j\right) ,$$ according to equations (10) to (12). Given the distribution of observable citizen characteristics $$G_{j}\left( x\right) ,$$ calculate the measure of each $$\left( l,\eta ,\kappa \right)$$-type in city $$j$$. Derive the measure $$\left( h_{jn}\right)$$ of each $$T_{n}$$-type in city $$j$$ according to Appendix A3. Step 3: Pick an $$s_{j}$$ from the grid for the size of police force and solve for all market equilibria.38 For each $$n\in \left\{ 1,...,N^{\ast }\right\} ,$$ carry out the following calculation: (1) suppose the crime rate $$\mu _{j}^{n}=\displaystyle\sum\limits_{n^{\prime }\leq n}h_{jn^{\prime }},$$ which happens when all $$T_{n^{\prime }}$$ are criminals for $$n^{\prime }\leq n$$ and all types $$T_{n"}$$ with $$n^{"}>n$$ choose between work and home according to condition (5). (2) The aggregate human capital employed $$\left( L_{j}^{n}\right)$$ is given by the total human capital among those who choose to work as in (7). (3) Derive the rest of the components of $$\Omega _{j}^{n}\left( s_{j}\right)$$ as follows: given $$\mu _{j}^{n}$$ and $$s_{j}$$, calculate the arrest rate according to (2); given the choice portfolios specified in (1), calculate $$A_{j}^{n}$$ as in equation (7) and $$r_{j}^{n}$$ as in equation (1). (4) Calculate the term in the middle of the equilibrium condition (8). $$\Omega _{j}^{n}\left( s_{j}\right)$$ is an equilibrium if only if inequality (8) is satisfied. Step 4: Calculate the government cost under $$s_{j}$$ by integrating over potential equilibria, as in (9). Repeat Steps 3-4 until the optimal size of police force $$s_{j}^{\ast }$$ and the associated set of market equilibria $$\left\{ \widetilde{\Omega }_{j}^{n}\left( s_{j}^{\ast }\right) \right\}$$ are found. Step 5: Do this $$R$$ times for every city $$j\in \left\{ 1,...,238\right\}$$. Each of the simulated replica serves as the counterpart of a city in our model. Step 6: Calculate the model predicted moments as \[ \frac{1}{238R}\sum_{j}\sum_{n=1}^{N^{\ast }}q_{j}^{n}\left( s_{j}^{\ast }\right) M_{j}^{n}\left( s_{j}^{\ast };\Theta \right) , \] where $$M_{j}^{n}\left( s_{j}^{\ast };\Theta \right)$$ is the vector of model predicted statistics in city $$j$$ if $$\Omega _{j}^{n}\left( s_{j}^{\ast }\right)$$ is a market equilibrium. 4.2.3. Target moments We target 171 moments.39 These moments include, among others, A set of unconditional moments with each MSA as an observation: i) First moments across MSA’s of the number of police, crime rate, arrest rate, employment rate, and the within-MSA average and standard deviation of earnings; ii) Cross moments of the variables in 1), except that between mean earnings and the standard deviation of earnings; iii) Second moments of the first $$5$$ outcome variables. iv) The fractions of cities with crime rates below the $$10^{\it{th}}$$, $$20^{\it{th}}$$,...,$$90^{\it{th}}$$ percentiles of crime rates in the data. First moments of MSA outcomes by MSA characteristics.40 The average size of the police force, the average crime rate and the average arrest rate conditional on the within-MSA marginal distributions of age, gender, education, and race, all treated as discrete variables. For example, we target the average outcomes among MSA’s where the fraction of college graduates is ranked below the 50th percentile among all MSA’s. First moments of individual outcomes by individual characteristics. i) The employment rate and average earnings among individuals by age, by education, by gender, by race, by the number of kids, and by the number of kids among females. ii) It is well documented that the crime rate is significantly higher among youths. However, our data do not contain age-specific crime rates. To extract as much information as possible about the criminal versus non-criminal choices, we also use data from the CPS on school enrolment status. Specifically, we target, within this age group (less than age 25), their employment rate and school attendance rate by current education attainment.41 4.3. Parameter estimates Table 3 shows selected parameter estimates. Standard errors (in parentheses) are calculated via bootstrap.42 As shown in the upper panel of Table 3, the elasticity of output with respect to aggregate human capital $$\left( \theta \right)$$ in the legal sector is $$0.84.$$ The value of $$\alpha ,$$ the return to crime, implies that a criminal steals 10% of the victim’s income, or approximately $\$$ 3,250.43 However, if apprehended, consumption is only $\$$900.44 Table 3 Selected parameter estimates Production $$\theta$$ 0.84 (0.01) Return to crime $$\alpha$$ 0.10 (0.003) Consumption if arrested $$\underline{c}$$ 0.09 (0.003) Dispersion of MSA-level unobservables Productivity $$\sigma _{\tau }$$ 0.12 (0.09) Arrest efficiency $$\sigma _{\epsilon }$$ 5.89 (0.03) Value of leisure $${\sigma }_{\overline{\kappa }}$$ 0.13 (0.07) Marginal cost of police $$\sigma _{\nu }$$ 0.25 (0.10) Probability distribution of market equilibria Below median slope $${\zeta }_{1}$$ $$-$$0.20 (0.18) Above median slope $$\zeta _{2}$$ $$-$$0.60 (0.24) Production $$\theta$$ 0.84 (0.01) Return to crime $$\alpha$$