Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Dependence modeling in stochastic frontier analysis

Dependence modeling in stochastic frontier analysis 1IntroductionThe roots of stochastic frontier analysis (SFA) can be traced back to the origins of classical growth accounting [82] and, perhaps, production planning [43]. These fields deal with production relationships, which are usually modeled through production functions or, more generally, tranfsformation functions (e.g., Shephard’s distance functions, directional distance functions, cost functions, etc.). In the classical growth accounting approach, all variation in growth, apart from the variation of inputs, is attributed to the so-called Solow’s residual, which under certain restrictions measures what is referred to as the change in total factor productivity (TFP).Early models assumed that all the decision making units (DMUs) represented in the data as observations (e.g., firms, countries, etc.) were independent of one another and fully efficient. This ignored the numerous inefficiencies that arise in practice, which have arbitrary dependence and can arise due to such factors as simultaneity in DMU decisions, unobserved heterogeneity, common sources of information asymmetry and other market imperfections [84], managerial practices [18], cultural beliefs [34], traditions, expectations, and other unobserved factors inducing unaccounted dependence in models of production [16].A key feature of several recent developments in SFA is the construction of a statistical model with as few restrictions on its dependence properties as possible. This implicitly recognizes that various forms of dependence are empirical questions that can and should be statistically tested against the data. Modern implements of SFA provide a framework where shortfalls from the production potential are decomposed into two terms – statistical noise and inefficiency, both of which are unobserved by a researcher but can be estimated for the sample as a whole (e.g., representing an industry) or for each individual DMU under a variety of possible dependence scenarios.The extensions of SFA allowing for dependence that has until recently been ignored or overly restrictive are the focus of this review. To a large extent, they are triggered by the fact that restricting the nature of dependence of the composed error can lead to severe biases in estimators and incorrect inference. For example, within the confines of the traditional SFA approach one can test for the presence of either inefficiency or noise [25,73]. Thus, the model encompasses the classical approach with a naive assumption of full efficiency (conditional mean) and the deterministic production frontier as special cases. However, such estimators and tests themselves have been derived under the assumption of independence between inefficiency and noise; empirical results suggest that allowing for dependence changes the estimates and tests significantly, potentially distorting the conclusions of the classical models.Thus, SFA under dependence is a natural relaxation of the extreme assumptions of full efficiency and independence, yet it also encompasses them as special cases, which can still be followed if the data and the statistical tests do not recommend otherwise. If there is evidence in favor of the full efficiency hypothesis after allowing for dependence, one can proceed with regression techniques or growth accounting, but this inference would now be robust to these assumptions.Accounting for dependence within a production model could be critical for both quantitative and qualitative conclusions and, perhaps more importantly, for the resulting policy implications. For example, El Mehdi and Hafner [28] found that estimated technical efficiency scores across the financing of Moroccan rural districts allowing for dependence tend to be lower than under the assumption of independence but the rankings remained basically the same. Thus, a key difference emerges if one is looking to identify the best versus measure how much improvement can be made.While some of the methods and models we present here can also be found in previous reviews, e.g., [67] and [9], and it is impossible to give a good review without following them to some degree, here we also summarize many of (what we believe to be) key recent developments as well as (with their help) shed some novel perspectives onto the workhorse methods. We do not claim, however, that this survey comprehensively covers allallof the relevant recent developments in modelling dependence in SFA. Many other important references can be found elsewhere.The rest of the article is structured as follows. Section 2 introduces the classical cross-sectional stochastic frontier models (SFMs) and focuses on dependence between error components in such models. Section 3 considers dependence via sample selection. Section 4 surveys dependence models used in panels. Section 5 discusses dependence that underlies endogeneity in SFM, which is a situation when there is dependence between production inputs and error terms. Section 6 discusses how dependence can help obtain more precise estimates of inefficiency. Section 7 concludes.2The benchmark SFM and dependence within the composed errorIn cross-sectional settings, one of the main approaches to study productivity and efficiency of firms is the SFM, independently proposed by Aigner et al. [3] and Meeusen and van den Broeck [61].[15] and [62], while appearing in the same year, are applications of the methods.Using conventional notation, let Yi{Y}_{i}be the single output for observation (e.g., firm) iiand let yi=ln(Yi){y}_{i}=\mathrm{ln}\left({Y}_{i}). The cross-sectional SFM can be written for a production frontier as (2.1)yi=m(xi;β)−ui+vi=m(xi;β)+εi.{y}_{i}=m\left({{\boldsymbol{x}}}_{i};\hspace{0.33em}{\boldsymbol{\beta }})-{u}_{i}+{v}_{i}=m\left({{\boldsymbol{x}}}_{i};\hspace{0.33em}{\boldsymbol{\beta }})+{\varepsilon }_{i}.Here m(xi;β)m\left({{\boldsymbol{x}}}_{i};\hspace{0.33em}{\boldsymbol{\beta }})represents the production frontier of a firm (or more generally a DMU), with given input vector xi{{\boldsymbol{x}}}_{i}. Observations indexed by i=1,…,Ni=1,\ldots ,Nare assumed to be independent and identically distributed. Our use of β{\boldsymbol{\beta }}is to clearly signify that we are parametrically specifying our production function, most commonly as a Cobb-Douglas or translog (see, e.g., [67] or [68] for a detailed treatment on nonparametric estimation of the SFM).2.1Canonical independence frameworkThe main difference between a standard production function setup and the SFM is the presence of two distinct error terms in the model. The ui{u}_{i}term captures inefficiency, shortfall from maximal output dictated by the production technology, while the vi{v}_{i}term captures stochastic shocks. The standard neoclassical production function model assumes full efficiency – so SFA embraces it as a special case, when ui=0{u}_{i}=0, ∀i\forall i, and allows the researcher to test this statistically.Prior to the development of the SFM, approaches which intended to model inefficiency typically ignored vi{v}_{i}leading to estimators of the SFM with less desirable statistical properties: see [1,2,27,71,72,86].It is commonly assumed that inputs are exogenous, in the sense that x{\boldsymbol{x}}is independent of uuand vv, x⊥(u,v){\boldsymbol{x}}\perp \left(u,v), and the two components of the error term are independent, u⊥vu\perp v.Many estimation methods require distributional assumptions for both uuand vv(beyond the assumption of independence). For an assumed distributional pair, one can obtain the implied distribution for εi{\varepsilon }_{i}and then estimate all of the parameters of the SFM with the maximum likelihood estimator (MLE). The most common assumption is that vi∼N(0,σv2){v}_{i}\hspace{0.33em} \sim \hspace{0.33em}N\left(0,{\sigma }_{v}^{2})and ui{u}_{i}is from a Half Normal distribution, N+(0,σu2){N}_{+}\left(0,{\sigma }_{u}^{2}), or ui{u}_{i}is from an Exponential distribution with parameter σu{\sigma }_{u}.The most popular case for the density of the composed error ε\varepsilon is obtained for the Normal Half Normal specification under independence u⊥vu\perp v. According to Aigner et al. [3], the distribution function of a sum of a normal and Truncated Normal was first derived by Weinstein [94]. Let fv{f}_{v}and fu{f}_{u}denote the density of vvand uu, respectively. For the Normal Half Normal case, fv(v)=ϕvσv{f}_{v}\left(v)=\phi \left(\frac{v}{{\sigma }_{v}}\right)and fu(u)=2σuϕuσu{f}_{u}\left(u)=\frac{2}{{\sigma }_{u}}\phi \left(\frac{u}{{\sigma }_{u}}\right), where ϕ(⋅)\phi \left(\cdot )is the standard Normal probability density function (pdf). The closed form expression for the pdf can be obtained by convolution as follows: (2.2)f(ε)=∫0∞fv(ε+u)fu(u)du=2σϕεσΦ−ελσ,f\left(\varepsilon )=\underset{0}{\overset{\infty }{\int }}{f}_{v}\left(\varepsilon +u){f}_{u}\left(u){\rm{d}}u=\frac{2}{\sigma }\phi \left(\frac{\varepsilon }{\sigma }\right)\Phi \left(-\frac{\varepsilon \lambda }{\sigma }\right),where Φ(⋅)\Phi \left(\cdot )is the standard normal cumulative distribution function (cdf), with the parameterization σ=σu2+σv2\sigma =\sqrt{{\sigma }_{u}^{2}+{\sigma }_{v}^{2}}and λ=σu/σv\lambda ={\sigma }_{u}\hspace{0.1em}\text{/}\hspace{0.1em}{\sigma }_{v}. λ\lambda is commonly interpreted as the proportion of variation in ε\varepsilon due to inefficiency. The density of ε\varepsilon in (2.2) can be characterized as that of a Skew Normal random variable with location parameter 0, scale parameter σ\sigma , and skew parameter −λ-\lambda .The pdf of a Skew Normal random variable xxis f(x)=2ϕ(x)Φ(αx)f\left(x)=2\phi \left(x)\Phi \left(\alpha x). The distribution is right skewed if α>0\alpha \gt 0and is left skewed if α<0\alpha \lt 0. We can also place the Normal, Truncated Normal pair of distributional assumptions in this class. The pdf of xxwith location ξ\xi , scale ω\omega , and skew parameter α\alpha is f(x)=2ωϕx−ξωΦαx−ξωf\left(x)=\frac{2}{\omega }\phi \left(\frac{x-\xi }{\omega }\right)\Phi \left(\alpha \left(\frac{x-\xi }{\omega }\right)\right). See [10] and [65] for more details.This connection has only recently appeared in the efficiency and productivity literature [24].It is worth noting that the closed form expression in (2.2) is equivalent to f(ε)=Eufv(ε+u)=Evfu(v−ε),f\left(\varepsilon )={{\mathbb{E}}}_{u}{f}_{v}\left(\varepsilon +u)={{\mathbb{E}}}_{v}{f}_{u}\left(v-\varepsilon ),where expectations are taken with respect to the relevant distribution. This suggests an alternative, simulation-based, way to construct the density by sampling from the distribution of uuor vvand evaluating the corresponding sample averages. Among the two sampling options (from the distribution of uuor from the distribution of vv), sampling the uu’s is more practical as it avoids to need to ensure that v−ε>0v-\varepsilon \gt 0. Sampling uucan be easily done by sampling from the standard normal distribution and taking the absolute values σu∣N(0,1)∣{\sigma }_{u}| N\left(0,1)| (in the case of the Half Normal distribution).With modern statistical software it is straightforward to sample from a wide swath of one-sided distributions that have been suggested in the SFA literature: Exponential, Gamma, Truncated Normal, Weibull, Beta, Uniform, Binomial, Generalized Exponential, etc.Our mathematical formulation will focus on a production frontier as it is the most popular object of study. The framework for dual characterizations (e.g., cost, revenue, profit) or other frontiers is similar and follows with only minor changes in notation. For example, the cost function formulation is obtained by changing the sign in front of uuto a “++,” which will represent excess, rather than shortfall, of cost above the minimum level.2.2Modeling dependenceSmith [81] relaxed the assumption of independence between uuand vvby introducing a copula function to model their joint distribution. This is one of the first relaxations of the independence assumptions available for SFA and it allowed testing the adequacy of this assumption. If the marginal distributions of uuand vvare linked by a copula density c(⋅,⋅)c\left(\cdot ,\cdot ), then their joint density can be expressed as follows: (2.3)f(v,u)=fv(v)fu(u)c(Fv(v),Fu(u)),f\left(v,u)={f}_{v}\left(v){f}_{u}\left(u)c\left({F}_{v}\left(v),{F}_{u}\left(u)),where Fu{F}_{u}and Fv{F}_{v}denote the respective cdfs. It then follows by a similar construction to (2.2) that the density of ε\varepsilon can be written as f(ε)=∫0∞fv(ε+u)fu(u)c(Fv(ε+u),Fu(u))du.f\left(\varepsilon )=\underset{0}{\overset{\infty }{\int }}{f}_{v}\left(\varepsilon +u){f}_{u}\left(u)c\left({F}_{v}\left(\varepsilon +u),{F}_{u}\left(u)){\rm{d}}u.For commonly used copula families, this density does not have a close form expression similar to (2.2), even in the Normal Half Normal case, so a simulation-based approach would often need to be used, where we simulate many draws of uuand evaluate the sample analogue of the following expectation with respect to the distribution of uu: f(ε)=Eu[fv(ε+u)fu(u)c(Fv(ε+u),Fu(u))].f\left(\varepsilon )={{\mathbb{E}}}_{u}[{f}_{v}\left(\varepsilon +u){f}_{u}\left(u)c\left({F}_{v}\left(\varepsilon +u),{F}_{u}\left(u))].Smith [81] found that ignoring the dependence can lead to biased estimates and discussed how one can test whether the independence assumption nested within this model is adequate. It is easy to see that the model in (2.2) is a special case of (2.3) when c(⋅,⋅)c\left(\cdot ,\cdot )is the independence (or product) copula.From f(ε)f\left(\varepsilon ), along with the assumption of independence over ii, the log-likelihood function can be written as follows: (2.4)lnℒ=ln∏i=1nf(εi)=∑i=1nlnf(εi),\mathrm{ln}{\mathcal{ {\mathcal L} }}=\mathrm{ln}\left(\mathop{\prod }\limits_{i=1}^{n}f\left({\varepsilon }_{i})\right)=\mathop{\sum }\limits_{i=1}^{n}\mathrm{ln}f\left({\varepsilon }_{i}),where εi=yi−m(xi;β){\varepsilon }_{i}={y}_{i}-m\left({{\boldsymbol{x}}}_{i};\hspace{0.33em}{\boldsymbol{\beta }}). The SFM can be estimated using the traditional MLE, if an analytic expression for the integrals is available, or the maximum simulated likelihood estimator (MSLE), if we need to use a simulation-based approach to evaluate the integrals of the density [60].Burns [21], who was Smith’s student, proposed using MSLE, and while predating Smith [81] actually attributes a working paper version.The benefit of MLE/MSLE is that under the assumption of correct distributional specification of ε\varepsilon , the MLE is asymptotically efficient (i.e., consistent, asymptotically Normal, and its asymptotic variance reaches the Cramer–Rao lower bound). A further benefit is that a range of testing options are available. For instance, tests related to β{\boldsymbol{\beta }}can easily be undertaken using any of the classic trilogy of tests: Wald, Lagrange multiplier, or likelihood ratio. The ability to readily and directly conduct asymptotic inference is one of the major benefits of SFA over data envelopment analysis (DEA).This in no way suggests that inference cannot be undertaken when the DEA estimator is deployed; rather, the DEA estimator has an asymptotic distribution which is much more complicated that the MLE for the SFM, and so direct asymptotic inference is not available; bootstrapping techniques are required for many of the most popular DEA estimators [78,79].Two main issues that practitioners face when confronting dependence are the choice of copula model and the assumed error distributions that best fit the data. As Wiboonpongse et al. ([95], p. 34) note “The impact of the independence assumption on technical efficiency estimation has long remained an open issue.” Analytical criteria such as AIC or BIC can be used for these purposes, see both [9] and [95] for detailed reviews.More specifically, Wiboonpongse et al. [95] use MSLE and systematically consider several copula families including the Student-t, Clayton, Gumbel and Joe families as well as their relevant rotated versions. Their data is a cross section from coffee production in Thailand and they use AIC and BIC to determine which copula model is most appropriate. Wiboonpongse et al. [95] also assume the marginals of the two error components are Normal and Half Normal, then apply a range of copulas to inject dependence. In their empirical application they have a total of 111 observations. The Clayton copula is found to be the best and plots of technical efficiencies across 111 farmers for independence and the best fitting copula model found near uniformly lower TE scores (though not much different). Finally, it also appears that the ranks are preserved (see their Figure 3).An unintended benefit of modelling dependence is that it may alleviate the “wrong skewness” issue that is common in the canonical Normal Half Normal SFM [77,89]. The wrong skewness issue arises when the OLS residuals display skewness that is of the wrong sign compared to that stemming from the frontier model (so positive when estimating a production frontier). For specific distributional pairs the model cannot separately identify the variance parameters for both vvand uu. It was noted in [19] that the third central moment of the composed error is (2.5)E[(ε−E[ε])3]=E[(v−E[v])3]−E[(u−E[u])3]+3Cov(u2,v)−3Cov(u,v2)−6(E[u]−E[v])Cov(u,v).E{[}{(\varepsilon -E\left[\varepsilon ])}^{3}]=E{[}{(v-E\left[v])}^{3}]-E{[}{(u-E\left[u])}^{3}]+3{\rm{Cov}}\left({u}^{2},v)-3{\rm{Cov}}\left(u,{v}^{2})-6(E\left[u]-E\left[v]){\rm{Cov}}\left(u,v).It is clear that the skewness of ε\varepsilon only depends on the skewness of uuwhen vvis assumed to be symmetric and independent of uu. Once uuand vvare allowed to be dependent and/or vvis allowed to be asymmetric, then the skewness of the composed error does not have to align with the skewness of inefficiency. Thus, modelling dependence is one way in which some of the empirical vagaries of the SFM can be overcome [93].2.2.1Asymmetric dependenceA common feature of all of the papers that have allowed dependence in the SFM is the use of copulas that introduce symmetric dependence. Symmetric dependence assumes that the noise vvand inefficiency components are treated equally in the SFM. However, a recent suggestion by Wei et al. [91] offers a set of copulas that allow for asymmetric dependence. As Wei et al. ([91], p. 57) note “[…]in practical situations, the inefficiency component uuand the error component often play different roles in global inefficiency, and in such cases, the symmetric copulas are not suitable.” They define asymmetric copulas as those that have non-exchangeable and/or radial asymmetric properties [92].Wei et al. [91] introduced the Skew Normal copula and used it to construct their SFM with dependence. An interesting feature of their general setup is that they allow both vvand uuto be asymmetric along with an asymmetric copula (see their Proposition 3.1). As in [95], Wei et al. [91] recommended selecting the copula model based on AIC/BIC. In their empirical application 31 out of 108 farms have the same efficiency rank (the bottom 5 are in complete agreement as are 4 of the top 5) across the standard SFM and the asymmetric copula SFM. The point estimates of technical efficiency however show large differences among the two competing models, which again provides evidence that ignoring dependence can have an undue influence on the point estimates of technical inefficiency.3Dependence via sample selectionAnother way in which dependence can arise is through sample selection. By itself sample selection has only recently been a serious area of focus in the stochastic frontier literature. Several early approaches to deal with potential sample selection follow the two-step correction [37]. In the first stage, the probability of selection is estimated and the inverse Mill’s ratio is calculated for each observation. This estimated inverse Mill’s ratio is then included as a regressor in the final SFM. An example of this is the Finnish farming study [80]. This limited information two-step approach works in a standard linear regression framework because of linearity, which Greene [33] makes clear. However, as shown in [51], when inefficiency is present no two-step approach will work and full information maximum likelihood estimation is required.Recognizing the limitations of direct application of the two-stage approach, both Kumbhakar et al. [51] and Greene [33] proposed alternative stochastic frontier selection models. The two approaches differ in how selection arises in the model. The Greene [33] model allows the choice of technology to be influenced by correlation between random error in the selection and frontier models, whereas Kumbhakar et al. [51] constructed a model where the choice of technology is based on some aspect of inefficiency, inducing a different form of sample selection. Beyond the difference in how selection arises, the sample selection stochastic production frontier models [51] and [33] are identical.Sriboonshitta et al. [83] were the first to recognize that dependence could enter the sample selection model. They work with the Greene [33] stochastic frontier sample selection model and admit dependence into the composite error term. This is termed a double-copula because they have a copula in the sample selection equation and a copula in the SFM. See ([83], equation (20)) for the likelihood function.Beyond a small set of simulations, Sriboonshitta et al. [83] applied the double copula sample selection SFM to 200 rice farmers from Kamphaeng Phet province, Thailand, in 2012 using a Cobb-Douglas production frontier and considered eight different copula functions (see their Table 4). Their preferred model based on the AIC is a Gaussian copula with 270-degree rotated Clayton model. They find a substantial difference in estimated TE scores between the Greene [33] model which assumes independence and their double-copula model (see their Figure 5). As Sriboonshitta et al. ([83], p. 183) note “[…]improperly assuming independence between the two components of the error term in the SFM may result in biased estimates of technical efficiency scores, hence potentially leading to wrong conclusions and recommendations.”As a further extension of [83], Liu et al. [59] noted that “this double-copula model neglects the correlation between the unobservables in the selection model and the random error in the SFM, in contrast to Greene’s model.” Liu et al. [59] generalized the Greene [33] model by modeling the dependence between the unobservables in the selection equation and the two error terms in the production equation using a trivariate Gaussian copula. The key feature is that the trivariate and double copula models rely on different assumptions concerning the joint distribution of vv, uu, and ξ\xi (ξ\xi here is the error in the selection equation). Liu et al. [59] made note of the decomposition (3.1)f(v,u,ξ)=f(v)f(u∣v)f(ξ∣v,u),f\left(v,u,\xi )=f\left(v)f\left(u| v)f\left(\xi | v,u),and note that the double copula model assumes that f(ξ∣v,u)=f(ξ∣v−u)f\left(\xi | v,u)=f\left(\xi | v-u), i.e., the distribution of ξ\xi only depends on the composite error, not the individual pieces. This also implies that the double copula model and the trivariate copula model are nonnested.Liu et al. [59] provided an application that focuses on Jasmine/non-Jasmine rice farming in Thailand. The data suggest uuis Gamma distributed for the most preferred model. As with some of the earlier papers, Liu et al. ([59], p. 193) noted that “[…]both Greene’s model and the double-copula model appear to overestimate technical efficiency. According to the [trivariate Gaussian copula] model, farmers also exhibit a wider range of production technical efficiency in Jasmine rice farming […].”4Dependence in panel SFMWhen repeated observations of the firms are available, then we can allow for richer models that incorporate unobserved components and various other dependence structures. Most importantly, we can extract information about likely time trends in inefficiency and time constant firm-specific characteristics. Pitt and Lee [69] seem to be the first to extend the cross-sectional SFM to a panel structure, and Schmidt and Sickles [76] were the first to propose a panel-specific methodology for SFA.4.1A benchmark specificationThe benchmark panel SFM can be written as follows: (4.1)yit=m(xit;β)+ci−ηi−uit+vit=m(xit;β)−αi+εit.{y}_{it}=m\left({{\boldsymbol{x}}}_{it};\hspace{0.33em}{\boldsymbol{\beta }})+{c}_{i}-{\eta }_{i}-{u}_{it}+{v}_{it}=m\left({{\boldsymbol{x}}}_{it};\hspace{0.33em}{\boldsymbol{\beta }})-{\alpha }_{i}+{\varepsilon }_{it}.This model differs from (2.1) in many ways. All observed variables and error terms inherited from (2.1) now have a double-index for both firms, ii, and time, t=1,…,T.t=1,\ldots ,T.In addition, the model contains the so-called firm-specific heterogeneity, ci{c}_{i}, and the time-invariant component of inefficiency ηi{\eta }_{i}. Compared with (2.1), ci{c}_{i}encapsulates any unobserved factors that affect output (other than inputs) without changing over time such as unmeasured management or operational specifics of the firm. If such factors are present, the dependence between ci{c}_{i}and xi{{\boldsymbol{x}}}_{i}causes omitted variable biases and invalidates inference based on cross-sectional SFM. In panel models, when ignored, such factors can serve as common sources of dependence in the error term εit{\varepsilon }_{it}which also leads to invalid inference.Another distinguishing feature of (4.1) is the presence of ηi{\eta }_{i}, a component of inefficiency which is time-invariant. This means that inefficiency is composed of both time-invariant and time variant components, which are sometimes interpreted as long-run and short-run inefficiency. Since both ci{c}_{i}and ηi{\eta }_{i}are unobserved, it will generally be difficult to decompose αi{\alpha }_{i}into its subsequent firm-specific and time-invariant inefficiency components.Classical panel methods (i.e., methods that assume that ηi{\eta }_{i}and uit{u}_{it}do not exist) allow for various forms of dependence between ci{c}_{i}and xit{{\boldsymbol{x}}}_{it}. For example, estimation under the fixed effects (FE) framework allows xit{{\boldsymbol{x}}}_{it}to be correlated with ci{c}_{i}and uses a within transformation to obtain a consistent estimator of β{\boldsymbol{\beta }}. Alternatively, estimation in the random effects (RE) framework assumes that xit{{\boldsymbol{x}}}_{it}and ci{c}_{i}are independent and uses OLS or GLS. The difference between OLS and GLS arises due to the fact that the variance-covariance matrix of the composed error term ci+vit{c}_{i}+{v}_{it}is no longer diagonal, and so, feasible GLS is asymptotically efficient.The early work on panel SFM assumed inefficiency to be time-invariant. This allowed handling dependence within panels using classical panel methods such as FE and RE estimation [76]. The standard time-invariant SFM is (4.2)yit=β0+xit′β+vit−ηi=(β0−ηi)+xit′β+vit=ci+xit′β+vit,{y}_{it}={\beta }_{0}+{{\boldsymbol{x}}}_{it}^{^{\prime} }{\boldsymbol{\beta }}+{v}_{it}-{\eta }_{i}=\left({\beta }_{0}-{\eta }_{i})+{{\boldsymbol{x}}}_{it}^{^{\prime} }{\boldsymbol{\beta }}+{v}_{it}={c}_{i}+{{\boldsymbol{x}}}_{it}^{^{\prime} }{\boldsymbol{\beta }}+{v}_{it},where ci≡β0−ηi{c}_{i}\equiv {\beta }_{0}-{\eta }_{i}. Under the FE framework, the serially independent inefficiency ηi,i=1,…,n{\eta }_{i},\hspace{0.33em}i=1,\ldots ,n, is allowed to have arbitrary dependence with xit{{\boldsymbol{x}}}_{it}. In cases in which there are time-invariant variables of interest in the production model, one can use the RE framework, which also requires no distributional assumptions on vvand η\eta and can be estimated with OLS or GLS. Alternatively, in such cases, one can rely on distributional assumptions as in [69], where vit{v}_{it}is assumed to follow a Normal distribution and ηi{\eta }_{i}Half Normal.Table 1 contains a summary of the classical SFMs allowing for specific forms of serial dependence in uit{u}_{it}. It also lists any additional dependence structures permitted in these different models such as dependence between ci{c}_{i}and xit{{\boldsymbol{x}}}_{it}. See [67] and [50] for a detailed discussion of these methods.Table 1Selection of panel data methodsPaperSerial dependence in inefficiencyOther dependence allowedSchmidt and Sickles [76]NoneBetween ηi{\eta }_{i}and xit{{\boldsymbol{x}}}_{it}Cornwell et al. [26]uit=c0i+c1it+c2it2{u}_{it}={c}_{0i}+{c}_{1i}t+{c}_{2i}{t}^{2}Between ηi{\eta }_{i}and xit{{\boldsymbol{x}}}_{it}Kumbhakar [47]uit=[1+exp(γ1t+γ2t2)]−1ui{u}_{it}={{[}1+\exp ({\gamma }_{1}t+{\gamma }_{2}{t}^{2})]}^{-1}{u}_{i}NoneBattese and Coelli [13]uit=exp[γ(t−T)]G(t)ui{u}_{it}=\exp {[}\gamma \left(t-T)]G\left(t){u}_{i}NoneLee and Schmidt [57]uit=ηilt{u}_{it}={\eta }_{i}{l}_{t}Between ηi{\eta }_{i}and xit{{\boldsymbol{x}}}_{it}Battese and Coelli [14]Noneuit∼N+(zit′δ,σu2){u}_{it}\hspace{0.33em} \sim \hspace{0.33em}{N}_{+}\left({{\boldsymbol{z}}}_{it}^{^{\prime} }{\boldsymbol{\delta }},{\sigma }_{u}^{2})Kumbhakar and Heshmati [48]uit=ηi+τit{u}_{it}={\eta }_{i}+{\tau }_{it}NoneGreene [31]NoneBetween ci{c}_{i}and xit{{\boldsymbol{x}}}_{it}Greene [32]NoneBetween ci{c}_{i}and xit{{\boldsymbol{x}}}_{it}uit∼∣N(zit′δ,σu2)∣{u}_{it}\hspace{0.33em} \sim \hspace{0.33em}| N\left({{\boldsymbol{z}}}_{it}^{^{\prime} }{\boldsymbol{\delta }},{\sigma }_{u}^{2})| Wang and Ho [90]NoneBetween ci{c}_{i}and xit{{\boldsymbol{x}}}_{it}uit=exp(zi′δ)ui∗{u}_{it}=\exp \left({{\boldsymbol{z}}}_{i}^{^{\prime} }{\boldsymbol{\delta }}){u}_{i}^{\ast }ui∗∼N+(0,σu2){u}_{i}^{\ast }\hspace{0.33em} \sim \hspace{0.33em}{N}_{+}\left(0,{\sigma }_{u}^{2})Badunenko and Kumbhakar [11]uit=ηi+τit{u}_{it}={\eta }_{i}+{\tau }_{it}ηi∼N+(0,σηi2){\eta }_{i}\hspace{0.33em} \sim \hspace{0.33em}{N}_{+}\left(0,{\sigma }_{\eta i}^{2})σηi=σηexp(z0i′δ0){\sigma }_{\eta i}={\sigma }_{\eta }\exp \left({{\boldsymbol{z}}}_{0i}^{^{\prime} }{{\boldsymbol{\delta }}}_{0})τit∼N+(0,στit2){\tau }_{it}\hspace{0.33em} \sim \hspace{0.33em}{N}_{+}\left(0,{\sigma }_{\tau it}^{2})στit=στexp(zit′δ){\sigma }_{\tau it}={\sigma }_{\tau }\exp \left({{\boldsymbol{z}}}_{it}^{^{\prime} }{\boldsymbol{\delta }})4.2Quasi MLEIf there is no (ci,ηi)\left({c}_{i},{\eta }_{i})or if (ci,ηi)\left({c}_{i},{\eta }_{i})is assumed to be part of uit{u}_{it}or vit{v}_{it}, which are independent of x{\boldsymbol{x}}, then we can view the panel model as a special case of the cross-sectional model (2.1), only with the double-index itit. The MLE method described in Section 2 applies in this case but it uses the sample likelihood obtained leveraging the assumption of independence over both iiand tt, not just ii. Because independence over ttis questionable in panels, this version of MLE is often referred to as quasi-MLE (QMLE).Let θ=(β′,σ2,λ)′{\boldsymbol{\theta }}=\left({\boldsymbol{\beta }}^{\prime} ,{\sigma }^{2},\lambda )^{\prime} and let fit{f}_{it}denote the density of the composed error term evaluated at εit=vit−uit{\varepsilon }_{it}={v}_{it}-{u}_{it}. Then, (4.3)fit(θ)=f(εit)=f(yit−xit′β){f}_{it}\left({\boldsymbol{\theta }})=f\left({\varepsilon }_{it})=f({y}_{it}-{{\boldsymbol{x}}}_{it}^{^{\prime} }{\boldsymbol{\beta }})and the QMLE of θ{\boldsymbol{\theta }}can be written as (4.4)θˆQMLE=argmaxθ∑i∑tlnfit(θ).{\hat{{\boldsymbol{\theta }}}}_{{\rm{\text{QMLE}}}}=\arg \mathop{\max }\limits_{{\boldsymbol{\theta }}}\sum _{i}\sum _{t}\mathrm{ln}{f}_{it}\left({\boldsymbol{\theta }}).The QMLE is known to be consistent even if there is no independence over ttbut to obtain the correct standard errors one needs to use the so-called “sandwich,” or misspecification-robust, estimator of the QMLE asymptotic variance matrix. QMLE is known to be dominated in terms of precision by several other estimators that use the dependence information explicitly (and correctly). However, an appeal of the QMLE in this setting is that assuming independence is more innocuous in the sense that it does not lead to estimation bias, only to a lack of precision, when compared to a misspecification of the type of dependence that can lead to distinct biases.Amsler et al. [5] proposed several estimators that model time dependence in panels. One such estimator can be obtained in the Generalized Method of Moments (GMM) framework. Let sit(θ){s}_{it}\left({\boldsymbol{\theta }})denote the score of the density function fit(θ){f}_{it}\left({\boldsymbol{\theta }}), i.e., (4.5)sit(θ)=∇θlnfit(θ),{s}_{it}\left({\boldsymbol{\theta }})={\nabla }_{\theta }\mathrm{ln}{f}_{it}\left({\boldsymbol{\theta }}),where ∇θ{\nabla }_{\theta }denotes the gradient with respect to θ{\boldsymbol{\theta }}. Then, the QMLE of θ{\boldsymbol{\theta }}solves ∑i∑tsit(θˆQMLE)=0\sum _{i}\sum _{t}{s}_{it}\left({\hat{{\boldsymbol{\theta }}}}_{{\rm{\text{QMLE}}}})=0and is identical to the GMM estimator based on the moment condition (4.6)E∑tsit(θ)=0,{\mathbb{E}}\sum _{t}{s}_{it}\left({\boldsymbol{\theta }})=0,where expectation is with respect to the distribution of ε\varepsilon . However, under time dependence, summation (over tt) of the scores in (4.6) is not the optimal weighting. The theory of optimal GMM suggests using correlation of sit{s}_{it}over ttby applying the GMM machinery to the TTscore functions written as follows: Esi1(θ)⋮siT(θ)=0.{\mathbb{E}}\left[\begin{array}{c}{s}_{i1}\left({\boldsymbol{\theta }})\\ \vdots \\ {s}_{iT}\left({\boldsymbol{\theta }})\end{array}\right]=0.The optimal GMM estimator based on these moment conditions has the smallest asymptotic variance than that of any other estimator using these moment conditions. In a classical (non-SFA) panel data setting, Prokhorov and Schmidt [70] call this estimator Improved QMLE (IQMLE).4.3Using a CopulaAlternative estimators that allow explicit modelling of dependence between cross-sectional errors over tthave to construct a joint distribution of those errors. Amsler et al. [5] offered two ways of doing so. One is to apply a copula to form fε{f}_{{\boldsymbol{\varepsilon }}}, the joint (over tt) density of the composed errors εi=(εi1,…,εiT){{\boldsymbol{\varepsilon }}}_{i}=\left({\varepsilon }_{i1},\hspace{-0.18em}\ldots ,{\varepsilon }_{iT}). The other is to use a copula to form fu{f}_{{\bf{u}}}, the joint distribution of ui=(ui1,…,uiT){{\boldsymbol{u}}}_{i}=\left({u}_{i1},\hspace{-0.18em}\ldots ,{u}_{iT}).Given the Normal/Half Normal marginals of ε\varepsilon ’s in (2.2) and a copula density c(⋅,…,⋅)c\left(\cdot ,\ldots ,\cdot ), the joint density fε{f}_{{\boldsymbol{\varepsilon }}}can be written as follows: fε(εi;θ)=c(Fi1(θ),…,FiT(θ))⋅fi1(θ)⋅…fiT(θ),{f}_{{\boldsymbol{\varepsilon }}}\left({{\boldsymbol{\varepsilon }}}_{i};\hspace{0.33em}{\boldsymbol{\theta }})=c\left({F}_{i1}\left({\boldsymbol{\theta }}),\hspace{-0.18em}\ldots ,{F}_{iT}\left({\boldsymbol{\theta }}))\cdot {f}_{i1}\left({\boldsymbol{\theta }})\cdot \ldots {f}_{iT}\left({\boldsymbol{\theta }}),where, as before, fit(θ)≡f(εit)=f(yit−xit′β){f}_{it}\left({\boldsymbol{\theta }})\equiv f\left({\varepsilon }_{it})=f({y}_{it}-{{\boldsymbol{x}}}_{it}^{^{\prime} }{\boldsymbol{\beta }})is the pdf of the composed error term evaluated at εit{\varepsilon }_{it}and Fit(θ)≡∫−∞εitf(s)ds{F}_{it}\left(\theta )\equiv {\int }_{-\infty }^{{\varepsilon }_{it}}f\left(s){\rm{d}}sis the corresponding cdf. Once the joint density is obtained we can construct a log-likelihood and run MLE. If we let the copula density have a scalar parameter ρ\rho , then the sample log-likelihood can be written as follows: (4.7)lnℒ(θ,ρ)=∑i=1n(lnc(Fi1(θ),…,FiT(θ);ρ)+lnfi1(θ)+…+lnfiT(θ)).\mathrm{ln}{\mathcal{ {\mathcal L} }}\left({\boldsymbol{\theta }},\rho )=\mathop{\sum }\limits_{i=1}^{n}(\mathrm{ln}c\left({F}_{i1}\left({\boldsymbol{\theta }}),\hspace{-0.18em}\ldots ,{F}_{iT}\left({\boldsymbol{\theta }});\hspace{0.33em}\rho )+\mathrm{ln}{f}_{i1}\left({\boldsymbol{\theta }})+\hspace{-0.18em}\ldots +\mathrm{ln}{f}_{iT}\left({\boldsymbol{\theta }})).The first term in the summation is what distinguishes this likelihood from QMLE – an explicit modelling of dependence between the composed errors at different tt.In a GMM framework, the MLE that maximizes (4.7) is identical to the GMM estimator based on the moment conditions (4.8)E∇θlnci(θ,ρ)+∇θlnfi1(θ)+…+∇θlnfiT(θ)∇ρlnci(θ,ρ)=0,{\mathbb{E}}\left[\begin{array}{c}{\nabla }_{\theta }\mathrm{ln}{c}_{i}\left({\boldsymbol{\theta }},\rho )+{\nabla }_{\theta }\mathrm{ln}{f}_{i1}\left({\boldsymbol{\theta }})+\hspace{-0.18em}\ldots +{\nabla }_{\theta }\mathrm{ln}{f}_{iT}\left({\boldsymbol{\theta }})\\ {\nabla }_{\rho }\mathrm{ln}{c}_{i}\left({\boldsymbol{\theta }},\rho )\end{array}\right]=0,where ci(θ,ρ)=c(Fi1(θ),…,FiT(θ);ρ){c}_{i}\left(\theta ,\rho )=c\left({F}_{i1}\left(\theta ),\hspace{-0.18em}\ldots ,{F}_{iT}\left(\theta );\hspace{0.33em}\rho ). Again, efficiency improvement is, in some circumstances, possible if we instead use the optimal GMM machinery on the moment conditions E∇θlnfi1(θ)…∇θlnfiT(θ)∇θlnci(θ,ρ)∇ρlnci(θ,ρ)=0.{\mathbb{E}}\left[\begin{array}{c}{\nabla }_{\theta }\mathrm{ln}{f}_{i1}\left({\boldsymbol{\theta }})\\ \ldots \\ {\nabla }_{\theta }\mathrm{ln}{f}_{iT}\left({\boldsymbol{\theta }})\\ {\nabla }_{\theta }\mathrm{ln}{c}_{i}\left({\boldsymbol{\theta }},\rho )\\ {\nabla }_{\rho }\mathrm{ln}{c}_{i}\left({\boldsymbol{\theta }},\rho )\end{array}\right]=0.However, this improvement may now come at the price of a bias as the copula-based moment conditions may be misspecified causing inconsistency of GMM and offsetting any benefit of higher precision. So assuming a wrong kind of time dependence may be worse than assuming independence (over time). Prokhorov and Schmidt [70] explored these circumstances.The alternative copula-based specification is to form the joint distribution of (ui1,…,uiT)\left({u}_{i1},\hspace{-0.18em}\ldots ,{u}_{iT})rather than that of (εi1,…,εiT)\left({\varepsilon }_{i1},\hspace{-0.18em}\ldots ,{\varepsilon }_{iT}). A challenge of this specification is that a TT-dimensional integration will be needed to form the likelihood in this case. Let fu(u;θ,ρ){f}_{{\bf{u}}}\left({\bf{u}};\hspace{0.33em}{\boldsymbol{\theta }},\rho )denote the copula-based joint density of the one-sided error vector and let fu(u;θ){f}_{u}\left(u;\hspace{0.33em}{\boldsymbol{\theta }})denote the marginal density of an individual one-sided (Half Normal) error term. Then, (4.9)fu(u;θ,ρ)=c(Fu(u1;θ),…,Fu(uT;θ);ρ)⋅fu(u1;θ)⋯fu(uT;θ),{f}_{{\bf{u}}}\left({\bf{u}};\hspace{0.33em}{\boldsymbol{\theta }},\rho )=c\left({F}_{u}\left({u}_{1};\hspace{0.33em}{\boldsymbol{\theta }}),\hspace{-0.18em}\ldots ,{F}_{u}\left({u}_{T};\hspace{0.33em}{\boldsymbol{\theta }});\hspace{0.33em}\rho )\cdot {f}_{u}\left({u}_{1};\hspace{0.33em}{\boldsymbol{\theta }})\cdots {f}_{u}\left({u}_{T};\hspace{0.33em}{\boldsymbol{\theta }}),where Fu(u;θ)≡∫0ufu(s;θ)ds{F}_{u}\left(u;\hspace{0.33em}{\boldsymbol{\theta }})\equiv {\int }_{0}^{u}{f}_{u}\left(s;\hspace{0.33em}{\boldsymbol{\theta }}){\rm{d}}sis the cdf of the Half Normal error term.To form the sample likelihood we need the joint density of the composed error vector ε{\boldsymbol{\varepsilon }}. Given the density of u{\bf{u}}and assuming, as before, that v⊥uv\perp u, this density can be obtained as follows: (4.10)fε(ε;θ,ρ)=∫0∞⋯∫0∞fv(ε+u;θ)fu(u;θ,ρ)du1⋯duT=Eu(θ,ρ)ϕ(ε+u),{f}_{{\boldsymbol{\varepsilon }}}\left({\boldsymbol{\varepsilon }};\hspace{0.33em}{\boldsymbol{\theta }},\rho )=\underset{0}{\overset{\infty }{\int }}\cdots \underset{0}{\overset{\infty }{\int }}{f}_{{\bf{v}}}\left({\boldsymbol{\varepsilon }}+{\bf{u}};\hspace{0.33em}{\boldsymbol{\theta }}){f}_{{\bf{u}}}\left({\bf{u}};\hspace{0.33em}{\boldsymbol{\theta }},\rho ){\rm{d}}{u}_{1}\cdots {\rm{d}}{u}_{T}={{\mathbb{E}}}_{{\bf{u}}\left({\boldsymbol{\theta }},\rho )}\phi \left({\boldsymbol{\varepsilon }}+{\bf{u}}),where Eu(θ,ρ){{\mathbb{E}}}_{{\bf{u}}\left({\boldsymbol{\theta }},\rho )}denotes the expectation with respect to fu(u;θ,ρ){f}_{{\bf{u}}}\left({\bf{u}};\hspace{0.33em}{\boldsymbol{\theta }},\rho )and fv(v;θ)=ϕ(v){f}_{{\bf{v}}}\left({\bf{v}};\hspace{0.33em}{\boldsymbol{\theta }})=\phi \left({\bf{v}})is the multivariate Normal pdf of v{\bf{v}}, where all vv’s are independent and have equal variance σv2{\sigma }_{v}^{2}.Similar to the previous section, this integral has no analytical form. Additionally, this is a TT-dimensional integral, which is computationally strenuous to evaluate using numerical methods. However, it has the form of an expectation over a distribution we can sample from and this, as before, permits application of MSLE, where we simulate the u{\bf{u}}’s and estimate fε(ε){f}_{{\boldsymbol{\varepsilon }}}\left({\boldsymbol{\varepsilon }})by averaging ϕ(ε+u)\phi \left({\boldsymbol{\varepsilon }}+{\bf{u}})over the draws. To be precise, let SSdenote the number of simulations. The direct simulator of fε(ε){f}_{{\boldsymbol{\varepsilon }}}\left({\boldsymbol{\varepsilon }})can be written as follows: fˆε(ε;θ,ρ)=1S∑s=1Sϕ(ε+us(θ,ρ)),{\hat{f}}_{{\boldsymbol{\varepsilon }}}\left({\boldsymbol{\varepsilon }};\hspace{0.33em}{\boldsymbol{\theta }},\rho )=\frac{1}{S}\mathop{\sum }\limits_{s=1}^{S}\phi \left({\boldsymbol{\varepsilon }}+{{\bf{u}}}^{s}\left({\boldsymbol{\theta }},\rho )),where us(θ,ρ){{\bf{u}}}^{s}\left({\boldsymbol{\theta }},\rho )is a draw from fu(u;θ,ρ){f}_{{\bf{u}}}\left({\bf{u}};\hspace{0.33em}{\boldsymbol{\theta }},\rho )constructed in (4.9). Then, a simulated log-likelihood can be obtained as follows: lnℒs(θ,ρ)=∑ilnfˆε(εi;θ,ρ),\mathrm{ln}{{\mathcal{ {\mathcal L} }}}^{s}\left({\boldsymbol{\theta }},\rho )=\sum _{i}\mathrm{ln}{\hat{f}}_{{\boldsymbol{\varepsilon }}}\left({{\boldsymbol{\varepsilon }}}_{i};\hspace{0.33em}{\boldsymbol{\theta }},\rho ),where, as before, εi=(εi1,…,εiT){{\boldsymbol{\varepsilon }}}_{i}=\left({\varepsilon }_{i1},\hspace{-0.18em}\ldots ,{\varepsilon }_{iT})and εit=yit−xit′β{\varepsilon }_{it}={y}_{it}-{{\boldsymbol{x}}}_{it}^{^{\prime} }{\boldsymbol{\beta }}.This method is a multivariate extension to the simulation-based estimation of univariate densities discussed earlier. An important additional requirement is the ability to sample from the copula; see ([64], Ch. 2) for a discussion of how to sample from the copula to allow dependence. Other than that, similar asymptotic arguments suggest that MSLE is asymptotically equivalent to MLE [29].5Dependence due to endogeneityA common assumption in the SFM is that x{\boldsymbol{x}}is either exogenous or independent of both uuand vv. If either of these conditions are violated, then all of the estimators discussed so far will be biased and most likely inconsistent. Yet, it is not difficult to think of settings where endogeneity is likely to exist. For example, if shocks are observed before inputs are chosen, then producers may respond to good or bad shocks by adjusting inputs, leading to correlation between x{\boldsymbol{x}}and vv. Alternatively, if managers know they are inefficient, they may use this information to guide their level of inputs, again, producing endogeneity. In a regression model, dealing with endogeneity is well understood. However, in the composed error setting, these methods cannot be simply transferred over, but require care in how they are implemented [6].To incorporate endogeneity into the SFM in (2.1), we set m(xi;β)=β0+x1i′β1+x2i′β2m\left({{\boldsymbol{x}}}_{i};\hspace{0.33em}{\boldsymbol{\beta }})={\beta }_{0}+{{\boldsymbol{x}}}_{1i}^{^{\prime} }{{\boldsymbol{\beta }}}_{1}+{{\boldsymbol{x}}}_{2i}^{^{\prime} }{{\boldsymbol{\beta }}}_{2}, where x1{{\boldsymbol{x}}}_{1}are our exogenous inputs, and x2{{\boldsymbol{x}}}_{2}are the endogenous inputs, where endogeneity may arise through correlation of x2{{\boldsymbol{x}}}_{2}with uu, vv, or both. To deal with endogeneity we require instruments, w{\boldsymbol{w}}, and identification necessitates that the dimension of w{\boldsymbol{w}}is at least as large as the dimension of x2{{\boldsymbol{x}}}_{2}. The natural assumption for valid instrumentation is that w{\boldsymbol{w}}is independent of both uuand vv.Why worry about endogeneity? Economic endogeneity means that the inputs in question are choice variables and chosen to optimize some objective function such as cost minimization or profit maximization. Statistical endogeneity arises from simultaneity, omitted variables, and measurement errors. For example, if the omitted variable is managerial ability, which is part of inefficiency, inefficiency is likely to be correlated with inputs because managerial ability affects inputs. This is the Mundlak argument for why omitting a management quality variable (for us inefficiency) will cause biased parameter estimates. Endogeneity can also be caused by simultaneity meaning that more than one variable in the model are jointly determined. In many applied settings, it is not clear what researchers mean when they attempt to handle endogeneity inside the SFM. An excellent introduction into the myriad of influences that endogeneity can have on the estimates stemming from the SFM can be found in [63]. Mutter et al. [63] used simulations designed around data based on the California nursing home industry to understand the impact of endogeneity of nursing home quality on inefficiency measurement.The simplest approach to accounting for endogeneity is to use a corrected two-stage least squares (C2SLS) approach, similar to the common correct ordinary least squares (COLS) approach that has been used to estimate the SFM. This method estimates the SFM using standard two-stage least squares (2SLS) with instruments w{\boldsymbol{w}}. This produces consistent estimators for β1{{\boldsymbol{\beta }}}_{1}and β2{{\boldsymbol{\beta }}}_{2}but not β0{\beta }_{0}, as this is obscured by the presence of E[u]E\left[u](to ensure that the residuals have mean zero). The second and third moments of the 2SLS residuals are then used to recover estimators of σv2{\sigma }_{v}^{2}and σu2{\sigma }_{u}^{2}. Once σ^u2{\widehat{\sigma }}_{u}^{2}is determined, the intercept can be corrected by adding 2πσˆu\sqrt{\frac{2}{\pi }}{\hat{\sigma }}_{u}. See Section 4.1 of [6] for details.This represents a simple avenue to account for endogeneity, and it does not require specifying how endogeneity enters the model, i.e., through correlation with vv, with uuor both. However, as with other corrected procedures based on calculations of the second and third (or even higher) moments of the residuals, from [66] and [89], if the initial 2SLS residuals have positive skew (instead of negative), then σu2{\sigma }_{u}^{2}cannot be identified and its estimator is 0. Furthermore, the standard errors from this approach need to be modified for the estimator of the intercept to account for the step-wise nature of the estimation.5.1A likelihood frameworkLikelihood-based alternatives allow for explicit modelling and estimation of the dependence structure that underlies endogeneity. This has recently been studied by Kutlu [54], Karakaplan and Kutlu [44], Tran and Tsionas [87,88], and Amsler et al. [6]. Our discussion here follows [6] as their derivation of the likelihood relies on a simple conditioning argument as opposed to the earlier work relying on the Cholesky decomposition or alternative approaches. While all approaches lead to a likelihood function, the conditioning idea of Amsler et al. [6] is simpler and more intuitive.Consider the stochastic frontier system: (5.1)yi=xiβ+εi{y}_{i}={{\boldsymbol{x}}}_{i}{\boldsymbol{\beta }}+{\varepsilon }_{i}(5.2)x2i=wiΓ+ηi,{{\boldsymbol{x}}}_{2i}={{\boldsymbol{w}}}_{i}{\boldsymbol{\Gamma }}+{{\boldsymbol{\eta }}}_{i},where xi=(x1i,x2i){{\boldsymbol{x}}}_{i}=\left({{\boldsymbol{x}}}_{1i},{{\boldsymbol{x}}}_{2i}), β=(β1,β2){\boldsymbol{\beta }}=\left({{\boldsymbol{\beta }}}_{1},{{\boldsymbol{\beta }}}_{2}), wi=(x1i,qi){{\boldsymbol{w}}}_{i}=\left({{\boldsymbol{x}}}_{1i},{{\boldsymbol{q}}}_{i})is the vector of instruments, ηi{{\boldsymbol{\eta }}}_{i}(different from ηi{\eta }_{i}in Sections 3.1–3.2) is uncorrelated with wi{{\boldsymbol{w}}}_{i}and endogeneity of x2i{{\boldsymbol{x}}}_{2i}arises through cov(εi,ηi)≠0cov\left({\varepsilon }_{i},{{\boldsymbol{\eta }}}_{i})\ne 0. Here simultaneity bias (and the resulting inconsistency) exists because ηi{{\boldsymbol{\eta }}}_{i}is correlated with either vi{v}_{i}, ui{u}_{i}, or both.We start with the case of dependence between ηi{{\boldsymbol{\eta }}}_{i}and vi{v}_{i}while ui{u}_{i}is independent of (ηi,vi)\left({{\boldsymbol{\eta }}}_{i},{v}_{i}). Assume that, conditional on wi{{\boldsymbol{w}}}_{i}, ψi=(vi,ηi)′∼N(0,Ω){\psi }_{i}={\left({v}_{i},{{\boldsymbol{\eta }}}_{i})}^{^{\prime} }\hspace{0.33em} \sim \hspace{0.33em}N\left({\boldsymbol{0}},\Omega ), where Ω=σv2ΣvηΣηvΣηηηi.\Omega =\left[\begin{array}{cc}{\sigma }_{v}^{2}& {\Sigma }_{v\eta }\\ {\Sigma }_{\eta v}& {\Sigma }_{\eta \eta }{{\boldsymbol{\eta }}}_{i}\end{array}\right].To derive the likelihood function, [6] condition on the instruments, w{\boldsymbol{w}}. Doing this yields f(y,x2∣w)=f(y∣x2,w)⋅f(x2∣w)f(y,{{\boldsymbol{x}}}_{2}| {\boldsymbol{w}})=f(y| {{\boldsymbol{x}}}_{2},{\boldsymbol{w}})\cdot f\left({{\boldsymbol{x}}}_{2}| {\boldsymbol{w}}). With the density in this form, the log-likelihood follows suite: lnℒ=lnℒ1+lnℒ2\mathrm{ln}{\mathcal{ {\mathcal L} }}=\mathrm{ln}{{\mathcal{ {\mathcal L} }}}_{1}+\mathrm{ln}{{\mathcal{ {\mathcal L} }}}_{2}, where lnℒ1\mathrm{ln}{{\mathcal{ {\mathcal L} }}}_{1}corresponds to f(y∣x2,w)f(y| {{\boldsymbol{x}}}_{2},{\boldsymbol{w}})and lnℒ2\mathrm{ln}{{\mathcal{ {\mathcal L} }}}_{2}corresponds to f(x2∣w)f\left({{\boldsymbol{x}}}_{2}| {\boldsymbol{w}}). These two components can be written as lnℒ1=−(n/2)lnσ2−12σ2∑i=1nε˜i2+∑i=1nln[Φ(−λcε˜i/σ)]lnℒ2=−(n/2)ln∣Σηη∣−0.5∑i=1nηi′Σηη−1ηi,\begin{array}{rcl}\mathrm{ln}{{\mathcal{ {\mathcal L} }}}_{1}& =& -\left(n\hspace{0.1em}\text{/}\hspace{0.1em}2)\mathrm{ln}{\sigma }^{2}-\frac{1}{2{\sigma }^{2}}\mathop{\displaystyle \sum }\limits_{i=1}^{n}{\tilde{\varepsilon }}_{i}^{2}+\mathop{\displaystyle \sum }\limits_{i=1}^{n}\mathrm{ln}{[}\Phi (-{\lambda }_{c}{\tilde{\varepsilon }}_{i}\hspace{0.1em}\text{/}\hspace{0.1em}\sigma )]\\ \mathrm{ln}{{\mathcal{ {\mathcal L} }}}_{2}& =& -\left(n\hspace{0.1em}\text{/}\hspace{0.1em}2)\mathrm{ln}| {\Sigma }_{\eta \eta }| -0.5\mathop{\displaystyle \sum }\limits_{i=1}^{n}{{\boldsymbol{\eta }}}_{i}^{^{\prime} }{\Sigma }_{\eta \eta }^{-1}{{\boldsymbol{\eta }}}_{i},\end{array}where ε˜i=yi−β0−xiβ−μci{\tilde{\varepsilon }}_{i}={y}_{i}-{\beta }_{0}-{{\boldsymbol{x}}}_{i}{\boldsymbol{\beta }}-{\mu }_{ci}, μci=ΣvηΣηη−1ηi{\mu }_{ci}={\Sigma }_{v\eta }{\Sigma }_{\eta \eta }^{-1}{{\boldsymbol{\eta }}}_{i}, σ2=σv2+σu2{\sigma }^{2}={\sigma }_{v}^{2}+{\sigma }_{u}^{2}, λc=σu/σc{\lambda }_{c}={\sigma }_{u}\hspace{0.1em}\text{/}\hspace{0.1em}{\sigma }_{c}, and σc2=σv2−ΣvηΣηη−1Σηv{\sigma }_{c}^{2}={\sigma }_{v}^{2}-{\Sigma }_{v\eta }{\Sigma }_{\eta \eta }^{-1}{\Sigma }_{\eta v}. The subtraction of μci{\mu }_{ci}in lnℒ1\mathrm{ln}{{\mathcal{ {\mathcal L} }}}_{1}is an endogeneity correction while it should be noted that lnℒ2\mathrm{ln}{{\mathcal{ {\mathcal L} }}}_{2}is nothing more than the standard likelihood function of a multivariate normal regression model (as in (5.2)). Estimates of the model parameters (β,σv2,σu2,Γ,Σvη)\left({\boldsymbol{\beta }},{\sigma }_{v}^{2},{\sigma }_{u}^{2},{\boldsymbol{\Gamma }},{\Sigma }_{v\eta })and Σηη{\Sigma }_{\eta \eta }can be obtained by maximizing lnℒ\mathrm{ln}{\mathcal{ {\mathcal L} }}.While direct estimation of the likelihood function is possible, a two-step approach is also available [54]. However, as pointed out by both Kutlu [54] and Amsler et al. [6], this two-step approach will have incorrect standard errors. Even though the two-step approach might be computationally simpler, it is, in general, different from full optimization of the likelihood function of Amsler et al. [6]. This is due to the fact that the two-step approach ignores the information provided by Γ{\boldsymbol{\Gamma }}and Σηη{\Sigma }_{\eta \eta }in lnℒ1\mathrm{ln}{{\mathcal{ {\mathcal L} }}}_{1}. In general, full optimization of the likelihood function is recommended as the standard errors (obtained in a usual manner from the inverse of the Fisher information matrix) are valid.Typically, the standard errors can be obtained either through use of the outer product of gradients (OPG) or direct estimation of the Hessian matrix of the log-likelihood function. Given the nascency of these methods it has yet to be determined which of these two methods is more reliable in practice, though in other settings both tend to work well. One caveat for promoting the use of the OPG is that since this only requires calculation of the first derivatives, it can be more stable (and more likely to be invertible) than calculation of the Hessian. Also note that in finite samples, the different estimators of covariance of MLE estimator can give different numerical estimates, even suggesting different implications on the inference (reject or do not reject the null hypothesis). So, for small samples, it is often advised to check all feasible estimates whenever there is suspicion of ambiguity in the conclusions (e.g., when a hypothesis is rejected only at say around the 10% of significance level).5.2A GMM frameworkAn insightful avenue to model dependence due to endogeneity in the SFM that differs from the traditional corrected methods or maximum likelihood stems from the GMM framework as proposed by Amsler et al. [6], who used the insights of Hansen et al. [35]. Similar to our discussion on the use of GMM in panel estimation, the idea is to use the first-order conditions for maximization of the likelihood function under exogeneity as a GMM problem: (5.3)E[ε22/σ2−1]=0,E{[}{\varepsilon }_{2}^{2}\hspace{0.1em}\text{/}\hspace{0.1em}{\sigma }^{2}-1]=0,(5.4)Eεiϕi1−Ψi=0,E\left[\frac{{\varepsilon }_{i}{\phi }_{i}}{1-{\Psi }_{i}}\right]=0,(5.5)Exiεi/σ+λxiϕi1−Φi=0,E\left[{{\boldsymbol{x}}}_{i}{\varepsilon }_{i}\hspace{0.1em}\text{/}\hspace{0.1em}\sigma +\lambda {{\boldsymbol{x}}}_{i}\frac{{\phi }_{i}}{1-{\Phi }_{i}}\right]=0,where ϕi=ϕ(λεiσ){\phi }_{i}=\phi \left(\frac{\lambda {\varepsilon }_{i}}{\sigma })and Φi=Φ(λεiσ){\Phi }_{i}=\Phi \left(\frac{\lambda {\varepsilon }_{i}}{\sigma }). Note that these expectations are taken over xi{{\boldsymbol{x}}}_{i}and yi{y}_{i}(and by default, εi{\varepsilon }_{i}) and solved for the parameters of the SFM.The key here is that these first-order conditions (one for σ2{\sigma }^{2}, one for λ\lambda , and the vector for β{\boldsymbol{\beta }}) are valid under exogeneity and this implies that the MLE is equivalent to the GMM estimator. Under endogeneity however, this relationship does not hold directly. But the seminal idea of Amsler et al. [6] is that the first-order conditions (5.3) and (5.4) are based on the distributional assumptions on vvand uu, not on the relationship of x{\boldsymbol{x}}with vvand/or uu. Thus, these moment conditions are valid whether x{\boldsymbol{x}}contains endogenous components or not. The only moment condition that needs to be adjusted is (5.5). In this case, the first-order condition needs to be taken with respect to w{\boldsymbol{w}}, the exogenous variable, not x{\boldsymbol{x}}. Doing so results in the following amended first-order condition: (5.6)Ewiεi/σ+λwiϕi1−Φi=0,E\left[{{\boldsymbol{w}}}_{i}{\varepsilon }_{i}\hspace{0.1em}\text{/}\hspace{0.1em}\sigma +\lambda {{\boldsymbol{w}}}_{i}\frac{{\phi }_{i}}{1-{\Phi }_{i}}\right]=0,where ϕi{\phi }_{i}and Φi{\Phi }_{i}are identical to those in (5.5). It is important to acknowledge that this moment condition is valid when εi{\varepsilon }_{i}and wi{{\boldsymbol{w}}}_{i}are independent. This is a more stringent requirement than the typical regression setup with E[εi∣wi]=0E\left[{\varepsilon }_{i}| {{\boldsymbol{w}}}_{i}]=0. As with the C2SLS approach, the source of endogeneity for x2{{\boldsymbol{x}}}_{2}does not need to be specified (through vvand/or uu).5.3An economic model of dependenceFrom an economic theory perspective, there are grounds to model dependence between ui{u}_{i}and ηi{{\boldsymbol{\eta }}}_{i}, not only between vi{v}_{i}and ηi{{\boldsymbol{\eta }}}_{i}. A system similar to (5.1)–(5.2) arises as a result of appending the SFM with the first-order conditions of cost minimizationIt is possible to treat a subset of x{\boldsymbol{x}}as endogenous, i.e., x=(x1,x2){\boldsymbol{x}}=\left({{\boldsymbol{x}}}_{1},{{\boldsymbol{x}}}_{2}), where x1{{\boldsymbol{x}}}_{1}is endogenous and x2{{\boldsymbol{x}}}_{2}is exogenous.([52], Chapter 8): (5.7)minP′x,s.t.y=m(x;β)+v−u,\min {{\boldsymbol{P}}}^{^{\prime} }{\boldsymbol{x}},\hspace{0.33em}\hspace{0.1em}\text{s.t.}\hspace{0.1em}\hspace{0.33em}y=m\left({\boldsymbol{x}};\hspace{0.33em}\hspace{0.33em}{\boldsymbol{\beta }})+v-u,for input prices P{\boldsymbol{P}}, the first-order conditions in this case are (5.8)mj(x;β)m1(x;β)=PjP1,j=2,…,J,\frac{{m}_{j}\left({\boldsymbol{x}};\hspace{0.33em}{\boldsymbol{\beta }})}{{m}_{1}\left({\boldsymbol{x}};\hspace{0.33em}{\boldsymbol{\beta }})}=\frac{{P}_{j}}{{P}_{1}},\hspace{1.0em}j=2,\ldots ,J,where mj(x;β){m}_{j}\left({\boldsymbol{x}};\hspace{0.33em}{\boldsymbol{\beta }})is the partial derivative of m(x;β)m\left({\boldsymbol{x}};\hspace{0.33em}{\boldsymbol{\beta }})with respect to xj{x}_{j}. These first-order conditions are exact, which usually does not arise in practice, rather, a stochastic term is added which is designed to capture allocative inefficiency. That is, our empirical first-order conditions are mj(x;β)m1(x;β)=PjP1eηj\frac{{m}_{j}\left({\boldsymbol{x}};\hspace{0.33em}{\boldsymbol{\beta }})}{{m}_{1}\left({\boldsymbol{x}};\hspace{0.33em}{\boldsymbol{\beta }})}=\frac{{P}_{j}}{{P}_{1}}{e}^{{\eta }_{j}}for j=2,…,Jj=2,\ldots ,J, where ηj{\eta }_{j}captures allocative inefficiency for the jth input relative to input one (the choice of input to compare is without loss of generality).The idea behind allocative inefficiency is that firms could be fully technically efficient, and still have room for improvement due to over or under use of inputs, relative to another input, given the price ratio. On the other hand, firms can be technically inefficiency because of allocative inefficiency and vice versa so independence between uuand η{\boldsymbol{\eta }}is hard to justify. Additionally, if firms are cost minimizers and one estimates a production function, the inputs will be endogenous as these are choice variables to the firm. In this case, input prices can serve as instruments.Combining the SFM, under the Cobb–Douglas production function, with the information in the J−1J-1conditions in (5.8) with allocative inefficiency built in, results in the following system: (5.9)yi=xiβ+εi{y}_{i}={{\boldsymbol{x}}}_{i}{\boldsymbol{\beta }}+{\varepsilon }_{i}(5.10)xi1−xij=ln(β1)−ln(βj)+pij−pi1+ηij,j=2,…,J,{x}_{i1}-{x}_{ij}=\mathrm{ln}\left({\beta }_{1})-\mathrm{ln}\left({\beta }_{j})+{p}_{ij}-{p}_{i1}+{\eta }_{ij},\hspace{1.0em}j=2,\ldots ,J,where xij{x}_{ij}is the log of input jjof firm ii, pj{p}_{j}is the log of input jjprice, βj{\beta }_{j}is the coefficient on input jjin (5.9), and ηi=(ηi2,…,ηiJ){{\boldsymbol{\eta }}}_{i}=\left({\eta }_{i2},\hspace{-0.18em}\ldots ,{\eta }_{iJ})are the allocative inefficiencies for J−1J-1inputs with respect to input one. See Schmidt and Lovell [74,75] for details.5.4A copula-based approachAmsler et al. [6] used copulas to obtain a joint distribution for u,vu,v, and η{\boldsymbol{\eta }}, whereas Amsler et al. [8] developed a new copula family for uuand η{\boldsymbol{\eta }}with properties that reflect the nature of allocative (symmetric) and technical (one-sided) inefficiencies. Here we provide the derivation of a copula-based likelihood for the most general case that allows us to model dependence between all the components of (u,v,η)\left(u,v,{\boldsymbol{\eta }}).We keep the Half Normal marginal for uu, Normal marginals for the elements of ψ=(v,η′)′\psi =\left(v,{\boldsymbol{\eta }}^{\prime} )^{\prime} as before, and assume a copula density c(⋅,…,⋅)c\left(\cdot ,\ldots ,\cdot ). Amsler et al. [6] used the Gaussian copula, which implies that the joint distribution of ψ\psi is Normal but this is largely done for convenience. This gives the joint density of (u,v,η)\left(u,v,{\boldsymbol{\eta }}): fu,v,η(u,v,η)=c(Fu(u),Fv(v),Fη(η2),…,Fη(ηJ))fu(u)fv(v)fη(η2)…fη(ηJ).{f}_{u,v,{\boldsymbol{\eta }}}\left(u,v,{\boldsymbol{\eta }})=c({F}_{u}\left(u),{F}_{v}\left(v),{F}_{\eta }\left({\eta }_{2}),\ldots ,{F}_{\eta }\left({\eta }_{J})){f}_{u}\left(u){f}_{v}\left(v){f}_{\eta }\left({\eta }_{2})\ldots {f}_{\eta }\left({\eta }_{J}).However, we need the joint density of ε\varepsilon and η{\boldsymbol{\eta }}in order to form a sample log-likelihood. This density can be obtained by integrating uuout of fu,v,η(u,v,η){f}_{u,v,{\boldsymbol{\eta }}}\left(u,v,{\boldsymbol{\eta }})as follows: (5.11)fε,η(ε,η)=∫0∞fu,v,η(u,ε+u,η)du=∫0∞fu,v,η(u,ε+u,η)fu(u)fu(u)du=Eufu,v,η(u,ε+u,η)fu(u).{f}_{\varepsilon ,\eta }\left(\varepsilon ,{\boldsymbol{\eta }})=\underset{0}{\overset{\infty }{\int }}{f}_{u,v,{\boldsymbol{\eta }}}\left(u,\varepsilon +u,{\boldsymbol{\eta }}){\rm{d}}u=\underset{0}{\overset{\infty }{\int }}\left[\frac{{f}_{u,v,{\boldsymbol{\eta }}}\left(u,\varepsilon +u,{\boldsymbol{\eta }})}{{f}_{u}\left(u)}\right]{f}_{u}\left(u){\rm{d}}u={{\mathbb{E}}}_{u}\left[\frac{{f}_{u,v,{\boldsymbol{\eta }}}\left(u,\varepsilon +u,{\boldsymbol{\eta }})}{{f}_{u}\left(u)}\right].Again, we can use simulation techniques to evaluate this density. Specifically, given SSdraws of us,s=1,…,S{u}_{s},s=1,\ldots ,S, the direct simulator can be written as fˆε,η(ε,η)=1S∑s=1Sfu,v,η(us,ε+us,η)fu(us),{\hat{f}}_{\varepsilon ,\eta }\left(\varepsilon ,{\boldsymbol{\eta }})=\frac{1}{S}\mathop{\sum }\limits_{s=1}^{S}\left[\frac{{f}_{u,v,{\boldsymbol{\eta }}}\left({u}_{s},\varepsilon +{u}_{s},{\boldsymbol{\eta }})}{{f}_{u}\left({u}_{s})}\right],and this leads to MSLE using the log-likelihood (5.12)lnℒs=∑i=1nlnfˆε,η(εi,ηi),\mathrm{ln}{{\mathcal{ {\mathcal L} }}}^{s}=\mathop{\sum }\limits_{i=1}^{n}\mathrm{ln}{\hat{f}}_{\varepsilon ,\eta }\left({\varepsilon }_{i},{{\boldsymbol{\eta }}}_{i}),where εi=yi−xi′β{\varepsilon }_{i}={y}_{i}-{{\boldsymbol{x}}}_{i}^{^{\prime} }{\boldsymbol{\beta }}and ηi=x2i−wiΓ{{\boldsymbol{\eta }}}_{i}={{\boldsymbol{x}}}_{2i}-{{\boldsymbol{w}}}_{i}{\boldsymbol{\Gamma }}as follows from the system in (5.1)–(5.2).The MSLE will produce estimates of all the parameters of the model, that is, β,Γ,σu2,σv2{\boldsymbol{\beta }},{\boldsymbol{\Gamma }},{\sigma }_{u}^{2},{\sigma }_{v}^{2}, variances of ηj{\eta }_{j}and whatever copula parameters appear in cc. This permits modelling and testing the validity of independence assumptions between all error terms in the system including the assumption of exogeneity.5.5Dependence on determinants of inefficiencyTo conclude this section, we consider the extension to a setting when inefficiency depends on covariates and some of these determinants of inefficiency may be endogenous [7,55]. These models can be estimated using traditional instrumental variable methods. However, given that the determinants of inefficiency enter the model nonlinearly, nonlinear methods are required.Amsler et al. [7] considered the model (5.13)yi=xi′β+vi−ui=xi′β+vi−ui∗ezi′δ,{y}_{i}={{\boldsymbol{x}}}_{i}^{^{\prime} }{\boldsymbol{\beta }}+{v}_{i}-{u}_{i}={{\boldsymbol{x}}}_{i}^{^{\prime} }{\boldsymbol{\beta }}+{v}_{i}-{u}_{i}^{\ast }{e}^{{{\boldsymbol{z}}}_{i}^{^{\prime} }{\boldsymbol{\delta }}},where u∗{u}^{\ast }is the baseline inefficiency and uuhas the property that the scale of its distribution (relative to the distribution of u∗{u}^{\ast }) changes depending on the determinants z{\boldsymbol{z}}(the so-called scaling property). The covariates xi{{\boldsymbol{x}}}_{i}and zi{{\boldsymbol{z}}}_{i}are partitioned as xi=x1ix2i,zi=z1iz2i,{{\boldsymbol{x}}}_{i}=\left[\begin{array}{c}{{\boldsymbol{x}}}_{1i}\\ {{\boldsymbol{x}}}_{2i}\end{array}\right],\hspace{1.0em}{{\boldsymbol{z}}}_{i}=\left[\begin{array}{c}{{\boldsymbol{z}}}_{1i}\\ {{\boldsymbol{z}}}_{2i}\end{array}\right],where x1i{{\boldsymbol{x}}}_{1i}and z1i{{\boldsymbol{z}}}_{1i}are exogenous and x2i{{\boldsymbol{x}}}_{2i}and z2i{{\boldsymbol{z}}}_{2i}are endogenous. The set of instruments used to combat endogeneity are defined as wi=x1iz1iqi,{{\boldsymbol{w}}}_{i}=\left[\begin{array}{c}{{\boldsymbol{x}}}_{1i}\\ {{\boldsymbol{z}}}_{1i}\\ {{\boldsymbol{q}}}_{i}\end{array}\right],where qi{{\boldsymbol{q}}}_{i}are the traditional outside instruments. Identification of all the parameters requires that the dimension of q{\boldsymbol{q}}be at least as large as the dimension of x2{{\boldsymbol{x}}}_{2}plus the dimension of z2{{\boldsymbol{z}}}_{2}(the rank condition).In the model of Amsler et al. [7], endogeneity arises through dependence between a variable in the model (x2{{\boldsymbol{x}}}_{2}and/or z2{{\boldsymbol{z}}}_{2}) and noise, vv. That is, both x{\boldsymbol{x}}and z{\boldsymbol{z}}are assumed to be independent of baseline inefficiency u∗{u}^{\ast }. Given that E[ui]E\left[{u}_{i}]is not constant, the COLS approach to deal with endogeneity proposed by Amsler et al. [6] cannot be used here. To develop an appropriate estimator, add and subtract the mean of inefficiency to produce a composed error term that has mean 0, (5.14)yi=xi′β−μ∗ezi′δ+vi−(ui∗−μ∗)ezi′δ.{y}_{i}={{\boldsymbol{x}}}_{i}^{^{\prime} }{\boldsymbol{\beta }}-{\mu }^{\ast }{e}^{{{\boldsymbol{z}}}_{i}^{^{\prime} }{\boldsymbol{\delta }}}+{v}_{i}-\left({u}_{i}^{\ast }-{\mu }^{\ast }){e}^{{{\boldsymbol{z}}}_{i}^{^{\prime} }{\boldsymbol{\delta }}}.Proper estimation through instrumental variables requires that the following moment condition holds (5.15)E[vi−(ui∗−μ∗)ezi′δ∣wi]=0.{\mathbb{E}}{[}{v}_{i}-\left({u}_{i}^{\ast }-{\mu }^{\ast }){e}^{{{\boldsymbol{z}}}_{i}^{^{\prime} }{\boldsymbol{\delta }}}| {{\boldsymbol{w}}}_{i}]=0.The nonlinearity of these moment conditions would necessitate use of nonlinear two-stage least squares (NL2SLS) [4].Latruffe et al. [55] have a similar setup to Amsler et al. [7], using the model in (5.13), but develop a four-step estimator for the parameters; additionally, only x2{{\boldsymbol{x}}}_{2}is treated as endogenous. Latruffe et al.’s [55] approach is based on [23] using the construction of efficient moment conditions. The vector of instruments proposed in [55] is defined as (5.16)wi(γ,δ)=x1iqi′γziezi′δ,{{\boldsymbol{w}}}_{i}\left({\boldsymbol{\gamma }},{\boldsymbol{\delta }})=\left[\begin{array}{c}{{\boldsymbol{x}}}_{1i}\\ {{\boldsymbol{q}}}_{i}^{^{\prime} }{\boldsymbol{\gamma }}\\ {{\boldsymbol{z}}}_{i}{e}^{{{\boldsymbol{z}}}_{i}^{^{\prime} }{\boldsymbol{\delta }}}\end{array}\right],where qi′γ{{\boldsymbol{q}}}_{i}^{^{\prime} }{\boldsymbol{\gamma }}captures the linear projection of x2{{\boldsymbol{x}}}_{2}on the external instruments q{\boldsymbol{q}}. The four-stage estimator is defined as Step 1Regress x2{{\boldsymbol{x}}}_{2}on q{\boldsymbol{q}}to estimate γ{\boldsymbol{\gamma }}. Denote the OLS estimator of γ{\boldsymbol{\gamma }}as γ^\widehat{{\boldsymbol{\gamma }}}.Step 2Use NLS to estimate the SFM in (5.13). Denote the NLS estimates of (β,δ)\left({\boldsymbol{\beta }},{\boldsymbol{\delta }})as (β¨,δ¨)\left(\ddot{{\boldsymbol{\beta }}},\ddot{{\boldsymbol{\delta }}}). Use the NLS estimate of δ{\boldsymbol{\delta }}and the OLS estimate of γ{\boldsymbol{\gamma }}in Step 1 to construct the instruments wi(γ^,δ¨){{\boldsymbol{w}}}_{i}\left(\widehat{{\boldsymbol{\gamma }}},\ddot{{\boldsymbol{\delta }}}).Step 3Using the estimated instrument vector wi(γ^,δ¨){{\boldsymbol{w}}}_{i}\left(\widehat{{\boldsymbol{\gamma }}},\ddot{{\boldsymbol{\delta }}}), calculate the NL2SLS estimator of (β,δ)\left({\boldsymbol{\beta }},{\boldsymbol{\delta }})as (β˜,δ˜)\left(\widetilde{{\boldsymbol{\beta }}},\widetilde{{\boldsymbol{\delta }}}). Use the NL2SLS estimate of δ{\boldsymbol{\delta }}and the OLS estimate of γ{\boldsymbol{\gamma }}in Step 1 to construct the instruments wi(γ^,δ˜){{\boldsymbol{w}}}_{i}\left(\widehat{{\boldsymbol{\gamma }}},\widetilde{{\boldsymbol{\delta }}}).Step 4Using the estimated instrument vector wi(γ^,δ˜){{\boldsymbol{w}}}_{i}\left(\widehat{{\boldsymbol{\gamma }}},\widetilde{{\boldsymbol{\delta }}}), calculate the NL2SLS estimator of (β,δ)\left({\boldsymbol{\beta }},{\boldsymbol{\delta }})as (β^,δ^)\left(\widehat{{\boldsymbol{\beta }}},\widehat{{\boldsymbol{\delta }}}).This multi-step estimator is necessary in the context of efficient moments because the actual set of instruments is not used directly, rather wi(γ,δ){{\boldsymbol{w}}}_{i}\left({\boldsymbol{\gamma }},{\boldsymbol{\delta }})is used, and this instrument vector requires estimates of γ{\boldsymbol{\gamma }}and δ{\boldsymbol{\delta }}. The first two steps of the algorithm are designed to construct estimates of these two unknown parameter vectors. The third step then is designed to construct a consistent estimator of wi(γ,δ){{\boldsymbol{w}}}_{i}\left({\boldsymbol{\gamma }},{\boldsymbol{\delta }}), which is not done in Step 2 given that the endogeneity of x2{{\boldsymbol{x}}}_{2}is ignored (note that NLS is used as opposed to NL2SLS). The iteration from Step 2 to Step 3 does produce a consistent estimator of wi(γ,δ){{\boldsymbol{w}}}_{i}\left({\boldsymbol{\gamma }},{\boldsymbol{\delta }}), and as such, Step 4 produces consistent estimators for β{\boldsymbol{\beta }}and δ{\boldsymbol{\delta }}. While Latruffe et al. [55] proposed a set of efficient moment conditions to handle endogeneity, the model of Amsler et al. [7] is more general because it can handle endogeneity in the determinants of inefficiency as well. Finally, the presence of z{\boldsymbol{z}}is attractive since this allows the researcher to dispense with distributional assumptions on vvand uu.6Estimation of individual inefficiency using dependence informationOnce the parameters of the SFM have been estimated, estimates of firm level productivity and efficiency can be recovered. Observation-specific estimates of inefficiency are one of the main benefits of the SFM relative to neoclassical models of production. Firms can be ranked according to estimated efficiency; the identity of under-performing firms as well as those who are deemed best practice can also be gleaned from the estimated SFM. All of this information is useful in helping to design more efficient public policy or subsidy programs aimed at improving the market, for example, insulating consumers from the poor performance of heavily inefficient firms.As a concrete illustration, consider firms operating electricity distribution networks that typically possess a natural local monopoly given that the construction of competing networks over the same terrain is prohibitively expensive.The SFA literature contains a fairly rich set of examples for the estimation and use of efficiency estimates in different fields of research. For example, in the context of electricity providers, see [36,45,53]; for banking efficiency, see [22] and references cited therein; for the analysis of the efficiency of national health care systems, see [30] and the review [40]; for analyzing efficiency in agriculture, see [13,14,20,58], to mention just a few.It is not uncommon for national governments to establish regulatory agencies which monitor the provision of electricity to ensure that abuse of the inherent monopoly power is not occurring. Regulators face the task of determining an acceptable price for the provision of electricity while having to balance the heterogeneity that exists across the firms (in terms of size of the firm and length of the network). Firms which are inefficient may charge too high a price to recoup a profit, but at the expense of operating below capacity. However, given production and distribution shocks, not all departures from the frontier represent inefficiency. Thus, precise measures designed to account for noise are required to parse information from εi{\varepsilon }_{i}regarding ui{u}_{i}.Alternatively, further investigation could reveal what it is that makes these establishments attain such high levels of performance. This could then be used to identify appropriate government policy implications and responses or identify processes and/or management practices that should be spread (or encouraged) across the less efficient, but otherwise similar, units. This is the essence of the determinants of inefficiency approach discussed in previous section. More directly, efficiency rankings are used in regulated industries such that regulators can set tougher future cost reduction targets for the more inefficient companies, in order to ensure that customers do not pay for the inefficiency of firms.The only direct estimate coming from the Normal Half Normal SFM is σ^u2{\widehat{\sigma }}_{u}^{2}. This provides context regarding the shape of the Half Normal distribution on ui{u}_{i}and the industry average efficiency E[u]{\mathbb{E}}\left[u], but not on the absolute level of inefficiency for a given firm. If we are only concerned with the average level of technical efficiency for the population, then this is all the information that is needed. Yet, if we want to know about a specific firm, then something else is required. The main approach to estimating firm-level inefficiency is the conditional mean estimator [42], commonly known as the JLMS estimator. Their idea was to calculate the expected value of ui{u}_{i}conditional on the realization of composed error of the model, εi≡vi−ui{\varepsilon }_{i}\equiv {v}_{i}-{u}_{i}, i.e., E[ui∣εi]{\mathbb{E}}\left[{u}_{i}| {\varepsilon }_{i}].JLMS [42] also suggested an alternative estimator based on the conditional mode.This conditional mean of ui{u}_{i}given εi{\varepsilon }_{i}gives a point prediction of ui{u}_{i}. The composed error contains individual-specific information, and the conditional expectation is one measure of firm-specific inefficiency.JLMS [42] shows that for the Normal Half Normal specification of the SFM, the conditional density function of ui{u}_{i}given εi{\varepsilon }_{i}, f(ui∣εi)f\left({u}_{i}| {\varepsilon }_{i}), is N+(μ∗i,σ∗2){N}_{+}\left({\mu }_{\ast i},{\sigma }_{\ast }^{2}), where (6.1)μ∗i=−εiσu2σ2{\mu }_{\ast i}=\frac{-{\varepsilon }_{i}{\sigma }_{u}^{2}}{{\sigma }^{2}}and (6.2)σ∗2=σv2σu2σ2.{\sigma }_{\ast }^{2}=\frac{{\sigma }_{v}^{2}{\sigma }_{u}^{2}}{{\sigma }^{2}}.Given results on the mean of a Truncated Normal density it follows that (6.3)E[ui∣εi]=μ∗i+σ∗ϕ(μ∗iσ∗)Φμ∗iσ∗.{\mathbb{E}}\left[{u}_{i}| {\varepsilon }_{i}]={\mu }_{\ast i}+\frac{{\sigma }_{\ast }\phi \left(\frac{{\mu }_{\ast i}}{{\sigma }_{\ast }})}{\Phi \left(\frac{{\mu }_{\ast i}}{{\sigma }_{\ast }}\right)}.The individual estimates are then obtained by replacing the true parameters in (6.3) with MLE (or MSMLE or GMM) estimates from the SFM.Another measure of interest is the Afriat-type level of technical efficiency, defined as e−ui=Yi/em(xi)evi∈[0,1]{e}^{-{u}_{i}}={Y}_{i}\hspace{0.1em}\text{/}\hspace{0.1em}{e}^{m\left({{\boldsymbol{x}}}_{i})}{e}^{{v}_{i}}\in \left[0,1]. This is useful in cases where output is measured in logarithmic form. Furthermore, technical efficiency is bounded between 0 and 1, making it somewhat easier to interpret relative to a raw inefficiency score. Since e−ui{e}^{-{u}_{i}}is not directly observable, the idea of JLMS [42] can be deployed here, and E[e−ui∣εi]{\mathbb{E}}{[}{e}^{-{u}_{i}}| {\varepsilon }_{i}]can be calculated [12,56]. For the Normal Half Normal model, we have (6.4)E[e−ui∣εi]=e−μ∗i+12σ∗2Φμ∗iσ∗−σ∗Φμ∗iσ∗,{\mathbb{E}}{[}{e}^{-{u}_{i}}| {\varepsilon }_{i}]={e}^{\left(-{\mu }_{\ast i}+\tfrac{1}{2}{\sigma }_{\ast }^{2}\right)}\frac{\Phi \left(\frac{{\mu }_{\ast i}}{{\sigma }_{\ast }}-{\sigma }_{\ast }\right)}{\Phi \left(\frac{{\mu }_{\ast i}}{{\sigma }_{\ast }}\right)},where μ∗i{\mu }_{\ast i}and σ∗{\sigma }_{\ast }were defined in (6.1) and (6.2), respectively. Technical efficiency estimates are obtained by replacing the true parameters in (6.4) with MLE estimates from the SFM. When ranking efficiency scores, one should use estimates of 1−E[ui∣εi]1-{\mathbb{E}}{[}{u}_{i}| {\varepsilon }_{i}], which is the first-order approximation of (6.4). Similar expressions for the JMLS [42] and Battese and Coelli [12] efficiency scores can be derived under the assumption that uuis Exponential ([49], p. 82), Truncated Normal ([49], p. 86), and Gamma ([49], p. 89); see also [52].An interesting and important finding from [5] and [6] is that when we allow for dependence of the kinds described in Sections 4 and 5, we can potentially improve estimation of inefficiency through the JLMS estimator. We focus on the case of endogeneity (Section 5) but the case of dependence over ttin panels (Section 4) is similar. The traditional predictor [42] is E(ui∣εi){\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i}). However, more information is available when dependence is allowed, namely via ηi{{\boldsymbol{\eta }}}_{i}. This calls for a modified JLMS estimator, E(ui∣εi,ηi){\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i},{{\boldsymbol{\eta }}}_{i}). Note that even though it is assumed that ui{u}_{i}is independent from ηi{{\boldsymbol{\eta }}}_{i}, similar to [6], because ηi{{\boldsymbol{\eta }}}_{i}is correlated with vi{v}_{i}, there is information that can be used to help predict ui{u}_{i}even after conditioning on εi{\varepsilon }_{i}.Amsler et al. [6] showed that ηi{{\boldsymbol{\eta }}}_{i}is independent of (ui,ε˜i)\left({u}_{i},{\tilde{\varepsilon }}_{i}): E(ui∣εi,ηi)=E(ui∣ε˜i,ηi)=E(ui∣ε˜i),{\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i},{{\boldsymbol{\eta }}}_{i})={\mathbb{E}}\left({u}_{i}| {\tilde{\varepsilon }}_{i},{{\boldsymbol{\eta }}}_{i})={\mathbb{E}}\left({u}_{i}| {\tilde{\varepsilon }}_{i}),and that the distribution of ui{u}_{i}conditional on ε˜i=yi−xi′β−μci{\tilde{\varepsilon }}_{i}={y}_{i}-{{\boldsymbol{x}}}_{i}^{^{\prime} }{\boldsymbol{\beta }}-{\mu }_{ci}is N+(μ∗i,σ∗2){N}_{+}\left({\mu }_{\ast i},{\sigma }_{\ast }^{2})with μ∗i=−σu2ε˜i/σ2{\mu }_{\ast i}=-{\sigma }_{u}^{2}{\tilde{\varepsilon }}_{i}\hspace{0.1em}\text{/}\hspace{0.1em}{\sigma }^{2}and σ∗2=σu2σc2/σ2{\sigma }_{\ast }^{2}={\sigma }_{u}^{2}{\sigma }_{c}^{2}\hspace{0.1em}\text{/}\hspace{0.1em}{\sigma }^{2}, which is identical to the original JLMS estimator, except that σv2{\sigma }_{v}^{2}is replaced with σc2{\sigma }_{c}^{2}and ε˜i{\tilde{\varepsilon }}_{i}taking the place of εi{\varepsilon }_{i}. The modified JLMS estimator in the presence of endogeneity becomes E(ui∣εi,ηi)=σ∗ϕ(ξi)1−Φ(ξi)−ξi{\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i},{{\boldsymbol{\eta }}}_{i})={\sigma }_{\ast }\left(\frac{\phi \left({\xi }_{i})}{1-\Phi \left({\xi }_{i})}-{\xi }_{i}\right)with ξi=λε˜i/σ{\xi }_{i}=\lambda {\tilde{\varepsilon }}_{i}\hspace{0.1em}\text{/}\hspace{0.1em}\sigma . Note that E(ui∣εi,ηi){\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i},{{\boldsymbol{\eta }}}_{i})is a better predictor than E(ui∣εi)E\left({u}_{i}| {\varepsilon }_{i})because σc2<σv2{\sigma }_{c}^{2}\lt {\sigma }_{v}^{2}. The improvement in prediction follows from the textbook identity for variances, where for any random vector (X,Z)\left(X,Z), where XXand ZZare random sub-vectors, we have V(X)=V[E(X∣Z)]︸Explained+E(V[X∣Z])︸Unexplained.{\mathbb{V}}\left(X)=\mathop{\underbrace{{\mathbb{V}}\left[{\mathbb{E}}\left(X| Z)]}}\limits_{{\rm{Explained}}}+\mathop{\underbrace{{\mathbb{E}}\left({\mathbb{V}}\left[X| Z])}}\limits_{{\rm{Unexplained}}}.In this case, by conditioning on both εi{\varepsilon }_{i}and ηi{{\boldsymbol{\eta }}}_{i}the conditioning set is larger than simply conditioning on εi{\varepsilon }_{i}and so it must hold that the unexplained portion of E(ui∣εi,ηi){\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i},{{\boldsymbol{\eta }}}_{i})is smaller than that of E(ui∣εi){\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i}). It then holds that there is less variation in E(ui∣εi,ηi){\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i},{{\boldsymbol{\eta }}}_{i})as a predictor than E(ui∣εi){\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i}), which is a good thing. A similar result is obtained by [5] in a panel setting, where the new estimator E(uit∣εi1,…,εiT){\mathbb{E}}\left({u}_{it}| {\varepsilon }_{i1},\hspace{-0.18em}\ldots ,{\varepsilon }_{iT})dominates the traditional estimator E(uit∣εit){\mathbb{E}}\left({u}_{it}| {\varepsilon }_{it})due to dependence over tt. While it is not obvious at first glance, one benefit of allowing for a richer dependence structure in SFM is that researchers may be able to more accurately predict firm-level inefficiency, though it comes at the expense of having to deal with a more complex model. This improvement in prediction may also be accompanied by narrower prediction intervals; however, this is not known as Amsler et al. [6] did not study the prediction intervals.A prediction interval for E[ui∣εi]{\mathbb{E}}\left[{u}_{i}| {\varepsilon }_{i}]was first derived by Taube [85] and also appeared in [39], [41], and [17] (see a discussion of this in [77]). The prediction interval is based on f(ui∣εi)f\left({u}_{i}| {\varepsilon }_{i}). The lower (Li{L}_{i}) and upper (Ui{U}_{i}) bounds for a (1−α)100\left(1-\alpha )100% prediction interval are (6.5)Li=μ∗i+Φ−11−1−α21−Φ−μ∗iσ∗σ∗,{L}_{i}={\mu }_{\ast i}+{\Phi }^{-1}\left(1-\left(1-\frac{\alpha }{2}\right)\left[1-\Phi \left(-\frac{{\mu }_{\ast i}}{{\sigma }_{\ast }}\right)\right]\right){\sigma }_{\ast },(6.6)Ui=μ∗i+Φ−11−α21−Φ−μ∗iσ∗σ∗,{U}_{i}={\mu }_{\ast i}+{\Phi }^{-1}\left(1-\frac{\alpha }{2}\left[1-\Phi \left(-\frac{{\mu }_{\ast i}}{{\sigma }_{\ast }}\right)\right]\right){\sigma }_{\ast },where μ∗i{\mu }_{\ast i}and σ∗{\sigma }_{\ast }are defined in (6.1) and (6.2), respectively, and replacing them with their MLE estimates will give estimated prediction intervals for E[ui∣εi]{\mathbb{E}}\left[{u}_{i}| {\varepsilon }_{i}]. Using the above result from [6] that ui{u}_{i}conditional on ε˜i{\tilde{\varepsilon }}_{i}is N+(μ∗,σ∗2){N}_{+}\left({\mu }_{\ast },{\sigma }_{\ast }^{2})with μ∗i=−σu2ε˜i/σ2{\mu }_{\ast i}=-{\sigma }_{u}^{2}{\tilde{\varepsilon }}_{i}\hspace{0.1em}\text{/}\hspace{0.1em}{\sigma }^{2}and σ∗2=σu2σc2/σ2{\sigma }_{\ast }^{2}={\sigma }_{u}^{2}{\sigma }_{c}^{2}\hspace{0.1em}\text{/}\hspace{0.1em}{\sigma }^{2}, one can easily obtain analogous prediction intervals for E(ui∣εi,ηi){\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i},{{\boldsymbol{\eta }}}_{i}). These new intervals will potentially be narrower.7ConclusionIn this article, we surveyed the workhorse SFM and various recent extensions that permit a wide range of dependence modelling within SFM. We discussed dependencies that arise in panels and that underpin endogeneity in production systems. Copulas play a key role in these settings because they naturally permit the construction of a likelihood, often based on simulated draws, while preserving the desired Half Normal distribution of technical inefficiency.While this is not a survey of SFA applications, it is worth pointing out that SFA has become a popular tool to isolate inefficiency in the behavior of economic agents, e.g., banks and non-financial firms. As just a couple of examples, Koetter et al. [46] showed that ignoring inefficiency, i.e., assuming all banks in the US economy are on the frontier, leads to substantial downward bias in the estimates of the banks’ market power, as measured by the Lerner index of price mark-ups; Henry et al. [38] applied SFA to obtain a correct estimate of TFP at the country level in a panel of 57 developing economies over the 1970–1998 period. The list of applications is long and growing.This survey’s goal was to provide a comprehensive overview of the state of the art in methods of dependence modeling (either directly with noise, through endogeneity, through sample selection or across time), but it is clear that many important issues still remain and this is an active area of research for the field. We remain excited about the potential developments in this area and the insights that they can shed in applications. At present there is little cognizance into the direction of impact of unmodelled, or misspecified, dependence on efficiency scores. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Dependence Modeling de Gruyter

Dependence modeling in stochastic frontier analysis

Loading next page...
 
/lp/de-gruyter/dependence-modeling-in-stochastic-frontier-analysis-kVgatLCRAQ
Publisher
de Gruyter
Copyright
© 2022 Mikhail E. Mamonov et al., published by De Gruyter
ISSN
2300-2298
eISSN
2300-2298
DOI
10.1515/demo-2022-0107
Publisher site
See Article on Publisher Site

Abstract

1IntroductionThe roots of stochastic frontier analysis (SFA) can be traced back to the origins of classical growth accounting [82] and, perhaps, production planning [43]. These fields deal with production relationships, which are usually modeled through production functions or, more generally, tranfsformation functions (e.g., Shephard’s distance functions, directional distance functions, cost functions, etc.). In the classical growth accounting approach, all variation in growth, apart from the variation of inputs, is attributed to the so-called Solow’s residual, which under certain restrictions measures what is referred to as the change in total factor productivity (TFP).Early models assumed that all the decision making units (DMUs) represented in the data as observations (e.g., firms, countries, etc.) were independent of one another and fully efficient. This ignored the numerous inefficiencies that arise in practice, which have arbitrary dependence and can arise due to such factors as simultaneity in DMU decisions, unobserved heterogeneity, common sources of information asymmetry and other market imperfections [84], managerial practices [18], cultural beliefs [34], traditions, expectations, and other unobserved factors inducing unaccounted dependence in models of production [16].A key feature of several recent developments in SFA is the construction of a statistical model with as few restrictions on its dependence properties as possible. This implicitly recognizes that various forms of dependence are empirical questions that can and should be statistically tested against the data. Modern implements of SFA provide a framework where shortfalls from the production potential are decomposed into two terms – statistical noise and inefficiency, both of which are unobserved by a researcher but can be estimated for the sample as a whole (e.g., representing an industry) or for each individual DMU under a variety of possible dependence scenarios.The extensions of SFA allowing for dependence that has until recently been ignored or overly restrictive are the focus of this review. To a large extent, they are triggered by the fact that restricting the nature of dependence of the composed error can lead to severe biases in estimators and incorrect inference. For example, within the confines of the traditional SFA approach one can test for the presence of either inefficiency or noise [25,73]. Thus, the model encompasses the classical approach with a naive assumption of full efficiency (conditional mean) and the deterministic production frontier as special cases. However, such estimators and tests themselves have been derived under the assumption of independence between inefficiency and noise; empirical results suggest that allowing for dependence changes the estimates and tests significantly, potentially distorting the conclusions of the classical models.Thus, SFA under dependence is a natural relaxation of the extreme assumptions of full efficiency and independence, yet it also encompasses them as special cases, which can still be followed if the data and the statistical tests do not recommend otherwise. If there is evidence in favor of the full efficiency hypothesis after allowing for dependence, one can proceed with regression techniques or growth accounting, but this inference would now be robust to these assumptions.Accounting for dependence within a production model could be critical for both quantitative and qualitative conclusions and, perhaps more importantly, for the resulting policy implications. For example, El Mehdi and Hafner [28] found that estimated technical efficiency scores across the financing of Moroccan rural districts allowing for dependence tend to be lower than under the assumption of independence but the rankings remained basically the same. Thus, a key difference emerges if one is looking to identify the best versus measure how much improvement can be made.While some of the methods and models we present here can also be found in previous reviews, e.g., [67] and [9], and it is impossible to give a good review without following them to some degree, here we also summarize many of (what we believe to be) key recent developments as well as (with their help) shed some novel perspectives onto the workhorse methods. We do not claim, however, that this survey comprehensively covers allallof the relevant recent developments in modelling dependence in SFA. Many other important references can be found elsewhere.The rest of the article is structured as follows. Section 2 introduces the classical cross-sectional stochastic frontier models (SFMs) and focuses on dependence between error components in such models. Section 3 considers dependence via sample selection. Section 4 surveys dependence models used in panels. Section 5 discusses dependence that underlies endogeneity in SFM, which is a situation when there is dependence between production inputs and error terms. Section 6 discusses how dependence can help obtain more precise estimates of inefficiency. Section 7 concludes.2The benchmark SFM and dependence within the composed errorIn cross-sectional settings, one of the main approaches to study productivity and efficiency of firms is the SFM, independently proposed by Aigner et al. [3] and Meeusen and van den Broeck [61].[15] and [62], while appearing in the same year, are applications of the methods.Using conventional notation, let Yi{Y}_{i}be the single output for observation (e.g., firm) iiand let yi=ln(Yi){y}_{i}=\mathrm{ln}\left({Y}_{i}). The cross-sectional SFM can be written for a production frontier as (2.1)yi=m(xi;β)−ui+vi=m(xi;β)+εi.{y}_{i}=m\left({{\boldsymbol{x}}}_{i};\hspace{0.33em}{\boldsymbol{\beta }})-{u}_{i}+{v}_{i}=m\left({{\boldsymbol{x}}}_{i};\hspace{0.33em}{\boldsymbol{\beta }})+{\varepsilon }_{i}.Here m(xi;β)m\left({{\boldsymbol{x}}}_{i};\hspace{0.33em}{\boldsymbol{\beta }})represents the production frontier of a firm (or more generally a DMU), with given input vector xi{{\boldsymbol{x}}}_{i}. Observations indexed by i=1,…,Ni=1,\ldots ,Nare assumed to be independent and identically distributed. Our use of β{\boldsymbol{\beta }}is to clearly signify that we are parametrically specifying our production function, most commonly as a Cobb-Douglas or translog (see, e.g., [67] or [68] for a detailed treatment on nonparametric estimation of the SFM).2.1Canonical independence frameworkThe main difference between a standard production function setup and the SFM is the presence of two distinct error terms in the model. The ui{u}_{i}term captures inefficiency, shortfall from maximal output dictated by the production technology, while the vi{v}_{i}term captures stochastic shocks. The standard neoclassical production function model assumes full efficiency – so SFA embraces it as a special case, when ui=0{u}_{i}=0, ∀i\forall i, and allows the researcher to test this statistically.Prior to the development of the SFM, approaches which intended to model inefficiency typically ignored vi{v}_{i}leading to estimators of the SFM with less desirable statistical properties: see [1,2,27,71,72,86].It is commonly assumed that inputs are exogenous, in the sense that x{\boldsymbol{x}}is independent of uuand vv, x⊥(u,v){\boldsymbol{x}}\perp \left(u,v), and the two components of the error term are independent, u⊥vu\perp v.Many estimation methods require distributional assumptions for both uuand vv(beyond the assumption of independence). For an assumed distributional pair, one can obtain the implied distribution for εi{\varepsilon }_{i}and then estimate all of the parameters of the SFM with the maximum likelihood estimator (MLE). The most common assumption is that vi∼N(0,σv2){v}_{i}\hspace{0.33em} \sim \hspace{0.33em}N\left(0,{\sigma }_{v}^{2})and ui{u}_{i}is from a Half Normal distribution, N+(0,σu2){N}_{+}\left(0,{\sigma }_{u}^{2}), or ui{u}_{i}is from an Exponential distribution with parameter σu{\sigma }_{u}.The most popular case for the density of the composed error ε\varepsilon is obtained for the Normal Half Normal specification under independence u⊥vu\perp v. According to Aigner et al. [3], the distribution function of a sum of a normal and Truncated Normal was first derived by Weinstein [94]. Let fv{f}_{v}and fu{f}_{u}denote the density of vvand uu, respectively. For the Normal Half Normal case, fv(v)=ϕvσv{f}_{v}\left(v)=\phi \left(\frac{v}{{\sigma }_{v}}\right)and fu(u)=2σuϕuσu{f}_{u}\left(u)=\frac{2}{{\sigma }_{u}}\phi \left(\frac{u}{{\sigma }_{u}}\right), where ϕ(⋅)\phi \left(\cdot )is the standard Normal probability density function (pdf). The closed form expression for the pdf can be obtained by convolution as follows: (2.2)f(ε)=∫0∞fv(ε+u)fu(u)du=2σϕεσΦ−ελσ,f\left(\varepsilon )=\underset{0}{\overset{\infty }{\int }}{f}_{v}\left(\varepsilon +u){f}_{u}\left(u){\rm{d}}u=\frac{2}{\sigma }\phi \left(\frac{\varepsilon }{\sigma }\right)\Phi \left(-\frac{\varepsilon \lambda }{\sigma }\right),where Φ(⋅)\Phi \left(\cdot )is the standard normal cumulative distribution function (cdf), with the parameterization σ=σu2+σv2\sigma =\sqrt{{\sigma }_{u}^{2}+{\sigma }_{v}^{2}}and λ=σu/σv\lambda ={\sigma }_{u}\hspace{0.1em}\text{/}\hspace{0.1em}{\sigma }_{v}. λ\lambda is commonly interpreted as the proportion of variation in ε\varepsilon due to inefficiency. The density of ε\varepsilon in (2.2) can be characterized as that of a Skew Normal random variable with location parameter 0, scale parameter σ\sigma , and skew parameter −λ-\lambda .The pdf of a Skew Normal random variable xxis f(x)=2ϕ(x)Φ(αx)f\left(x)=2\phi \left(x)\Phi \left(\alpha x). The distribution is right skewed if α>0\alpha \gt 0and is left skewed if α<0\alpha \lt 0. We can also place the Normal, Truncated Normal pair of distributional assumptions in this class. The pdf of xxwith location ξ\xi , scale ω\omega , and skew parameter α\alpha is f(x)=2ωϕx−ξωΦαx−ξωf\left(x)=\frac{2}{\omega }\phi \left(\frac{x-\xi }{\omega }\right)\Phi \left(\alpha \left(\frac{x-\xi }{\omega }\right)\right). See [10] and [65] for more details.This connection has only recently appeared in the efficiency and productivity literature [24].It is worth noting that the closed form expression in (2.2) is equivalent to f(ε)=Eufv(ε+u)=Evfu(v−ε),f\left(\varepsilon )={{\mathbb{E}}}_{u}{f}_{v}\left(\varepsilon +u)={{\mathbb{E}}}_{v}{f}_{u}\left(v-\varepsilon ),where expectations are taken with respect to the relevant distribution. This suggests an alternative, simulation-based, way to construct the density by sampling from the distribution of uuor vvand evaluating the corresponding sample averages. Among the two sampling options (from the distribution of uuor from the distribution of vv), sampling the uu’s is more practical as it avoids to need to ensure that v−ε>0v-\varepsilon \gt 0. Sampling uucan be easily done by sampling from the standard normal distribution and taking the absolute values σu∣N(0,1)∣{\sigma }_{u}| N\left(0,1)| (in the case of the Half Normal distribution).With modern statistical software it is straightforward to sample from a wide swath of one-sided distributions that have been suggested in the SFA literature: Exponential, Gamma, Truncated Normal, Weibull, Beta, Uniform, Binomial, Generalized Exponential, etc.Our mathematical formulation will focus on a production frontier as it is the most popular object of study. The framework for dual characterizations (e.g., cost, revenue, profit) or other frontiers is similar and follows with only minor changes in notation. For example, the cost function formulation is obtained by changing the sign in front of uuto a “++,” which will represent excess, rather than shortfall, of cost above the minimum level.2.2Modeling dependenceSmith [81] relaxed the assumption of independence between uuand vvby introducing a copula function to model their joint distribution. This is one of the first relaxations of the independence assumptions available for SFA and it allowed testing the adequacy of this assumption. If the marginal distributions of uuand vvare linked by a copula density c(⋅,⋅)c\left(\cdot ,\cdot ), then their joint density can be expressed as follows: (2.3)f(v,u)=fv(v)fu(u)c(Fv(v),Fu(u)),f\left(v,u)={f}_{v}\left(v){f}_{u}\left(u)c\left({F}_{v}\left(v),{F}_{u}\left(u)),where Fu{F}_{u}and Fv{F}_{v}denote the respective cdfs. It then follows by a similar construction to (2.2) that the density of ε\varepsilon can be written as f(ε)=∫0∞fv(ε+u)fu(u)c(Fv(ε+u),Fu(u))du.f\left(\varepsilon )=\underset{0}{\overset{\infty }{\int }}{f}_{v}\left(\varepsilon +u){f}_{u}\left(u)c\left({F}_{v}\left(\varepsilon +u),{F}_{u}\left(u)){\rm{d}}u.For commonly used copula families, this density does not have a close form expression similar to (2.2), even in the Normal Half Normal case, so a simulation-based approach would often need to be used, where we simulate many draws of uuand evaluate the sample analogue of the following expectation with respect to the distribution of uu: f(ε)=Eu[fv(ε+u)fu(u)c(Fv(ε+u),Fu(u))].f\left(\varepsilon )={{\mathbb{E}}}_{u}[{f}_{v}\left(\varepsilon +u){f}_{u}\left(u)c\left({F}_{v}\left(\varepsilon +u),{F}_{u}\left(u))].Smith [81] found that ignoring the dependence can lead to biased estimates and discussed how one can test whether the independence assumption nested within this model is adequate. It is easy to see that the model in (2.2) is a special case of (2.3) when c(⋅,⋅)c\left(\cdot ,\cdot )is the independence (or product) copula.From f(ε)f\left(\varepsilon ), along with the assumption of independence over ii, the log-likelihood function can be written as follows: (2.4)lnℒ=ln∏i=1nf(εi)=∑i=1nlnf(εi),\mathrm{ln}{\mathcal{ {\mathcal L} }}=\mathrm{ln}\left(\mathop{\prod }\limits_{i=1}^{n}f\left({\varepsilon }_{i})\right)=\mathop{\sum }\limits_{i=1}^{n}\mathrm{ln}f\left({\varepsilon }_{i}),where εi=yi−m(xi;β){\varepsilon }_{i}={y}_{i}-m\left({{\boldsymbol{x}}}_{i};\hspace{0.33em}{\boldsymbol{\beta }}). The SFM can be estimated using the traditional MLE, if an analytic expression for the integrals is available, or the maximum simulated likelihood estimator (MSLE), if we need to use a simulation-based approach to evaluate the integrals of the density [60].Burns [21], who was Smith’s student, proposed using MSLE, and while predating Smith [81] actually attributes a working paper version.The benefit of MLE/MSLE is that under the assumption of correct distributional specification of ε\varepsilon , the MLE is asymptotically efficient (i.e., consistent, asymptotically Normal, and its asymptotic variance reaches the Cramer–Rao lower bound). A further benefit is that a range of testing options are available. For instance, tests related to β{\boldsymbol{\beta }}can easily be undertaken using any of the classic trilogy of tests: Wald, Lagrange multiplier, or likelihood ratio. The ability to readily and directly conduct asymptotic inference is one of the major benefits of SFA over data envelopment analysis (DEA).This in no way suggests that inference cannot be undertaken when the DEA estimator is deployed; rather, the DEA estimator has an asymptotic distribution which is much more complicated that the MLE for the SFM, and so direct asymptotic inference is not available; bootstrapping techniques are required for many of the most popular DEA estimators [78,79].Two main issues that practitioners face when confronting dependence are the choice of copula model and the assumed error distributions that best fit the data. As Wiboonpongse et al. ([95], p. 34) note “The impact of the independence assumption on technical efficiency estimation has long remained an open issue.” Analytical criteria such as AIC or BIC can be used for these purposes, see both [9] and [95] for detailed reviews.More specifically, Wiboonpongse et al. [95] use MSLE and systematically consider several copula families including the Student-t, Clayton, Gumbel and Joe families as well as their relevant rotated versions. Their data is a cross section from coffee production in Thailand and they use AIC and BIC to determine which copula model is most appropriate. Wiboonpongse et al. [95] also assume the marginals of the two error components are Normal and Half Normal, then apply a range of copulas to inject dependence. In their empirical application they have a total of 111 observations. The Clayton copula is found to be the best and plots of technical efficiencies across 111 farmers for independence and the best fitting copula model found near uniformly lower TE scores (though not much different). Finally, it also appears that the ranks are preserved (see their Figure 3).An unintended benefit of modelling dependence is that it may alleviate the “wrong skewness” issue that is common in the canonical Normal Half Normal SFM [77,89]. The wrong skewness issue arises when the OLS residuals display skewness that is of the wrong sign compared to that stemming from the frontier model (so positive when estimating a production frontier). For specific distributional pairs the model cannot separately identify the variance parameters for both vvand uu. It was noted in [19] that the third central moment of the composed error is (2.5)E[(ε−E[ε])3]=E[(v−E[v])3]−E[(u−E[u])3]+3Cov(u2,v)−3Cov(u,v2)−6(E[u]−E[v])Cov(u,v).E{[}{(\varepsilon -E\left[\varepsilon ])}^{3}]=E{[}{(v-E\left[v])}^{3}]-E{[}{(u-E\left[u])}^{3}]+3{\rm{Cov}}\left({u}^{2},v)-3{\rm{Cov}}\left(u,{v}^{2})-6(E\left[u]-E\left[v]){\rm{Cov}}\left(u,v).It is clear that the skewness of ε\varepsilon only depends on the skewness of uuwhen vvis assumed to be symmetric and independent of uu. Once uuand vvare allowed to be dependent and/or vvis allowed to be asymmetric, then the skewness of the composed error does not have to align with the skewness of inefficiency. Thus, modelling dependence is one way in which some of the empirical vagaries of the SFM can be overcome [93].2.2.1Asymmetric dependenceA common feature of all of the papers that have allowed dependence in the SFM is the use of copulas that introduce symmetric dependence. Symmetric dependence assumes that the noise vvand inefficiency components are treated equally in the SFM. However, a recent suggestion by Wei et al. [91] offers a set of copulas that allow for asymmetric dependence. As Wei et al. ([91], p. 57) note “[…]in practical situations, the inefficiency component uuand the error component often play different roles in global inefficiency, and in such cases, the symmetric copulas are not suitable.” They define asymmetric copulas as those that have non-exchangeable and/or radial asymmetric properties [92].Wei et al. [91] introduced the Skew Normal copula and used it to construct their SFM with dependence. An interesting feature of their general setup is that they allow both vvand uuto be asymmetric along with an asymmetric copula (see their Proposition 3.1). As in [95], Wei et al. [91] recommended selecting the copula model based on AIC/BIC. In their empirical application 31 out of 108 farms have the same efficiency rank (the bottom 5 are in complete agreement as are 4 of the top 5) across the standard SFM and the asymmetric copula SFM. The point estimates of technical efficiency however show large differences among the two competing models, which again provides evidence that ignoring dependence can have an undue influence on the point estimates of technical inefficiency.3Dependence via sample selectionAnother way in which dependence can arise is through sample selection. By itself sample selection has only recently been a serious area of focus in the stochastic frontier literature. Several early approaches to deal with potential sample selection follow the two-step correction [37]. In the first stage, the probability of selection is estimated and the inverse Mill’s ratio is calculated for each observation. This estimated inverse Mill’s ratio is then included as a regressor in the final SFM. An example of this is the Finnish farming study [80]. This limited information two-step approach works in a standard linear regression framework because of linearity, which Greene [33] makes clear. However, as shown in [51], when inefficiency is present no two-step approach will work and full information maximum likelihood estimation is required.Recognizing the limitations of direct application of the two-stage approach, both Kumbhakar et al. [51] and Greene [33] proposed alternative stochastic frontier selection models. The two approaches differ in how selection arises in the model. The Greene [33] model allows the choice of technology to be influenced by correlation between random error in the selection and frontier models, whereas Kumbhakar et al. [51] constructed a model where the choice of technology is based on some aspect of inefficiency, inducing a different form of sample selection. Beyond the difference in how selection arises, the sample selection stochastic production frontier models [51] and [33] are identical.Sriboonshitta et al. [83] were the first to recognize that dependence could enter the sample selection model. They work with the Greene [33] stochastic frontier sample selection model and admit dependence into the composite error term. This is termed a double-copula because they have a copula in the sample selection equation and a copula in the SFM. See ([83], equation (20)) for the likelihood function.Beyond a small set of simulations, Sriboonshitta et al. [83] applied the double copula sample selection SFM to 200 rice farmers from Kamphaeng Phet province, Thailand, in 2012 using a Cobb-Douglas production frontier and considered eight different copula functions (see their Table 4). Their preferred model based on the AIC is a Gaussian copula with 270-degree rotated Clayton model. They find a substantial difference in estimated TE scores between the Greene [33] model which assumes independence and their double-copula model (see their Figure 5). As Sriboonshitta et al. ([83], p. 183) note “[…]improperly assuming independence between the two components of the error term in the SFM may result in biased estimates of technical efficiency scores, hence potentially leading to wrong conclusions and recommendations.”As a further extension of [83], Liu et al. [59] noted that “this double-copula model neglects the correlation between the unobservables in the selection model and the random error in the SFM, in contrast to Greene’s model.” Liu et al. [59] generalized the Greene [33] model by modeling the dependence between the unobservables in the selection equation and the two error terms in the production equation using a trivariate Gaussian copula. The key feature is that the trivariate and double copula models rely on different assumptions concerning the joint distribution of vv, uu, and ξ\xi (ξ\xi here is the error in the selection equation). Liu et al. [59] made note of the decomposition (3.1)f(v,u,ξ)=f(v)f(u∣v)f(ξ∣v,u),f\left(v,u,\xi )=f\left(v)f\left(u| v)f\left(\xi | v,u),and note that the double copula model assumes that f(ξ∣v,u)=f(ξ∣v−u)f\left(\xi | v,u)=f\left(\xi | v-u), i.e., the distribution of ξ\xi only depends on the composite error, not the individual pieces. This also implies that the double copula model and the trivariate copula model are nonnested.Liu et al. [59] provided an application that focuses on Jasmine/non-Jasmine rice farming in Thailand. The data suggest uuis Gamma distributed for the most preferred model. As with some of the earlier papers, Liu et al. ([59], p. 193) noted that “[…]both Greene’s model and the double-copula model appear to overestimate technical efficiency. According to the [trivariate Gaussian copula] model, farmers also exhibit a wider range of production technical efficiency in Jasmine rice farming […].”4Dependence in panel SFMWhen repeated observations of the firms are available, then we can allow for richer models that incorporate unobserved components and various other dependence structures. Most importantly, we can extract information about likely time trends in inefficiency and time constant firm-specific characteristics. Pitt and Lee [69] seem to be the first to extend the cross-sectional SFM to a panel structure, and Schmidt and Sickles [76] were the first to propose a panel-specific methodology for SFA.4.1A benchmark specificationThe benchmark panel SFM can be written as follows: (4.1)yit=m(xit;β)+ci−ηi−uit+vit=m(xit;β)−αi+εit.{y}_{it}=m\left({{\boldsymbol{x}}}_{it};\hspace{0.33em}{\boldsymbol{\beta }})+{c}_{i}-{\eta }_{i}-{u}_{it}+{v}_{it}=m\left({{\boldsymbol{x}}}_{it};\hspace{0.33em}{\boldsymbol{\beta }})-{\alpha }_{i}+{\varepsilon }_{it}.This model differs from (2.1) in many ways. All observed variables and error terms inherited from (2.1) now have a double-index for both firms, ii, and time, t=1,…,T.t=1,\ldots ,T.In addition, the model contains the so-called firm-specific heterogeneity, ci{c}_{i}, and the time-invariant component of inefficiency ηi{\eta }_{i}. Compared with (2.1), ci{c}_{i}encapsulates any unobserved factors that affect output (other than inputs) without changing over time such as unmeasured management or operational specifics of the firm. If such factors are present, the dependence between ci{c}_{i}and xi{{\boldsymbol{x}}}_{i}causes omitted variable biases and invalidates inference based on cross-sectional SFM. In panel models, when ignored, such factors can serve as common sources of dependence in the error term εit{\varepsilon }_{it}which also leads to invalid inference.Another distinguishing feature of (4.1) is the presence of ηi{\eta }_{i}, a component of inefficiency which is time-invariant. This means that inefficiency is composed of both time-invariant and time variant components, which are sometimes interpreted as long-run and short-run inefficiency. Since both ci{c}_{i}and ηi{\eta }_{i}are unobserved, it will generally be difficult to decompose αi{\alpha }_{i}into its subsequent firm-specific and time-invariant inefficiency components.Classical panel methods (i.e., methods that assume that ηi{\eta }_{i}and uit{u}_{it}do not exist) allow for various forms of dependence between ci{c}_{i}and xit{{\boldsymbol{x}}}_{it}. For example, estimation under the fixed effects (FE) framework allows xit{{\boldsymbol{x}}}_{it}to be correlated with ci{c}_{i}and uses a within transformation to obtain a consistent estimator of β{\boldsymbol{\beta }}. Alternatively, estimation in the random effects (RE) framework assumes that xit{{\boldsymbol{x}}}_{it}and ci{c}_{i}are independent and uses OLS or GLS. The difference between OLS and GLS arises due to the fact that the variance-covariance matrix of the composed error term ci+vit{c}_{i}+{v}_{it}is no longer diagonal, and so, feasible GLS is asymptotically efficient.The early work on panel SFM assumed inefficiency to be time-invariant. This allowed handling dependence within panels using classical panel methods such as FE and RE estimation [76]. The standard time-invariant SFM is (4.2)yit=β0+xit′β+vit−ηi=(β0−ηi)+xit′β+vit=ci+xit′β+vit,{y}_{it}={\beta }_{0}+{{\boldsymbol{x}}}_{it}^{^{\prime} }{\boldsymbol{\beta }}+{v}_{it}-{\eta }_{i}=\left({\beta }_{0}-{\eta }_{i})+{{\boldsymbol{x}}}_{it}^{^{\prime} }{\boldsymbol{\beta }}+{v}_{it}={c}_{i}+{{\boldsymbol{x}}}_{it}^{^{\prime} }{\boldsymbol{\beta }}+{v}_{it},where ci≡β0−ηi{c}_{i}\equiv {\beta }_{0}-{\eta }_{i}. Under the FE framework, the serially independent inefficiency ηi,i=1,…,n{\eta }_{i},\hspace{0.33em}i=1,\ldots ,n, is allowed to have arbitrary dependence with xit{{\boldsymbol{x}}}_{it}. In cases in which there are time-invariant variables of interest in the production model, one can use the RE framework, which also requires no distributional assumptions on vvand η\eta and can be estimated with OLS or GLS. Alternatively, in such cases, one can rely on distributional assumptions as in [69], where vit{v}_{it}is assumed to follow a Normal distribution and ηi{\eta }_{i}Half Normal.Table 1 contains a summary of the classical SFMs allowing for specific forms of serial dependence in uit{u}_{it}. It also lists any additional dependence structures permitted in these different models such as dependence between ci{c}_{i}and xit{{\boldsymbol{x}}}_{it}. See [67] and [50] for a detailed discussion of these methods.Table 1Selection of panel data methodsPaperSerial dependence in inefficiencyOther dependence allowedSchmidt and Sickles [76]NoneBetween ηi{\eta }_{i}and xit{{\boldsymbol{x}}}_{it}Cornwell et al. [26]uit=c0i+c1it+c2it2{u}_{it}={c}_{0i}+{c}_{1i}t+{c}_{2i}{t}^{2}Between ηi{\eta }_{i}and xit{{\boldsymbol{x}}}_{it}Kumbhakar [47]uit=[1+exp(γ1t+γ2t2)]−1ui{u}_{it}={{[}1+\exp ({\gamma }_{1}t+{\gamma }_{2}{t}^{2})]}^{-1}{u}_{i}NoneBattese and Coelli [13]uit=exp[γ(t−T)]G(t)ui{u}_{it}=\exp {[}\gamma \left(t-T)]G\left(t){u}_{i}NoneLee and Schmidt [57]uit=ηilt{u}_{it}={\eta }_{i}{l}_{t}Between ηi{\eta }_{i}and xit{{\boldsymbol{x}}}_{it}Battese and Coelli [14]Noneuit∼N+(zit′δ,σu2){u}_{it}\hspace{0.33em} \sim \hspace{0.33em}{N}_{+}\left({{\boldsymbol{z}}}_{it}^{^{\prime} }{\boldsymbol{\delta }},{\sigma }_{u}^{2})Kumbhakar and Heshmati [48]uit=ηi+τit{u}_{it}={\eta }_{i}+{\tau }_{it}NoneGreene [31]NoneBetween ci{c}_{i}and xit{{\boldsymbol{x}}}_{it}Greene [32]NoneBetween ci{c}_{i}and xit{{\boldsymbol{x}}}_{it}uit∼∣N(zit′δ,σu2)∣{u}_{it}\hspace{0.33em} \sim \hspace{0.33em}| N\left({{\boldsymbol{z}}}_{it}^{^{\prime} }{\boldsymbol{\delta }},{\sigma }_{u}^{2})| Wang and Ho [90]NoneBetween ci{c}_{i}and xit{{\boldsymbol{x}}}_{it}uit=exp(zi′δ)ui∗{u}_{it}=\exp \left({{\boldsymbol{z}}}_{i}^{^{\prime} }{\boldsymbol{\delta }}){u}_{i}^{\ast }ui∗∼N+(0,σu2){u}_{i}^{\ast }\hspace{0.33em} \sim \hspace{0.33em}{N}_{+}\left(0,{\sigma }_{u}^{2})Badunenko and Kumbhakar [11]uit=ηi+τit{u}_{it}={\eta }_{i}+{\tau }_{it}ηi∼N+(0,σηi2){\eta }_{i}\hspace{0.33em} \sim \hspace{0.33em}{N}_{+}\left(0,{\sigma }_{\eta i}^{2})σηi=σηexp(z0i′δ0){\sigma }_{\eta i}={\sigma }_{\eta }\exp \left({{\boldsymbol{z}}}_{0i}^{^{\prime} }{{\boldsymbol{\delta }}}_{0})τit∼N+(0,στit2){\tau }_{it}\hspace{0.33em} \sim \hspace{0.33em}{N}_{+}\left(0,{\sigma }_{\tau it}^{2})στit=στexp(zit′δ){\sigma }_{\tau it}={\sigma }_{\tau }\exp \left({{\boldsymbol{z}}}_{it}^{^{\prime} }{\boldsymbol{\delta }})4.2Quasi MLEIf there is no (ci,ηi)\left({c}_{i},{\eta }_{i})or if (ci,ηi)\left({c}_{i},{\eta }_{i})is assumed to be part of uit{u}_{it}or vit{v}_{it}, which are independent of x{\boldsymbol{x}}, then we can view the panel model as a special case of the cross-sectional model (2.1), only with the double-index itit. The MLE method described in Section 2 applies in this case but it uses the sample likelihood obtained leveraging the assumption of independence over both iiand tt, not just ii. Because independence over ttis questionable in panels, this version of MLE is often referred to as quasi-MLE (QMLE).Let θ=(β′,σ2,λ)′{\boldsymbol{\theta }}=\left({\boldsymbol{\beta }}^{\prime} ,{\sigma }^{2},\lambda )^{\prime} and let fit{f}_{it}denote the density of the composed error term evaluated at εit=vit−uit{\varepsilon }_{it}={v}_{it}-{u}_{it}. Then, (4.3)fit(θ)=f(εit)=f(yit−xit′β){f}_{it}\left({\boldsymbol{\theta }})=f\left({\varepsilon }_{it})=f({y}_{it}-{{\boldsymbol{x}}}_{it}^{^{\prime} }{\boldsymbol{\beta }})and the QMLE of θ{\boldsymbol{\theta }}can be written as (4.4)θˆQMLE=argmaxθ∑i∑tlnfit(θ).{\hat{{\boldsymbol{\theta }}}}_{{\rm{\text{QMLE}}}}=\arg \mathop{\max }\limits_{{\boldsymbol{\theta }}}\sum _{i}\sum _{t}\mathrm{ln}{f}_{it}\left({\boldsymbol{\theta }}).The QMLE is known to be consistent even if there is no independence over ttbut to obtain the correct standard errors one needs to use the so-called “sandwich,” or misspecification-robust, estimator of the QMLE asymptotic variance matrix. QMLE is known to be dominated in terms of precision by several other estimators that use the dependence information explicitly (and correctly). However, an appeal of the QMLE in this setting is that assuming independence is more innocuous in the sense that it does not lead to estimation bias, only to a lack of precision, when compared to a misspecification of the type of dependence that can lead to distinct biases.Amsler et al. [5] proposed several estimators that model time dependence in panels. One such estimator can be obtained in the Generalized Method of Moments (GMM) framework. Let sit(θ){s}_{it}\left({\boldsymbol{\theta }})denote the score of the density function fit(θ){f}_{it}\left({\boldsymbol{\theta }}), i.e., (4.5)sit(θ)=∇θlnfit(θ),{s}_{it}\left({\boldsymbol{\theta }})={\nabla }_{\theta }\mathrm{ln}{f}_{it}\left({\boldsymbol{\theta }}),where ∇θ{\nabla }_{\theta }denotes the gradient with respect to θ{\boldsymbol{\theta }}. Then, the QMLE of θ{\boldsymbol{\theta }}solves ∑i∑tsit(θˆQMLE)=0\sum _{i}\sum _{t}{s}_{it}\left({\hat{{\boldsymbol{\theta }}}}_{{\rm{\text{QMLE}}}})=0and is identical to the GMM estimator based on the moment condition (4.6)E∑tsit(θ)=0,{\mathbb{E}}\sum _{t}{s}_{it}\left({\boldsymbol{\theta }})=0,where expectation is with respect to the distribution of ε\varepsilon . However, under time dependence, summation (over tt) of the scores in (4.6) is not the optimal weighting. The theory of optimal GMM suggests using correlation of sit{s}_{it}over ttby applying the GMM machinery to the TTscore functions written as follows: Esi1(θ)⋮siT(θ)=0.{\mathbb{E}}\left[\begin{array}{c}{s}_{i1}\left({\boldsymbol{\theta }})\\ \vdots \\ {s}_{iT}\left({\boldsymbol{\theta }})\end{array}\right]=0.The optimal GMM estimator based on these moment conditions has the smallest asymptotic variance than that of any other estimator using these moment conditions. In a classical (non-SFA) panel data setting, Prokhorov and Schmidt [70] call this estimator Improved QMLE (IQMLE).4.3Using a CopulaAlternative estimators that allow explicit modelling of dependence between cross-sectional errors over tthave to construct a joint distribution of those errors. Amsler et al. [5] offered two ways of doing so. One is to apply a copula to form fε{f}_{{\boldsymbol{\varepsilon }}}, the joint (over tt) density of the composed errors εi=(εi1,…,εiT){{\boldsymbol{\varepsilon }}}_{i}=\left({\varepsilon }_{i1},\hspace{-0.18em}\ldots ,{\varepsilon }_{iT}). The other is to use a copula to form fu{f}_{{\bf{u}}}, the joint distribution of ui=(ui1,…,uiT){{\boldsymbol{u}}}_{i}=\left({u}_{i1},\hspace{-0.18em}\ldots ,{u}_{iT}).Given the Normal/Half Normal marginals of ε\varepsilon ’s in (2.2) and a copula density c(⋅,…,⋅)c\left(\cdot ,\ldots ,\cdot ), the joint density fε{f}_{{\boldsymbol{\varepsilon }}}can be written as follows: fε(εi;θ)=c(Fi1(θ),…,FiT(θ))⋅fi1(θ)⋅…fiT(θ),{f}_{{\boldsymbol{\varepsilon }}}\left({{\boldsymbol{\varepsilon }}}_{i};\hspace{0.33em}{\boldsymbol{\theta }})=c\left({F}_{i1}\left({\boldsymbol{\theta }}),\hspace{-0.18em}\ldots ,{F}_{iT}\left({\boldsymbol{\theta }}))\cdot {f}_{i1}\left({\boldsymbol{\theta }})\cdot \ldots {f}_{iT}\left({\boldsymbol{\theta }}),where, as before, fit(θ)≡f(εit)=f(yit−xit′β){f}_{it}\left({\boldsymbol{\theta }})\equiv f\left({\varepsilon }_{it})=f({y}_{it}-{{\boldsymbol{x}}}_{it}^{^{\prime} }{\boldsymbol{\beta }})is the pdf of the composed error term evaluated at εit{\varepsilon }_{it}and Fit(θ)≡∫−∞εitf(s)ds{F}_{it}\left(\theta )\equiv {\int }_{-\infty }^{{\varepsilon }_{it}}f\left(s){\rm{d}}sis the corresponding cdf. Once the joint density is obtained we can construct a log-likelihood and run MLE. If we let the copula density have a scalar parameter ρ\rho , then the sample log-likelihood can be written as follows: (4.7)lnℒ(θ,ρ)=∑i=1n(lnc(Fi1(θ),…,FiT(θ);ρ)+lnfi1(θ)+…+lnfiT(θ)).\mathrm{ln}{\mathcal{ {\mathcal L} }}\left({\boldsymbol{\theta }},\rho )=\mathop{\sum }\limits_{i=1}^{n}(\mathrm{ln}c\left({F}_{i1}\left({\boldsymbol{\theta }}),\hspace{-0.18em}\ldots ,{F}_{iT}\left({\boldsymbol{\theta }});\hspace{0.33em}\rho )+\mathrm{ln}{f}_{i1}\left({\boldsymbol{\theta }})+\hspace{-0.18em}\ldots +\mathrm{ln}{f}_{iT}\left({\boldsymbol{\theta }})).The first term in the summation is what distinguishes this likelihood from QMLE – an explicit modelling of dependence between the composed errors at different tt.In a GMM framework, the MLE that maximizes (4.7) is identical to the GMM estimator based on the moment conditions (4.8)E∇θlnci(θ,ρ)+∇θlnfi1(θ)+…+∇θlnfiT(θ)∇ρlnci(θ,ρ)=0,{\mathbb{E}}\left[\begin{array}{c}{\nabla }_{\theta }\mathrm{ln}{c}_{i}\left({\boldsymbol{\theta }},\rho )+{\nabla }_{\theta }\mathrm{ln}{f}_{i1}\left({\boldsymbol{\theta }})+\hspace{-0.18em}\ldots +{\nabla }_{\theta }\mathrm{ln}{f}_{iT}\left({\boldsymbol{\theta }})\\ {\nabla }_{\rho }\mathrm{ln}{c}_{i}\left({\boldsymbol{\theta }},\rho )\end{array}\right]=0,where ci(θ,ρ)=c(Fi1(θ),…,FiT(θ);ρ){c}_{i}\left(\theta ,\rho )=c\left({F}_{i1}\left(\theta ),\hspace{-0.18em}\ldots ,{F}_{iT}\left(\theta );\hspace{0.33em}\rho ). Again, efficiency improvement is, in some circumstances, possible if we instead use the optimal GMM machinery on the moment conditions E∇θlnfi1(θ)…∇θlnfiT(θ)∇θlnci(θ,ρ)∇ρlnci(θ,ρ)=0.{\mathbb{E}}\left[\begin{array}{c}{\nabla }_{\theta }\mathrm{ln}{f}_{i1}\left({\boldsymbol{\theta }})\\ \ldots \\ {\nabla }_{\theta }\mathrm{ln}{f}_{iT}\left({\boldsymbol{\theta }})\\ {\nabla }_{\theta }\mathrm{ln}{c}_{i}\left({\boldsymbol{\theta }},\rho )\\ {\nabla }_{\rho }\mathrm{ln}{c}_{i}\left({\boldsymbol{\theta }},\rho )\end{array}\right]=0.However, this improvement may now come at the price of a bias as the copula-based moment conditions may be misspecified causing inconsistency of GMM and offsetting any benefit of higher precision. So assuming a wrong kind of time dependence may be worse than assuming independence (over time). Prokhorov and Schmidt [70] explored these circumstances.The alternative copula-based specification is to form the joint distribution of (ui1,…,uiT)\left({u}_{i1},\hspace{-0.18em}\ldots ,{u}_{iT})rather than that of (εi1,…,εiT)\left({\varepsilon }_{i1},\hspace{-0.18em}\ldots ,{\varepsilon }_{iT}). A challenge of this specification is that a TT-dimensional integration will be needed to form the likelihood in this case. Let fu(u;θ,ρ){f}_{{\bf{u}}}\left({\bf{u}};\hspace{0.33em}{\boldsymbol{\theta }},\rho )denote the copula-based joint density of the one-sided error vector and let fu(u;θ){f}_{u}\left(u;\hspace{0.33em}{\boldsymbol{\theta }})denote the marginal density of an individual one-sided (Half Normal) error term. Then, (4.9)fu(u;θ,ρ)=c(Fu(u1;θ),…,Fu(uT;θ);ρ)⋅fu(u1;θ)⋯fu(uT;θ),{f}_{{\bf{u}}}\left({\bf{u}};\hspace{0.33em}{\boldsymbol{\theta }},\rho )=c\left({F}_{u}\left({u}_{1};\hspace{0.33em}{\boldsymbol{\theta }}),\hspace{-0.18em}\ldots ,{F}_{u}\left({u}_{T};\hspace{0.33em}{\boldsymbol{\theta }});\hspace{0.33em}\rho )\cdot {f}_{u}\left({u}_{1};\hspace{0.33em}{\boldsymbol{\theta }})\cdots {f}_{u}\left({u}_{T};\hspace{0.33em}{\boldsymbol{\theta }}),where Fu(u;θ)≡∫0ufu(s;θ)ds{F}_{u}\left(u;\hspace{0.33em}{\boldsymbol{\theta }})\equiv {\int }_{0}^{u}{f}_{u}\left(s;\hspace{0.33em}{\boldsymbol{\theta }}){\rm{d}}sis the cdf of the Half Normal error term.To form the sample likelihood we need the joint density of the composed error vector ε{\boldsymbol{\varepsilon }}. Given the density of u{\bf{u}}and assuming, as before, that v⊥uv\perp u, this density can be obtained as follows: (4.10)fε(ε;θ,ρ)=∫0∞⋯∫0∞fv(ε+u;θ)fu(u;θ,ρ)du1⋯duT=Eu(θ,ρ)ϕ(ε+u),{f}_{{\boldsymbol{\varepsilon }}}\left({\boldsymbol{\varepsilon }};\hspace{0.33em}{\boldsymbol{\theta }},\rho )=\underset{0}{\overset{\infty }{\int }}\cdots \underset{0}{\overset{\infty }{\int }}{f}_{{\bf{v}}}\left({\boldsymbol{\varepsilon }}+{\bf{u}};\hspace{0.33em}{\boldsymbol{\theta }}){f}_{{\bf{u}}}\left({\bf{u}};\hspace{0.33em}{\boldsymbol{\theta }},\rho ){\rm{d}}{u}_{1}\cdots {\rm{d}}{u}_{T}={{\mathbb{E}}}_{{\bf{u}}\left({\boldsymbol{\theta }},\rho )}\phi \left({\boldsymbol{\varepsilon }}+{\bf{u}}),where Eu(θ,ρ){{\mathbb{E}}}_{{\bf{u}}\left({\boldsymbol{\theta }},\rho )}denotes the expectation with respect to fu(u;θ,ρ){f}_{{\bf{u}}}\left({\bf{u}};\hspace{0.33em}{\boldsymbol{\theta }},\rho )and fv(v;θ)=ϕ(v){f}_{{\bf{v}}}\left({\bf{v}};\hspace{0.33em}{\boldsymbol{\theta }})=\phi \left({\bf{v}})is the multivariate Normal pdf of v{\bf{v}}, where all vv’s are independent and have equal variance σv2{\sigma }_{v}^{2}.Similar to the previous section, this integral has no analytical form. Additionally, this is a TT-dimensional integral, which is computationally strenuous to evaluate using numerical methods. However, it has the form of an expectation over a distribution we can sample from and this, as before, permits application of MSLE, where we simulate the u{\bf{u}}’s and estimate fε(ε){f}_{{\boldsymbol{\varepsilon }}}\left({\boldsymbol{\varepsilon }})by averaging ϕ(ε+u)\phi \left({\boldsymbol{\varepsilon }}+{\bf{u}})over the draws. To be precise, let SSdenote the number of simulations. The direct simulator of fε(ε){f}_{{\boldsymbol{\varepsilon }}}\left({\boldsymbol{\varepsilon }})can be written as follows: fˆε(ε;θ,ρ)=1S∑s=1Sϕ(ε+us(θ,ρ)),{\hat{f}}_{{\boldsymbol{\varepsilon }}}\left({\boldsymbol{\varepsilon }};\hspace{0.33em}{\boldsymbol{\theta }},\rho )=\frac{1}{S}\mathop{\sum }\limits_{s=1}^{S}\phi \left({\boldsymbol{\varepsilon }}+{{\bf{u}}}^{s}\left({\boldsymbol{\theta }},\rho )),where us(θ,ρ){{\bf{u}}}^{s}\left({\boldsymbol{\theta }},\rho )is a draw from fu(u;θ,ρ){f}_{{\bf{u}}}\left({\bf{u}};\hspace{0.33em}{\boldsymbol{\theta }},\rho )constructed in (4.9). Then, a simulated log-likelihood can be obtained as follows: lnℒs(θ,ρ)=∑ilnfˆε(εi;θ,ρ),\mathrm{ln}{{\mathcal{ {\mathcal L} }}}^{s}\left({\boldsymbol{\theta }},\rho )=\sum _{i}\mathrm{ln}{\hat{f}}_{{\boldsymbol{\varepsilon }}}\left({{\boldsymbol{\varepsilon }}}_{i};\hspace{0.33em}{\boldsymbol{\theta }},\rho ),where, as before, εi=(εi1,…,εiT){{\boldsymbol{\varepsilon }}}_{i}=\left({\varepsilon }_{i1},\hspace{-0.18em}\ldots ,{\varepsilon }_{iT})and εit=yit−xit′β{\varepsilon }_{it}={y}_{it}-{{\boldsymbol{x}}}_{it}^{^{\prime} }{\boldsymbol{\beta }}.This method is a multivariate extension to the simulation-based estimation of univariate densities discussed earlier. An important additional requirement is the ability to sample from the copula; see ([64], Ch. 2) for a discussion of how to sample from the copula to allow dependence. Other than that, similar asymptotic arguments suggest that MSLE is asymptotically equivalent to MLE [29].5Dependence due to endogeneityA common assumption in the SFM is that x{\boldsymbol{x}}is either exogenous or independent of both uuand vv. If either of these conditions are violated, then all of the estimators discussed so far will be biased and most likely inconsistent. Yet, it is not difficult to think of settings where endogeneity is likely to exist. For example, if shocks are observed before inputs are chosen, then producers may respond to good or bad shocks by adjusting inputs, leading to correlation between x{\boldsymbol{x}}and vv. Alternatively, if managers know they are inefficient, they may use this information to guide their level of inputs, again, producing endogeneity. In a regression model, dealing with endogeneity is well understood. However, in the composed error setting, these methods cannot be simply transferred over, but require care in how they are implemented [6].To incorporate endogeneity into the SFM in (2.1), we set m(xi;β)=β0+x1i′β1+x2i′β2m\left({{\boldsymbol{x}}}_{i};\hspace{0.33em}{\boldsymbol{\beta }})={\beta }_{0}+{{\boldsymbol{x}}}_{1i}^{^{\prime} }{{\boldsymbol{\beta }}}_{1}+{{\boldsymbol{x}}}_{2i}^{^{\prime} }{{\boldsymbol{\beta }}}_{2}, where x1{{\boldsymbol{x}}}_{1}are our exogenous inputs, and x2{{\boldsymbol{x}}}_{2}are the endogenous inputs, where endogeneity may arise through correlation of x2{{\boldsymbol{x}}}_{2}with uu, vv, or both. To deal with endogeneity we require instruments, w{\boldsymbol{w}}, and identification necessitates that the dimension of w{\boldsymbol{w}}is at least as large as the dimension of x2{{\boldsymbol{x}}}_{2}. The natural assumption for valid instrumentation is that w{\boldsymbol{w}}is independent of both uuand vv.Why worry about endogeneity? Economic endogeneity means that the inputs in question are choice variables and chosen to optimize some objective function such as cost minimization or profit maximization. Statistical endogeneity arises from simultaneity, omitted variables, and measurement errors. For example, if the omitted variable is managerial ability, which is part of inefficiency, inefficiency is likely to be correlated with inputs because managerial ability affects inputs. This is the Mundlak argument for why omitting a management quality variable (for us inefficiency) will cause biased parameter estimates. Endogeneity can also be caused by simultaneity meaning that more than one variable in the model are jointly determined. In many applied settings, it is not clear what researchers mean when they attempt to handle endogeneity inside the SFM. An excellent introduction into the myriad of influences that endogeneity can have on the estimates stemming from the SFM can be found in [63]. Mutter et al. [63] used simulations designed around data based on the California nursing home industry to understand the impact of endogeneity of nursing home quality on inefficiency measurement.The simplest approach to accounting for endogeneity is to use a corrected two-stage least squares (C2SLS) approach, similar to the common correct ordinary least squares (COLS) approach that has been used to estimate the SFM. This method estimates the SFM using standard two-stage least squares (2SLS) with instruments w{\boldsymbol{w}}. This produces consistent estimators for β1{{\boldsymbol{\beta }}}_{1}and β2{{\boldsymbol{\beta }}}_{2}but not β0{\beta }_{0}, as this is obscured by the presence of E[u]E\left[u](to ensure that the residuals have mean zero). The second and third moments of the 2SLS residuals are then used to recover estimators of σv2{\sigma }_{v}^{2}and σu2{\sigma }_{u}^{2}. Once σ^u2{\widehat{\sigma }}_{u}^{2}is determined, the intercept can be corrected by adding 2πσˆu\sqrt{\frac{2}{\pi }}{\hat{\sigma }}_{u}. See Section 4.1 of [6] for details.This represents a simple avenue to account for endogeneity, and it does not require specifying how endogeneity enters the model, i.e., through correlation with vv, with uuor both. However, as with other corrected procedures based on calculations of the second and third (or even higher) moments of the residuals, from [66] and [89], if the initial 2SLS residuals have positive skew (instead of negative), then σu2{\sigma }_{u}^{2}cannot be identified and its estimator is 0. Furthermore, the standard errors from this approach need to be modified for the estimator of the intercept to account for the step-wise nature of the estimation.5.1A likelihood frameworkLikelihood-based alternatives allow for explicit modelling and estimation of the dependence structure that underlies endogeneity. This has recently been studied by Kutlu [54], Karakaplan and Kutlu [44], Tran and Tsionas [87,88], and Amsler et al. [6]. Our discussion here follows [6] as their derivation of the likelihood relies on a simple conditioning argument as opposed to the earlier work relying on the Cholesky decomposition or alternative approaches. While all approaches lead to a likelihood function, the conditioning idea of Amsler et al. [6] is simpler and more intuitive.Consider the stochastic frontier system: (5.1)yi=xiβ+εi{y}_{i}={{\boldsymbol{x}}}_{i}{\boldsymbol{\beta }}+{\varepsilon }_{i}(5.2)x2i=wiΓ+ηi,{{\boldsymbol{x}}}_{2i}={{\boldsymbol{w}}}_{i}{\boldsymbol{\Gamma }}+{{\boldsymbol{\eta }}}_{i},where xi=(x1i,x2i){{\boldsymbol{x}}}_{i}=\left({{\boldsymbol{x}}}_{1i},{{\boldsymbol{x}}}_{2i}), β=(β1,β2){\boldsymbol{\beta }}=\left({{\boldsymbol{\beta }}}_{1},{{\boldsymbol{\beta }}}_{2}), wi=(x1i,qi){{\boldsymbol{w}}}_{i}=\left({{\boldsymbol{x}}}_{1i},{{\boldsymbol{q}}}_{i})is the vector of instruments, ηi{{\boldsymbol{\eta }}}_{i}(different from ηi{\eta }_{i}in Sections 3.1–3.2) is uncorrelated with wi{{\boldsymbol{w}}}_{i}and endogeneity of x2i{{\boldsymbol{x}}}_{2i}arises through cov(εi,ηi)≠0cov\left({\varepsilon }_{i},{{\boldsymbol{\eta }}}_{i})\ne 0. Here simultaneity bias (and the resulting inconsistency) exists because ηi{{\boldsymbol{\eta }}}_{i}is correlated with either vi{v}_{i}, ui{u}_{i}, or both.We start with the case of dependence between ηi{{\boldsymbol{\eta }}}_{i}and vi{v}_{i}while ui{u}_{i}is independent of (ηi,vi)\left({{\boldsymbol{\eta }}}_{i},{v}_{i}). Assume that, conditional on wi{{\boldsymbol{w}}}_{i}, ψi=(vi,ηi)′∼N(0,Ω){\psi }_{i}={\left({v}_{i},{{\boldsymbol{\eta }}}_{i})}^{^{\prime} }\hspace{0.33em} \sim \hspace{0.33em}N\left({\boldsymbol{0}},\Omega ), where Ω=σv2ΣvηΣηvΣηηηi.\Omega =\left[\begin{array}{cc}{\sigma }_{v}^{2}& {\Sigma }_{v\eta }\\ {\Sigma }_{\eta v}& {\Sigma }_{\eta \eta }{{\boldsymbol{\eta }}}_{i}\end{array}\right].To derive the likelihood function, [6] condition on the instruments, w{\boldsymbol{w}}. Doing this yields f(y,x2∣w)=f(y∣x2,w)⋅f(x2∣w)f(y,{{\boldsymbol{x}}}_{2}| {\boldsymbol{w}})=f(y| {{\boldsymbol{x}}}_{2},{\boldsymbol{w}})\cdot f\left({{\boldsymbol{x}}}_{2}| {\boldsymbol{w}}). With the density in this form, the log-likelihood follows suite: lnℒ=lnℒ1+lnℒ2\mathrm{ln}{\mathcal{ {\mathcal L} }}=\mathrm{ln}{{\mathcal{ {\mathcal L} }}}_{1}+\mathrm{ln}{{\mathcal{ {\mathcal L} }}}_{2}, where lnℒ1\mathrm{ln}{{\mathcal{ {\mathcal L} }}}_{1}corresponds to f(y∣x2,w)f(y| {{\boldsymbol{x}}}_{2},{\boldsymbol{w}})and lnℒ2\mathrm{ln}{{\mathcal{ {\mathcal L} }}}_{2}corresponds to f(x2∣w)f\left({{\boldsymbol{x}}}_{2}| {\boldsymbol{w}}). These two components can be written as lnℒ1=−(n/2)lnσ2−12σ2∑i=1nε˜i2+∑i=1nln[Φ(−λcε˜i/σ)]lnℒ2=−(n/2)ln∣Σηη∣−0.5∑i=1nηi′Σηη−1ηi,\begin{array}{rcl}\mathrm{ln}{{\mathcal{ {\mathcal L} }}}_{1}& =& -\left(n\hspace{0.1em}\text{/}\hspace{0.1em}2)\mathrm{ln}{\sigma }^{2}-\frac{1}{2{\sigma }^{2}}\mathop{\displaystyle \sum }\limits_{i=1}^{n}{\tilde{\varepsilon }}_{i}^{2}+\mathop{\displaystyle \sum }\limits_{i=1}^{n}\mathrm{ln}{[}\Phi (-{\lambda }_{c}{\tilde{\varepsilon }}_{i}\hspace{0.1em}\text{/}\hspace{0.1em}\sigma )]\\ \mathrm{ln}{{\mathcal{ {\mathcal L} }}}_{2}& =& -\left(n\hspace{0.1em}\text{/}\hspace{0.1em}2)\mathrm{ln}| {\Sigma }_{\eta \eta }| -0.5\mathop{\displaystyle \sum }\limits_{i=1}^{n}{{\boldsymbol{\eta }}}_{i}^{^{\prime} }{\Sigma }_{\eta \eta }^{-1}{{\boldsymbol{\eta }}}_{i},\end{array}where ε˜i=yi−β0−xiβ−μci{\tilde{\varepsilon }}_{i}={y}_{i}-{\beta }_{0}-{{\boldsymbol{x}}}_{i}{\boldsymbol{\beta }}-{\mu }_{ci}, μci=ΣvηΣηη−1ηi{\mu }_{ci}={\Sigma }_{v\eta }{\Sigma }_{\eta \eta }^{-1}{{\boldsymbol{\eta }}}_{i}, σ2=σv2+σu2{\sigma }^{2}={\sigma }_{v}^{2}+{\sigma }_{u}^{2}, λc=σu/σc{\lambda }_{c}={\sigma }_{u}\hspace{0.1em}\text{/}\hspace{0.1em}{\sigma }_{c}, and σc2=σv2−ΣvηΣηη−1Σηv{\sigma }_{c}^{2}={\sigma }_{v}^{2}-{\Sigma }_{v\eta }{\Sigma }_{\eta \eta }^{-1}{\Sigma }_{\eta v}. The subtraction of μci{\mu }_{ci}in lnℒ1\mathrm{ln}{{\mathcal{ {\mathcal L} }}}_{1}is an endogeneity correction while it should be noted that lnℒ2\mathrm{ln}{{\mathcal{ {\mathcal L} }}}_{2}is nothing more than the standard likelihood function of a multivariate normal regression model (as in (5.2)). Estimates of the model parameters (β,σv2,σu2,Γ,Σvη)\left({\boldsymbol{\beta }},{\sigma }_{v}^{2},{\sigma }_{u}^{2},{\boldsymbol{\Gamma }},{\Sigma }_{v\eta })and Σηη{\Sigma }_{\eta \eta }can be obtained by maximizing lnℒ\mathrm{ln}{\mathcal{ {\mathcal L} }}.While direct estimation of the likelihood function is possible, a two-step approach is also available [54]. However, as pointed out by both Kutlu [54] and Amsler et al. [6], this two-step approach will have incorrect standard errors. Even though the two-step approach might be computationally simpler, it is, in general, different from full optimization of the likelihood function of Amsler et al. [6]. This is due to the fact that the two-step approach ignores the information provided by Γ{\boldsymbol{\Gamma }}and Σηη{\Sigma }_{\eta \eta }in lnℒ1\mathrm{ln}{{\mathcal{ {\mathcal L} }}}_{1}. In general, full optimization of the likelihood function is recommended as the standard errors (obtained in a usual manner from the inverse of the Fisher information matrix) are valid.Typically, the standard errors can be obtained either through use of the outer product of gradients (OPG) or direct estimation of the Hessian matrix of the log-likelihood function. Given the nascency of these methods it has yet to be determined which of these two methods is more reliable in practice, though in other settings both tend to work well. One caveat for promoting the use of the OPG is that since this only requires calculation of the first derivatives, it can be more stable (and more likely to be invertible) than calculation of the Hessian. Also note that in finite samples, the different estimators of covariance of MLE estimator can give different numerical estimates, even suggesting different implications on the inference (reject or do not reject the null hypothesis). So, for small samples, it is often advised to check all feasible estimates whenever there is suspicion of ambiguity in the conclusions (e.g., when a hypothesis is rejected only at say around the 10% of significance level).5.2A GMM frameworkAn insightful avenue to model dependence due to endogeneity in the SFM that differs from the traditional corrected methods or maximum likelihood stems from the GMM framework as proposed by Amsler et al. [6], who used the insights of Hansen et al. [35]. Similar to our discussion on the use of GMM in panel estimation, the idea is to use the first-order conditions for maximization of the likelihood function under exogeneity as a GMM problem: (5.3)E[ε22/σ2−1]=0,E{[}{\varepsilon }_{2}^{2}\hspace{0.1em}\text{/}\hspace{0.1em}{\sigma }^{2}-1]=0,(5.4)Eεiϕi1−Ψi=0,E\left[\frac{{\varepsilon }_{i}{\phi }_{i}}{1-{\Psi }_{i}}\right]=0,(5.5)Exiεi/σ+λxiϕi1−Φi=0,E\left[{{\boldsymbol{x}}}_{i}{\varepsilon }_{i}\hspace{0.1em}\text{/}\hspace{0.1em}\sigma +\lambda {{\boldsymbol{x}}}_{i}\frac{{\phi }_{i}}{1-{\Phi }_{i}}\right]=0,where ϕi=ϕ(λεiσ){\phi }_{i}=\phi \left(\frac{\lambda {\varepsilon }_{i}}{\sigma })and Φi=Φ(λεiσ){\Phi }_{i}=\Phi \left(\frac{\lambda {\varepsilon }_{i}}{\sigma }). Note that these expectations are taken over xi{{\boldsymbol{x}}}_{i}and yi{y}_{i}(and by default, εi{\varepsilon }_{i}) and solved for the parameters of the SFM.The key here is that these first-order conditions (one for σ2{\sigma }^{2}, one for λ\lambda , and the vector for β{\boldsymbol{\beta }}) are valid under exogeneity and this implies that the MLE is equivalent to the GMM estimator. Under endogeneity however, this relationship does not hold directly. But the seminal idea of Amsler et al. [6] is that the first-order conditions (5.3) and (5.4) are based on the distributional assumptions on vvand uu, not on the relationship of x{\boldsymbol{x}}with vvand/or uu. Thus, these moment conditions are valid whether x{\boldsymbol{x}}contains endogenous components or not. The only moment condition that needs to be adjusted is (5.5). In this case, the first-order condition needs to be taken with respect to w{\boldsymbol{w}}, the exogenous variable, not x{\boldsymbol{x}}. Doing so results in the following amended first-order condition: (5.6)Ewiεi/σ+λwiϕi1−Φi=0,E\left[{{\boldsymbol{w}}}_{i}{\varepsilon }_{i}\hspace{0.1em}\text{/}\hspace{0.1em}\sigma +\lambda {{\boldsymbol{w}}}_{i}\frac{{\phi }_{i}}{1-{\Phi }_{i}}\right]=0,where ϕi{\phi }_{i}and Φi{\Phi }_{i}are identical to those in (5.5). It is important to acknowledge that this moment condition is valid when εi{\varepsilon }_{i}and wi{{\boldsymbol{w}}}_{i}are independent. This is a more stringent requirement than the typical regression setup with E[εi∣wi]=0E\left[{\varepsilon }_{i}| {{\boldsymbol{w}}}_{i}]=0. As with the C2SLS approach, the source of endogeneity for x2{{\boldsymbol{x}}}_{2}does not need to be specified (through vvand/or uu).5.3An economic model of dependenceFrom an economic theory perspective, there are grounds to model dependence between ui{u}_{i}and ηi{{\boldsymbol{\eta }}}_{i}, not only between vi{v}_{i}and ηi{{\boldsymbol{\eta }}}_{i}. A system similar to (5.1)–(5.2) arises as a result of appending the SFM with the first-order conditions of cost minimizationIt is possible to treat a subset of x{\boldsymbol{x}}as endogenous, i.e., x=(x1,x2){\boldsymbol{x}}=\left({{\boldsymbol{x}}}_{1},{{\boldsymbol{x}}}_{2}), where x1{{\boldsymbol{x}}}_{1}is endogenous and x2{{\boldsymbol{x}}}_{2}is exogenous.([52], Chapter 8): (5.7)minP′x,s.t.y=m(x;β)+v−u,\min {{\boldsymbol{P}}}^{^{\prime} }{\boldsymbol{x}},\hspace{0.33em}\hspace{0.1em}\text{s.t.}\hspace{0.1em}\hspace{0.33em}y=m\left({\boldsymbol{x}};\hspace{0.33em}\hspace{0.33em}{\boldsymbol{\beta }})+v-u,for input prices P{\boldsymbol{P}}, the first-order conditions in this case are (5.8)mj(x;β)m1(x;β)=PjP1,j=2,…,J,\frac{{m}_{j}\left({\boldsymbol{x}};\hspace{0.33em}{\boldsymbol{\beta }})}{{m}_{1}\left({\boldsymbol{x}};\hspace{0.33em}{\boldsymbol{\beta }})}=\frac{{P}_{j}}{{P}_{1}},\hspace{1.0em}j=2,\ldots ,J,where mj(x;β){m}_{j}\left({\boldsymbol{x}};\hspace{0.33em}{\boldsymbol{\beta }})is the partial derivative of m(x;β)m\left({\boldsymbol{x}};\hspace{0.33em}{\boldsymbol{\beta }})with respect to xj{x}_{j}. These first-order conditions are exact, which usually does not arise in practice, rather, a stochastic term is added which is designed to capture allocative inefficiency. That is, our empirical first-order conditions are mj(x;β)m1(x;β)=PjP1eηj\frac{{m}_{j}\left({\boldsymbol{x}};\hspace{0.33em}{\boldsymbol{\beta }})}{{m}_{1}\left({\boldsymbol{x}};\hspace{0.33em}{\boldsymbol{\beta }})}=\frac{{P}_{j}}{{P}_{1}}{e}^{{\eta }_{j}}for j=2,…,Jj=2,\ldots ,J, where ηj{\eta }_{j}captures allocative inefficiency for the jth input relative to input one (the choice of input to compare is without loss of generality).The idea behind allocative inefficiency is that firms could be fully technically efficient, and still have room for improvement due to over or under use of inputs, relative to another input, given the price ratio. On the other hand, firms can be technically inefficiency because of allocative inefficiency and vice versa so independence between uuand η{\boldsymbol{\eta }}is hard to justify. Additionally, if firms are cost minimizers and one estimates a production function, the inputs will be endogenous as these are choice variables to the firm. In this case, input prices can serve as instruments.Combining the SFM, under the Cobb–Douglas production function, with the information in the J−1J-1conditions in (5.8) with allocative inefficiency built in, results in the following system: (5.9)yi=xiβ+εi{y}_{i}={{\boldsymbol{x}}}_{i}{\boldsymbol{\beta }}+{\varepsilon }_{i}(5.10)xi1−xij=ln(β1)−ln(βj)+pij−pi1+ηij,j=2,…,J,{x}_{i1}-{x}_{ij}=\mathrm{ln}\left({\beta }_{1})-\mathrm{ln}\left({\beta }_{j})+{p}_{ij}-{p}_{i1}+{\eta }_{ij},\hspace{1.0em}j=2,\ldots ,J,where xij{x}_{ij}is the log of input jjof firm ii, pj{p}_{j}is the log of input jjprice, βj{\beta }_{j}is the coefficient on input jjin (5.9), and ηi=(ηi2,…,ηiJ){{\boldsymbol{\eta }}}_{i}=\left({\eta }_{i2},\hspace{-0.18em}\ldots ,{\eta }_{iJ})are the allocative inefficiencies for J−1J-1inputs with respect to input one. See Schmidt and Lovell [74,75] for details.5.4A copula-based approachAmsler et al. [6] used copulas to obtain a joint distribution for u,vu,v, and η{\boldsymbol{\eta }}, whereas Amsler et al. [8] developed a new copula family for uuand η{\boldsymbol{\eta }}with properties that reflect the nature of allocative (symmetric) and technical (one-sided) inefficiencies. Here we provide the derivation of a copula-based likelihood for the most general case that allows us to model dependence between all the components of (u,v,η)\left(u,v,{\boldsymbol{\eta }}).We keep the Half Normal marginal for uu, Normal marginals for the elements of ψ=(v,η′)′\psi =\left(v,{\boldsymbol{\eta }}^{\prime} )^{\prime} as before, and assume a copula density c(⋅,…,⋅)c\left(\cdot ,\ldots ,\cdot ). Amsler et al. [6] used the Gaussian copula, which implies that the joint distribution of ψ\psi is Normal but this is largely done for convenience. This gives the joint density of (u,v,η)\left(u,v,{\boldsymbol{\eta }}): fu,v,η(u,v,η)=c(Fu(u),Fv(v),Fη(η2),…,Fη(ηJ))fu(u)fv(v)fη(η2)…fη(ηJ).{f}_{u,v,{\boldsymbol{\eta }}}\left(u,v,{\boldsymbol{\eta }})=c({F}_{u}\left(u),{F}_{v}\left(v),{F}_{\eta }\left({\eta }_{2}),\ldots ,{F}_{\eta }\left({\eta }_{J})){f}_{u}\left(u){f}_{v}\left(v){f}_{\eta }\left({\eta }_{2})\ldots {f}_{\eta }\left({\eta }_{J}).However, we need the joint density of ε\varepsilon and η{\boldsymbol{\eta }}in order to form a sample log-likelihood. This density can be obtained by integrating uuout of fu,v,η(u,v,η){f}_{u,v,{\boldsymbol{\eta }}}\left(u,v,{\boldsymbol{\eta }})as follows: (5.11)fε,η(ε,η)=∫0∞fu,v,η(u,ε+u,η)du=∫0∞fu,v,η(u,ε+u,η)fu(u)fu(u)du=Eufu,v,η(u,ε+u,η)fu(u).{f}_{\varepsilon ,\eta }\left(\varepsilon ,{\boldsymbol{\eta }})=\underset{0}{\overset{\infty }{\int }}{f}_{u,v,{\boldsymbol{\eta }}}\left(u,\varepsilon +u,{\boldsymbol{\eta }}){\rm{d}}u=\underset{0}{\overset{\infty }{\int }}\left[\frac{{f}_{u,v,{\boldsymbol{\eta }}}\left(u,\varepsilon +u,{\boldsymbol{\eta }})}{{f}_{u}\left(u)}\right]{f}_{u}\left(u){\rm{d}}u={{\mathbb{E}}}_{u}\left[\frac{{f}_{u,v,{\boldsymbol{\eta }}}\left(u,\varepsilon +u,{\boldsymbol{\eta }})}{{f}_{u}\left(u)}\right].Again, we can use simulation techniques to evaluate this density. Specifically, given SSdraws of us,s=1,…,S{u}_{s},s=1,\ldots ,S, the direct simulator can be written as fˆε,η(ε,η)=1S∑s=1Sfu,v,η(us,ε+us,η)fu(us),{\hat{f}}_{\varepsilon ,\eta }\left(\varepsilon ,{\boldsymbol{\eta }})=\frac{1}{S}\mathop{\sum }\limits_{s=1}^{S}\left[\frac{{f}_{u,v,{\boldsymbol{\eta }}}\left({u}_{s},\varepsilon +{u}_{s},{\boldsymbol{\eta }})}{{f}_{u}\left({u}_{s})}\right],and this leads to MSLE using the log-likelihood (5.12)lnℒs=∑i=1nlnfˆε,η(εi,ηi),\mathrm{ln}{{\mathcal{ {\mathcal L} }}}^{s}=\mathop{\sum }\limits_{i=1}^{n}\mathrm{ln}{\hat{f}}_{\varepsilon ,\eta }\left({\varepsilon }_{i},{{\boldsymbol{\eta }}}_{i}),where εi=yi−xi′β{\varepsilon }_{i}={y}_{i}-{{\boldsymbol{x}}}_{i}^{^{\prime} }{\boldsymbol{\beta }}and ηi=x2i−wiΓ{{\boldsymbol{\eta }}}_{i}={{\boldsymbol{x}}}_{2i}-{{\boldsymbol{w}}}_{i}{\boldsymbol{\Gamma }}as follows from the system in (5.1)–(5.2).The MSLE will produce estimates of all the parameters of the model, that is, β,Γ,σu2,σv2{\boldsymbol{\beta }},{\boldsymbol{\Gamma }},{\sigma }_{u}^{2},{\sigma }_{v}^{2}, variances of ηj{\eta }_{j}and whatever copula parameters appear in cc. This permits modelling and testing the validity of independence assumptions between all error terms in the system including the assumption of exogeneity.5.5Dependence on determinants of inefficiencyTo conclude this section, we consider the extension to a setting when inefficiency depends on covariates and some of these determinants of inefficiency may be endogenous [7,55]. These models can be estimated using traditional instrumental variable methods. However, given that the determinants of inefficiency enter the model nonlinearly, nonlinear methods are required.Amsler et al. [7] considered the model (5.13)yi=xi′β+vi−ui=xi′β+vi−ui∗ezi′δ,{y}_{i}={{\boldsymbol{x}}}_{i}^{^{\prime} }{\boldsymbol{\beta }}+{v}_{i}-{u}_{i}={{\boldsymbol{x}}}_{i}^{^{\prime} }{\boldsymbol{\beta }}+{v}_{i}-{u}_{i}^{\ast }{e}^{{{\boldsymbol{z}}}_{i}^{^{\prime} }{\boldsymbol{\delta }}},where u∗{u}^{\ast }is the baseline inefficiency and uuhas the property that the scale of its distribution (relative to the distribution of u∗{u}^{\ast }) changes depending on the determinants z{\boldsymbol{z}}(the so-called scaling property). The covariates xi{{\boldsymbol{x}}}_{i}and zi{{\boldsymbol{z}}}_{i}are partitioned as xi=x1ix2i,zi=z1iz2i,{{\boldsymbol{x}}}_{i}=\left[\begin{array}{c}{{\boldsymbol{x}}}_{1i}\\ {{\boldsymbol{x}}}_{2i}\end{array}\right],\hspace{1.0em}{{\boldsymbol{z}}}_{i}=\left[\begin{array}{c}{{\boldsymbol{z}}}_{1i}\\ {{\boldsymbol{z}}}_{2i}\end{array}\right],where x1i{{\boldsymbol{x}}}_{1i}and z1i{{\boldsymbol{z}}}_{1i}are exogenous and x2i{{\boldsymbol{x}}}_{2i}and z2i{{\boldsymbol{z}}}_{2i}are endogenous. The set of instruments used to combat endogeneity are defined as wi=x1iz1iqi,{{\boldsymbol{w}}}_{i}=\left[\begin{array}{c}{{\boldsymbol{x}}}_{1i}\\ {{\boldsymbol{z}}}_{1i}\\ {{\boldsymbol{q}}}_{i}\end{array}\right],where qi{{\boldsymbol{q}}}_{i}are the traditional outside instruments. Identification of all the parameters requires that the dimension of q{\boldsymbol{q}}be at least as large as the dimension of x2{{\boldsymbol{x}}}_{2}plus the dimension of z2{{\boldsymbol{z}}}_{2}(the rank condition).In the model of Amsler et al. [7], endogeneity arises through dependence between a variable in the model (x2{{\boldsymbol{x}}}_{2}and/or z2{{\boldsymbol{z}}}_{2}) and noise, vv. That is, both x{\boldsymbol{x}}and z{\boldsymbol{z}}are assumed to be independent of baseline inefficiency u∗{u}^{\ast }. Given that E[ui]E\left[{u}_{i}]is not constant, the COLS approach to deal with endogeneity proposed by Amsler et al. [6] cannot be used here. To develop an appropriate estimator, add and subtract the mean of inefficiency to produce a composed error term that has mean 0, (5.14)yi=xi′β−μ∗ezi′δ+vi−(ui∗−μ∗)ezi′δ.{y}_{i}={{\boldsymbol{x}}}_{i}^{^{\prime} }{\boldsymbol{\beta }}-{\mu }^{\ast }{e}^{{{\boldsymbol{z}}}_{i}^{^{\prime} }{\boldsymbol{\delta }}}+{v}_{i}-\left({u}_{i}^{\ast }-{\mu }^{\ast }){e}^{{{\boldsymbol{z}}}_{i}^{^{\prime} }{\boldsymbol{\delta }}}.Proper estimation through instrumental variables requires that the following moment condition holds (5.15)E[vi−(ui∗−μ∗)ezi′δ∣wi]=0.{\mathbb{E}}{[}{v}_{i}-\left({u}_{i}^{\ast }-{\mu }^{\ast }){e}^{{{\boldsymbol{z}}}_{i}^{^{\prime} }{\boldsymbol{\delta }}}| {{\boldsymbol{w}}}_{i}]=0.The nonlinearity of these moment conditions would necessitate use of nonlinear two-stage least squares (NL2SLS) [4].Latruffe et al. [55] have a similar setup to Amsler et al. [7], using the model in (5.13), but develop a four-step estimator for the parameters; additionally, only x2{{\boldsymbol{x}}}_{2}is treated as endogenous. Latruffe et al.’s [55] approach is based on [23] using the construction of efficient moment conditions. The vector of instruments proposed in [55] is defined as (5.16)wi(γ,δ)=x1iqi′γziezi′δ,{{\boldsymbol{w}}}_{i}\left({\boldsymbol{\gamma }},{\boldsymbol{\delta }})=\left[\begin{array}{c}{{\boldsymbol{x}}}_{1i}\\ {{\boldsymbol{q}}}_{i}^{^{\prime} }{\boldsymbol{\gamma }}\\ {{\boldsymbol{z}}}_{i}{e}^{{{\boldsymbol{z}}}_{i}^{^{\prime} }{\boldsymbol{\delta }}}\end{array}\right],where qi′γ{{\boldsymbol{q}}}_{i}^{^{\prime} }{\boldsymbol{\gamma }}captures the linear projection of x2{{\boldsymbol{x}}}_{2}on the external instruments q{\boldsymbol{q}}. The four-stage estimator is defined as Step 1Regress x2{{\boldsymbol{x}}}_{2}on q{\boldsymbol{q}}to estimate γ{\boldsymbol{\gamma }}. Denote the OLS estimator of γ{\boldsymbol{\gamma }}as γ^\widehat{{\boldsymbol{\gamma }}}.Step 2Use NLS to estimate the SFM in (5.13). Denote the NLS estimates of (β,δ)\left({\boldsymbol{\beta }},{\boldsymbol{\delta }})as (β¨,δ¨)\left(\ddot{{\boldsymbol{\beta }}},\ddot{{\boldsymbol{\delta }}}). Use the NLS estimate of δ{\boldsymbol{\delta }}and the OLS estimate of γ{\boldsymbol{\gamma }}in Step 1 to construct the instruments wi(γ^,δ¨){{\boldsymbol{w}}}_{i}\left(\widehat{{\boldsymbol{\gamma }}},\ddot{{\boldsymbol{\delta }}}).Step 3Using the estimated instrument vector wi(γ^,δ¨){{\boldsymbol{w}}}_{i}\left(\widehat{{\boldsymbol{\gamma }}},\ddot{{\boldsymbol{\delta }}}), calculate the NL2SLS estimator of (β,δ)\left({\boldsymbol{\beta }},{\boldsymbol{\delta }})as (β˜,δ˜)\left(\widetilde{{\boldsymbol{\beta }}},\widetilde{{\boldsymbol{\delta }}}). Use the NL2SLS estimate of δ{\boldsymbol{\delta }}and the OLS estimate of γ{\boldsymbol{\gamma }}in Step 1 to construct the instruments wi(γ^,δ˜){{\boldsymbol{w}}}_{i}\left(\widehat{{\boldsymbol{\gamma }}},\widetilde{{\boldsymbol{\delta }}}).Step 4Using the estimated instrument vector wi(γ^,δ˜){{\boldsymbol{w}}}_{i}\left(\widehat{{\boldsymbol{\gamma }}},\widetilde{{\boldsymbol{\delta }}}), calculate the NL2SLS estimator of (β,δ)\left({\boldsymbol{\beta }},{\boldsymbol{\delta }})as (β^,δ^)\left(\widehat{{\boldsymbol{\beta }}},\widehat{{\boldsymbol{\delta }}}).This multi-step estimator is necessary in the context of efficient moments because the actual set of instruments is not used directly, rather wi(γ,δ){{\boldsymbol{w}}}_{i}\left({\boldsymbol{\gamma }},{\boldsymbol{\delta }})is used, and this instrument vector requires estimates of γ{\boldsymbol{\gamma }}and δ{\boldsymbol{\delta }}. The first two steps of the algorithm are designed to construct estimates of these two unknown parameter vectors. The third step then is designed to construct a consistent estimator of wi(γ,δ){{\boldsymbol{w}}}_{i}\left({\boldsymbol{\gamma }},{\boldsymbol{\delta }}), which is not done in Step 2 given that the endogeneity of x2{{\boldsymbol{x}}}_{2}is ignored (note that NLS is used as opposed to NL2SLS). The iteration from Step 2 to Step 3 does produce a consistent estimator of wi(γ,δ){{\boldsymbol{w}}}_{i}\left({\boldsymbol{\gamma }},{\boldsymbol{\delta }}), and as such, Step 4 produces consistent estimators for β{\boldsymbol{\beta }}and δ{\boldsymbol{\delta }}. While Latruffe et al. [55] proposed a set of efficient moment conditions to handle endogeneity, the model of Amsler et al. [7] is more general because it can handle endogeneity in the determinants of inefficiency as well. Finally, the presence of z{\boldsymbol{z}}is attractive since this allows the researcher to dispense with distributional assumptions on vvand uu.6Estimation of individual inefficiency using dependence informationOnce the parameters of the SFM have been estimated, estimates of firm level productivity and efficiency can be recovered. Observation-specific estimates of inefficiency are one of the main benefits of the SFM relative to neoclassical models of production. Firms can be ranked according to estimated efficiency; the identity of under-performing firms as well as those who are deemed best practice can also be gleaned from the estimated SFM. All of this information is useful in helping to design more efficient public policy or subsidy programs aimed at improving the market, for example, insulating consumers from the poor performance of heavily inefficient firms.As a concrete illustration, consider firms operating electricity distribution networks that typically possess a natural local monopoly given that the construction of competing networks over the same terrain is prohibitively expensive.The SFA literature contains a fairly rich set of examples for the estimation and use of efficiency estimates in different fields of research. For example, in the context of electricity providers, see [36,45,53]; for banking efficiency, see [22] and references cited therein; for the analysis of the efficiency of national health care systems, see [30] and the review [40]; for analyzing efficiency in agriculture, see [13,14,20,58], to mention just a few.It is not uncommon for national governments to establish regulatory agencies which monitor the provision of electricity to ensure that abuse of the inherent monopoly power is not occurring. Regulators face the task of determining an acceptable price for the provision of electricity while having to balance the heterogeneity that exists across the firms (in terms of size of the firm and length of the network). Firms which are inefficient may charge too high a price to recoup a profit, but at the expense of operating below capacity. However, given production and distribution shocks, not all departures from the frontier represent inefficiency. Thus, precise measures designed to account for noise are required to parse information from εi{\varepsilon }_{i}regarding ui{u}_{i}.Alternatively, further investigation could reveal what it is that makes these establishments attain such high levels of performance. This could then be used to identify appropriate government policy implications and responses or identify processes and/or management practices that should be spread (or encouraged) across the less efficient, but otherwise similar, units. This is the essence of the determinants of inefficiency approach discussed in previous section. More directly, efficiency rankings are used in regulated industries such that regulators can set tougher future cost reduction targets for the more inefficient companies, in order to ensure that customers do not pay for the inefficiency of firms.The only direct estimate coming from the Normal Half Normal SFM is σ^u2{\widehat{\sigma }}_{u}^{2}. This provides context regarding the shape of the Half Normal distribution on ui{u}_{i}and the industry average efficiency E[u]{\mathbb{E}}\left[u], but not on the absolute level of inefficiency for a given firm. If we are only concerned with the average level of technical efficiency for the population, then this is all the information that is needed. Yet, if we want to know about a specific firm, then something else is required. The main approach to estimating firm-level inefficiency is the conditional mean estimator [42], commonly known as the JLMS estimator. Their idea was to calculate the expected value of ui{u}_{i}conditional on the realization of composed error of the model, εi≡vi−ui{\varepsilon }_{i}\equiv {v}_{i}-{u}_{i}, i.e., E[ui∣εi]{\mathbb{E}}\left[{u}_{i}| {\varepsilon }_{i}].JLMS [42] also suggested an alternative estimator based on the conditional mode.This conditional mean of ui{u}_{i}given εi{\varepsilon }_{i}gives a point prediction of ui{u}_{i}. The composed error contains individual-specific information, and the conditional expectation is one measure of firm-specific inefficiency.JLMS [42] shows that for the Normal Half Normal specification of the SFM, the conditional density function of ui{u}_{i}given εi{\varepsilon }_{i}, f(ui∣εi)f\left({u}_{i}| {\varepsilon }_{i}), is N+(μ∗i,σ∗2){N}_{+}\left({\mu }_{\ast i},{\sigma }_{\ast }^{2}), where (6.1)μ∗i=−εiσu2σ2{\mu }_{\ast i}=\frac{-{\varepsilon }_{i}{\sigma }_{u}^{2}}{{\sigma }^{2}}and (6.2)σ∗2=σv2σu2σ2.{\sigma }_{\ast }^{2}=\frac{{\sigma }_{v}^{2}{\sigma }_{u}^{2}}{{\sigma }^{2}}.Given results on the mean of a Truncated Normal density it follows that (6.3)E[ui∣εi]=μ∗i+σ∗ϕ(μ∗iσ∗)Φμ∗iσ∗.{\mathbb{E}}\left[{u}_{i}| {\varepsilon }_{i}]={\mu }_{\ast i}+\frac{{\sigma }_{\ast }\phi \left(\frac{{\mu }_{\ast i}}{{\sigma }_{\ast }})}{\Phi \left(\frac{{\mu }_{\ast i}}{{\sigma }_{\ast }}\right)}.The individual estimates are then obtained by replacing the true parameters in (6.3) with MLE (or MSMLE or GMM) estimates from the SFM.Another measure of interest is the Afriat-type level of technical efficiency, defined as e−ui=Yi/em(xi)evi∈[0,1]{e}^{-{u}_{i}}={Y}_{i}\hspace{0.1em}\text{/}\hspace{0.1em}{e}^{m\left({{\boldsymbol{x}}}_{i})}{e}^{{v}_{i}}\in \left[0,1]. This is useful in cases where output is measured in logarithmic form. Furthermore, technical efficiency is bounded between 0 and 1, making it somewhat easier to interpret relative to a raw inefficiency score. Since e−ui{e}^{-{u}_{i}}is not directly observable, the idea of JLMS [42] can be deployed here, and E[e−ui∣εi]{\mathbb{E}}{[}{e}^{-{u}_{i}}| {\varepsilon }_{i}]can be calculated [12,56]. For the Normal Half Normal model, we have (6.4)E[e−ui∣εi]=e−μ∗i+12σ∗2Φμ∗iσ∗−σ∗Φμ∗iσ∗,{\mathbb{E}}{[}{e}^{-{u}_{i}}| {\varepsilon }_{i}]={e}^{\left(-{\mu }_{\ast i}+\tfrac{1}{2}{\sigma }_{\ast }^{2}\right)}\frac{\Phi \left(\frac{{\mu }_{\ast i}}{{\sigma }_{\ast }}-{\sigma }_{\ast }\right)}{\Phi \left(\frac{{\mu }_{\ast i}}{{\sigma }_{\ast }}\right)},where μ∗i{\mu }_{\ast i}and σ∗{\sigma }_{\ast }were defined in (6.1) and (6.2), respectively. Technical efficiency estimates are obtained by replacing the true parameters in (6.4) with MLE estimates from the SFM. When ranking efficiency scores, one should use estimates of 1−E[ui∣εi]1-{\mathbb{E}}{[}{u}_{i}| {\varepsilon }_{i}], which is the first-order approximation of (6.4). Similar expressions for the JMLS [42] and Battese and Coelli [12] efficiency scores can be derived under the assumption that uuis Exponential ([49], p. 82), Truncated Normal ([49], p. 86), and Gamma ([49], p. 89); see also [52].An interesting and important finding from [5] and [6] is that when we allow for dependence of the kinds described in Sections 4 and 5, we can potentially improve estimation of inefficiency through the JLMS estimator. We focus on the case of endogeneity (Section 5) but the case of dependence over ttin panels (Section 4) is similar. The traditional predictor [42] is E(ui∣εi){\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i}). However, more information is available when dependence is allowed, namely via ηi{{\boldsymbol{\eta }}}_{i}. This calls for a modified JLMS estimator, E(ui∣εi,ηi){\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i},{{\boldsymbol{\eta }}}_{i}). Note that even though it is assumed that ui{u}_{i}is independent from ηi{{\boldsymbol{\eta }}}_{i}, similar to [6], because ηi{{\boldsymbol{\eta }}}_{i}is correlated with vi{v}_{i}, there is information that can be used to help predict ui{u}_{i}even after conditioning on εi{\varepsilon }_{i}.Amsler et al. [6] showed that ηi{{\boldsymbol{\eta }}}_{i}is independent of (ui,ε˜i)\left({u}_{i},{\tilde{\varepsilon }}_{i}): E(ui∣εi,ηi)=E(ui∣ε˜i,ηi)=E(ui∣ε˜i),{\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i},{{\boldsymbol{\eta }}}_{i})={\mathbb{E}}\left({u}_{i}| {\tilde{\varepsilon }}_{i},{{\boldsymbol{\eta }}}_{i})={\mathbb{E}}\left({u}_{i}| {\tilde{\varepsilon }}_{i}),and that the distribution of ui{u}_{i}conditional on ε˜i=yi−xi′β−μci{\tilde{\varepsilon }}_{i}={y}_{i}-{{\boldsymbol{x}}}_{i}^{^{\prime} }{\boldsymbol{\beta }}-{\mu }_{ci}is N+(μ∗i,σ∗2){N}_{+}\left({\mu }_{\ast i},{\sigma }_{\ast }^{2})with μ∗i=−σu2ε˜i/σ2{\mu }_{\ast i}=-{\sigma }_{u}^{2}{\tilde{\varepsilon }}_{i}\hspace{0.1em}\text{/}\hspace{0.1em}{\sigma }^{2}and σ∗2=σu2σc2/σ2{\sigma }_{\ast }^{2}={\sigma }_{u}^{2}{\sigma }_{c}^{2}\hspace{0.1em}\text{/}\hspace{0.1em}{\sigma }^{2}, which is identical to the original JLMS estimator, except that σv2{\sigma }_{v}^{2}is replaced with σc2{\sigma }_{c}^{2}and ε˜i{\tilde{\varepsilon }}_{i}taking the place of εi{\varepsilon }_{i}. The modified JLMS estimator in the presence of endogeneity becomes E(ui∣εi,ηi)=σ∗ϕ(ξi)1−Φ(ξi)−ξi{\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i},{{\boldsymbol{\eta }}}_{i})={\sigma }_{\ast }\left(\frac{\phi \left({\xi }_{i})}{1-\Phi \left({\xi }_{i})}-{\xi }_{i}\right)with ξi=λε˜i/σ{\xi }_{i}=\lambda {\tilde{\varepsilon }}_{i}\hspace{0.1em}\text{/}\hspace{0.1em}\sigma . Note that E(ui∣εi,ηi){\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i},{{\boldsymbol{\eta }}}_{i})is a better predictor than E(ui∣εi)E\left({u}_{i}| {\varepsilon }_{i})because σc2<σv2{\sigma }_{c}^{2}\lt {\sigma }_{v}^{2}. The improvement in prediction follows from the textbook identity for variances, where for any random vector (X,Z)\left(X,Z), where XXand ZZare random sub-vectors, we have V(X)=V[E(X∣Z)]︸Explained+E(V[X∣Z])︸Unexplained.{\mathbb{V}}\left(X)=\mathop{\underbrace{{\mathbb{V}}\left[{\mathbb{E}}\left(X| Z)]}}\limits_{{\rm{Explained}}}+\mathop{\underbrace{{\mathbb{E}}\left({\mathbb{V}}\left[X| Z])}}\limits_{{\rm{Unexplained}}}.In this case, by conditioning on both εi{\varepsilon }_{i}and ηi{{\boldsymbol{\eta }}}_{i}the conditioning set is larger than simply conditioning on εi{\varepsilon }_{i}and so it must hold that the unexplained portion of E(ui∣εi,ηi){\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i},{{\boldsymbol{\eta }}}_{i})is smaller than that of E(ui∣εi){\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i}). It then holds that there is less variation in E(ui∣εi,ηi){\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i},{{\boldsymbol{\eta }}}_{i})as a predictor than E(ui∣εi){\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i}), which is a good thing. A similar result is obtained by [5] in a panel setting, where the new estimator E(uit∣εi1,…,εiT){\mathbb{E}}\left({u}_{it}| {\varepsilon }_{i1},\hspace{-0.18em}\ldots ,{\varepsilon }_{iT})dominates the traditional estimator E(uit∣εit){\mathbb{E}}\left({u}_{it}| {\varepsilon }_{it})due to dependence over tt. While it is not obvious at first glance, one benefit of allowing for a richer dependence structure in SFM is that researchers may be able to more accurately predict firm-level inefficiency, though it comes at the expense of having to deal with a more complex model. This improvement in prediction may also be accompanied by narrower prediction intervals; however, this is not known as Amsler et al. [6] did not study the prediction intervals.A prediction interval for E[ui∣εi]{\mathbb{E}}\left[{u}_{i}| {\varepsilon }_{i}]was first derived by Taube [85] and also appeared in [39], [41], and [17] (see a discussion of this in [77]). The prediction interval is based on f(ui∣εi)f\left({u}_{i}| {\varepsilon }_{i}). The lower (Li{L}_{i}) and upper (Ui{U}_{i}) bounds for a (1−α)100\left(1-\alpha )100% prediction interval are (6.5)Li=μ∗i+Φ−11−1−α21−Φ−μ∗iσ∗σ∗,{L}_{i}={\mu }_{\ast i}+{\Phi }^{-1}\left(1-\left(1-\frac{\alpha }{2}\right)\left[1-\Phi \left(-\frac{{\mu }_{\ast i}}{{\sigma }_{\ast }}\right)\right]\right){\sigma }_{\ast },(6.6)Ui=μ∗i+Φ−11−α21−Φ−μ∗iσ∗σ∗,{U}_{i}={\mu }_{\ast i}+{\Phi }^{-1}\left(1-\frac{\alpha }{2}\left[1-\Phi \left(-\frac{{\mu }_{\ast i}}{{\sigma }_{\ast }}\right)\right]\right){\sigma }_{\ast },where μ∗i{\mu }_{\ast i}and σ∗{\sigma }_{\ast }are defined in (6.1) and (6.2), respectively, and replacing them with their MLE estimates will give estimated prediction intervals for E[ui∣εi]{\mathbb{E}}\left[{u}_{i}| {\varepsilon }_{i}]. Using the above result from [6] that ui{u}_{i}conditional on ε˜i{\tilde{\varepsilon }}_{i}is N+(μ∗,σ∗2){N}_{+}\left({\mu }_{\ast },{\sigma }_{\ast }^{2})with μ∗i=−σu2ε˜i/σ2{\mu }_{\ast i}=-{\sigma }_{u}^{2}{\tilde{\varepsilon }}_{i}\hspace{0.1em}\text{/}\hspace{0.1em}{\sigma }^{2}and σ∗2=σu2σc2/σ2{\sigma }_{\ast }^{2}={\sigma }_{u}^{2}{\sigma }_{c}^{2}\hspace{0.1em}\text{/}\hspace{0.1em}{\sigma }^{2}, one can easily obtain analogous prediction intervals for E(ui∣εi,ηi){\mathbb{E}}\left({u}_{i}| {\varepsilon }_{i},{{\boldsymbol{\eta }}}_{i}). These new intervals will potentially be narrower.7ConclusionIn this article, we surveyed the workhorse SFM and various recent extensions that permit a wide range of dependence modelling within SFM. We discussed dependencies that arise in panels and that underpin endogeneity in production systems. Copulas play a key role in these settings because they naturally permit the construction of a likelihood, often based on simulated draws, while preserving the desired Half Normal distribution of technical inefficiency.While this is not a survey of SFA applications, it is worth pointing out that SFA has become a popular tool to isolate inefficiency in the behavior of economic agents, e.g., banks and non-financial firms. As just a couple of examples, Koetter et al. [46] showed that ignoring inefficiency, i.e., assuming all banks in the US economy are on the frontier, leads to substantial downward bias in the estimates of the banks’ market power, as measured by the Lerner index of price mark-ups; Henry et al. [38] applied SFA to obtain a correct estimate of TFP at the country level in a panel of 57 developing economies over the 1970–1998 period. The list of applications is long and growing.This survey’s goal was to provide a comprehensive overview of the state of the art in methods of dependence modeling (either directly with noise, through endogeneity, through sample selection or across time), but it is clear that many important issues still remain and this is an active area of research for the field. We remain excited about the potential developments in this area and the insights that they can shed in applications. At present there is little cognizance into the direction of impact of unmodelled, or misspecified, dependence on efficiency scores.

Journal

Dependence Modelingde Gruyter

Published: Jan 1, 2022

Keywords: efficiency; productivity; panel data; endogeneity; determinants of inefficiency; dependence; copulas; 62J02; 62H05; 91B38

There are no references for this article.