Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Disentangling the impact of mean reversion in estimating policy response with dynamic panels

Disentangling the impact of mean reversion in estimating policy response with dynamic panels 1IntroductionThe phenomenon of regression toward the mean (mean reversion) is observed in case of longitudinal observations of a variable, which is susceptible to random variations. In this case, exceptionally low or high values of the variable in initial measurement tend to be closer to the center of the distribution in subsequent measurements [24]. In short, mean reversion is an inherent part of the stationary process and implies the return of the process to its mean value [25,31].For instance, mean reversion may manifest itself in the intersection between the mean and the trajectory for the process as it varies over time [25] or in the return of the autoregressive process to the long-term mean [31].Historically, the appearance of the term “mean reversion” is associated with the seminal works by Galton, who discovered an inverse relationship between the height of parents and children [30] and hence framed the term “regression” as the tendency of the dependent variable to revert to the mean value. Recent examples of the analysis of processes which exhibit mean reversion in various fields of economics include the current account of countries [81] and their productivity [29], profitability of banks [48], housing prices [31], tax avoidance by companies [3], blood pressure and cholesterol level of patients [5], and birthweight of children in successive pregnancies of the same mother [79].Mean reversion contaminates judgment about the time profile of the dependent variable in case of groupwise estimations. If the value of the dependent variable for a certain observation is lower than average in period tt, it is likely to be higher in period t+1t+1than in period tt. Similarly, observations with high values in period tttend to be followed by lower values in period t+1t+1. Accordingly, mean reversion leads to an increase in the expected value of the dependent variable in the group of observations belonging to the lower percentiles of yy, and to a decrease in the expected value – in the higher percentiles of yy. Therefore, the impact of mean reversion needs to be excluded in econometric analysis, which evaluates the longitudinal impact of policy interventions on groups of economic agents.The purpose of this article is to model multivariate dependence of the variable of policy interest by disentangling the two sources of intertermporal dependence: one from the effect of the policy of interest per se and the other from mean reversion. Specifically, we show a way of separating the effect of mean reversion from the policy effect when evaluating the impact of an incentive scheme with intertemporal stimuli and intertemporal variation of the parameter of the reform intensity.Although mean reversion is inherent to any stationary process, it is most often noted in the analysis of dynamic panels. The dynamic panel data model is a generalization of the panel data fixed effect regression when the dynamic structure of the process needs to be introduced. In our article we use the example of Medicare’s incentive contract applied to the observed quality of services, which has to be described as an autoregressive process. Hence, in evaluating the effect of this incentive scheme on hospital quality, we follow a handful of articles which deal with mean reversion in dynamic panels [25,31,48,81].We focus on the pay-for-performance mechanism – an innovative method of remuneration, which originally emerged in corporate finance and managerial economics, and has since been much used in the public sector (civil service, education, social work, and healthcare). In order to quantify the unobserved quality of work, the incentive scheme computes the performance level using imprecisely measured proxies for various dimensions of quality. Next, the regulator imposes an incentive contract, which relates remuneration to performance, so that agents with higher performance in the current period receive higher payment for their services in future periods than agents with lower performance. The reform intensity parameter in this context is the share of the agent’s income, which is “at risk” under the incentive contract.Assuming a direct association between demand for services and quality of work, higher payment to agents with high performance incentivizes agents to improve their level of quality in order to raise demand for their services. In such a setting, if the unobserved quality could be measured precisely, each agent would have sustained their fixed level of performance.However, performance is in fact a noisy signal. First, there is an imprecision in measuring performance, since it is only a proxy for true quality. Second, in case of healthcare, the unobserved true quality of services is itself subject to a random variation, due, for instance, to patient non-compliance with medical treatment [62]. So it is plausible to assume that performance contains a random error. Hence, performance may unexpectedly be valued as having improved in period ttdue to this random error, and then the payment in period t+1t+1(which is a function of current performance) will increase. Accordingly, the incentive to improve quality in the future period becomes stronger for agents with higher performance. So the performance of these agents in period t+1t+1will be on average higher than their performance in period tt. The reverse argument applies in case of unexpected lowering of performance valuation in period tt.What therefore happens is that performance of the economic agent becomes a process with serial correlation. So the evolution of the variable of policy interest when such incentives are applied can be viewed as an autoregressive process. In a situation where the policy variable changes over time, we estimate the unconditional mean in the autoregressive process as a function of the agent’s characteristics and of policy intensity. Comparison of the fitted values of the unconditional mean under different values of the reform intensity enables us to identify the reform effect cleared of mean reversion. For instance, we contrast the unconditional means estimated under the values of the policy variable in two consecutive time periods. Alternatively, we compare the fitted value of the unconditional mean in period ttwith its counterfactual analogue: the unconditional mean at zero value of policy intensity. The article which is closest to our latter approach in assessing the policy effect in dynamic panels is [48]: the actual value of return on equity (ROE) at merged banks is compared with the fitted value of ROE, measured as the unconditional mean of the AR(1) process for the whole banking industry (i.e., the counterfactual value of ROE in the absence of the merger).It should be noted that our identification strategy is close to difference-in-difference analysis in a non-binary treatment: the intensity of reform is the analogue of the treatment variable and the share of Medicare’s patients at the hospital is the analogue of the variable for the treatment/control groups.If the treatment is binary, our approach does not differ from the conventional difference-in-difference estimation with the interaction term of the pre/post treatment dummy and the dummy for the treatment/control group.We use the example of Medicare’s value-based purchasing, implemented at national level in the US since 2013 on the basis of a reward function that relates the aggregate measure of hospital performance to remuneration. Overall, applications of pay-for-performance are very numerous in healthcare, since healthcare is the classic example of an industry with asymmetric information where sustained quality of service is extremely important. It should be noted that the research in health economics is vulnerable to random shocks in the dependent variable and hence, to the phenomenon of mean reversion. Yet, as regards incentive schemes, to the best of our knowledge, only one article explicitly discusses the impact of random variation of quality [62] and only a few articles point to the need for reassessing the impact of Medicare’s pay-for-performance incentive mechanisms in view of the potential impact of mean reversion [58,63].Our estimations of the association between the observed level of prior quality and measured quality improvement employ nationwide data for 2,984 acute-care Medicare hospitals which are financed according to the quality-incentive mechanism in 2013–2019. The empirical approach uses annual variation in the size of quality incentives in order to estimate the effect of pay-for-performance cleansed of mean reversion. We control for other potential channels of quality improvement by Medicare hospitals, using data on the Hospital Readmissions Reduction Program (HRRP) and on the meaningful use of Electronic Health Records (EHR).We find that the higher the quintile of the composite quality measure at Medicare hospitals, the larger the estimated effect of the reform. Our empirical results suggest that the stylized fact of inverse relationship between improvement owing to the incentive scheme and the baseline performance should be revisited. This inverse relationship has been found by most empirical assessments of the impact of incentive contracts on healthcare quality and seems to hold for various designs of pay-for-performance: it is observed for general practitioners in the UK; physician groups in California, Chicago, and Ontario; US hospitals in Michigan, New York, and Wisconsin; and hospitals involved in Medicare’s pilot project for quality improvement [19,26,34,40,42,51,52,63,67,76]. However, we argue that the finding of an inverse relationship may be incorrect when the empirical approach fails to account for the impact of the random shocks on the time profile of quality under the intertemporal incentive scheme.The remainder of the article is structured as follows. Section 2 reviews the design of Medicare’s quality incentive and sets up the framework for evaluating its outcomes. Section 3 outlines the empirical methodology, and Section 4 describes the data for Medicare hospitals. The results of the empirical analysis are presented in Section 5. Section 6 contains a discussion of our approach in view of conventional methods for policy evaluation, and Section 7 supports the quantitative findings of our analysis by suggesting potential channels used for quality improvement at hospitals.2Medicare’s incentive contract2.1Policy settingThe mechanism provides an incentive proportional to measured quality and has been applied to discharges in the inpatient prospective payment system at acute-care Medicare hospitals since 2013.Two US states are exceptions to the rule: Puerto Rico, which only started innovating its healthcare system in 2015 and Maryland, which has a unique model for hospital financing.The scheme reduced Medicare’s base paymentThe base payment is linked to each diagnosis-related group.to each hospital by a factor αt{\alpha }_{t}which equaled 0.01 in 2013. The amount of the reduction was increased annually by 0.0025 in 2014–2017 and has remained flat at 0.02 since 2017. Note that α\alpha is the parameter of the reform intensity, varying over time, and α=0\alpha =0would correspond to a counterfactual setting with the absence of the reform.The accumulated saving from reduction in base payment is redistributed across hospitals according to an adjustment coefficient, which is computed as a linear function of the composite quality measure: 1+κtmit100−1⋅αt1+\left({\kappa }_{t}\frac{{m}_{it}}{100}-1\right)\cdot {\alpha }_{t}, where iiis the index of a hospital, ttindicates year, and mit{m}_{it}is the hospital’s total performance score (TPS), (0≤mit≤1000\le {m}_{it}\le 100). A hospital is rewarded in period t+2t+2if the adjustment coefficient based on mit{m}_{it}is above one and is penalized otherwise. The quality incentive scheme is budget-neutral and the value of the slope κt{\kappa }_{t}is chosen to ensure budget neutrality, so that hospitals with value of TPSTPSabove the empirical mean gained under the reform. In the first years of the reform κt{\kappa }_{t}was close to 2, so hospitals with value of the composite quality measure above 50 were winners from the incentive scheme.The TPS is a weighted sum of scores for measures in several domains: timely implementation of recommended medical interventions (clinical process of care), quality of healthcare as perceived by patients (patient experience of care), survival rates for AMI, heart failure and pneumonia patients and other proxies for outcome of care, healthcare-associated infections and other measures of safety of care, and spending per beneficiary as a measure of efficiency of care.The domain score is the sum of the scores for its measures. Higher score of the measure reflects higher position of the hospital in the empirical distribution of the quality measure in a given year or greater improvement of the quality measure relative to the baseline period. Specifically, achievement points are computed for each measure evaluating a hospital’s performance relative to other hospitals in a given year, and improvement points for each measure are computed to assess change in the hospital’s own performance in the given year relative to the baseline period. Then, for each measure, the highest of the two (achievement points or improvement points) is used as the hospital’s score for that measure.A hospital’s intertemporal incentive in Medicare’s scheme is based on the expectation that the quality payments will continue over a long term, so the hospital’s executives and physicians realize that demand is proportionate to quality and that their current policies toward quality of care will influence future reimbursement [46,73].2.2Autoregressive process and quality convergenceThe evolution of the measured quality constitutes a process with serial correlation. If the process for the measured quality is stationary, then it may be treated as an autoregressive process mt−μ(θ)=ϕ1(mt−1−μ(θ))+⋯+ϕp(mt−p−μ(θ))+εt{m}_{t}-\mu \left(\theta )={\phi }_{1}\left({m}_{t-1}-\mu \left(\theta ))+\cdots +{\phi }_{p}\left({m}_{t-p}-\mu \left(\theta ))+{\varepsilon }_{t}. Here μ(θ)=E(mt∣θ)\mu \left(\theta )=E\left({m}_{t}| \theta )denotes the mean value of the measured quality for a hospital with type θ\theta . As the absolute values of the reciprocals of the roots of the characteristic equation of AR(pp) processes are less than one, the maximum absolute value of these reciprocals (denoted λ\lambda ) may be used as the measure of persistence for the process of measured quality [74].Using definitions in [29], we can disentangle a permanent component in mt{m}_{t}, which is related to economic impact of pay-for-performance from a transient component (a pure dynamic effect), which may be referred to as “mean reversion” or “regression toward the mean” [30].The reason for the phenomenon of mean reversion is the existence of the random error εt{\varepsilon }_{t}in the measured quality mt{m}_{t}. Indeed, in the absence of εt{\varepsilon }_{t}the process quickly converges to its mean μ(θ)\mu \left(\theta )and does not exhibit mean reversion because it always sits at the mean. The random error in the measured quality is largely attributed to imprecision in quality measurement: it is hard to reveal true quality using observable proxies. Another reason is random variation in true quality, which may be explained by the fact that patients do not always comply with the prescribed treatment [62]. Combined with the fact that hospitals make an intertemporal decision in respect of the quality-based reimbursement, the random error leads to the autoregressive form of measured quality mt{m}_{t}.The autoregressive specification can be taken as equivalent to convergence of the measured quality toward the value μ(θ)\mu \left(\theta )and λ\lambda is associated with the speed of quality convergence. The persistence parameter λ\lambda essentially describes how quickly the effect of any unexpected shock in value of the dependent fades over time. For example, consider a simple AR(1) process with 0<λ<10\lt \lambda \lt 1and the conditional mean E(mt∣mt−1,θ)=μ(θ)+λ(mt−1−μ(θ))E\left({m}_{t}| {m}_{t-1},\theta )=\mu \left(\theta )+\lambda \left({m}_{t-1}-\mu \left(\theta )). Here the expected value of current measured quality EmtE{m}_{t}is closer to the mean value μ(θ)\mu \left(\theta )than is the value of the measured quality in the previous period, i.e., mt−1{m}_{t-1}. The expression for E(mt∣mt−1,θ)E\left({m}_{t}| {m}_{t-1},\theta )becomes more complicated for AR(pp) processes with p>1p\gt 1, but λ\lambda can still be used as a measure of persistence of the process.The hospital receives higher profits for improvement of performance under higher values of α\alpha than under lower values of α\alpha . This, combined with the serial correlation between performance in consecutive periods, implies a direct association between the persistence parameter λ\lambda and α\alpha . Higher values of λ\lambda imply a lower rate of convergence of quality and hence a weaker effect of mean reversion.2.3Expected outcomes of the reform and time profiles of the quality measure2.3.1Mean effect of the reformThe payment schedule makes the hospital adjustment coefficient a linear function of TPS, so each hospital has an incentive to raise the value of the observed composite quality measure. Hence, the introduction of pay-for-performance is expected to have a positive effect on mean value of the composite quality measure. Indeed, the mean level of hospital performance was improved even in case of a continuous reward function applied to hospitals above the threshold values of quality indicators (Medicare’s pilot program, Phase I) [18,34,37,52,68]. Specifically, the value of the composite performance score in Medicare’s pay-for-performance hospitals was higher than in the control group of hospitals [52,78]. Moreover, sociological evidence points to the fact that hospitals participating in incentive schemes are likely to improve performance as they implement a larger number of quality improving activities that non-incentivized hospitals do not carry out [41].The higher the value of α\alpha , the higher may be the hospital’s loss under the reform in case of insufficient value of TPS. Indeed, the empirical evidence points to larger incentives being more effective than smaller ones in such reforms [8,15,60].Accordingly, the expected mean effects of the reform may be formulated as follows:Hypothesis H1aH1a: The introduction of pay-for-performance and the increase of parameter α\alpha in the context of pay-for-performance lead to a positive mean effect on observed quality.Hypothesis H1aH1aimplies that hospitals can be treated as agents which take their future payments into account. The intertemporal stimuli result in mean reversion with respect to observed quality. However, the strength of mean reversion is interrelated with parameter α\alpha as follows:Hypothesis H1bH1b. The increase in the share of hospital funds at risk in pay-for-performance weakens the effect of convergence of the measured quality to the mean value.2.3.2Groupwise effects of the reformWe assume that the effect of Medicare’s reform will be larger at hospitals with higher quality, based on findings in the health policy literature that emphasis on quality improvement in incentive schemes is greater at high-quality hospitals or among high-quality physicians in comparison with low-quality hospitals and physicians [21,37,69,77,78].For instance, [77] conducted structural surveys at hospitals in the top two and bottom two deciles of performance measure in Medicare’s pilot program and discovered stronger involvement in quality improving activities among top performing hospitals. The statistically significant differences between top- and bottom-performing hospitals were observed in case of the numerical values, assigned to the following components of quality improvement: organizational culture, multidisciplinary teams, “adequate human resources for projects to increase adherence to quality indicators” and “new activities or policies related to quality improvement” (Tables 3 and P on pp. 836–837).Interviews with the leaders of California physician organizations [21] similarly discovered that physicians with high performance placed higher emphasis on the support that “the organization dedicates to addressing quality issues” than medium- and low-performing physicians (Exhibit 3, p. 521).Moreover, papers that use policy evaluation techniques applied to assessment of the effect of the pilot pay-for-performance program at Medicare hospitals report that hospitals in the top two deciles of quality measures showed the fastest improvement, while hospitals in the lowest deciles raised their quality to a much lesser extent or may even have failed to improve [69,78].To sum up, the hypothesis on groupwise effects of pay-for-performance is as follows:Hypothesis H2H2. The introduction of pay-for-performance leads to a larger boost of measured quality at high-quality hospitals than at low-quality hospitals.2.3.3Net total effect over time at groups of hospitalsConsider the multivariate dependence of the variable of interest on two sources of intertemporal dependence: the policy reform and mean reversion. The effect of mean reversion implies a differential time profile of measured quality: measured quality increases at hospitals in low percentiles of the quality distribution and decreases at hospitals in high percentiles. Combined with the positive effect of pay-for-performance on the mean value of measured quality (Hypothesis H2H2), mean reversion is likely to result in heterogeneous net total effect of change in measured quality over time.Hypothesis H3aH3a. High-quality hospitals experience decrease of measured quality owing to regression toward the mean. However, the introduction of pay-for-performance and increase of the share of hospital funds at risk in pay-for-performance lead to improvements in measured quality at these hospitals. The net total effect may vary.Hypothesis H3bH3b. Low-quality hospitals increase their measured quality owing to regression toward the mean. The introduction of pay-for-performance and increase of the share of hospital funds at risk in pay-for-performance also cause a rise in measured quality, so the net total effect at these hospitals is positive.If α\alpha is gradually raised in the course of implementation of the incentive scheme, then, according to H1bH1b, convergence of measured quality weakens over time. The net total effect at high-quality hospitals is the sum of the positive effect of the quality incentive and negative effect of the quality convergence. With increase in α\alpha , the number of hospitals where the positive effect outweighs the negative becomes larger.Hypothesis H3cH3c. The increase of hospital funds at risk under pay-for-performance weakens the effect of convergence of measured quality, so the number of high-quality hospitals with negative net total effect decreases.3Empirical approach3.1SpecificationThe dependent variable yit{y}_{it}is the TPS of hospital iiin year tt. The value of yit{y}_{it}is used for remuneration of Medicare hospitals at time t+2t+2, so we employ the second-order dynamic panel,Formally, the model with at least two lags must be used for describing the evolution of the TPS. However, the coefficients for the variables with the third lag of TPS turned out insignificant in our empirical estimation, so we treat TPS as an AR(2) process.(1)yit=ϕ0+ϕ1yit−1+ϕ2yit−2+ϕ3αtsit+ϕ4αtsityit−1+ϕ5αtsityit−2+δ0sit+zit′δ1+αtsit⋅zit′δ2+dt′δ3+ui+εit,{y}_{it}={\phi }_{0}+{\phi }_{1}{y}_{it-1}+{\phi }_{2}{y}_{it-2}+{\phi }_{3}{\alpha }_{t}{s}_{it}+{\phi }_{4}{\alpha }_{t}{s}_{it}{y}_{it-1}+{\phi }_{5}{\alpha }_{t}{s}_{it}{y}_{it-2}+{\delta }_{0}{s}_{it}+{z}_{it}^{^{\prime} }{\delta }_{1}+{\alpha }_{t}{s}_{it}\cdot {z}_{it}^{^{\prime} }{\delta }_{2}+{d}_{t}^{^{\prime} }{\delta }_{3}+{u}_{i}+{\varepsilon }_{it},where zit{z}_{it}are hospital time-varying characteristics, ui{u}_{i}are individual hospital effects (in particular, they incorporate the altruistic effects), the size of quality incentives αt{\alpha }_{t}varies in different years and enters the equation multiplied by the share of Medicare discharges sit{s}_{it}, which indicates that the quality incentives apply only to treatment of Medicare patients, and dt{d}_{t}is a set of dummy variables which capture external time effects (effects unrelated to hospital decisions). The following restrictions are used to identify the constant term ϕ0{\phi }_{0}: the sum of the coefficients for components of dt{d}_{t}is normalized to zero, and the expected value E(ui)=0E\left({u}_{i})=0. Hospital time-varying characteristics are disproportionate share index, casemix index, number of hospital beds,As the distribution of the number of hospital beds is extremely skewed, we take the log of hospital beds. This approach is in line with [22] and makes it possible to account for nonlinear effect of hospital beds. It is less restrictive than the alternative approach, employing a list of dummies based on the ranges of hospital beds (e.g., less than 99, 100–199 etc.). Use of a list of dummies condemns the effect to be piece-wise, prohibiting variation within the category of hospitals with a given range of beds.physician-to-bed ratio, and nurse-to-bed ratio. The posterior analysis of the effect of quality incentives deals with hospital grouping according to the time-invariant characteristics, which could not be incorporated in the empirical specification with fixed effects: geographic region where the hospital is located, public ownership, urban location, and teaching status.We use two hospital control variables which affect quality improvement and allow us to mitigate potential biases, which might occur if the pay-for-performance effect is identified based only on the variation of α\alpha in time. The HRRP penalty captures the impact of a simultaneously adopted incentive program with similar incentives. Moreover, the readmission reduction program targets improvement of quality measures which are components of TPS.30-day unplanned readmission rates for acute myocardial infarction, heart failure, and pneumonia.The binary variable for successful attestation of meaningful usage of EHR accounts for the effect of another compulsory program, which provides bonuses to attested hospitals. The variable controls for the fixed cost incurred by a hospital to improve its quality through installing and using health information technology systems.Eq. (1) can be estimated using the generalized method of moments: the [2] and [12] methodology for dynamic panel data. Examples of use of the methodology in health economics include analysis of the quality of care at Medicare’s hospitals in [56], study of the length of stay at Japanese hospitals in [10], investigation of labor supply by Norwegian physicians in [4], and of health status of individuals in the US in [57].The first set of moment conditions in GMM comes from the approach of [2] and [12]. We take the first difference of the right-hand side and left-hand side of Eq. (1): (2)Δyit=ϕ1Δyit−1+ϕ2Δyit−2+ϕ3Δ(αtsit)+ϕ4Δ(αtsityit−1)+ϕ5Δ(αtsityit−2)+δ0Δsit+Δ(zit)′δ1+Δ(αtsit⋅zit)′δ2+Δdt′δ3+Δεit.\Delta {y}_{it}={\phi }_{1}\Delta {y}_{it-1}+{\phi }_{2}\Delta {y}_{it-2}+{\phi }_{3}\Delta \left({\alpha }_{t}{s}_{it})+{\phi }_{4}\Delta \left({\alpha }_{t}{s}_{it}{y}_{it-1})+{\phi }_{5}\Delta \left({\alpha }_{t}{s}_{it}{y}_{it-2})+{\delta }_{0}\Delta {s}_{it}+\Delta \left({z}_{it})^{\prime} {\delta }_{1}+\Delta \left({\alpha }_{t}{s}_{it}\cdot {z}_{it})^{\prime} {\delta }_{2}+\Delta {d}_{t}^{^{\prime} }{\delta }_{3}+\Delta {\varepsilon }_{it}.Since εit{\varepsilon }_{it}cannot be predicted using the information available at period t−1t-1, εit{\varepsilon }_{it}is uncorrelated with any variable known at time t−1t-1, t−2t-2etc. Therefore, Δεit\Delta {\varepsilon }_{it}is uncorrelated with any variable known at time t−2t-2, t−3t-3etc. Hence, the following set of moment conditions can be imposed to estimate the model parameters in Eq. (2), see [2] and [12]: t=3:E(Δei3Zi1)=0,t=4:E(Δei4Zi1)=0,E(Δei4Zi2)=0,t=5:E(Δei5Zi1)=0,E(Δei5Zi2)=0,E(Δei5Zi3)=0,etc.\begin{array}{rcl}t=3:& & E\left(\Delta {e}_{i3}{Z}_{i1})=0,\\ t=4:& & E\left(\Delta {e}_{i4}{Z}_{i1})=0,\hspace{1.0em}E\left(\Delta {e}_{i4}{Z}_{i2})=0,\\ t=5:& & E\left(\Delta {e}_{i5}{Z}_{i1})=0,\hspace{1.0em}E\left(\Delta {e}_{i5}{Z}_{i2})=0,\hspace{1.0em}E\left(\Delta {e}_{i5}{Z}_{i3})=0,\\ \hspace{0.1em}\text{etc.}\hspace{0.1em}& & \end{array}where eit{e}_{it}is the regression residual and Zit{Z}_{it}is any variable known at time tt.For instance, yit{y}_{it}may serve as Zit{Z}_{it}.Another set of moment conditions comes from [12] for the level Eq. (1): ui+εit{u}_{i}+{\varepsilon }_{it}has to be uncorrelated with ΔZit−1\Delta {Z}_{it-1}for any stationary variable Zit{Z}_{it}, where Zit−1{Z}_{it-1}is known at time t−1t-1.(3)E((ui+eit)ΔZit−1)=0,t=3,4,…E\left(\left({u}_{i}+{e}_{it})\Delta {Z}_{it-1})=0,\hspace{1.0em}t=3,4,\ldots So Zit{Z}_{it}includes lagged values of predetermined and endogenous variables (the first set of moment conditions) and differenced predetermined and endogenous variables (the second set of the moment conditions). All moment conditions are formulated separately for different years, so the number of observations for asymptotics equals the number of hospitals.The separate formulation of moment conditions for different years makes it impossible to apply the exclusion restrictions, which are commonly used in the instrumental variables approach.More specifically, lagged value of TPS and other hospital control variables in zit{z}_{it}(beds, physician-to-bed and nurse-to-bed ratios, HRRP penalty, and the binary variable for hospital EHR attestation) are taken as predetermined and do not require the use of instruments in estimations. Casemix and the disproportionate share index are assumed to be endogenous: we rely on the empirical evidence of manipulation by hospitals of patient diagnoses (i.e., with casemix) and reluctance to admit low-income patients under quality-incentive schemes [17,23,28]. We assume that the Medicare share is endogenous, too: the fact may be explained by demand-side response from Medicare patients to publicly reported hospital quality [44,53,72].It should be noted that the use of dynamic panel data methodology requires justification on economic grounds. This is because the approach uses lags and lagged differences as instruments, and there are potential problems with using lags as instruments even though they pass the Arrelano-Bond tests. Specifically, lags may prove to be weak and invalid [7]: the weakness may occur when lags are distant [59], and invalidity happens due to overfitting of the endogenous variable under large TT[66]. However, neither of these problems (weakness and invalidity) are likely to be present in our analysis since we restrict our instruments only by the first appropriate lag.The validity of instruments is assessed through statistics of the Arellano-Bond test. We employ [80] robust standard errors for estimation.The Sargan statistic may be used in dynamic panels for assessing validity of instruments under the homoskedasticity assumption. But it is not applicable to our specification with robust standard errors.But formal tests are insufficient for establishing the causal relationship in models, which use an instrumental variable approach [1,7]. Accordingly, it is necessary to provide an economic justification for the assumption of the exclusion restriction of the instruments, i.e., that the instruments are exogenous and impact the dependent variable through no channels other than the endogenous variable and, possibly, also through exogenous covariates. An example of such justification on theoretical grounds can be found in [6], who uses lags of GDP and lags of the inflation rate as instruments for GDP and inflation. Another way of arguing for the exclusion restriction is given in [38], which estimates per capita output in various countries as a function of social infrastructure. Owing to endogeneity of social infrastructure, variables related to exposure to Western culture are used as instruments, and there is a discussion of the absence of any direct channels through which these variables could impact a country’s per capita output.We follow the latter approach to provide an economic justification for the validity of instruments in the dynamic panel data model for the composite quality measure at Medicare hospitals. Our arguments below, which advocate the applicability of lagged first differences as instruments for the level Eq. (1) and first lagged levels as instruments for the difference Eq. (2), are based on the plausible assumption of a short adjustment period in the values of the dependent variable. Specifically, we assume that hospital managers take prompt action upon learning the TPS in year tt, so that adjustment is observed in the next period and is not delayed until a more remote future. This assumption is supported by interviews with hospital managers [21,46,37,55,73,77], which show real-time assessment of performance of hospital personnel and immediate feedback initiatives aimed at correcting possible lack of quality. For instance, at Medicare hospitals which participated in the pilot pay-for-performance program, “progress reports were routinely delivered to hospital leadership and regional boards” ([37], p. 45S). Hospital-specific and physician-specific compliance reports were collected at least every 1.5 months on average, and the results of these reports were delivered to individual physicians once in 5 months on average at both top-performing and bottom-performing hospitals ([77], Table 4, p. 837). As regards nationwide implementation of pay-for-performance at Medicare hospitals, the TPS is calculated annually, but values for the quality dimensions of the TPS are made publicly available on a quarterly basis.Exceptions are one measure in the clinical process of care domain (influenza immunization), one measure in the safety domain (PSI-90), and a measure in the efficiency domain – Medicare spending per beneficiary, which are updated annually. See measure dates in the quarterly data archives available at https://data.cms.gov/provider-data/archived-data/hospitals.Frequent announcements of quality scores make it possible to expedite quality adjustment at each hospital and improve the value of the TPS within a year. For instance, the survey of hospital CEOs, physicians, nurses, and board members showed that, since implementation of the value-based purchasing program, “data were shared with their board and discussed at least quarterly with senior leadership” ([55], p. 435).As regards our formal analysis, Eq. (1) has TPS as a dependent variable and its first and second lags as explanatory variables. Δyt−1\Delta {y}_{t-1}is used as an instrument for yt−1{y}_{t-1}. We assume that the change in TPS from period t−2t-2to t−1t-1, i.e., Δyt−1\Delta {y}_{t-1}, which is observed at a hospital at t−1t-1, is immediately followed by the hospital’s action in period t−1t-1. So the instrument Δyt−1\Delta {y}_{t-1}affect the dependent variable yt{y}_{t}through the endogenous variable yt−1{y}_{t-1}, i.e., through improved quality in period t−1t-1(and potentially also through the predetermined variable yt−2{y}_{t-2}, i.e., quality adjustment may start as early as in period t−2t-2) but not through other channels. Without the short adjustment period, these other channels might have included some postponed effects which only come into effect in period tt. Note that the equation has hospital control variables, and we follow the empirical literature on the US Medicare reform by treating some of them as endogenous. One of such variables, the share of Medicare patients, reflects the desire of the regulator to sign contracts with the hospital to treat Medicare patients, and it is a function of the hospital’s quality enhancing efforts [46]. Our empirical strategy relies on the fact that Δxt−1\Delta {x}_{t-1}is an excludable instrument for xt{x}_{t}. It is, indeed, plausible to assume that increase of quality efforts from period t−2t-2to period t−1t-1results in positive value of Δst−1\Delta {s}_{t-1}(where st{s}_{t}denotes the share of Medicare patients) and impacts the value of the TPS in the period tt. A similar argument applies to another endogenous control variable – casemix – which reflects the share of patients with complicated diagnoses. If we ignore potential dumping of patients by hospitals, hospitals are interested in treating patients with complicated diagnoses, since compensation in the system of diagnosis-related groups is higher for severe cases. But patient demand responds to public reports on hospital quality [20,42,44], so the share of Medicare cases becomes a function of hospital quality.Another equation is (2) and it models first differences, i.e., changes in quality. The dependent variable is Δyt\Delta {y}_{t}and it is a function of the endogenous variable Δyt−1\Delta {y}_{t-1}, a predetermined variable Δyt−2\Delta {y}_{t-2}, and the difference in the values of hospital control variables Δxt\Delta {x}_{t}. The instrument for Δyt−1\Delta {y}_{t-1}is yt−2{y}_{t-2}and the instrument for each endogenous hospital control variable is xt−2{x}_{t-2}. Following the above logic about prompt response of TPS to its values in the previous period, we presume that yt−2{y}_{t-2}will affect the change in the value of the TPS from period t−2t-2to period t−1t-1. So yt−2{y}_{t-2}impacts yt{y}_{t}through Δyt−1\Delta {y}_{t-1}(and potentially even through the predetermined variable Δyt−2\Delta {y}_{t-2}) but not through other channels (i.e., not through processes that occur as late as in period tt). Similarly, upon learning the value of xt−2{x}_{t-2}, hospitals speedily adjust their quality to change Δxt\Delta {x}_{t}and it affects Δyt\Delta {y}_{t}.Note that [56] used similar arguments in discussing applicability of the dynamic panel data model to analyze in-hospital mortality and the complication rate, which are used as measures of hospital quality in US Medicare hospitals. They write: “We believe our approach is appropriate because (i) changes to in-hospital mortality and complications should be immediately affected by changes in staffing levels, not after a long adjustment period, and (ii) the influence of the past is incorporated through the lagged value of the dependent variable.” (p. 296, Footnote 3).A related study applying dynamic panel data models to hospital performance indicators deals with average length of stay at Japanese acute-care hospitals that plan to introduce a prospective payment system [10]. The variable is treated in Japan as a proxy for hospital efficiency. It is regularly monitored and analyzed by the regulator and by hospital management, with feedback actions by hospital personnel in response to annual updates on levels of the variable [9,10,43,45,75]. Accordingly, the assumption of a short adjustment period for the length of stay is likely to hold at Japanese hospitals and the use of lagged levels and lagged differences as instruments is justified.Note that potential violations of the exclusion restriction may occur in instances where the quality measure requires long periods to adjust. In such instances, causal impact of the Medicare reform on the quality of care cannot be established [1,7].We note other limitations of our approach. First, the analysis deals with the composite quality measure. While quality-related efforts of a hospital and the TPS composite quality measure are multi-dimensional, we do not touch upon multi-tasking in the empirical estimations. Our approach considers a one-dimensional effort, a one-dimensional true quality, and its measurable proxy.Note that in case of Medicare’s formula, the true multi-dimensional quality of hospitals (and hence quality-related efforts) is transformed into measured quality (i.e., TPS) in a non-linear manner, owing to the step-wise scale used for computing the points for each measure. We might nonetheless assume that quality is transformed into TPS monotonically and can be linearized in the empirical part of the article. Several arguments can be listed to support the conjecture. First, the data for Medicare hospitals show that no hospital has the highest possible step-wise values for all its measures. So even the best hospitals have an incentive to work to increase at least one of their measure scores in order to improve TPS. Second, we can neglect disincentives within the step-wise scale used for aggregating measure scores, which may cause deterioration of quality for hospitals that are already positioned at the highest step. Such hospitals could afford only a slight decrease of their quality (due to slacking efforts) while remaining at this step, and the impact on the value of TPS of a fall in quality in only one quality measure will be negligible. Third, interviews with executives of hospitals using value-based purchasing show that a hospital rarely gives special attention to a given subset of measures or shifts its administrative and other efforts across measures. All dimensions of TPS are monitored and actions to improve each dimension are implemented [46,73].Second, we do not touch on the rules for computing the scores of each dimension of the composite measure or on aggregation of dimension scores. It is important to note that Medicare uses whichever is highest, improvement points or achievement points, as the score for each dimension. The choice between achievement and improvement points stimulates low-performing hospitals, and the uniform formula assumes that all groups of hospitals have equal margin for improvement. A minor exception is protection of hospitals above the benchmark value of the 95th percentile of a corresponding measure score: these hospitals receive 10 points for their achievement on a [0,10]\left[0,10]scale, while the maximum number of points for improvement by any hospital is 9.The approach used by Medicare is in contrast with the methodology used in France, where all providers are stimulated according to improvement while only providers with quality above the mean value are also rewarded for their achievement [14].Third, weighting of scores across domains is another feature of the design of the incentive mechanism which is not analyzed in our article. So the dichotomous variables for annual periods in the empirical specification capture time effects unrelated to Medicare’s value-based purchasing as well as time effects not associated with the size of incentives but potentially linked to changes in other elements of the reform design (i.e., changes in weights).Finally, conventional policy evaluation using a control group of hospitals is not possible because quality measures for non-Medicare hospitals are not available.The TPS or all its components are only available for hospitals in the Hospital Compare database. The Hospital Compare database does include a small group of non-incentivized hospitals together with value-based purchasing hospitals. These are children’s hospitals and critical-access hospitals. But both groups offer a special type of healthcare and are not comparable with acute-care hospitals. Moreover, critical-access hospitals usually have no more than 20 beds, which makes it impossible to find a close match with acute-care hospitals. See [70] for an attempt of matching acute-care and critical-access hospitals.The empirical part of the article therefore focuses solely on pay-for-performance hospitals and identifies the effect of quality incentives based on variation in αt{\alpha }_{t}and the share of Medicare patients in the hospital sit{s}_{it}. Variation in αt{\alpha }_{t}plays the role of the dummy for treatment/pre-treatment periods, and variations in sit{s}_{it}act similar to the dummy for the treatment/control groups.3.2Multivariate dependence of the quality variable3.2.1Calculation of the mean in the autoregressive processWe interpret the second-order dynamic panel (1) as a second-order autoregressive process. The coefficients for the first and the second lags of yit{y}_{it}in this AR(2) process are equal to ϕ1+ϕ4αtsit{\phi }_{1}+{\phi }_{4}{\alpha }_{t}{s}_{it}and ϕ2+ϕ5αtsit{\phi }_{2}+{\phi }_{5}{\alpha }_{t}{s}_{it}, respectively. Note that both coefficients are linear functions of αt{\alpha }_{t}. While the standard form of the AR(2) process contains only the lags of the dependent variable, the right-hand side of our empirical equation includes various hospital characteristics and control variables.To test the hypotheses which concern the mean value of the measured quality μ\mu , we measure the mean fitted value of yit{y}_{it}as follows.For a fixed value of α\alpha we take the unconditional expected values of both sides of (1) and denote μ(α)=E(yit)\mu \left(\alpha )=E({y}_{it}): (4)μ(α)=ϕ0+ϕ1μ(α)+ϕ2μ(α)+ϕ3αE(sit)+ϕ4αE(sit)μ(α)+ϕ4αcov(sit,yit−1)+ϕ5αE(sit)μ(α)+ϕ5αcov(sit,yit−2)+δ0E(sit)+E(zit)′δ1+αE(sitzit)′δ2+E(dt)′δ3,\begin{array}{rcl}\mu \left(\alpha )& =& {\phi }_{0}+{\phi }_{1}\mu \left(\alpha )+{\phi }_{2}\mu \left(\alpha )+{\phi }_{3}\alpha E\left({s}_{it})+{\phi }_{4}\alpha E\left({s}_{it})\mu \left(\alpha )+{\phi }_{4}\alpha {\rm{cov}}\left({s}_{it},{y}_{it-1})+{\phi }_{5}\alpha E\left({s}_{it})\mu \left(\alpha )\\ & & +{\phi }_{5}\alpha {\rm{cov}}\left({s}_{it},{y}_{it-2})+{\delta }_{0}E\left({s}_{it})+E\left({z}_{it})^{\prime} {\delta }_{1}+\alpha E\left({s}_{it}{z}_{it})^{\prime} {\delta }_{2}+E\left({d}_{t})^{\prime} {\delta }_{3},\end{array}where E(dt)′δ3=0E\left({d}_{t})^{\prime} {\delta }_{3}=0because of the normalization of coefficients δ3{\delta }_{3}in (1). After collecting the terms with μ\mu and rearranging them, we obtain: μ(α)=ϕ0+ϕ3αE(sit)+δ0E(sit)+E(zit)′δ1+αE(sitzit)′δ2+ϕ4αcov(sit,yit−1)+ϕ5αcov(sit,yit−2)1−ϕ1−ϕ2−ϕ4αE(sit)−ϕ5αE(sit).\mu \left(\alpha )=\frac{{\phi }_{0}+{\phi }_{3}\alpha E\left({s}_{it})+{\delta }_{0}E\left({s}_{it})+E\left({z}_{it})^{\prime} {\delta }_{1}+\alpha E\left({s}_{it}{z}_{it})^{\prime} {\delta }_{2}+{\phi }_{4}\alpha {\rm{cov}}\left({s}_{it},{y}_{it-1})+{\phi }_{5}\alpha {\rm{cov}}\left({s}_{it},{y}_{it-2})}{1-{\phi }_{1}-{\phi }_{2}-{\phi }_{4}\alpha E\left({s}_{it})-{\phi }_{5}\alpha E\left({s}_{it})}.Since α\alpha differs across tt, we use sample means across the hospitals for fixed ttto obtain estimates of expectations.The estimate of μ(α)\mu \left(\alpha )is constructed by replacing the expected values and covariances by corresponding sample means and sample covariances: μ(α)=ϕ0+ϕ3αs¯+δ0s¯+z¯′δ1+αsz¯′δ2+ϕ4αcov^(s,L(y))+ϕ5αcov^(s,L2(y))1−ϕ1−ϕ2−ϕ4αs¯−ϕ5αs¯.\mu \left(\alpha )=\frac{{\phi }_{0}+{\phi }_{3}\alpha \overline{s}+{\delta }_{0}\overline{s}+\overline{z}^{\prime} {\delta }_{1}+\alpha \overline{sz}^{\prime} {\delta }_{2}+{\phi }_{4}\alpha \widehat{{\rm{cov}}}\left(s,L(y))+{\phi }_{5}\alpha \widehat{{\rm{cov}}}\left(s,{L}^{2}(y))}{1-{\phi }_{1}-{\phi }_{2}-{\phi }_{4}\alpha \overline{s}-{\phi }_{5}\alpha \overline{s}}.Note that the expression for μ(α)\mu \left(\alpha )does not contain the time effects dt′δ3{d}_{t}^{^{\prime} }{\delta }_{3}, as they represent shifts in quality which are common to all the hospitals and are caused by external circumstances.3.2.2Intertemporal dependence due to the policy reformThe policy parameter α\alpha increases in 2013–2017 and remains unchanged in 2017–2019. As follows from the hypothesis H1aH1a, the value of μ(αt)\mu \left({\alpha }_{t})is expected to increase through 2013–2017 and should become flat in 2017–2019.Accordingly, we examine the difference between μ(αt)\mu \left({\alpha }_{t})and μ(αt−1)\mu \left({\alpha }_{t-1}): μ(αt)−μ(αt−1)=ϕ0+ϕ3αts¯+δ0s¯+z¯′δ1+αtsz¯′δ2+ϕ4αtcov^(s,L(y))+ϕ5αtcov^(s,L2(y))1−ϕ1−ϕ2−ϕ4αts¯−ϕ5αts¯−ϕ0+ϕ3αt−1s¯+δ0s¯+z¯′δ1+αt−1sz¯′δ2+ϕ4αt−1cov^(s,L(y))+ϕ5αt−1cov^(s,L2(y))1−ϕ1−ϕ2−ϕ4αt−1s¯−ϕ5αt−1s¯.\begin{array}{rcl}\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1})& =& \frac{{\phi }_{0}+{\phi }_{3}{\alpha }_{t}\overline{s}+{\delta }_{0}\overline{s}+\overline{z}^{\prime} {\delta }_{1}+{\alpha }_{t}\overline{sz}^{\prime} {\delta }_{2}+{\phi }_{4}{\alpha }_{t}\widehat{{\rm{cov}}}\left(s,L(y))+{\phi }_{5}{\alpha }_{t}\widehat{{\rm{cov}}}\left(s,{L}^{2}(y))}{1-{\phi }_{1}-{\phi }_{2}-{\phi }_{4}{\alpha }_{t}\overline{s}-{\phi }_{5}{\alpha }_{t}\overline{s}}\\ & & -\frac{{\phi }_{0}+{\phi }_{3}{\alpha }_{t-1}\overline{s}+{\delta }_{0}\overline{s}+\overline{z}^{\prime} {\delta }_{1}+{\alpha }_{t-1}\overline{sz}^{\prime} {\delta }_{2}+{\phi }_{4}{\alpha }_{t-1}\widehat{{\rm{cov}}}\left(s,L(y))+{\phi }_{5}{\alpha }_{t-1}\widehat{{\rm{cov}}}\left(s,{L}^{2}(y))}{1-{\phi }_{1}-{\phi }_{2}-{\phi }_{4}{\alpha }_{t-1}\overline{s}-{\phi }_{5}{\alpha }_{t-1}\overline{s}}.\end{array}The null hypothesis is as follows: H0:μ(αt)−μ(αt−1)=0,{H}_{0}:\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1})=0,and it is tested against the positive alternative.Equivalently, we compute the difference between μ(α)\mu \left(\alpha )and μ(0)\mu \left(0): μ(α)−μ(0)=ϕ0+ϕ3αs¯+δ0s¯+z¯′δ1+αsz¯′δ2+ϕ4αcov^(s,L(y))+ϕ5αcov^(s,L2(y))1−ϕ1−ϕ2−ϕ4αs¯−ϕ5αs¯−ϕ0+z¯′δ11−ϕ1−ϕ2.\mu \left(\alpha )-\mu \left(0)=\frac{{\phi }_{0}+{\phi }_{3}\alpha \overline{s}+{\delta }_{0}\overline{s}+\overline{z}^{\prime} {\delta }_{1}+\alpha \overline{sz}^{\prime} {\delta }_{2}+{\phi }_{4}\alpha \widehat{{\rm{cov}}}\left(s,L(y))+{\phi }_{5}\alpha \widehat{{\rm{cov}}}\left(s,{L}^{2}(y))}{1-{\phi }_{1}-{\phi }_{2}-{\phi }_{4}\alpha \overline{s}-{\phi }_{5}\alpha \overline{s}}-\frac{{\phi }_{0}+\overline{z}^{\prime} {\delta }_{1}}{1-{\phi }_{1}-{\phi }_{2}}.Note that μ(0)\mu \left(0)represents the mean value in the pre-reform years when α=0\alpha =0and is obtained analytically by plugging α=0\alpha =0into the expression for μ(α)\mu \left(\alpha ).The null hypothesis is as follows: H0:μ(αt)−μ(0)=0,{H}_{0}:\mu \left({\alpha }_{t})-\mu \left(0)=0,and it is tested against the positive alternative.In conjunction with hypothesis H1aH1a, μ(αt)−μ(αt−1)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1})should be positive in 2013–2017 and should be close to zero in 2017–2019. Equivalently, μ(αt)−μ(α0)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{0})should be positive in 2013–2019 and should increase over the period 2013–2017.Now consider hypothesis H1bH1b. The persistence parameter λ(α)\lambda \left(\alpha )describes how quickly the effect of random shock in quality fades over time. For a second-order autoregressive process the rate of convergence of the conditional expected value of yit{y}_{it}decays exponentially at a rate equal to the reciprocal value of the smallest root of the characteristic equation for the AR(2) process: 1−(ϕ1+ϕ4αtsit)λ−(ϕ2+ϕ5αtsit)λ2=01-\left({\phi }_{1}+{\phi }_{4}{\alpha }_{t}{s}_{it})\lambda -\left({\phi }_{2}+{\phi }_{5}{\alpha }_{t}{s}_{it}){\lambda }^{2}=0([39], Section 2.3). Again, for a fixed value of α\alpha we take expectations 1−(ϕ1+ϕ4αE(sit))λ−(ϕ2+ϕ5αE(sit))λ2=0.1-\left({\phi }_{1}+{\phi }_{4}\alpha E\left({s}_{it}))\lambda -\left({\phi }_{2}+{\phi }_{5}\alpha E\left({s}_{it})){\lambda }^{2}=0.Then we replace the expected values by sample means and solve this quadratic equation to obtain the following formula for λ(α)\lambda \left(\alpha ): λ(α)=ϕ1+ϕ4αs¯+(ϕ1+ϕ4αs¯)2+4(ϕ2+ϕ5αs¯)2,\lambda \left(\alpha )=\frac{{\phi }_{1}+{\phi }_{4}\alpha \overline{s}+\sqrt{{\left({\phi }_{1}+{\phi }_{4}\alpha \overline{s})}^{2}+4\left({\phi }_{2}+{\phi }_{5}\alpha \overline{s})}}{2},where s¯\overline{s}is the mean value of the share of Medicare cases for a given year.An alternative approach considers the value of the autocorrelation function (ACF(1)) (the correlation coefficient between yit{y}_{it}and yit−1{y}_{it-1}) as the persistence parameter λ\lambda . Specifically, for the second-order autoregressive process (1) the estimated value of ACF(1) becomes λ(α)=ϕ1+ϕ4αs¯1−ϕ2−ϕ5αs¯\lambda \left(\alpha )=\frac{{\phi }_{1}+{\phi }_{4}\alpha \overline{s}}{1-{\phi }_{2}-{\phi }_{5}\alpha \overline{s}}([39], Section 3.4).Testing H1bH1bimplies analyzing whether λ(α)\lambda \left(\alpha )is an increasing function of α\alpha . So, similar to H1aH1a, the null hypothesis: H0:λ(αt)−λ(αt−1)=0{H}_{0}:\lambda \left({\alpha }_{t})-\lambda \left({\alpha }_{t-1})=0is tested against the positive alternative.Alternatively, we assess whether λ(α)−λ(0)\lambda \left(\alpha )-\lambda \left(0)is positive, whether it increases in 2013–2017 and changes only negligibly in 2017–2019.To assess H2H2we compute the effect of pay-for-performance as μ(αt)−μ(0)\mu \left({\alpha }_{t})-\mu \left(0)or μ(αt)−μ(αt−1)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1})at different quintiles of the lagged TPSit{{\rm{TPS}}}_{it}, where quintile 1 denotes the lowest quality and quintile 5 denotes the highest. We investigate whether the effect is positive for μ(αt)−μ(0)\mu \left({\alpha }_{t})-\mu \left(0)in 2013–2019 (and for μ(αt)−μ(αt−1)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1})in 2013–2017) and whether the effect increases by quintile.Testing H3aH3aand H3bH3binvolves computing the predicted values of TPSit{{\rm{TPS}}}_{it}at the mean value of each covariate for different quintiles of the lagged TPSit{{\rm{TPS}}}_{it}and examining whether in 2013–2019 they change from positive in the lowest quintiles to negative or insignificant in the highest quintiles. Average difference between predicted TPS and lagged TPS shows the expected change in quality in consecutive years (the net total effect) which is the sum of the effect of pay-for-performance and the impact of mean reversion.3.2.3Estimation of the multivariate effect due to policy reform and mean reversionEvaluation of H3cH3cinvolves calculating annual values of the net total effect for quintiles of the lagged TPS and examining whether the negative values of the net total effect become less frequent across highest quintiles during the period 2013–2017 and stays constant in 2017–2019.4Data4.1Data sources and variablesThe analysis uses data for Medicare hospitals in 2011–2019 from several sources. We use Hospital Compare data archives (January 2021 update) for quality measures, hospital ownership, and geographic location. The medical school affiliation of a hospital, the number of hospital beds, nurses, and physicians come from Provider of Service files. Other hospital control variables are taken from the Final Rules, which are Medicare’s annual documents on reimbursement rates in the inpatient prospective payment system. Specifically, we use information from the Impact Files, which accompany the Final Rules and estimate the impact of the reimbursement mechanism on hospital characteristics. The variables taken from the Impact Files are the share of Medicare’s discharges, ownership, and urban location.Patient characteristics are also taken from the Impact Files. The casemix variable reflects the relative weight of each DRG in financial terms and is adjusted for transfers of patients between hospitals.If a patient was transferred to/from a hospital, then the transfer-adjustment factor is the lesser of one and the value of the patient’s length of stay relative to the geometric mean of the national length of stay for this DRG. See Federal Register 2011, 42 CFR, Part 412.Casemix makes it possible to control for the composition of patient cases taking account of the objective link between severity of illness and hospital resources. The disproportionate-share index accounts for the share of low-income patients and makes it possible to proxy a patient’s income.To account for other major channels of quality improvement by Medicare hospitals over the observed time period, we use the data for two programs run by the Centers for Medicare and Medicaid Services. One of them is the HRRP, which applies to Medicare hospitals since fiscal year 2013 and penalizes them for excess readmissions. Specifically, the payment reduction which may equal from 0 to 3% is applied to hospital’s Medicare remuneration, higher values of the percentage for the penalty represent more excess readmissions at the hospital. Using the HRRP Supplemental data files, which accompany annual Final Rules on acute inpatient PPS (June 2020 update), we find the HRRP penalty for 2013–2019 and use it as one of the control variables in the empirical analysis.We also consider the EHR Incentive Program, which was in force since 2011. The program establishes hospital attestation on the use of EHR. The adoption of quality-improving information technology requires substantial fixed cost, so the binary variable for hospital attestation within EHR makes it possible to control for the fixed cost in the empirical analysis. The EHR promotion program consists of three stages (sequentially introduced in 2011, 2014, and 2017). Using data from The Eligible Hospitals Public Use Files on the EHR incentive program (February 2020 update), we set the EHR attestation dummy equal to one if the hospital passed its attestation for the given year at any stage. Owing to non-availability of data on the third stage of the program, we extend the second stage data from year 2016 to years 2017–2019. Use of an attestation dummy lets us control for the fact of incurring the fixed cost of quality-improvement efforts. Owing to the small size of the non-EHR group (only 8–10% of the sample), we do not analyze whether quality goes up faster in the group of the hospitals (for instance, we do not interact the attestation dummy with α\alpha ).4.2SampleThe non-anonymous character of the data sources allows us to merge them by year and hospital name. Our analysis focuses on acute-care Medicare’s hospitals, as the pay-for-performance incentive contract applies exclusively to this group. We restrict the sample by considering only hospitals with share of Medicare cases greater than 5%.The specification with second-order lag enables estimation of the fitted values of TPSt{{\rm{TPS}}}_{t}and the values of μt{\mu }_{t}only starting 2013. Accordingly, we can evaluate the impact of the incentive contract on quality improvement in 2013–2019. There are 2,984 hospitals in our sample for 2013–2019, which make TPSN observations (Table 1).Table 1Descriptive statistics for Medicare’s acute-care hospitals in 2013–2019VariableDefinitionObsMeanSt.DevMinMaxHospital performanceTPSHospital TPS18,54537.26511.4682.72798.182Patient characteristicsCasemixTransfer-adjusted casemix index18,5451.5990.2980.8343.972DshDisproportionate share index, reflecting the prevalence of low-income patients18,5450.3070.16501.232Hospital characteristicsNurses/bedsNurse-to-bed ratio18,5451.3123.8490170.479Physicians/bedsPhysician-to-bed ratio18,5450.0990.947070.992BedsNumber of beds18,545272.158241.53832,891log(beds)Number of beds (in logs)18,5455.2830.8191.0997.969Medicare shareShare of Medicare cases18,5450.3780.1180.0500.983HRRP penaltyPercentage reduction of the Medicare payments under HRRP18,5450.4980.59003.000MUEHR=1 if passed attestation for meaningful usage of EHR18,5450.9240.26501Urban=1 if an urban hospital18,5450.7110.45301Public=1 if managed by federal, state or local government, or hospital district or authority18,5450.1470.35401Teaching=1 if hospital has medical school affiliation18,5450.3640.48101Hospital locationNew England=1 if located in Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, or Vermont18,5450.0460.21001Mid-Atlantic=1 if located in New Jersey, New York, or Pennsylvania18,5450.1230.32801East North Central=1 if located in Illinois, Indiana, Michigan, Ohio, or Wisconsin18,5450.1680.37401West North Central=1 if located in Iowa, Kansas, Minnesota, Missouri, Nebraska, North Dakota, or South Dakota18,5450.0810.27201South Atlantic=1 if located in Delaware, Florida, Georgia, Maryland, North Carolina, South Carolina, Virginia, District of Columbia, or West Virginia18,5450.1770.38101East South Central=1 if located in Alabama, Kentucky, Mississippi, or Tennessee18,5450.0870.28201West South Central=1 if located in Arkansas, Louisiana, Oklahoma, or Texas18,5450.1290.33501Mountain=1 if located in Arizona, Colorado, Idaho, Montana, Nevada, New Mexico, Utah, or Wyoming18,5450.0690.25301Pacific=1 if located in California, Oregon, Washington, Alaska, or Hawaii18,5450.1150.31901Note: Section 401 hospitals are treated as rural hospitals.4.3Flow of quality and evidence of mean reversionDescriptive analysis of the values of TPS offers suggestive evidence in support of some of the main hypotheses generated by the model. Specifically, we focus on the flow of hospitals between quintiles of TPS in different years. The Sankey diagrams in Figures 1 and 2 use the width of arrows as the intensity of flow rates and demonstrates how hospitals change their position in quintiles of the composite quality measure after the introduction of pay-for-performance (e.g., from 2012 to 2013).Figure 1Flow of hospitals between quintiles of TPS in 2012–2013.Figure 2Flow of hospitals between quintiles of TPS in 2018–2019.As can be inferred from Figure 1, there is considerable movement of hospitals between quintiles. For instance, consider hospitals which in 2012 belonged to the fifth quintile of TPS (quintile with the highest performance). Fewer than half of these hospitals remained in the fifth quintile of TPS in 2013, and the rest saw a decline of their position relative to other hospitals by moving to quintiles one through four. Similar tendencies are observed for hospitals in any other given quintile of TPS in 2012: only a small share of hospitals continue to belong to the same quintile in the subsequent year. This can be viewed as graphic support for the phenomenon of mean reversion since hospitals would rarely change their quintile from year to year in the absence of mean reversion.It is plausible to assume that mean reversion becomes weaker when there is an increase of α\alpha . Figure 2 supports this prediction. It shows the flow of hospitals between quintiles of TPS from 2018 to 2019, when the value of α\alpha was 0.02. Compared to Figure 1 with α\alpha equal to 0.01 in 2013, the flows in 2018–2019 are much weaker than the flows in 2012–2013, so hospitals change their position in quintiles less often.5Empirical resultsThe first set of our results is reported in Table 2 and concerns the mean effect of pay-for-performance at Medicare hospitals.Table 2Effect of pay-for-performance on the mean quality2013201420152016201720182019αt{\alpha }_{t}1.001.251.501.752.002.002.00μ(αt)\mu \left({\alpha }_{t})30.546∗∗∗30.54{6}^{\ast \ast \ast }31.879∗∗∗31.87{9}^{\ast \ast \ast }34.823∗∗∗34.82{3}^{\ast \ast \ast }36.226∗∗∗36.22{6}^{\ast \ast \ast }38.407∗∗∗38.40{7}^{\ast \ast \ast }38.684∗∗∗38.68{4}^{\ast \ast \ast }38.841∗∗∗38.84{1}^{\ast \ast \ast }(0.973)(0.678)(0.385)(0.453)(0.970)(0.961)(0.956)μ(αt)−μ(0)\mu \left({\alpha }_{t})-\mu \left(0)2.255∗∗∗2.25{5}^{\ast \ast \ast }3.179∗∗∗3.17{9}^{\ast \ast \ast }4.762∗∗∗4.76{2}^{\ast \ast \ast }6.487∗∗∗6.48{7}^{\ast \ast \ast }8.461∗∗∗8.46{1}^{\ast \ast \ast }8.530∗∗∗8.53{0}^{\ast \ast \ast }8.495∗∗∗8.49{5}^{\ast \ast \ast }(0.737)(1.009)(1.349)(1.800)(2.355)(2.309)(2.266)μ(αt)−μ(αt−1)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1})2.255∗∗∗2.25{5}^{\ast \ast \ast }1.333∗∗∗1.33{3}^{\ast \ast \ast }2.944∗∗∗2.94{4}^{\ast \ast \ast }1.403∗∗∗1.40{3}^{\ast \ast \ast }2.181∗∗∗2.18{1}^{\ast \ast \ast }0.2770.157(0.737)(0.375)(0.474)(0.542)(0.634)(0.240)(0.237)Notes: Standard errors calculated using the delta-method are in parentheses.∗{}^{\ast }, ∗∗{}^{\ast \ast }, and ∗∗∗{}^{\ast \ast \ast }show significance at levels of 0.1, 0.05, and 0.01, respectively.Measured as μ(αt)−μ(0)\mu \left({\alpha }_{t})-\mu \left(0), the mean effect of pay-for-performance is positive in 2013–2019. The value of the effect μ(αt)\mu \left({\alpha }_{t})increases in αt{\alpha }_{t}in 2013–2017. However, the increase in 2017–2019 is negligible and is in line with the fact that α\alpha has remained flat since 2017. Similarly, the change in the effect of pay-for-performance in consecutive years, defined as μ(αt)−μ(αt−1)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1}), is positive for 2013–2017 but is extremely small and statistically insignificant in 2018–2019 in comparison with the previous years. This finding corresponds to our hypothesis H1aH1aof improvement in mean quality owing to the introduction of pay-for-performance (i.e., the increase of α\alpha from 0 to 1) and of expected rise of mean quality due to the linearly increasing reward function (α\alpha gradually goes up from 1 to 2 in 2013–2017).Note that the mean value of μ(αt)\mu \left({\alpha }_{t})increases in αt{\alpha }_{t}, which supports our supposition that hospital managers take account of future benefits from improving current values of hospital quality. Table 3 shows the second set of results for heterogeneity of hospital response to pay-for-performance. The parameter λ\lambda is estimated as the inverse of the smaller root of AR(2) or as ACF(1). The values are significant and less than one under both approaches. This points to mean reversion, so quality decreases toward the mean at high-quality hospitals and goes up toward the mean at hospitals with low quality. The values of λ\lambda rise with an increase in the size of incentives α\alpha , which implies that the persistence of the dynamic process increases, and hence the effect of mean reversion becomes weaker. Similarly, the values of λ(αt)−λ(0)\lambda \left({\alpha }_{t})-\lambda \left(0)are positive and increase in αt{\alpha }_{t}. The time change in the convergence parameter: λ(αt)−λ(αt−1)\lambda \left({\alpha }_{t})-\lambda \left({\alpha }_{t-1})is positive for 2013–2017. The results support hypothesis H1bH1bof weakening of quality convergence to the mean value with a rise in α\alpha . (The value of λ(αt)−λ(αt−1)\lambda \left({\alpha }_{t})-\lambda \left({\alpha }_{t-1})approaches zero in 2018–2019, when parameter α\alpha remains flat.)Table 3Effect of pay-for-performance on mean reversion2013201420152016201720182019αt{\alpha }_{t}1.001.251.501.752.002.002.00λ(αt)\lambda \left({\alpha }_{t})0.286∗∗∗0.28{6}^{\ast \ast \ast }0.435∗∗∗0.43{5}^{\ast \ast \ast }0.531∗∗∗0.53{1}^{\ast \ast \ast }0.598∗∗∗0.59{8}^{\ast \ast \ast }0.659∗∗∗0.65{9}^{\ast \ast \ast }0.651∗∗∗0.65{1}^{\ast \ast \ast }0.642∗∗∗0.64{2}^{\ast \ast \ast }(0.112)(0.032)(0.020)(0.017)(0.016)(0.016)(0.016)λ(αt)−λ(0)\lambda \left({\alpha }_{t})-\lambda \left(0)−0.169-0.169−0.020-0.0200.0760.144∗∗∗0.14{4}^{\ast \ast \ast }0.204∗∗∗0.20{4}^{\ast \ast \ast }0.196∗∗∗0.19{6}^{\ast \ast \ast }0.187∗∗∗0.18{7}^{\ast \ast \ast }(0.151)(0.069)(0.055)(0.048)(0.044)(0.044)(0.045)λ(αt)−λ(αt−1)\lambda \left({\alpha }_{t})-\lambda \left({\alpha }_{t-1})−0.169-0.1690.149∗0.14{9}^{\ast }0.096∗∗∗0.09{6}^{\ast \ast \ast }0.068∗∗∗0.06{8}^{\ast \ast \ast }0.061∗∗∗0.06{1}^{\ast \ast \ast }−0.008∗-0.00{8}^{\ast }−0.009∗-0.00{9}^{\ast }(0.151)(0.082)(0.016)(0.009)(0.007)(0.005)(0.005)λ(αt)\lambda \left({\alpha }_{t})(alternative)0.408∗∗∗0.40{8}^{\ast \ast \ast }0.442∗∗∗0.44{2}^{\ast \ast \ast }0.482∗∗∗0.48{2}^{\ast \ast \ast }0.519∗∗∗0.51{9}^{\ast \ast \ast }0.561∗∗∗0.56{1}^{\ast \ast \ast }0.554∗∗∗0.55{4}^{\ast \ast \ast }0.548∗∗∗0.54{8}^{\ast \ast \ast }(0.018)(0.015)(0.013)(0.013)(0.015)(0.014)(0.014)λ(αt)−λ(0)\lambda \left({\alpha }_{t})-\lambda \left(0)(alternative)0.132∗∗∗0.13{2}^{\ast \ast \ast }0.166∗∗∗0.16{6}^{\ast \ast \ast }0.206∗∗∗0.20{6}^{\ast \ast \ast }0.244∗∗∗0.24{4}^{\ast \ast \ast }0.285∗∗∗0.28{5}^{\ast \ast \ast }0.278∗∗∗0.27{8}^{\ast \ast \ast }0.272∗∗∗0.27{2}^{\ast \ast \ast }(0.019)(0.024)(0.029)(0.034)(0.039)(0.038)(0.038)λ(αt)−λ(αt−1)\lambda \left({\alpha }_{t})-\lambda \left({\alpha }_{t-1})(alternative)0.132∗∗∗0.13{2}^{\ast \ast \ast }0.034∗∗∗0.03{4}^{\ast \ast \ast }0.040∗∗∗0.04{0}^{\ast \ast \ast }0.037∗∗∗0.03{7}^{\ast \ast \ast }0.041∗∗∗0.04{1}^{\ast \ast \ast }−0.006∗-0.00{6}^{\ast }−0.006∗-0.00{6}^{\ast }(0.019)(0.005)(0.006)(0.006)(0.006)(0.003)(0.003)Notes: Standard errors calculated using the delta-method are in parentheses.∗{}^{\ast }, ∗∗{}^{\ast \ast }, and ∗∗∗{}^{\ast \ast \ast }show significance at levels of 0.1, 0.05, and 0.01, respectively.The persistence parameter λ(αt)\lambda \left({\alpha }_{t})is estimated as the inverse of the smaller root of AR(2) or as ACF(1), the latter is denoted as “alternative.”Since the values of λ(αt)\lambda \left({\alpha }_{t})are well below 1, we can conclude that the estimated AR(2) processes are indeed stationary for each αt{\alpha }_{t}.The heterogeneous changes in hospital quality owing to pay-for-performance are given in Tables 4, 5, 6 where hospitals are divided into quintiles according to the values of their TPS. Note that the change in hospital quality is a function of the regression coefficient and the mean values of covariates. So its standard error consists of two parts: the error of the estimated regression coefficient and the error of the mean values of covariates. Only the second part of this error depends on sample size and should go up approximately 5\sqrt{5}times due to analysis by quintiles. However, the weight of this second part proves to be relatively small in case of our data, so the standard errors in Tables 4–6 are only slightly larger than standard errors in Table 2.Table 4Effect of pay-for-performance as μ(αt)−μ(0)\mu \left({\alpha }_{t})-\mu \left(0)for quintiles of TPSt−1{{\rm{TPS}}}_{t-1}2013201420152016201720182019Quintile 11.828∗∗∗1.82{8}^{\ast \ast \ast }2.910∗∗∗2.91{0}^{\ast \ast \ast }4.149∗∗∗4.14{9}^{\ast \ast \ast }5.798∗∗∗5.79{8}^{\ast \ast \ast }7.673∗∗∗7.67{3}^{\ast \ast \ast }7.518∗∗∗7.51{8}^{\ast \ast \ast }7.683∗∗∗7.68{3}^{\ast \ast \ast }(0.741)(1.007)(1.334)(1.727)(2.264)(2.162)(2.202)Quintile 22.150∗∗∗2.15{0}^{\ast \ast \ast }2.881∗∗∗2.88{1}^{\ast \ast \ast }4.396∗∗∗4.39{6}^{\ast \ast \ast }5.953∗∗∗5.95{3}^{\ast \ast \ast }8.293∗∗∗8.29{3}^{\ast \ast \ast }8.063∗∗∗8.06{3}^{\ast \ast \ast }8.360∗∗∗8.36{0}^{\ast \ast \ast }(0.745)(1.013)(1.358)(1.803)(2.375)(2.293)(2.250)Quintile 2 minus0.321∗∗0.32{1}^{\ast \ast }−0.029-0.0290.2470.1550.620∗0.62{0}^{\ast }0.545∗0.54{5}^{\ast }0.677∗∗0.67{7}^{\ast \ast }Quintile 1(0.138)(0.124)(0.174)(0.233)(0.328)(0.327)(0.313)Quintile 32.187∗∗∗2.18{7}^{\ast \ast \ast }3.072∗∗∗3.07{2}^{\ast \ast \ast }4.279∗∗∗4.27{9}^{\ast \ast \ast }6.368∗∗∗6.36{8}^{\ast \ast \ast }8.093∗∗∗8.09{3}^{\ast \ast \ast }8.602∗∗∗8.60{2}^{\ast \ast \ast }8.301∗∗∗8.30{1}^{\ast \ast \ast }(0.738)(1.023)(1.378)(1.849)(2.346)(2.381)(2.278)Quintile 3 minus0.0370.191∗0.19{1}^{\ast }−0.117-0.1170.416∗0.41{6}^{\ast }−0.200-0.2000.539−0.059-0.059Quintile 2(0.078)(0.114)(0.155)(0.232)(0.321)(0.356)(0.355)Quintile 42.298∗∗∗2.29{8}^{\ast \ast \ast }3.218∗∗∗3.21{8}^{\ast \ast \ast }4.742∗∗∗4.74{2}^{\ast \ast \ast }6.534∗∗∗6.53{4}^{\ast \ast \ast }8.287∗∗∗8.28{7}^{\ast \ast \ast }8.260∗∗∗8.26{0}^{\ast \ast \ast }8.087∗∗∗8.08{7}^{\ast \ast \ast }(0.739)(1.027)(1.392)(1.850)(2.391)(2.338)(2.279)Quintile 4 minus0.1100.1460.463∗∗0.46{3}^{\ast \ast }0.1650.194−0.342-0.342−0.214-0.214Quintile 3(0.074)(0.111)(0.230)(0.235)(0.368)(0.419)(0.349)Quintile 52.483∗∗∗2.48{3}^{\ast \ast \ast }3.261∗∗∗3.26{1}^{\ast \ast \ast }5.135∗∗∗5.13{5}^{\ast \ast \ast }6.529∗∗∗6.52{9}^{\ast \ast \ast }8.381∗∗∗8.38{1}^{\ast \ast \ast }8.282∗∗∗8.28{2}^{\ast \ast \ast }8.501∗∗∗8.50{1}^{\ast \ast \ast }(0.741)(1.030)(1.418)(1.849)(2.506)(2.476)(2.421)Quintile 5 minus0.186∗0.18{6}^{\ast }0.0430.393−0.004-0.0040.0940.0220.414Quintile 4(0.101)(0.251)(0.503)(0.433)(0.568)(0.503)(0.497)Notes: quintile 1 denotes the lowest quality and quintile 5 – the highest. The table reports the effect at each corresponding quintile and the differences in the effects at consecutive quintiles.∗{}^{\ast }, ∗∗{}^{\ast \ast }, and ∗∗∗{}^{\ast \ast \ast }show significance at levels of 0.1, 0.05, and 0.01, respectively.Standard errors (calculated using the delta-method for the difference of the reform effects across the corresponding two categories of each time-invariant hospital characteristic) are in parentheses.There are two sources of errors in the estimates shown in the table: the error of the regression coefficient and the error of the mean values of covariates. The first part of the error does not vary across all result tables, while the second part of the error depends on the group size and is approximately 5\sqrt{5}times larger than its counterpart in Table 2. However, the errors of the regression coefficient are considerably bigger than those of mean values of covariates, so the increase in the standard errors in this table and two subsequent tables relative to the standard error in Table 2 is only minor.Table 5Effect of pay-for-performance as μ(αt)−μ(αt−1)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1})by quintiles of TPSt−1{{\rm{TPS}}}_{t-1}2013201420152016201720182019Quintile 11.828∗∗∗1.82{8}^{\ast \ast \ast }−0.257-0.2571.032∗∗1.03{2}^{\ast \ast }2.442∗∗∗2.44{2}^{\ast \ast \ast }1.738∗∗∗1.73{8}^{\ast \ast \ast }0.536−0.215-0.215(0.741)(0.517)(0.497)(0.582)(0.656)(0.440)(0.434)Quintile 22.150∗∗∗2.15{0}^{\ast \ast \ast }0.6782.131∗∗∗2.13{1}^{\ast \ast \ast }1.908∗∗∗1.90{8}^{\ast \ast \ast }2.405∗∗∗2.40{5}^{\ast \ast \ast }−0.125-0.1250.655(0.745)(0.489)(0.546)(0.626)(0.736)(0.465)(0.505)Quintile 2 minus0.321∗∗0.32{1}^{\ast \ast }0.935∗0.93{5}^{\ast }1.099∗∗1.09{9}^{\ast \ast }−0.534-0.5340.667−0.661-0.6610.869Quintile 1(0.138)(0.522)(0.501)(0.545)(0.585)(0.629)(0.676)Quintile 32.187∗∗∗2.18{7}^{\ast \ast \ast }0.933∗∗0.93{3}^{\ast \ast }1.561∗∗∗1.56{1}^{\ast \ast \ast }3.086∗∗∗3.08{6}^{\ast \ast \ast }1.940∗∗∗1.94{0}^{\ast \ast \ast }0.7430.104(0.738)(0.458)(0.530)(0.669)(0.723)(0.501)(0.531)Quintile 3 minus0.0370.255−0.570-0.5701.178∗∗1.17{8}^{\ast \ast }−0.465-0.4650.868−0.551-0.551Quintile 2(0.078)(0.490)(0.514)(0.545)(0.609)(0.677)(0.758)Quintile 42.298∗∗∗2.29{8}^{\ast \ast \ast }1.656∗∗∗1.65{6}^{\ast \ast \ast }3.222∗∗∗3.22{2}^{\ast \ast \ast }0.9292.397∗∗∗2.39{7}^{\ast \ast \ast }0.505−0.045-0.045(0.739)(0.464)(0.626)(0.702)(0.789)(0.552)(0.510)Quintile 4 minus0.1100.7221.662∗∗∗1.66{2}^{\ast \ast \ast }−2.157∗∗∗-2.15{7}^{\ast \ast \ast }0.458−0.238-0.238−0.149-0.149Quintile 3(0.074)(0.500)(0.565)(0.630)(0.654)(0.745)(0.748)Quintile 52.483∗∗∗2.48{3}^{\ast \ast \ast }3.445∗∗∗3.44{5}^{\ast \ast \ast }6.247∗∗∗6.24{7}^{\ast \ast \ast }−1.518∗-1.51{8}^{\ast }2.105∗∗∗2.10{5}^{\ast \ast \ast }−0.631-0.6310.666(0.741)(0.620)(0.864)(0.875)(0.892)(0.583)(0.567)Quintile 5 minus0.186∗0.18{6}^{\ast }1.789∗∗∗1.78{9}^{\ast \ast \ast }3.024∗∗∗3.02{4}^{\ast \ast \ast }−2.447∗∗∗-2.44{7}^{\ast \ast \ast }−0.292-0.292−1.136-1.1360.711Quintile 4(0.101)(0.599)(0.829)(0.876)(0.780)(0.803)(0.762)Notes: quintile 1 denotes the lowest quality and quintile 5 – the highest. The table reports the effect at each corresponding quintile and the differences in the effects at consecutive quintiles.∗{}^{\ast }, ∗∗{}^{\ast \ast }, and ∗∗∗{}^{\ast \ast \ast }show significance at levels of 0.1, 0.05, and 0.01, respectively.Standard errors calculated using the delta-method are in parentheses.Table 6Net total effect by quintiles of TPSt−1{{\rm{TPS}}}_{t-1}: Predicted TPS minus lagged TPS2013201420152016201720182019Quintile 12.686∗∗∗2.68{6}^{\ast \ast \ast }4.989∗∗∗4.98{9}^{\ast \ast \ast }1.153∗∗∗1.15{3}^{\ast \ast \ast }7.715∗∗∗7.71{5}^{\ast \ast \ast }5.647∗∗∗5.64{7}^{\ast \ast \ast }5.351∗∗∗5.35{1}^{\ast \ast \ast }0.226(0.458)(0.329)(0.298)(0.271)(0.263)(0.273)(0.266)Quintile 2−3.796∗∗∗-3.79{6}^{\ast \ast \ast }0.708∗∗∗0.70{8}^{\ast \ast \ast }−2.275∗∗∗-2.27{5}^{\ast \ast \ast }4.388∗∗∗4.38{8}^{\ast \ast \ast }2.586∗∗∗2.58{6}^{\ast \ast \ast }2.145∗∗∗2.14{5}^{\ast \ast \ast }−2.572∗∗∗-2.57{2}^{\ast \ast \ast }(0.288)(0.249)(0.236)(0.223)(0.212)(0.224)(0.238)Quintile 2 minus−6.482∗∗∗-6.48{2}^{\ast \ast \ast }−4.281∗∗∗-4.28{1}^{\ast \ast \ast }−3.428∗∗∗-3.42{8}^{\ast \ast \ast }−3.327∗∗∗-3.32{7}^{\ast \ast \ast }−3.061∗∗∗-3.06{1}^{\ast \ast \ast }−3.205∗∗∗-3.20{5}^{\ast \ast \ast }−2.799∗∗∗-2.79{9}^{\ast \ast \ast }Quintile 1(0.367)(0.285)(0.254)(0.215)(0.207)(0.221)(0.228)Quintile 3−7.249∗∗∗-7.24{9}^{\ast \ast \ast }−1.938∗∗∗-1.93{8}^{\ast \ast \ast }−5.054∗∗∗-5.05{4}^{\ast \ast \ast }2.405∗∗∗2.40{5}^{\ast \ast \ast }0.646∗∗∗0.64{6}^{\ast \ast \ast }0.383∗0.38{3}^{\ast }−4.523∗∗∗-4.52{3}^{\ast \ast \ast }(0.282)(0.236)(0.237)(0.218)(0.216)(0.228)(0.235)Quintile 3 minus−3.453∗∗∗-3.45{3}^{\ast \ast \ast }−2.646∗∗∗-2.64{6}^{\ast \ast \ast }−2.778∗∗∗-2.77{8}^{\ast \ast \ast }−1.982∗∗∗-1.98{2}^{\ast \ast \ast }−1.940∗∗∗-1.94{0}^{\ast \ast \ast }−1.762∗∗∗-1.76{2}^{\ast \ast \ast }−1.950∗∗∗-1.95{0}^{\ast \ast \ast }Quintile 2(0.276)(0.243)(0.212)(0.201)(0.193)(0.214)(0.231)Quintile 4−10.783∗∗∗-10.78{3}^{\ast \ast \ast }−4.199∗∗∗-4.19{9}^{\ast \ast \ast }−6.789∗∗∗-6.78{9}^{\ast \ast \ast }−0.182-0.182−1.337∗∗∗-1.33{7}^{\ast \ast \ast }−1.861∗∗∗-1.86{1}^{\ast \ast \ast }−6.893∗∗∗-6.89{3}^{\ast \ast \ast }(0.321)(0.273)(0.280)(0.250)(0.256)(0.272)(0.270)Quintile 4 minus−3.534∗∗∗-3.53{4}^{\ast \ast \ast }−2.261∗∗∗-2.26{1}^{\ast \ast \ast }−1.736∗∗∗-1.73{6}^{\ast \ast \ast }−2.587∗∗∗-2.58{7}^{\ast \ast \ast }−1.983∗∗∗-1.98{3}^{\ast \ast \ast }−2.244∗∗∗-2.24{4}^{\ast \ast \ast }−2.370∗∗∗-2.37{0}^{\ast \ast \ast }Quintile 3(0.289)(0.243)(0.247)(0.230)(0.228)(0.267)(0.229)Quintile 5−15.403∗∗∗-15.40{3}^{\ast \ast \ast }−8.318∗∗∗-8.31{8}^{\ast \ast \ast }−10.637∗∗∗-10.63{7}^{\ast \ast \ast }−3.815∗∗∗-3.81{5}^{\ast \ast \ast }−4.806∗∗∗-4.80{6}^{\ast \ast \ast }−5.654∗∗∗-5.65{4}^{\ast \ast \ast }−10.791∗∗∗-10.79{1}^{\ast \ast \ast }(0.439)(0.402)(0.476)(0.388)(0.381)(0.381)(0.396)Quintile 5 minus−4.620∗∗∗-4.62{0}^{\ast \ast \ast }−4.119∗∗∗-4.11{9}^{\ast \ast \ast }−3.847∗∗∗-3.84{7}^{\ast \ast \ast }−3.634∗∗∗-3.63{4}^{\ast \ast \ast }−3.469∗∗∗-3.46{9}^{\ast \ast \ast }−3.794∗∗∗-3.79{4}^{\ast \ast \ast }−3.898∗∗∗-3.89{8}^{\ast \ast \ast }Quintile 4(0.353)(0.330)(0.406)(0.336)(0.330)(0.323)(0.330)Notes: quintile 1 denotes the lowest quality and quintile 5 – the highest. The table reports the effect at each corresponding quintile and the differences in the effects at consecutive quintiles.∗{}^{\ast }, ∗∗{}^{\ast \ast }, and ∗∗∗{}^{\ast \ast \ast }show significance at levels of 0.1, 0.05, and 0.01, respectively.Standard errors calculated using the delta-method are in parentheses.The estimates of the effect of pay-for-performance in terms of μ(αt)−μ(0)\mu \left({\alpha }_{t})-\mu \left(0)or in terms of μ(αt)−μ(αt−1)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1})show that the higher the quintile of the quality distribution in the previous year, the larger is the impact of the reform (Tables 4 and 5). Statistically significant differences in the effect of pay-for-performance across consecutive quintiles of lagged TPS are observed in many years, for instance, in 4 years out of 7 for quintiles 1–2 in case of μ(αt)−μ(0)\mu \left({\alpha }_{t})-\mu \left(0)and for quintiles 4–5 in case of μ(αt)−μ(αt−1)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1}). The change of the effect of pay-for-performance over time μ(αt)−μ(αt−1)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1})increases with a rise of the quality incentive α\alpha but almost stops increasing in 2018–2019 when α\alpha becomes constant, as shown in Table 5. So pay-for-performance stimulates quality increase in all groups of Medicare’s hospitals, and the impact of pay-for-performance is greater at higher-quality hospitals.Table 6 gives estimates of the net total effect, i.e., the expected change in hospital quality over time, measured as the difference between the predicted TPS and the lagged TPS. The net total effect is the sum of the impact of mean reversion and the effect of pay-for-performance.Note that the estimation of the fitted value of TPS includes time effects which account both for time trend and for important changes in the incentive mechanism not captured by variation in α\alpha . An example of such change occurred in 2015 and temporarily decreased the value of TPS for each hospital.Specifically, the pneumonia cohort was expanded and it caused a rise in pneumonia readmission rates in 2015. Additionally, the safety domain with relatively low scores in comparison to measures of other domains was added to the list of measures which constitute TPS.Accordingly, Table 6 shows that the values of predicted TPS minus lagged TPS go down in 2015 for each quintile.The values of net total effect reveal an increase of quality in the groups of low-quality hospitals, while quality deteriorates in high-quality groups. Negative total effect is less prevalent or is smaller in absolute terms at high-quality hospitals in 2016–2017. The result can be attributed to the weakening of mean reversion with increase in α\alpha . Yet, when α\alpha becomes constant in 2018–2019, the prevalence of negative total effect and the absolute value of the negative effect returns to that of 2015.Finally, we focus on the effect of pay-for-performance for groups of Medicare hospitals according to their ownership, teaching status, urban location, and geographic region. The mean effect increases in α\alpha for public and private hospitals, for urban and rural hospitals, for teaching and non-teaching hospitals, and for hospitals in each geographic region (Tables 7 and 8).Table 7Effect of pay-for-performance as μ(αt)−μ(0)\mu \left({\alpha }_{t})-\mu \left(0)by hospital ownership, teaching status, and urban location2013201420152016201720182019Public2.099∗∗∗2.09{9}^{\ast \ast \ast }2.945∗∗∗2.94{5}^{\ast \ast \ast }4.226∗∗∗4.22{6}^{\ast \ast \ast }5.869∗∗∗5.86{9}^{\ast \ast \ast }7.413∗∗∗7.41{3}^{\ast \ast \ast }7.365∗∗∗7.36{5}^{\ast \ast \ast }7.475∗∗∗7.47{5}^{\ast \ast \ast }(0.723)(0.984)(1.304)(1.739)(2.254)(2.203)(2.148)Private2.274∗∗∗2.27{4}^{\ast \ast \ast }3.214∗∗∗3.21{4}^{\ast \ast \ast }4.858∗∗∗4.85{8}^{\ast \ast \ast }6.595∗∗∗6.59{5}^{\ast \ast \ast }8.637∗∗∗8.63{7}^{\ast \ast \ast }8.726∗∗∗8.72{6}^{\ast \ast \ast }8.656∗∗∗8.65{6}^{\ast \ast \ast }(0.740)(1.015)(1.360)(1.814)(2.377)(2.333)(2.289)Private minus0.1740.2690.632∗∗∗0.63{2}^{\ast \ast \ast }0.725∗∗0.72{5}^{\ast \ast }1.224∗∗∗1.22{4}^{\ast \ast \ast }1.361∗∗∗1.36{1}^{\ast \ast \ast }1.181∗∗∗1.18{1}^{\ast \ast \ast }Public(0.118)(0.168)(0.256)(0.323)(0.432)(0.437)(0.414)Urban2.383∗∗∗2.38{3}^{\ast \ast \ast }3.291∗∗∗3.29{1}^{\ast \ast \ast }4.735∗∗∗4.73{5}^{\ast \ast \ast }6.253∗∗∗6.25{3}^{\ast \ast \ast }8.059∗∗∗8.05{9}^{\ast \ast \ast }8.037∗∗∗8.03{7}^{\ast \ast \ast }7.929∗∗∗7.92{9}^{\ast \ast \ast }(0.739)(1.003)(1.314)(1.721)(2.203)(2.146)(2.104)Rural2.048∗∗∗2.04{8}^{\ast \ast \ast }3.069∗∗∗3.06{9}^{\ast \ast \ast }4.353∗∗∗4.35{3}^{\ast \ast \ast }6.649∗∗∗6.64{9}^{\ast \ast \ast }9.195∗∗∗9.19{5}^{\ast \ast \ast }9.275∗∗∗9.27{5}^{\ast \ast \ast }9.321∗∗∗9.32{1}^{\ast \ast \ast }(0.741)(1.070)(1.494)(2.054)(2.802)(2.706)(2.578)Rural minus−0.335-0.335−0.222-0.222−0.382-0.3820.3961.1361.239∗1.23{9}^{\ast }1.392∗∗1.39{2}^{\ast \ast }Urban(0.227)(0.333)(0.493)(0.599)(0.832)(0.749)(0.615)Teaching2.363∗∗∗2.36{3}^{\ast \ast \ast }3.294∗∗∗3.29{4}^{\ast \ast \ast }4.667∗∗∗4.66{7}^{\ast \ast \ast }6.327∗∗∗6.32{7}^{\ast \ast \ast }8.177∗∗∗8.17{7}^{\ast \ast \ast }8.284∗∗∗8.28{4}^{\ast \ast \ast }8.309∗∗∗8.30{9}^{\ast \ast \ast }(0.755)(1.013)(1.334)(1.735)(2.224)(2.198)(2.156)Non-teaching2.238∗∗∗2.23{8}^{\ast \ast \ast }3.177∗∗∗3.17{7}^{\ast \ast \ast }4.807∗∗∗4.80{7}^{\ast \ast \ast }6.546∗∗∗6.54{6}^{\ast \ast \ast }8.664∗∗∗8.66{4}^{\ast \ast \ast }8.688∗∗∗8.68{8}^{\ast \ast \ast }8.610∗∗∗8.61{0}^{\ast \ast \ast }(0.732)(1.018)(1.373)(1.854)(2.467)(2.407)(2.362)Non-teaching minus−0.125-0.125−0.118-0.1180.1400.2190.4870.4040.301Teaching(0.179)(0.236)(0.366)(0.430)(0.609)(0.578)(0.560)Notes: Standard errors (calculated using the delta-method for the difference of the reform effects across the corresponding two categories of each time-invariant hospital characteristic) are in parentheses.∗{}^{\ast }, ∗∗{}^{\ast \ast }, and ∗∗∗{}^{\ast \ast \ast }show significance at levels of 0.1, 0.05, and 0.01, respectively.Table 8Effect of pay-for-performance as μ(αt)−μ(0)\mu \left({\alpha }_{t})-\mu \left(0)for hospitals at different geographic regions2013201420152016201720182019New England1.864∗∗1.86{4}^{\ast \ast }2.0803.605∗∗3.60{5}^{\ast \ast }6.858∗∗∗6.85{8}^{\ast \ast \ast }10.211∗∗∗10.21{1}^{\ast \ast \ast }10.071∗∗∗10.07{1}^{\ast \ast \ast }9.934∗∗∗9.93{4}^{\ast \ast \ast }(0.815)(1.264)(1.752)(2.105)(3.078)(2.976)(2.888)Mid-Atlantic1.800∗∗∗1.80{0}^{\ast \ast \ast }2.550∗∗∗2.55{0}^{\ast \ast \ast }3.617∗∗∗3.61{7}^{\ast \ast \ast }5.189∗∗∗5.18{9}^{\ast \ast \ast }7.013∗∗∗7.01{3}^{\ast \ast \ast }7.194∗∗∗7.19{4}^{\ast \ast \ast }7.030∗∗∗7.03{0}^{\ast \ast \ast }(0.737)(0.999)(1.321)(1.739)(2.253)(2.219)(2.146)Mid-Atlantic minus−0.064-0.0640.4700.012−1.670∗∗∗-1.67{0}^{\ast \ast \ast }−3.198∗∗∗-3.19{8}^{\ast \ast \ast }−2.877∗∗∗-2.87{7}^{\ast \ast \ast }−2.904∗∗∗-2.90{4}^{\ast \ast \ast }New England(0.328)(0.702)(0.978)(0.574)(1.082)(0.997)(0.981)East North Central2.078∗∗∗2.07{8}^{\ast \ast \ast }3.084∗∗∗3.08{4}^{\ast \ast \ast }4.773∗∗∗4.77{3}^{\ast \ast \ast }6.552∗∗∗6.55{2}^{\ast \ast \ast }8.142∗∗∗8.14{2}^{\ast \ast \ast }8.205∗∗∗8.20{5}^{\ast \ast \ast }8.246∗∗∗8.24{6}^{\ast \ast \ast }(0.751)(1.036)(1.404)(1.865)(2.393)(2.387)(2.347)East North Central minus0.2151.0041.168−0.307-0.307−2.069∗∗-2.06{9}^{\ast \ast }−1.866∗∗∗-1.86{6}^{\ast \ast \ast }−1.688∗∗-1.68{8}^{\ast \ast }New England(0.316)(0.694)(0.955)(0.451)(0.905)(0.801)(0.775)West North Central2.318∗∗∗2.31{8}^{\ast \ast \ast }3.329∗∗∗3.32{9}^{\ast \ast \ast }5.154∗∗∗5.15{4}^{\ast \ast \ast }7.342∗∗∗7.34{2}^{\ast \ast \ast }10.138∗∗∗10.13{8}^{\ast \ast \ast }10.140∗∗∗10.14{0}^{\ast \ast \ast }10.172∗∗∗10.17{2}^{\ast \ast \ast }(0.741)(1.052)(1.431)(2.025)(2.842)(2.797)(2.765)West North Central minus0.4551.249∗1.24{9}^{\ast }1.5490.483−0.073-0.0730.0690.238New England(0.332)(0.709)(1.001)(0.520)(0.808)(0.780)(0.757)South Atlantic2.248∗∗∗2.24{8}^{\ast \ast \ast }3.210∗∗∗3.21{0}^{\ast \ast \ast }4.946∗∗∗4.94{6}^{\ast \ast \ast }6.803∗∗∗6.80{3}^{\ast \ast \ast }8.886∗∗∗8.88{6}^{\ast \ast \ast }8.863∗∗∗8.86{3}^{\ast \ast \ast }8.744∗∗∗8.74{4}^{\ast \ast \ast }(0.761)(1.040)(1.422)(1.887)(2.481)(2.407)(2.374)South Atlantic minus0.3841.1311.341−0.056-0.056−1.325-1.325−1.208-1.208−1.190-1.190New England(0.326)(0.699)(0.949)(0.434)(0.826)(0.788)(0.760)East South Central2.295∗∗∗2.29{5}^{\ast \ast \ast }3.118∗∗∗3.11{8}^{\ast \ast \ast }4.962∗∗∗4.96{2}^{\ast \ast \ast }6.913∗∗∗6.91{3}^{\ast \ast \ast }9.422∗∗∗9.42{2}^{\ast \ast \ast }8.821∗∗∗8.82{1}^{\ast \ast \ast }8.643∗∗∗8.64{3}^{\ast \ast \ast }(0.780)(1.065)(1.492)(2.042)(2.803)(2.597)(2.460)East South Central minus0.4321.0381.3570.054−0.789-0.789−1.250∗-1.25{0}^{\ast }−1.291∗-1.29{1}^{\ast }New England(0.356)(0.720)(0.989)(0.517)(0.719)(0.705)(0.726)West South Central2.276∗∗∗2.27{6}^{\ast \ast \ast }3.225∗∗∗3.22{5}^{\ast \ast \ast }5.193∗∗∗5.19{3}^{\ast \ast \ast }6.324∗∗∗6.32{4}^{\ast \ast \ast }7.926∗∗∗7.92{6}^{\ast \ast \ast }8.380∗∗∗8.38{0}^{\ast \ast \ast }8.274∗∗∗8.27{4}^{\ast \ast \ast }(0.735)(0.990)(1.325)(1.755)(2.241)(2.227)(2.175)West South Central minus0.4131.1461.588−0.535-0.535−2.286∗∗-2.28{6}^{\ast \ast }−1.691∗-1.69{1}^{\ast }−1.659∗-1.65{9}^{\ast }New England(0.347)(0.723)(1.022)(0.583)(1.061)(0.969)(0.933)Mountain1.795∗∗∗1.79{5}^{\ast \ast \ast }2.809∗∗∗2.80{9}^{\ast \ast \ast }4.303∗∗∗4.30{3}^{\ast \ast \ast }5.537∗∗∗5.53{7}^{\ast \ast \ast }7.291∗∗∗7.29{1}^{\ast \ast \ast }7.686∗∗∗7.68{6}^{\ast \ast \ast }7.941∗∗∗7.94{1}^{\ast \ast \ast }(0.647)(0.863)(1.119)(1.415)(1.809)(1.819)(1.861)Mountain minus−0.069-0.0690.7290.698−1.322-1.322−2.920∗∗-2.92{0}^{\ast \ast }−2.385∗-2.38{5}^{\ast }−1.993-1.993New England(0.371)(0.756)(1.073)(0.869)(1.468)(1.345)(1.242)Pacific2.524∗∗∗2.52{4}^{\ast \ast \ast }3.324∗∗∗3.32{4}^{\ast \ast \ast }4.276∗∗∗4.27{6}^{\ast \ast \ast }5.957∗∗∗5.95{7}^{\ast \ast \ast }7.923∗∗∗7.92{3}^{\ast \ast \ast }7.910∗∗∗7.91{0}^{\ast \ast \ast }8.190∗∗∗8.19{0}^{\ast \ast \ast }(0.716)(0.957)(1.238)(1.613)(2.101)(2.067)(2.066)Pacific minus0.661∗0.66{1}^{\ast }1.2450.671−0.901-0.901−2.288∗-2.28{8}^{\ast }−2.161∗-2.16{1}^{\ast }−1.744-1.744New England(0.388)(0.765)(1.072)(0.827)(1.315)(1.245)(1.189)Notes: Standard errors (calculated using the delta-method for the difference of the reform effects across New England hospitals and hospitals in each corresponding geographic region) are in parentheses.∗{}^{\ast }, ∗∗{}^{\ast \ast }, and ∗∗∗{}^{\ast \ast \ast }show significance at levels of 0.1, 0.05, and 0.01, respectively.The effect of pay-for-performance is greater for private hospitals than for public hospitals, which corresponds to findings in [13] and [78]. The result can be explained by a greater emphasis on financial incentives at these healthcare institutions. These profit constraints, combined with the altruistic character of healthcare services, induce more effective quality competition at non-public hospitals [16]. The difference in the effect for private and public hospitals is statistically significant in most years.As for teaching status, quality improvement owing to the incentive scheme is often higher at non-teaching hospitals, which may be because they can devote all of their labor resources to patient treatment, while teaching hospitals lose some efficiency due to their educational activities [64]. Also, teaching hospitals may be treating more difficult cases. This complexity could not be fully captured by the casemix variable in our analysis and may cause a downward bias of the estimated effect at teaching hospitals, explaining the lower value of the effect at teaching than at non-teaching hospitals. Yet, the difference in the values at teaching and non-teaching hospitals is statistically insignificant in each year.Statistically significant differences in the effect of pay-for-performance for urban and rural hospitals are observed only in the last 2 years: the effect is larger at rural hospitals.As regards geographic location, there is practically no variation in the effect across groups of hospitals in the early years of pay-for-performance. The differences are present mainly in the later few years: for instance, the mean effect of pay-for-performance is greater in New England than in Mid-Atlantic in 2016–2019 and than in East North Central and West South Central regions in 2017–2019.6DiscussionIn this article, we focused on exclusion of mean reversion in evaluating the response of TPS at Medicare hospitals to an incentive contract. Since TPS under this contract becomes an autoregressive process, our analysis deals with dynamic panels.It should be noted that dynamic panel data models are prevalent in various fields of economics. Examples in macroeconomics include the analysis of a country’s growth [11,50] or its current account [81]. Application in corporate finance deals with the study of such firm-level variables as size [33,61], profit [54], leverage [32,36], and such proxies of firm performance as return on asset and Tobin’s Q [49,65]. In the banking sphere dynamic panels are applied to ROE and profitability [35,48] while in finance they are used for housing prices [31] and fuel prices [71]. Papers in the economics of labor, health, and welfare employ dynamic panel data models to analyze physician labor supply [4], hospital staffing intensity [82], wealth of households and health status of individuals [57], and quality and efficiency of hospitals (e.g., mortality ratio in [56], and average length of stay in [10]).The approach used in our study estimates the unconditional mean of the dependent variable in the dynamic panel data model and employs it for policy evaluation. Specifically, the comparison of the fitted values of the unconditional mean at different values of policy intensity offers a measure of the effect of reform. The advantages of the approach are twofold. First, it excludes the impact of mean reversion in groupwise estimations (e.g., in lower and in higher quantiles of hospitals according to their TPS). Second, the approach may also be used in the analysis of the mean effect of the reform if we focus on effects in the long run. Indeed, the unconditional mean in dynamic panel data analysis is sometimes called the long-term mean as it reflects the mean value in the long run. It should be noted that an alternative approach that uses the estimated coefficient for the policy variable as a measure of the mean effect of reform does not suffer from the problem of mean reversion. But in dynamic panel data models it evaluates only the short-term impact of policy.As regards exclusion of mean reversion in dynamic panels, we note a limitation on the character of mean reversion, imposed by the nature of the dynamic panel model where the unconditional mean is the long-term mean. Mean reversion is not instantaneous: if a deviation from the mean is observed in period tt, the return to the mean occurs not in period t+1t+1but only in later periods.It may be noted that our approach is similar to difference-in-difference estimations. The long-run effect of reform under our approach is the difference in the fitted value of the long-term mean under the value αt{\alpha }_{t}and under counterfactual value of zero (similar to [48]). Alternatively, we can take the difference in the fitted values of the long-term means under the value of αt{\alpha }_{t}and αt−1{\alpha }_{t-1}. To summarize: in focusing on the long-run impact of the reform in dynamic panel data models, the estimation of either the mean effect or of the groupwise effects requires an unconditional mean. The approach also excludes mean reversion which contaminates policy evaluation in case of groupwise estimations.As regards policy evaluation based on panel data fixed effects methodology, our approach of computing the unconditional mean as a function of the policy variable α\alpha produces the conventional linear prediction of the dependent variable. The mean effect of reform in the static panel is either the coefficient for the reform variable or the difference in the fitted value of yyunder αt{\alpha }_{t}and the fitted value of yyunder 0 (counterfactual).Finally, we note the prerequisites for identification of the unconditional mean which are similar to the assumptions in difference-in-difference estimations. Two requirements apply both to the static and dynamic panels. First, time variation in the policy variable is required for identification of the coefficient for the policy variable in the unconditional mean function. Second, if there is only time variation in the policy variable α\alpha (and no cross-section variation in αt{\alpha }_{t}at a given value of tt, i.e., no control group), the reform effect cannot be distinguished from other time effects. So cross-section variation in another variable, which is correlated with the policy variable, is required. In our case, this variable is the Medicare share: the higher the share of Medicare patients in the hospital, the stronger the impact of α\alpha (the share of hospital funds at risk under the Medicare program becomes more important for total revenues of the hospital). The use of dynamic panel data models requires a third assumption: the unconditional mean must be defined, and for this reason the process yyhas to be stationary.7ConclusionStudies of incentive contracts usually focus on the mean tendency and give scant attention to potentially heterogeneous response to the policy of interest by agents at different percentiles of the distribution of the dependent variable. But insufficient analysis of such heterogeneity may lead to speculation on ceiling effects and belief among agents with better values of the variable of interest that there are no ways of making further financial gains by further improvements.This article highlights the fact that there is a multivariate dependence of the variable of interest in such incentive contracts. Specifically, a part of intertemporal dependence can be attributed to the policy reform and a part to mean reversion. So the article proposes a method to help model such multivariate dependence by excluding the impact of mean reversion. As mean reversion contaminates judgment regarding the time profile of the dependent variable, and this contamination is different for agents in lower and higher percentiles of the variable of interest, clearing out the reform effect of mean reversion makes the method suitable for assessing heterogeneity of incentive schemes.In an application to the longitudinal data for Medicare’s acute-care hospitals taking part in the nationwide quality incentive mechanism (“value-based purchasing”), we find that the higher the quintile of quality in the prior period, the larger the increase in the composite quality measure owing to the reform. Quality improvement in each quintile increases with the increase in size of the quality incentive.Our results reveal that increase in the quality measure owing to pay-for-performance is greater at hospitals with higher levels of quality. The finding suggests stronger emphasis on quality activities at high-quality hospitals, and this is indeed discovered in a number of works. For instance, top-performing hospitals in the US pilot program paid more attention to quality enhancement than bottom-performing hospitals [77]. Under the proportional pay-for-performance mechanism in California, high-quality physicians similarly placed more emphasis on an organizational culture of quality and demonstrated stronger dedication to addressing quality issues than low-quality physicians [21]. The desire of high-quality hospitals, which have reached top deciles of hospital performance, to pursue quality improvement by means additional to those proposed by the policy regulator is further evidence in support of our research [37].Directions for future work in health economics applications may include analysis of heterogeneous hospital response to quality incentives by considering different dimensions of the composite quality measure. A related field of research is the study of potential sacrifice of quality of non-incentivized measures in favor of measures incentivized by pay-for-performance. This has been analyzed at the mean level [27,47] and may be expanded to account for different behavior by high-quality and low-quality hospitals. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Dependence Modeling de Gruyter

Disentangling the impact of mean reversion in estimating policy response with dynamic panels

Dependence Modeling , Volume 10 (1): 29 – Jan 1, 2022

Loading next page...
 
/lp/de-gruyter/disentangling-the-impact-of-mean-reversion-in-estimating-policy-ta2xzNmCkW
Publisher
de Gruyter
Copyright
© 2022 Galina Besstremyannaya and Sergei Golovan, published by De Gruyter
ISSN
2300-2298
eISSN
2300-2298
DOI
10.1515/demo-2022-0104
Publisher site
See Article on Publisher Site

Abstract

1IntroductionThe phenomenon of regression toward the mean (mean reversion) is observed in case of longitudinal observations of a variable, which is susceptible to random variations. In this case, exceptionally low or high values of the variable in initial measurement tend to be closer to the center of the distribution in subsequent measurements [24]. In short, mean reversion is an inherent part of the stationary process and implies the return of the process to its mean value [25,31].For instance, mean reversion may manifest itself in the intersection between the mean and the trajectory for the process as it varies over time [25] or in the return of the autoregressive process to the long-term mean [31].Historically, the appearance of the term “mean reversion” is associated with the seminal works by Galton, who discovered an inverse relationship between the height of parents and children [30] and hence framed the term “regression” as the tendency of the dependent variable to revert to the mean value. Recent examples of the analysis of processes which exhibit mean reversion in various fields of economics include the current account of countries [81] and their productivity [29], profitability of banks [48], housing prices [31], tax avoidance by companies [3], blood pressure and cholesterol level of patients [5], and birthweight of children in successive pregnancies of the same mother [79].Mean reversion contaminates judgment about the time profile of the dependent variable in case of groupwise estimations. If the value of the dependent variable for a certain observation is lower than average in period tt, it is likely to be higher in period t+1t+1than in period tt. Similarly, observations with high values in period tttend to be followed by lower values in period t+1t+1. Accordingly, mean reversion leads to an increase in the expected value of the dependent variable in the group of observations belonging to the lower percentiles of yy, and to a decrease in the expected value – in the higher percentiles of yy. Therefore, the impact of mean reversion needs to be excluded in econometric analysis, which evaluates the longitudinal impact of policy interventions on groups of economic agents.The purpose of this article is to model multivariate dependence of the variable of policy interest by disentangling the two sources of intertermporal dependence: one from the effect of the policy of interest per se and the other from mean reversion. Specifically, we show a way of separating the effect of mean reversion from the policy effect when evaluating the impact of an incentive scheme with intertemporal stimuli and intertemporal variation of the parameter of the reform intensity.Although mean reversion is inherent to any stationary process, it is most often noted in the analysis of dynamic panels. The dynamic panel data model is a generalization of the panel data fixed effect regression when the dynamic structure of the process needs to be introduced. In our article we use the example of Medicare’s incentive contract applied to the observed quality of services, which has to be described as an autoregressive process. Hence, in evaluating the effect of this incentive scheme on hospital quality, we follow a handful of articles which deal with mean reversion in dynamic panels [25,31,48,81].We focus on the pay-for-performance mechanism – an innovative method of remuneration, which originally emerged in corporate finance and managerial economics, and has since been much used in the public sector (civil service, education, social work, and healthcare). In order to quantify the unobserved quality of work, the incentive scheme computes the performance level using imprecisely measured proxies for various dimensions of quality. Next, the regulator imposes an incentive contract, which relates remuneration to performance, so that agents with higher performance in the current period receive higher payment for their services in future periods than agents with lower performance. The reform intensity parameter in this context is the share of the agent’s income, which is “at risk” under the incentive contract.Assuming a direct association between demand for services and quality of work, higher payment to agents with high performance incentivizes agents to improve their level of quality in order to raise demand for their services. In such a setting, if the unobserved quality could be measured precisely, each agent would have sustained their fixed level of performance.However, performance is in fact a noisy signal. First, there is an imprecision in measuring performance, since it is only a proxy for true quality. Second, in case of healthcare, the unobserved true quality of services is itself subject to a random variation, due, for instance, to patient non-compliance with medical treatment [62]. So it is plausible to assume that performance contains a random error. Hence, performance may unexpectedly be valued as having improved in period ttdue to this random error, and then the payment in period t+1t+1(which is a function of current performance) will increase. Accordingly, the incentive to improve quality in the future period becomes stronger for agents with higher performance. So the performance of these agents in period t+1t+1will be on average higher than their performance in period tt. The reverse argument applies in case of unexpected lowering of performance valuation in period tt.What therefore happens is that performance of the economic agent becomes a process with serial correlation. So the evolution of the variable of policy interest when such incentives are applied can be viewed as an autoregressive process. In a situation where the policy variable changes over time, we estimate the unconditional mean in the autoregressive process as a function of the agent’s characteristics and of policy intensity. Comparison of the fitted values of the unconditional mean under different values of the reform intensity enables us to identify the reform effect cleared of mean reversion. For instance, we contrast the unconditional means estimated under the values of the policy variable in two consecutive time periods. Alternatively, we compare the fitted value of the unconditional mean in period ttwith its counterfactual analogue: the unconditional mean at zero value of policy intensity. The article which is closest to our latter approach in assessing the policy effect in dynamic panels is [48]: the actual value of return on equity (ROE) at merged banks is compared with the fitted value of ROE, measured as the unconditional mean of the AR(1) process for the whole banking industry (i.e., the counterfactual value of ROE in the absence of the merger).It should be noted that our identification strategy is close to difference-in-difference analysis in a non-binary treatment: the intensity of reform is the analogue of the treatment variable and the share of Medicare’s patients at the hospital is the analogue of the variable for the treatment/control groups.If the treatment is binary, our approach does not differ from the conventional difference-in-difference estimation with the interaction term of the pre/post treatment dummy and the dummy for the treatment/control group.We use the example of Medicare’s value-based purchasing, implemented at national level in the US since 2013 on the basis of a reward function that relates the aggregate measure of hospital performance to remuneration. Overall, applications of pay-for-performance are very numerous in healthcare, since healthcare is the classic example of an industry with asymmetric information where sustained quality of service is extremely important. It should be noted that the research in health economics is vulnerable to random shocks in the dependent variable and hence, to the phenomenon of mean reversion. Yet, as regards incentive schemes, to the best of our knowledge, only one article explicitly discusses the impact of random variation of quality [62] and only a few articles point to the need for reassessing the impact of Medicare’s pay-for-performance incentive mechanisms in view of the potential impact of mean reversion [58,63].Our estimations of the association between the observed level of prior quality and measured quality improvement employ nationwide data for 2,984 acute-care Medicare hospitals which are financed according to the quality-incentive mechanism in 2013–2019. The empirical approach uses annual variation in the size of quality incentives in order to estimate the effect of pay-for-performance cleansed of mean reversion. We control for other potential channels of quality improvement by Medicare hospitals, using data on the Hospital Readmissions Reduction Program (HRRP) and on the meaningful use of Electronic Health Records (EHR).We find that the higher the quintile of the composite quality measure at Medicare hospitals, the larger the estimated effect of the reform. Our empirical results suggest that the stylized fact of inverse relationship between improvement owing to the incentive scheme and the baseline performance should be revisited. This inverse relationship has been found by most empirical assessments of the impact of incentive contracts on healthcare quality and seems to hold for various designs of pay-for-performance: it is observed for general practitioners in the UK; physician groups in California, Chicago, and Ontario; US hospitals in Michigan, New York, and Wisconsin; and hospitals involved in Medicare’s pilot project for quality improvement [19,26,34,40,42,51,52,63,67,76]. However, we argue that the finding of an inverse relationship may be incorrect when the empirical approach fails to account for the impact of the random shocks on the time profile of quality under the intertemporal incentive scheme.The remainder of the article is structured as follows. Section 2 reviews the design of Medicare’s quality incentive and sets up the framework for evaluating its outcomes. Section 3 outlines the empirical methodology, and Section 4 describes the data for Medicare hospitals. The results of the empirical analysis are presented in Section 5. Section 6 contains a discussion of our approach in view of conventional methods for policy evaluation, and Section 7 supports the quantitative findings of our analysis by suggesting potential channels used for quality improvement at hospitals.2Medicare’s incentive contract2.1Policy settingThe mechanism provides an incentive proportional to measured quality and has been applied to discharges in the inpatient prospective payment system at acute-care Medicare hospitals since 2013.Two US states are exceptions to the rule: Puerto Rico, which only started innovating its healthcare system in 2015 and Maryland, which has a unique model for hospital financing.The scheme reduced Medicare’s base paymentThe base payment is linked to each diagnosis-related group.to each hospital by a factor αt{\alpha }_{t}which equaled 0.01 in 2013. The amount of the reduction was increased annually by 0.0025 in 2014–2017 and has remained flat at 0.02 since 2017. Note that α\alpha is the parameter of the reform intensity, varying over time, and α=0\alpha =0would correspond to a counterfactual setting with the absence of the reform.The accumulated saving from reduction in base payment is redistributed across hospitals according to an adjustment coefficient, which is computed as a linear function of the composite quality measure: 1+κtmit100−1⋅αt1+\left({\kappa }_{t}\frac{{m}_{it}}{100}-1\right)\cdot {\alpha }_{t}, where iiis the index of a hospital, ttindicates year, and mit{m}_{it}is the hospital’s total performance score (TPS), (0≤mit≤1000\le {m}_{it}\le 100). A hospital is rewarded in period t+2t+2if the adjustment coefficient based on mit{m}_{it}is above one and is penalized otherwise. The quality incentive scheme is budget-neutral and the value of the slope κt{\kappa }_{t}is chosen to ensure budget neutrality, so that hospitals with value of TPSTPSabove the empirical mean gained under the reform. In the first years of the reform κt{\kappa }_{t}was close to 2, so hospitals with value of the composite quality measure above 50 were winners from the incentive scheme.The TPS is a weighted sum of scores for measures in several domains: timely implementation of recommended medical interventions (clinical process of care), quality of healthcare as perceived by patients (patient experience of care), survival rates for AMI, heart failure and pneumonia patients and other proxies for outcome of care, healthcare-associated infections and other measures of safety of care, and spending per beneficiary as a measure of efficiency of care.The domain score is the sum of the scores for its measures. Higher score of the measure reflects higher position of the hospital in the empirical distribution of the quality measure in a given year or greater improvement of the quality measure relative to the baseline period. Specifically, achievement points are computed for each measure evaluating a hospital’s performance relative to other hospitals in a given year, and improvement points for each measure are computed to assess change in the hospital’s own performance in the given year relative to the baseline period. Then, for each measure, the highest of the two (achievement points or improvement points) is used as the hospital’s score for that measure.A hospital’s intertemporal incentive in Medicare’s scheme is based on the expectation that the quality payments will continue over a long term, so the hospital’s executives and physicians realize that demand is proportionate to quality and that their current policies toward quality of care will influence future reimbursement [46,73].2.2Autoregressive process and quality convergenceThe evolution of the measured quality constitutes a process with serial correlation. If the process for the measured quality is stationary, then it may be treated as an autoregressive process mt−μ(θ)=ϕ1(mt−1−μ(θ))+⋯+ϕp(mt−p−μ(θ))+εt{m}_{t}-\mu \left(\theta )={\phi }_{1}\left({m}_{t-1}-\mu \left(\theta ))+\cdots +{\phi }_{p}\left({m}_{t-p}-\mu \left(\theta ))+{\varepsilon }_{t}. Here μ(θ)=E(mt∣θ)\mu \left(\theta )=E\left({m}_{t}| \theta )denotes the mean value of the measured quality for a hospital with type θ\theta . As the absolute values of the reciprocals of the roots of the characteristic equation of AR(pp) processes are less than one, the maximum absolute value of these reciprocals (denoted λ\lambda ) may be used as the measure of persistence for the process of measured quality [74].Using definitions in [29], we can disentangle a permanent component in mt{m}_{t}, which is related to economic impact of pay-for-performance from a transient component (a pure dynamic effect), which may be referred to as “mean reversion” or “regression toward the mean” [30].The reason for the phenomenon of mean reversion is the existence of the random error εt{\varepsilon }_{t}in the measured quality mt{m}_{t}. Indeed, in the absence of εt{\varepsilon }_{t}the process quickly converges to its mean μ(θ)\mu \left(\theta )and does not exhibit mean reversion because it always sits at the mean. The random error in the measured quality is largely attributed to imprecision in quality measurement: it is hard to reveal true quality using observable proxies. Another reason is random variation in true quality, which may be explained by the fact that patients do not always comply with the prescribed treatment [62]. Combined with the fact that hospitals make an intertemporal decision in respect of the quality-based reimbursement, the random error leads to the autoregressive form of measured quality mt{m}_{t}.The autoregressive specification can be taken as equivalent to convergence of the measured quality toward the value μ(θ)\mu \left(\theta )and λ\lambda is associated with the speed of quality convergence. The persistence parameter λ\lambda essentially describes how quickly the effect of any unexpected shock in value of the dependent fades over time. For example, consider a simple AR(1) process with 0<λ<10\lt \lambda \lt 1and the conditional mean E(mt∣mt−1,θ)=μ(θ)+λ(mt−1−μ(θ))E\left({m}_{t}| {m}_{t-1},\theta )=\mu \left(\theta )+\lambda \left({m}_{t-1}-\mu \left(\theta )). Here the expected value of current measured quality EmtE{m}_{t}is closer to the mean value μ(θ)\mu \left(\theta )than is the value of the measured quality in the previous period, i.e., mt−1{m}_{t-1}. The expression for E(mt∣mt−1,θ)E\left({m}_{t}| {m}_{t-1},\theta )becomes more complicated for AR(pp) processes with p>1p\gt 1, but λ\lambda can still be used as a measure of persistence of the process.The hospital receives higher profits for improvement of performance under higher values of α\alpha than under lower values of α\alpha . This, combined with the serial correlation between performance in consecutive periods, implies a direct association between the persistence parameter λ\lambda and α\alpha . Higher values of λ\lambda imply a lower rate of convergence of quality and hence a weaker effect of mean reversion.2.3Expected outcomes of the reform and time profiles of the quality measure2.3.1Mean effect of the reformThe payment schedule makes the hospital adjustment coefficient a linear function of TPS, so each hospital has an incentive to raise the value of the observed composite quality measure. Hence, the introduction of pay-for-performance is expected to have a positive effect on mean value of the composite quality measure. Indeed, the mean level of hospital performance was improved even in case of a continuous reward function applied to hospitals above the threshold values of quality indicators (Medicare’s pilot program, Phase I) [18,34,37,52,68]. Specifically, the value of the composite performance score in Medicare’s pay-for-performance hospitals was higher than in the control group of hospitals [52,78]. Moreover, sociological evidence points to the fact that hospitals participating in incentive schemes are likely to improve performance as they implement a larger number of quality improving activities that non-incentivized hospitals do not carry out [41].The higher the value of α\alpha , the higher may be the hospital’s loss under the reform in case of insufficient value of TPS. Indeed, the empirical evidence points to larger incentives being more effective than smaller ones in such reforms [8,15,60].Accordingly, the expected mean effects of the reform may be formulated as follows:Hypothesis H1aH1a: The introduction of pay-for-performance and the increase of parameter α\alpha in the context of pay-for-performance lead to a positive mean effect on observed quality.Hypothesis H1aH1aimplies that hospitals can be treated as agents which take their future payments into account. The intertemporal stimuli result in mean reversion with respect to observed quality. However, the strength of mean reversion is interrelated with parameter α\alpha as follows:Hypothesis H1bH1b. The increase in the share of hospital funds at risk in pay-for-performance weakens the effect of convergence of the measured quality to the mean value.2.3.2Groupwise effects of the reformWe assume that the effect of Medicare’s reform will be larger at hospitals with higher quality, based on findings in the health policy literature that emphasis on quality improvement in incentive schemes is greater at high-quality hospitals or among high-quality physicians in comparison with low-quality hospitals and physicians [21,37,69,77,78].For instance, [77] conducted structural surveys at hospitals in the top two and bottom two deciles of performance measure in Medicare’s pilot program and discovered stronger involvement in quality improving activities among top performing hospitals. The statistically significant differences between top- and bottom-performing hospitals were observed in case of the numerical values, assigned to the following components of quality improvement: organizational culture, multidisciplinary teams, “adequate human resources for projects to increase adherence to quality indicators” and “new activities or policies related to quality improvement” (Tables 3 and P on pp. 836–837).Interviews with the leaders of California physician organizations [21] similarly discovered that physicians with high performance placed higher emphasis on the support that “the organization dedicates to addressing quality issues” than medium- and low-performing physicians (Exhibit 3, p. 521).Moreover, papers that use policy evaluation techniques applied to assessment of the effect of the pilot pay-for-performance program at Medicare hospitals report that hospitals in the top two deciles of quality measures showed the fastest improvement, while hospitals in the lowest deciles raised their quality to a much lesser extent or may even have failed to improve [69,78].To sum up, the hypothesis on groupwise effects of pay-for-performance is as follows:Hypothesis H2H2. The introduction of pay-for-performance leads to a larger boost of measured quality at high-quality hospitals than at low-quality hospitals.2.3.3Net total effect over time at groups of hospitalsConsider the multivariate dependence of the variable of interest on two sources of intertemporal dependence: the policy reform and mean reversion. The effect of mean reversion implies a differential time profile of measured quality: measured quality increases at hospitals in low percentiles of the quality distribution and decreases at hospitals in high percentiles. Combined with the positive effect of pay-for-performance on the mean value of measured quality (Hypothesis H2H2), mean reversion is likely to result in heterogeneous net total effect of change in measured quality over time.Hypothesis H3aH3a. High-quality hospitals experience decrease of measured quality owing to regression toward the mean. However, the introduction of pay-for-performance and increase of the share of hospital funds at risk in pay-for-performance lead to improvements in measured quality at these hospitals. The net total effect may vary.Hypothesis H3bH3b. Low-quality hospitals increase their measured quality owing to regression toward the mean. The introduction of pay-for-performance and increase of the share of hospital funds at risk in pay-for-performance also cause a rise in measured quality, so the net total effect at these hospitals is positive.If α\alpha is gradually raised in the course of implementation of the incentive scheme, then, according to H1bH1b, convergence of measured quality weakens over time. The net total effect at high-quality hospitals is the sum of the positive effect of the quality incentive and negative effect of the quality convergence. With increase in α\alpha , the number of hospitals where the positive effect outweighs the negative becomes larger.Hypothesis H3cH3c. The increase of hospital funds at risk under pay-for-performance weakens the effect of convergence of measured quality, so the number of high-quality hospitals with negative net total effect decreases.3Empirical approach3.1SpecificationThe dependent variable yit{y}_{it}is the TPS of hospital iiin year tt. The value of yit{y}_{it}is used for remuneration of Medicare hospitals at time t+2t+2, so we employ the second-order dynamic panel,Formally, the model with at least two lags must be used for describing the evolution of the TPS. However, the coefficients for the variables with the third lag of TPS turned out insignificant in our empirical estimation, so we treat TPS as an AR(2) process.(1)yit=ϕ0+ϕ1yit−1+ϕ2yit−2+ϕ3αtsit+ϕ4αtsityit−1+ϕ5αtsityit−2+δ0sit+zit′δ1+αtsit⋅zit′δ2+dt′δ3+ui+εit,{y}_{it}={\phi }_{0}+{\phi }_{1}{y}_{it-1}+{\phi }_{2}{y}_{it-2}+{\phi }_{3}{\alpha }_{t}{s}_{it}+{\phi }_{4}{\alpha }_{t}{s}_{it}{y}_{it-1}+{\phi }_{5}{\alpha }_{t}{s}_{it}{y}_{it-2}+{\delta }_{0}{s}_{it}+{z}_{it}^{^{\prime} }{\delta }_{1}+{\alpha }_{t}{s}_{it}\cdot {z}_{it}^{^{\prime} }{\delta }_{2}+{d}_{t}^{^{\prime} }{\delta }_{3}+{u}_{i}+{\varepsilon }_{it},where zit{z}_{it}are hospital time-varying characteristics, ui{u}_{i}are individual hospital effects (in particular, they incorporate the altruistic effects), the size of quality incentives αt{\alpha }_{t}varies in different years and enters the equation multiplied by the share of Medicare discharges sit{s}_{it}, which indicates that the quality incentives apply only to treatment of Medicare patients, and dt{d}_{t}is a set of dummy variables which capture external time effects (effects unrelated to hospital decisions). The following restrictions are used to identify the constant term ϕ0{\phi }_{0}: the sum of the coefficients for components of dt{d}_{t}is normalized to zero, and the expected value E(ui)=0E\left({u}_{i})=0. Hospital time-varying characteristics are disproportionate share index, casemix index, number of hospital beds,As the distribution of the number of hospital beds is extremely skewed, we take the log of hospital beds. This approach is in line with [22] and makes it possible to account for nonlinear effect of hospital beds. It is less restrictive than the alternative approach, employing a list of dummies based on the ranges of hospital beds (e.g., less than 99, 100–199 etc.). Use of a list of dummies condemns the effect to be piece-wise, prohibiting variation within the category of hospitals with a given range of beds.physician-to-bed ratio, and nurse-to-bed ratio. The posterior analysis of the effect of quality incentives deals with hospital grouping according to the time-invariant characteristics, which could not be incorporated in the empirical specification with fixed effects: geographic region where the hospital is located, public ownership, urban location, and teaching status.We use two hospital control variables which affect quality improvement and allow us to mitigate potential biases, which might occur if the pay-for-performance effect is identified based only on the variation of α\alpha in time. The HRRP penalty captures the impact of a simultaneously adopted incentive program with similar incentives. Moreover, the readmission reduction program targets improvement of quality measures which are components of TPS.30-day unplanned readmission rates for acute myocardial infarction, heart failure, and pneumonia.The binary variable for successful attestation of meaningful usage of EHR accounts for the effect of another compulsory program, which provides bonuses to attested hospitals. The variable controls for the fixed cost incurred by a hospital to improve its quality through installing and using health information technology systems.Eq. (1) can be estimated using the generalized method of moments: the [2] and [12] methodology for dynamic panel data. Examples of use of the methodology in health economics include analysis of the quality of care at Medicare’s hospitals in [56], study of the length of stay at Japanese hospitals in [10], investigation of labor supply by Norwegian physicians in [4], and of health status of individuals in the US in [57].The first set of moment conditions in GMM comes from the approach of [2] and [12]. We take the first difference of the right-hand side and left-hand side of Eq. (1): (2)Δyit=ϕ1Δyit−1+ϕ2Δyit−2+ϕ3Δ(αtsit)+ϕ4Δ(αtsityit−1)+ϕ5Δ(αtsityit−2)+δ0Δsit+Δ(zit)′δ1+Δ(αtsit⋅zit)′δ2+Δdt′δ3+Δεit.\Delta {y}_{it}={\phi }_{1}\Delta {y}_{it-1}+{\phi }_{2}\Delta {y}_{it-2}+{\phi }_{3}\Delta \left({\alpha }_{t}{s}_{it})+{\phi }_{4}\Delta \left({\alpha }_{t}{s}_{it}{y}_{it-1})+{\phi }_{5}\Delta \left({\alpha }_{t}{s}_{it}{y}_{it-2})+{\delta }_{0}\Delta {s}_{it}+\Delta \left({z}_{it})^{\prime} {\delta }_{1}+\Delta \left({\alpha }_{t}{s}_{it}\cdot {z}_{it})^{\prime} {\delta }_{2}+\Delta {d}_{t}^{^{\prime} }{\delta }_{3}+\Delta {\varepsilon }_{it}.Since εit{\varepsilon }_{it}cannot be predicted using the information available at period t−1t-1, εit{\varepsilon }_{it}is uncorrelated with any variable known at time t−1t-1, t−2t-2etc. Therefore, Δεit\Delta {\varepsilon }_{it}is uncorrelated with any variable known at time t−2t-2, t−3t-3etc. Hence, the following set of moment conditions can be imposed to estimate the model parameters in Eq. (2), see [2] and [12]: t=3:E(Δei3Zi1)=0,t=4:E(Δei4Zi1)=0,E(Δei4Zi2)=0,t=5:E(Δei5Zi1)=0,E(Δei5Zi2)=0,E(Δei5Zi3)=0,etc.\begin{array}{rcl}t=3:& & E\left(\Delta {e}_{i3}{Z}_{i1})=0,\\ t=4:& & E\left(\Delta {e}_{i4}{Z}_{i1})=0,\hspace{1.0em}E\left(\Delta {e}_{i4}{Z}_{i2})=0,\\ t=5:& & E\left(\Delta {e}_{i5}{Z}_{i1})=0,\hspace{1.0em}E\left(\Delta {e}_{i5}{Z}_{i2})=0,\hspace{1.0em}E\left(\Delta {e}_{i5}{Z}_{i3})=0,\\ \hspace{0.1em}\text{etc.}\hspace{0.1em}& & \end{array}where eit{e}_{it}is the regression residual and Zit{Z}_{it}is any variable known at time tt.For instance, yit{y}_{it}may serve as Zit{Z}_{it}.Another set of moment conditions comes from [12] for the level Eq. (1): ui+εit{u}_{i}+{\varepsilon }_{it}has to be uncorrelated with ΔZit−1\Delta {Z}_{it-1}for any stationary variable Zit{Z}_{it}, where Zit−1{Z}_{it-1}is known at time t−1t-1.(3)E((ui+eit)ΔZit−1)=0,t=3,4,…E\left(\left({u}_{i}+{e}_{it})\Delta {Z}_{it-1})=0,\hspace{1.0em}t=3,4,\ldots So Zit{Z}_{it}includes lagged values of predetermined and endogenous variables (the first set of moment conditions) and differenced predetermined and endogenous variables (the second set of the moment conditions). All moment conditions are formulated separately for different years, so the number of observations for asymptotics equals the number of hospitals.The separate formulation of moment conditions for different years makes it impossible to apply the exclusion restrictions, which are commonly used in the instrumental variables approach.More specifically, lagged value of TPS and other hospital control variables in zit{z}_{it}(beds, physician-to-bed and nurse-to-bed ratios, HRRP penalty, and the binary variable for hospital EHR attestation) are taken as predetermined and do not require the use of instruments in estimations. Casemix and the disproportionate share index are assumed to be endogenous: we rely on the empirical evidence of manipulation by hospitals of patient diagnoses (i.e., with casemix) and reluctance to admit low-income patients under quality-incentive schemes [17,23,28]. We assume that the Medicare share is endogenous, too: the fact may be explained by demand-side response from Medicare patients to publicly reported hospital quality [44,53,72].It should be noted that the use of dynamic panel data methodology requires justification on economic grounds. This is because the approach uses lags and lagged differences as instruments, and there are potential problems with using lags as instruments even though they pass the Arrelano-Bond tests. Specifically, lags may prove to be weak and invalid [7]: the weakness may occur when lags are distant [59], and invalidity happens due to overfitting of the endogenous variable under large TT[66]. However, neither of these problems (weakness and invalidity) are likely to be present in our analysis since we restrict our instruments only by the first appropriate lag.The validity of instruments is assessed through statistics of the Arellano-Bond test. We employ [80] robust standard errors for estimation.The Sargan statistic may be used in dynamic panels for assessing validity of instruments under the homoskedasticity assumption. But it is not applicable to our specification with robust standard errors.But formal tests are insufficient for establishing the causal relationship in models, which use an instrumental variable approach [1,7]. Accordingly, it is necessary to provide an economic justification for the assumption of the exclusion restriction of the instruments, i.e., that the instruments are exogenous and impact the dependent variable through no channels other than the endogenous variable and, possibly, also through exogenous covariates. An example of such justification on theoretical grounds can be found in [6], who uses lags of GDP and lags of the inflation rate as instruments for GDP and inflation. Another way of arguing for the exclusion restriction is given in [38], which estimates per capita output in various countries as a function of social infrastructure. Owing to endogeneity of social infrastructure, variables related to exposure to Western culture are used as instruments, and there is a discussion of the absence of any direct channels through which these variables could impact a country’s per capita output.We follow the latter approach to provide an economic justification for the validity of instruments in the dynamic panel data model for the composite quality measure at Medicare hospitals. Our arguments below, which advocate the applicability of lagged first differences as instruments for the level Eq. (1) and first lagged levels as instruments for the difference Eq. (2), are based on the plausible assumption of a short adjustment period in the values of the dependent variable. Specifically, we assume that hospital managers take prompt action upon learning the TPS in year tt, so that adjustment is observed in the next period and is not delayed until a more remote future. This assumption is supported by interviews with hospital managers [21,46,37,55,73,77], which show real-time assessment of performance of hospital personnel and immediate feedback initiatives aimed at correcting possible lack of quality. For instance, at Medicare hospitals which participated in the pilot pay-for-performance program, “progress reports were routinely delivered to hospital leadership and regional boards” ([37], p. 45S). Hospital-specific and physician-specific compliance reports were collected at least every 1.5 months on average, and the results of these reports were delivered to individual physicians once in 5 months on average at both top-performing and bottom-performing hospitals ([77], Table 4, p. 837). As regards nationwide implementation of pay-for-performance at Medicare hospitals, the TPS is calculated annually, but values for the quality dimensions of the TPS are made publicly available on a quarterly basis.Exceptions are one measure in the clinical process of care domain (influenza immunization), one measure in the safety domain (PSI-90), and a measure in the efficiency domain – Medicare spending per beneficiary, which are updated annually. See measure dates in the quarterly data archives available at https://data.cms.gov/provider-data/archived-data/hospitals.Frequent announcements of quality scores make it possible to expedite quality adjustment at each hospital and improve the value of the TPS within a year. For instance, the survey of hospital CEOs, physicians, nurses, and board members showed that, since implementation of the value-based purchasing program, “data were shared with their board and discussed at least quarterly with senior leadership” ([55], p. 435).As regards our formal analysis, Eq. (1) has TPS as a dependent variable and its first and second lags as explanatory variables. Δyt−1\Delta {y}_{t-1}is used as an instrument for yt−1{y}_{t-1}. We assume that the change in TPS from period t−2t-2to t−1t-1, i.e., Δyt−1\Delta {y}_{t-1}, which is observed at a hospital at t−1t-1, is immediately followed by the hospital’s action in period t−1t-1. So the instrument Δyt−1\Delta {y}_{t-1}affect the dependent variable yt{y}_{t}through the endogenous variable yt−1{y}_{t-1}, i.e., through improved quality in period t−1t-1(and potentially also through the predetermined variable yt−2{y}_{t-2}, i.e., quality adjustment may start as early as in period t−2t-2) but not through other channels. Without the short adjustment period, these other channels might have included some postponed effects which only come into effect in period tt. Note that the equation has hospital control variables, and we follow the empirical literature on the US Medicare reform by treating some of them as endogenous. One of such variables, the share of Medicare patients, reflects the desire of the regulator to sign contracts with the hospital to treat Medicare patients, and it is a function of the hospital’s quality enhancing efforts [46]. Our empirical strategy relies on the fact that Δxt−1\Delta {x}_{t-1}is an excludable instrument for xt{x}_{t}. It is, indeed, plausible to assume that increase of quality efforts from period t−2t-2to period t−1t-1results in positive value of Δst−1\Delta {s}_{t-1}(where st{s}_{t}denotes the share of Medicare patients) and impacts the value of the TPS in the period tt. A similar argument applies to another endogenous control variable – casemix – which reflects the share of patients with complicated diagnoses. If we ignore potential dumping of patients by hospitals, hospitals are interested in treating patients with complicated diagnoses, since compensation in the system of diagnosis-related groups is higher for severe cases. But patient demand responds to public reports on hospital quality [20,42,44], so the share of Medicare cases becomes a function of hospital quality.Another equation is (2) and it models first differences, i.e., changes in quality. The dependent variable is Δyt\Delta {y}_{t}and it is a function of the endogenous variable Δyt−1\Delta {y}_{t-1}, a predetermined variable Δyt−2\Delta {y}_{t-2}, and the difference in the values of hospital control variables Δxt\Delta {x}_{t}. The instrument for Δyt−1\Delta {y}_{t-1}is yt−2{y}_{t-2}and the instrument for each endogenous hospital control variable is xt−2{x}_{t-2}. Following the above logic about prompt response of TPS to its values in the previous period, we presume that yt−2{y}_{t-2}will affect the change in the value of the TPS from period t−2t-2to period t−1t-1. So yt−2{y}_{t-2}impacts yt{y}_{t}through Δyt−1\Delta {y}_{t-1}(and potentially even through the predetermined variable Δyt−2\Delta {y}_{t-2}) but not through other channels (i.e., not through processes that occur as late as in period tt). Similarly, upon learning the value of xt−2{x}_{t-2}, hospitals speedily adjust their quality to change Δxt\Delta {x}_{t}and it affects Δyt\Delta {y}_{t}.Note that [56] used similar arguments in discussing applicability of the dynamic panel data model to analyze in-hospital mortality and the complication rate, which are used as measures of hospital quality in US Medicare hospitals. They write: “We believe our approach is appropriate because (i) changes to in-hospital mortality and complications should be immediately affected by changes in staffing levels, not after a long adjustment period, and (ii) the influence of the past is incorporated through the lagged value of the dependent variable.” (p. 296, Footnote 3).A related study applying dynamic panel data models to hospital performance indicators deals with average length of stay at Japanese acute-care hospitals that plan to introduce a prospective payment system [10]. The variable is treated in Japan as a proxy for hospital efficiency. It is regularly monitored and analyzed by the regulator and by hospital management, with feedback actions by hospital personnel in response to annual updates on levels of the variable [9,10,43,45,75]. Accordingly, the assumption of a short adjustment period for the length of stay is likely to hold at Japanese hospitals and the use of lagged levels and lagged differences as instruments is justified.Note that potential violations of the exclusion restriction may occur in instances where the quality measure requires long periods to adjust. In such instances, causal impact of the Medicare reform on the quality of care cannot be established [1,7].We note other limitations of our approach. First, the analysis deals with the composite quality measure. While quality-related efforts of a hospital and the TPS composite quality measure are multi-dimensional, we do not touch upon multi-tasking in the empirical estimations. Our approach considers a one-dimensional effort, a one-dimensional true quality, and its measurable proxy.Note that in case of Medicare’s formula, the true multi-dimensional quality of hospitals (and hence quality-related efforts) is transformed into measured quality (i.e., TPS) in a non-linear manner, owing to the step-wise scale used for computing the points for each measure. We might nonetheless assume that quality is transformed into TPS monotonically and can be linearized in the empirical part of the article. Several arguments can be listed to support the conjecture. First, the data for Medicare hospitals show that no hospital has the highest possible step-wise values for all its measures. So even the best hospitals have an incentive to work to increase at least one of their measure scores in order to improve TPS. Second, we can neglect disincentives within the step-wise scale used for aggregating measure scores, which may cause deterioration of quality for hospitals that are already positioned at the highest step. Such hospitals could afford only a slight decrease of their quality (due to slacking efforts) while remaining at this step, and the impact on the value of TPS of a fall in quality in only one quality measure will be negligible. Third, interviews with executives of hospitals using value-based purchasing show that a hospital rarely gives special attention to a given subset of measures or shifts its administrative and other efforts across measures. All dimensions of TPS are monitored and actions to improve each dimension are implemented [46,73].Second, we do not touch on the rules for computing the scores of each dimension of the composite measure or on aggregation of dimension scores. It is important to note that Medicare uses whichever is highest, improvement points or achievement points, as the score for each dimension. The choice between achievement and improvement points stimulates low-performing hospitals, and the uniform formula assumes that all groups of hospitals have equal margin for improvement. A minor exception is protection of hospitals above the benchmark value of the 95th percentile of a corresponding measure score: these hospitals receive 10 points for their achievement on a [0,10]\left[0,10]scale, while the maximum number of points for improvement by any hospital is 9.The approach used by Medicare is in contrast with the methodology used in France, where all providers are stimulated according to improvement while only providers with quality above the mean value are also rewarded for their achievement [14].Third, weighting of scores across domains is another feature of the design of the incentive mechanism which is not analyzed in our article. So the dichotomous variables for annual periods in the empirical specification capture time effects unrelated to Medicare’s value-based purchasing as well as time effects not associated with the size of incentives but potentially linked to changes in other elements of the reform design (i.e., changes in weights).Finally, conventional policy evaluation using a control group of hospitals is not possible because quality measures for non-Medicare hospitals are not available.The TPS or all its components are only available for hospitals in the Hospital Compare database. The Hospital Compare database does include a small group of non-incentivized hospitals together with value-based purchasing hospitals. These are children’s hospitals and critical-access hospitals. But both groups offer a special type of healthcare and are not comparable with acute-care hospitals. Moreover, critical-access hospitals usually have no more than 20 beds, which makes it impossible to find a close match with acute-care hospitals. See [70] for an attempt of matching acute-care and critical-access hospitals.The empirical part of the article therefore focuses solely on pay-for-performance hospitals and identifies the effect of quality incentives based on variation in αt{\alpha }_{t}and the share of Medicare patients in the hospital sit{s}_{it}. Variation in αt{\alpha }_{t}plays the role of the dummy for treatment/pre-treatment periods, and variations in sit{s}_{it}act similar to the dummy for the treatment/control groups.3.2Multivariate dependence of the quality variable3.2.1Calculation of the mean in the autoregressive processWe interpret the second-order dynamic panel (1) as a second-order autoregressive process. The coefficients for the first and the second lags of yit{y}_{it}in this AR(2) process are equal to ϕ1+ϕ4αtsit{\phi }_{1}+{\phi }_{4}{\alpha }_{t}{s}_{it}and ϕ2+ϕ5αtsit{\phi }_{2}+{\phi }_{5}{\alpha }_{t}{s}_{it}, respectively. Note that both coefficients are linear functions of αt{\alpha }_{t}. While the standard form of the AR(2) process contains only the lags of the dependent variable, the right-hand side of our empirical equation includes various hospital characteristics and control variables.To test the hypotheses which concern the mean value of the measured quality μ\mu , we measure the mean fitted value of yit{y}_{it}as follows.For a fixed value of α\alpha we take the unconditional expected values of both sides of (1) and denote μ(α)=E(yit)\mu \left(\alpha )=E({y}_{it}): (4)μ(α)=ϕ0+ϕ1μ(α)+ϕ2μ(α)+ϕ3αE(sit)+ϕ4αE(sit)μ(α)+ϕ4αcov(sit,yit−1)+ϕ5αE(sit)μ(α)+ϕ5αcov(sit,yit−2)+δ0E(sit)+E(zit)′δ1+αE(sitzit)′δ2+E(dt)′δ3,\begin{array}{rcl}\mu \left(\alpha )& =& {\phi }_{0}+{\phi }_{1}\mu \left(\alpha )+{\phi }_{2}\mu \left(\alpha )+{\phi }_{3}\alpha E\left({s}_{it})+{\phi }_{4}\alpha E\left({s}_{it})\mu \left(\alpha )+{\phi }_{4}\alpha {\rm{cov}}\left({s}_{it},{y}_{it-1})+{\phi }_{5}\alpha E\left({s}_{it})\mu \left(\alpha )\\ & & +{\phi }_{5}\alpha {\rm{cov}}\left({s}_{it},{y}_{it-2})+{\delta }_{0}E\left({s}_{it})+E\left({z}_{it})^{\prime} {\delta }_{1}+\alpha E\left({s}_{it}{z}_{it})^{\prime} {\delta }_{2}+E\left({d}_{t})^{\prime} {\delta }_{3},\end{array}where E(dt)′δ3=0E\left({d}_{t})^{\prime} {\delta }_{3}=0because of the normalization of coefficients δ3{\delta }_{3}in (1). After collecting the terms with μ\mu and rearranging them, we obtain: μ(α)=ϕ0+ϕ3αE(sit)+δ0E(sit)+E(zit)′δ1+αE(sitzit)′δ2+ϕ4αcov(sit,yit−1)+ϕ5αcov(sit,yit−2)1−ϕ1−ϕ2−ϕ4αE(sit)−ϕ5αE(sit).\mu \left(\alpha )=\frac{{\phi }_{0}+{\phi }_{3}\alpha E\left({s}_{it})+{\delta }_{0}E\left({s}_{it})+E\left({z}_{it})^{\prime} {\delta }_{1}+\alpha E\left({s}_{it}{z}_{it})^{\prime} {\delta }_{2}+{\phi }_{4}\alpha {\rm{cov}}\left({s}_{it},{y}_{it-1})+{\phi }_{5}\alpha {\rm{cov}}\left({s}_{it},{y}_{it-2})}{1-{\phi }_{1}-{\phi }_{2}-{\phi }_{4}\alpha E\left({s}_{it})-{\phi }_{5}\alpha E\left({s}_{it})}.Since α\alpha differs across tt, we use sample means across the hospitals for fixed ttto obtain estimates of expectations.The estimate of μ(α)\mu \left(\alpha )is constructed by replacing the expected values and covariances by corresponding sample means and sample covariances: μ(α)=ϕ0+ϕ3αs¯+δ0s¯+z¯′δ1+αsz¯′δ2+ϕ4αcov^(s,L(y))+ϕ5αcov^(s,L2(y))1−ϕ1−ϕ2−ϕ4αs¯−ϕ5αs¯.\mu \left(\alpha )=\frac{{\phi }_{0}+{\phi }_{3}\alpha \overline{s}+{\delta }_{0}\overline{s}+\overline{z}^{\prime} {\delta }_{1}+\alpha \overline{sz}^{\prime} {\delta }_{2}+{\phi }_{4}\alpha \widehat{{\rm{cov}}}\left(s,L(y))+{\phi }_{5}\alpha \widehat{{\rm{cov}}}\left(s,{L}^{2}(y))}{1-{\phi }_{1}-{\phi }_{2}-{\phi }_{4}\alpha \overline{s}-{\phi }_{5}\alpha \overline{s}}.Note that the expression for μ(α)\mu \left(\alpha )does not contain the time effects dt′δ3{d}_{t}^{^{\prime} }{\delta }_{3}, as they represent shifts in quality which are common to all the hospitals and are caused by external circumstances.3.2.2Intertemporal dependence due to the policy reformThe policy parameter α\alpha increases in 2013–2017 and remains unchanged in 2017–2019. As follows from the hypothesis H1aH1a, the value of μ(αt)\mu \left({\alpha }_{t})is expected to increase through 2013–2017 and should become flat in 2017–2019.Accordingly, we examine the difference between μ(αt)\mu \left({\alpha }_{t})and μ(αt−1)\mu \left({\alpha }_{t-1}): μ(αt)−μ(αt−1)=ϕ0+ϕ3αts¯+δ0s¯+z¯′δ1+αtsz¯′δ2+ϕ4αtcov^(s,L(y))+ϕ5αtcov^(s,L2(y))1−ϕ1−ϕ2−ϕ4αts¯−ϕ5αts¯−ϕ0+ϕ3αt−1s¯+δ0s¯+z¯′δ1+αt−1sz¯′δ2+ϕ4αt−1cov^(s,L(y))+ϕ5αt−1cov^(s,L2(y))1−ϕ1−ϕ2−ϕ4αt−1s¯−ϕ5αt−1s¯.\begin{array}{rcl}\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1})& =& \frac{{\phi }_{0}+{\phi }_{3}{\alpha }_{t}\overline{s}+{\delta }_{0}\overline{s}+\overline{z}^{\prime} {\delta }_{1}+{\alpha }_{t}\overline{sz}^{\prime} {\delta }_{2}+{\phi }_{4}{\alpha }_{t}\widehat{{\rm{cov}}}\left(s,L(y))+{\phi }_{5}{\alpha }_{t}\widehat{{\rm{cov}}}\left(s,{L}^{2}(y))}{1-{\phi }_{1}-{\phi }_{2}-{\phi }_{4}{\alpha }_{t}\overline{s}-{\phi }_{5}{\alpha }_{t}\overline{s}}\\ & & -\frac{{\phi }_{0}+{\phi }_{3}{\alpha }_{t-1}\overline{s}+{\delta }_{0}\overline{s}+\overline{z}^{\prime} {\delta }_{1}+{\alpha }_{t-1}\overline{sz}^{\prime} {\delta }_{2}+{\phi }_{4}{\alpha }_{t-1}\widehat{{\rm{cov}}}\left(s,L(y))+{\phi }_{5}{\alpha }_{t-1}\widehat{{\rm{cov}}}\left(s,{L}^{2}(y))}{1-{\phi }_{1}-{\phi }_{2}-{\phi }_{4}{\alpha }_{t-1}\overline{s}-{\phi }_{5}{\alpha }_{t-1}\overline{s}}.\end{array}The null hypothesis is as follows: H0:μ(αt)−μ(αt−1)=0,{H}_{0}:\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1})=0,and it is tested against the positive alternative.Equivalently, we compute the difference between μ(α)\mu \left(\alpha )and μ(0)\mu \left(0): μ(α)−μ(0)=ϕ0+ϕ3αs¯+δ0s¯+z¯′δ1+αsz¯′δ2+ϕ4αcov^(s,L(y))+ϕ5αcov^(s,L2(y))1−ϕ1−ϕ2−ϕ4αs¯−ϕ5αs¯−ϕ0+z¯′δ11−ϕ1−ϕ2.\mu \left(\alpha )-\mu \left(0)=\frac{{\phi }_{0}+{\phi }_{3}\alpha \overline{s}+{\delta }_{0}\overline{s}+\overline{z}^{\prime} {\delta }_{1}+\alpha \overline{sz}^{\prime} {\delta }_{2}+{\phi }_{4}\alpha \widehat{{\rm{cov}}}\left(s,L(y))+{\phi }_{5}\alpha \widehat{{\rm{cov}}}\left(s,{L}^{2}(y))}{1-{\phi }_{1}-{\phi }_{2}-{\phi }_{4}\alpha \overline{s}-{\phi }_{5}\alpha \overline{s}}-\frac{{\phi }_{0}+\overline{z}^{\prime} {\delta }_{1}}{1-{\phi }_{1}-{\phi }_{2}}.Note that μ(0)\mu \left(0)represents the mean value in the pre-reform years when α=0\alpha =0and is obtained analytically by plugging α=0\alpha =0into the expression for μ(α)\mu \left(\alpha ).The null hypothesis is as follows: H0:μ(αt)−μ(0)=0,{H}_{0}:\mu \left({\alpha }_{t})-\mu \left(0)=0,and it is tested against the positive alternative.In conjunction with hypothesis H1aH1a, μ(αt)−μ(αt−1)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1})should be positive in 2013–2017 and should be close to zero in 2017–2019. Equivalently, μ(αt)−μ(α0)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{0})should be positive in 2013–2019 and should increase over the period 2013–2017.Now consider hypothesis H1bH1b. The persistence parameter λ(α)\lambda \left(\alpha )describes how quickly the effect of random shock in quality fades over time. For a second-order autoregressive process the rate of convergence of the conditional expected value of yit{y}_{it}decays exponentially at a rate equal to the reciprocal value of the smallest root of the characteristic equation for the AR(2) process: 1−(ϕ1+ϕ4αtsit)λ−(ϕ2+ϕ5αtsit)λ2=01-\left({\phi }_{1}+{\phi }_{4}{\alpha }_{t}{s}_{it})\lambda -\left({\phi }_{2}+{\phi }_{5}{\alpha }_{t}{s}_{it}){\lambda }^{2}=0([39], Section 2.3). Again, for a fixed value of α\alpha we take expectations 1−(ϕ1+ϕ4αE(sit))λ−(ϕ2+ϕ5αE(sit))λ2=0.1-\left({\phi }_{1}+{\phi }_{4}\alpha E\left({s}_{it}))\lambda -\left({\phi }_{2}+{\phi }_{5}\alpha E\left({s}_{it})){\lambda }^{2}=0.Then we replace the expected values by sample means and solve this quadratic equation to obtain the following formula for λ(α)\lambda \left(\alpha ): λ(α)=ϕ1+ϕ4αs¯+(ϕ1+ϕ4αs¯)2+4(ϕ2+ϕ5αs¯)2,\lambda \left(\alpha )=\frac{{\phi }_{1}+{\phi }_{4}\alpha \overline{s}+\sqrt{{\left({\phi }_{1}+{\phi }_{4}\alpha \overline{s})}^{2}+4\left({\phi }_{2}+{\phi }_{5}\alpha \overline{s})}}{2},where s¯\overline{s}is the mean value of the share of Medicare cases for a given year.An alternative approach considers the value of the autocorrelation function (ACF(1)) (the correlation coefficient between yit{y}_{it}and yit−1{y}_{it-1}) as the persistence parameter λ\lambda . Specifically, for the second-order autoregressive process (1) the estimated value of ACF(1) becomes λ(α)=ϕ1+ϕ4αs¯1−ϕ2−ϕ5αs¯\lambda \left(\alpha )=\frac{{\phi }_{1}+{\phi }_{4}\alpha \overline{s}}{1-{\phi }_{2}-{\phi }_{5}\alpha \overline{s}}([39], Section 3.4).Testing H1bH1bimplies analyzing whether λ(α)\lambda \left(\alpha )is an increasing function of α\alpha . So, similar to H1aH1a, the null hypothesis: H0:λ(αt)−λ(αt−1)=0{H}_{0}:\lambda \left({\alpha }_{t})-\lambda \left({\alpha }_{t-1})=0is tested against the positive alternative.Alternatively, we assess whether λ(α)−λ(0)\lambda \left(\alpha )-\lambda \left(0)is positive, whether it increases in 2013–2017 and changes only negligibly in 2017–2019.To assess H2H2we compute the effect of pay-for-performance as μ(αt)−μ(0)\mu \left({\alpha }_{t})-\mu \left(0)or μ(αt)−μ(αt−1)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1})at different quintiles of the lagged TPSit{{\rm{TPS}}}_{it}, where quintile 1 denotes the lowest quality and quintile 5 denotes the highest. We investigate whether the effect is positive for μ(αt)−μ(0)\mu \left({\alpha }_{t})-\mu \left(0)in 2013–2019 (and for μ(αt)−μ(αt−1)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1})in 2013–2017) and whether the effect increases by quintile.Testing H3aH3aand H3bH3binvolves computing the predicted values of TPSit{{\rm{TPS}}}_{it}at the mean value of each covariate for different quintiles of the lagged TPSit{{\rm{TPS}}}_{it}and examining whether in 2013–2019 they change from positive in the lowest quintiles to negative or insignificant in the highest quintiles. Average difference between predicted TPS and lagged TPS shows the expected change in quality in consecutive years (the net total effect) which is the sum of the effect of pay-for-performance and the impact of mean reversion.3.2.3Estimation of the multivariate effect due to policy reform and mean reversionEvaluation of H3cH3cinvolves calculating annual values of the net total effect for quintiles of the lagged TPS and examining whether the negative values of the net total effect become less frequent across highest quintiles during the period 2013–2017 and stays constant in 2017–2019.4Data4.1Data sources and variablesThe analysis uses data for Medicare hospitals in 2011–2019 from several sources. We use Hospital Compare data archives (January 2021 update) for quality measures, hospital ownership, and geographic location. The medical school affiliation of a hospital, the number of hospital beds, nurses, and physicians come from Provider of Service files. Other hospital control variables are taken from the Final Rules, which are Medicare’s annual documents on reimbursement rates in the inpatient prospective payment system. Specifically, we use information from the Impact Files, which accompany the Final Rules and estimate the impact of the reimbursement mechanism on hospital characteristics. The variables taken from the Impact Files are the share of Medicare’s discharges, ownership, and urban location.Patient characteristics are also taken from the Impact Files. The casemix variable reflects the relative weight of each DRG in financial terms and is adjusted for transfers of patients between hospitals.If a patient was transferred to/from a hospital, then the transfer-adjustment factor is the lesser of one and the value of the patient’s length of stay relative to the geometric mean of the national length of stay for this DRG. See Federal Register 2011, 42 CFR, Part 412.Casemix makes it possible to control for the composition of patient cases taking account of the objective link between severity of illness and hospital resources. The disproportionate-share index accounts for the share of low-income patients and makes it possible to proxy a patient’s income.To account for other major channels of quality improvement by Medicare hospitals over the observed time period, we use the data for two programs run by the Centers for Medicare and Medicaid Services. One of them is the HRRP, which applies to Medicare hospitals since fiscal year 2013 and penalizes them for excess readmissions. Specifically, the payment reduction which may equal from 0 to 3% is applied to hospital’s Medicare remuneration, higher values of the percentage for the penalty represent more excess readmissions at the hospital. Using the HRRP Supplemental data files, which accompany annual Final Rules on acute inpatient PPS (June 2020 update), we find the HRRP penalty for 2013–2019 and use it as one of the control variables in the empirical analysis.We also consider the EHR Incentive Program, which was in force since 2011. The program establishes hospital attestation on the use of EHR. The adoption of quality-improving information technology requires substantial fixed cost, so the binary variable for hospital attestation within EHR makes it possible to control for the fixed cost in the empirical analysis. The EHR promotion program consists of three stages (sequentially introduced in 2011, 2014, and 2017). Using data from The Eligible Hospitals Public Use Files on the EHR incentive program (February 2020 update), we set the EHR attestation dummy equal to one if the hospital passed its attestation for the given year at any stage. Owing to non-availability of data on the third stage of the program, we extend the second stage data from year 2016 to years 2017–2019. Use of an attestation dummy lets us control for the fact of incurring the fixed cost of quality-improvement efforts. Owing to the small size of the non-EHR group (only 8–10% of the sample), we do not analyze whether quality goes up faster in the group of the hospitals (for instance, we do not interact the attestation dummy with α\alpha ).4.2SampleThe non-anonymous character of the data sources allows us to merge them by year and hospital name. Our analysis focuses on acute-care Medicare’s hospitals, as the pay-for-performance incentive contract applies exclusively to this group. We restrict the sample by considering only hospitals with share of Medicare cases greater than 5%.The specification with second-order lag enables estimation of the fitted values of TPSt{{\rm{TPS}}}_{t}and the values of μt{\mu }_{t}only starting 2013. Accordingly, we can evaluate the impact of the incentive contract on quality improvement in 2013–2019. There are 2,984 hospitals in our sample for 2013–2019, which make TPSN observations (Table 1).Table 1Descriptive statistics for Medicare’s acute-care hospitals in 2013–2019VariableDefinitionObsMeanSt.DevMinMaxHospital performanceTPSHospital TPS18,54537.26511.4682.72798.182Patient characteristicsCasemixTransfer-adjusted casemix index18,5451.5990.2980.8343.972DshDisproportionate share index, reflecting the prevalence of low-income patients18,5450.3070.16501.232Hospital characteristicsNurses/bedsNurse-to-bed ratio18,5451.3123.8490170.479Physicians/bedsPhysician-to-bed ratio18,5450.0990.947070.992BedsNumber of beds18,545272.158241.53832,891log(beds)Number of beds (in logs)18,5455.2830.8191.0997.969Medicare shareShare of Medicare cases18,5450.3780.1180.0500.983HRRP penaltyPercentage reduction of the Medicare payments under HRRP18,5450.4980.59003.000MUEHR=1 if passed attestation for meaningful usage of EHR18,5450.9240.26501Urban=1 if an urban hospital18,5450.7110.45301Public=1 if managed by federal, state or local government, or hospital district or authority18,5450.1470.35401Teaching=1 if hospital has medical school affiliation18,5450.3640.48101Hospital locationNew England=1 if located in Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, or Vermont18,5450.0460.21001Mid-Atlantic=1 if located in New Jersey, New York, or Pennsylvania18,5450.1230.32801East North Central=1 if located in Illinois, Indiana, Michigan, Ohio, or Wisconsin18,5450.1680.37401West North Central=1 if located in Iowa, Kansas, Minnesota, Missouri, Nebraska, North Dakota, or South Dakota18,5450.0810.27201South Atlantic=1 if located in Delaware, Florida, Georgia, Maryland, North Carolina, South Carolina, Virginia, District of Columbia, or West Virginia18,5450.1770.38101East South Central=1 if located in Alabama, Kentucky, Mississippi, or Tennessee18,5450.0870.28201West South Central=1 if located in Arkansas, Louisiana, Oklahoma, or Texas18,5450.1290.33501Mountain=1 if located in Arizona, Colorado, Idaho, Montana, Nevada, New Mexico, Utah, or Wyoming18,5450.0690.25301Pacific=1 if located in California, Oregon, Washington, Alaska, or Hawaii18,5450.1150.31901Note: Section 401 hospitals are treated as rural hospitals.4.3Flow of quality and evidence of mean reversionDescriptive analysis of the values of TPS offers suggestive evidence in support of some of the main hypotheses generated by the model. Specifically, we focus on the flow of hospitals between quintiles of TPS in different years. The Sankey diagrams in Figures 1 and 2 use the width of arrows as the intensity of flow rates and demonstrates how hospitals change their position in quintiles of the composite quality measure after the introduction of pay-for-performance (e.g., from 2012 to 2013).Figure 1Flow of hospitals between quintiles of TPS in 2012–2013.Figure 2Flow of hospitals between quintiles of TPS in 2018–2019.As can be inferred from Figure 1, there is considerable movement of hospitals between quintiles. For instance, consider hospitals which in 2012 belonged to the fifth quintile of TPS (quintile with the highest performance). Fewer than half of these hospitals remained in the fifth quintile of TPS in 2013, and the rest saw a decline of their position relative to other hospitals by moving to quintiles one through four. Similar tendencies are observed for hospitals in any other given quintile of TPS in 2012: only a small share of hospitals continue to belong to the same quintile in the subsequent year. This can be viewed as graphic support for the phenomenon of mean reversion since hospitals would rarely change their quintile from year to year in the absence of mean reversion.It is plausible to assume that mean reversion becomes weaker when there is an increase of α\alpha . Figure 2 supports this prediction. It shows the flow of hospitals between quintiles of TPS from 2018 to 2019, when the value of α\alpha was 0.02. Compared to Figure 1 with α\alpha equal to 0.01 in 2013, the flows in 2018–2019 are much weaker than the flows in 2012–2013, so hospitals change their position in quintiles less often.5Empirical resultsThe first set of our results is reported in Table 2 and concerns the mean effect of pay-for-performance at Medicare hospitals.Table 2Effect of pay-for-performance on the mean quality2013201420152016201720182019αt{\alpha }_{t}1.001.251.501.752.002.002.00μ(αt)\mu \left({\alpha }_{t})30.546∗∗∗30.54{6}^{\ast \ast \ast }31.879∗∗∗31.87{9}^{\ast \ast \ast }34.823∗∗∗34.82{3}^{\ast \ast \ast }36.226∗∗∗36.22{6}^{\ast \ast \ast }38.407∗∗∗38.40{7}^{\ast \ast \ast }38.684∗∗∗38.68{4}^{\ast \ast \ast }38.841∗∗∗38.84{1}^{\ast \ast \ast }(0.973)(0.678)(0.385)(0.453)(0.970)(0.961)(0.956)μ(αt)−μ(0)\mu \left({\alpha }_{t})-\mu \left(0)2.255∗∗∗2.25{5}^{\ast \ast \ast }3.179∗∗∗3.17{9}^{\ast \ast \ast }4.762∗∗∗4.76{2}^{\ast \ast \ast }6.487∗∗∗6.48{7}^{\ast \ast \ast }8.461∗∗∗8.46{1}^{\ast \ast \ast }8.530∗∗∗8.53{0}^{\ast \ast \ast }8.495∗∗∗8.49{5}^{\ast \ast \ast }(0.737)(1.009)(1.349)(1.800)(2.355)(2.309)(2.266)μ(αt)−μ(αt−1)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1})2.255∗∗∗2.25{5}^{\ast \ast \ast }1.333∗∗∗1.33{3}^{\ast \ast \ast }2.944∗∗∗2.94{4}^{\ast \ast \ast }1.403∗∗∗1.40{3}^{\ast \ast \ast }2.181∗∗∗2.18{1}^{\ast \ast \ast }0.2770.157(0.737)(0.375)(0.474)(0.542)(0.634)(0.240)(0.237)Notes: Standard errors calculated using the delta-method are in parentheses.∗{}^{\ast }, ∗∗{}^{\ast \ast }, and ∗∗∗{}^{\ast \ast \ast }show significance at levels of 0.1, 0.05, and 0.01, respectively.Measured as μ(αt)−μ(0)\mu \left({\alpha }_{t})-\mu \left(0), the mean effect of pay-for-performance is positive in 2013–2019. The value of the effect μ(αt)\mu \left({\alpha }_{t})increases in αt{\alpha }_{t}in 2013–2017. However, the increase in 2017–2019 is negligible and is in line with the fact that α\alpha has remained flat since 2017. Similarly, the change in the effect of pay-for-performance in consecutive years, defined as μ(αt)−μ(αt−1)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1}), is positive for 2013–2017 but is extremely small and statistically insignificant in 2018–2019 in comparison with the previous years. This finding corresponds to our hypothesis H1aH1aof improvement in mean quality owing to the introduction of pay-for-performance (i.e., the increase of α\alpha from 0 to 1) and of expected rise of mean quality due to the linearly increasing reward function (α\alpha gradually goes up from 1 to 2 in 2013–2017).Note that the mean value of μ(αt)\mu \left({\alpha }_{t})increases in αt{\alpha }_{t}, which supports our supposition that hospital managers take account of future benefits from improving current values of hospital quality. Table 3 shows the second set of results for heterogeneity of hospital response to pay-for-performance. The parameter λ\lambda is estimated as the inverse of the smaller root of AR(2) or as ACF(1). The values are significant and less than one under both approaches. This points to mean reversion, so quality decreases toward the mean at high-quality hospitals and goes up toward the mean at hospitals with low quality. The values of λ\lambda rise with an increase in the size of incentives α\alpha , which implies that the persistence of the dynamic process increases, and hence the effect of mean reversion becomes weaker. Similarly, the values of λ(αt)−λ(0)\lambda \left({\alpha }_{t})-\lambda \left(0)are positive and increase in αt{\alpha }_{t}. The time change in the convergence parameter: λ(αt)−λ(αt−1)\lambda \left({\alpha }_{t})-\lambda \left({\alpha }_{t-1})is positive for 2013–2017. The results support hypothesis H1bH1bof weakening of quality convergence to the mean value with a rise in α\alpha . (The value of λ(αt)−λ(αt−1)\lambda \left({\alpha }_{t})-\lambda \left({\alpha }_{t-1})approaches zero in 2018–2019, when parameter α\alpha remains flat.)Table 3Effect of pay-for-performance on mean reversion2013201420152016201720182019αt{\alpha }_{t}1.001.251.501.752.002.002.00λ(αt)\lambda \left({\alpha }_{t})0.286∗∗∗0.28{6}^{\ast \ast \ast }0.435∗∗∗0.43{5}^{\ast \ast \ast }0.531∗∗∗0.53{1}^{\ast \ast \ast }0.598∗∗∗0.59{8}^{\ast \ast \ast }0.659∗∗∗0.65{9}^{\ast \ast \ast }0.651∗∗∗0.65{1}^{\ast \ast \ast }0.642∗∗∗0.64{2}^{\ast \ast \ast }(0.112)(0.032)(0.020)(0.017)(0.016)(0.016)(0.016)λ(αt)−λ(0)\lambda \left({\alpha }_{t})-\lambda \left(0)−0.169-0.169−0.020-0.0200.0760.144∗∗∗0.14{4}^{\ast \ast \ast }0.204∗∗∗0.20{4}^{\ast \ast \ast }0.196∗∗∗0.19{6}^{\ast \ast \ast }0.187∗∗∗0.18{7}^{\ast \ast \ast }(0.151)(0.069)(0.055)(0.048)(0.044)(0.044)(0.045)λ(αt)−λ(αt−1)\lambda \left({\alpha }_{t})-\lambda \left({\alpha }_{t-1})−0.169-0.1690.149∗0.14{9}^{\ast }0.096∗∗∗0.09{6}^{\ast \ast \ast }0.068∗∗∗0.06{8}^{\ast \ast \ast }0.061∗∗∗0.06{1}^{\ast \ast \ast }−0.008∗-0.00{8}^{\ast }−0.009∗-0.00{9}^{\ast }(0.151)(0.082)(0.016)(0.009)(0.007)(0.005)(0.005)λ(αt)\lambda \left({\alpha }_{t})(alternative)0.408∗∗∗0.40{8}^{\ast \ast \ast }0.442∗∗∗0.44{2}^{\ast \ast \ast }0.482∗∗∗0.48{2}^{\ast \ast \ast }0.519∗∗∗0.51{9}^{\ast \ast \ast }0.561∗∗∗0.56{1}^{\ast \ast \ast }0.554∗∗∗0.55{4}^{\ast \ast \ast }0.548∗∗∗0.54{8}^{\ast \ast \ast }(0.018)(0.015)(0.013)(0.013)(0.015)(0.014)(0.014)λ(αt)−λ(0)\lambda \left({\alpha }_{t})-\lambda \left(0)(alternative)0.132∗∗∗0.13{2}^{\ast \ast \ast }0.166∗∗∗0.16{6}^{\ast \ast \ast }0.206∗∗∗0.20{6}^{\ast \ast \ast }0.244∗∗∗0.24{4}^{\ast \ast \ast }0.285∗∗∗0.28{5}^{\ast \ast \ast }0.278∗∗∗0.27{8}^{\ast \ast \ast }0.272∗∗∗0.27{2}^{\ast \ast \ast }(0.019)(0.024)(0.029)(0.034)(0.039)(0.038)(0.038)λ(αt)−λ(αt−1)\lambda \left({\alpha }_{t})-\lambda \left({\alpha }_{t-1})(alternative)0.132∗∗∗0.13{2}^{\ast \ast \ast }0.034∗∗∗0.03{4}^{\ast \ast \ast }0.040∗∗∗0.04{0}^{\ast \ast \ast }0.037∗∗∗0.03{7}^{\ast \ast \ast }0.041∗∗∗0.04{1}^{\ast \ast \ast }−0.006∗-0.00{6}^{\ast }−0.006∗-0.00{6}^{\ast }(0.019)(0.005)(0.006)(0.006)(0.006)(0.003)(0.003)Notes: Standard errors calculated using the delta-method are in parentheses.∗{}^{\ast }, ∗∗{}^{\ast \ast }, and ∗∗∗{}^{\ast \ast \ast }show significance at levels of 0.1, 0.05, and 0.01, respectively.The persistence parameter λ(αt)\lambda \left({\alpha }_{t})is estimated as the inverse of the smaller root of AR(2) or as ACF(1), the latter is denoted as “alternative.”Since the values of λ(αt)\lambda \left({\alpha }_{t})are well below 1, we can conclude that the estimated AR(2) processes are indeed stationary for each αt{\alpha }_{t}.The heterogeneous changes in hospital quality owing to pay-for-performance are given in Tables 4, 5, 6 where hospitals are divided into quintiles according to the values of their TPS. Note that the change in hospital quality is a function of the regression coefficient and the mean values of covariates. So its standard error consists of two parts: the error of the estimated regression coefficient and the error of the mean values of covariates. Only the second part of this error depends on sample size and should go up approximately 5\sqrt{5}times due to analysis by quintiles. However, the weight of this second part proves to be relatively small in case of our data, so the standard errors in Tables 4–6 are only slightly larger than standard errors in Table 2.Table 4Effect of pay-for-performance as μ(αt)−μ(0)\mu \left({\alpha }_{t})-\mu \left(0)for quintiles of TPSt−1{{\rm{TPS}}}_{t-1}2013201420152016201720182019Quintile 11.828∗∗∗1.82{8}^{\ast \ast \ast }2.910∗∗∗2.91{0}^{\ast \ast \ast }4.149∗∗∗4.14{9}^{\ast \ast \ast }5.798∗∗∗5.79{8}^{\ast \ast \ast }7.673∗∗∗7.67{3}^{\ast \ast \ast }7.518∗∗∗7.51{8}^{\ast \ast \ast }7.683∗∗∗7.68{3}^{\ast \ast \ast }(0.741)(1.007)(1.334)(1.727)(2.264)(2.162)(2.202)Quintile 22.150∗∗∗2.15{0}^{\ast \ast \ast }2.881∗∗∗2.88{1}^{\ast \ast \ast }4.396∗∗∗4.39{6}^{\ast \ast \ast }5.953∗∗∗5.95{3}^{\ast \ast \ast }8.293∗∗∗8.29{3}^{\ast \ast \ast }8.063∗∗∗8.06{3}^{\ast \ast \ast }8.360∗∗∗8.36{0}^{\ast \ast \ast }(0.745)(1.013)(1.358)(1.803)(2.375)(2.293)(2.250)Quintile 2 minus0.321∗∗0.32{1}^{\ast \ast }−0.029-0.0290.2470.1550.620∗0.62{0}^{\ast }0.545∗0.54{5}^{\ast }0.677∗∗0.67{7}^{\ast \ast }Quintile 1(0.138)(0.124)(0.174)(0.233)(0.328)(0.327)(0.313)Quintile 32.187∗∗∗2.18{7}^{\ast \ast \ast }3.072∗∗∗3.07{2}^{\ast \ast \ast }4.279∗∗∗4.27{9}^{\ast \ast \ast }6.368∗∗∗6.36{8}^{\ast \ast \ast }8.093∗∗∗8.09{3}^{\ast \ast \ast }8.602∗∗∗8.60{2}^{\ast \ast \ast }8.301∗∗∗8.30{1}^{\ast \ast \ast }(0.738)(1.023)(1.378)(1.849)(2.346)(2.381)(2.278)Quintile 3 minus0.0370.191∗0.19{1}^{\ast }−0.117-0.1170.416∗0.41{6}^{\ast }−0.200-0.2000.539−0.059-0.059Quintile 2(0.078)(0.114)(0.155)(0.232)(0.321)(0.356)(0.355)Quintile 42.298∗∗∗2.29{8}^{\ast \ast \ast }3.218∗∗∗3.21{8}^{\ast \ast \ast }4.742∗∗∗4.74{2}^{\ast \ast \ast }6.534∗∗∗6.53{4}^{\ast \ast \ast }8.287∗∗∗8.28{7}^{\ast \ast \ast }8.260∗∗∗8.26{0}^{\ast \ast \ast }8.087∗∗∗8.08{7}^{\ast \ast \ast }(0.739)(1.027)(1.392)(1.850)(2.391)(2.338)(2.279)Quintile 4 minus0.1100.1460.463∗∗0.46{3}^{\ast \ast }0.1650.194−0.342-0.342−0.214-0.214Quintile 3(0.074)(0.111)(0.230)(0.235)(0.368)(0.419)(0.349)Quintile 52.483∗∗∗2.48{3}^{\ast \ast \ast }3.261∗∗∗3.26{1}^{\ast \ast \ast }5.135∗∗∗5.13{5}^{\ast \ast \ast }6.529∗∗∗6.52{9}^{\ast \ast \ast }8.381∗∗∗8.38{1}^{\ast \ast \ast }8.282∗∗∗8.28{2}^{\ast \ast \ast }8.501∗∗∗8.50{1}^{\ast \ast \ast }(0.741)(1.030)(1.418)(1.849)(2.506)(2.476)(2.421)Quintile 5 minus0.186∗0.18{6}^{\ast }0.0430.393−0.004-0.0040.0940.0220.414Quintile 4(0.101)(0.251)(0.503)(0.433)(0.568)(0.503)(0.497)Notes: quintile 1 denotes the lowest quality and quintile 5 – the highest. The table reports the effect at each corresponding quintile and the differences in the effects at consecutive quintiles.∗{}^{\ast }, ∗∗{}^{\ast \ast }, and ∗∗∗{}^{\ast \ast \ast }show significance at levels of 0.1, 0.05, and 0.01, respectively.Standard errors (calculated using the delta-method for the difference of the reform effects across the corresponding two categories of each time-invariant hospital characteristic) are in parentheses.There are two sources of errors in the estimates shown in the table: the error of the regression coefficient and the error of the mean values of covariates. The first part of the error does not vary across all result tables, while the second part of the error depends on the group size and is approximately 5\sqrt{5}times larger than its counterpart in Table 2. However, the errors of the regression coefficient are considerably bigger than those of mean values of covariates, so the increase in the standard errors in this table and two subsequent tables relative to the standard error in Table 2 is only minor.Table 5Effect of pay-for-performance as μ(αt)−μ(αt−1)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1})by quintiles of TPSt−1{{\rm{TPS}}}_{t-1}2013201420152016201720182019Quintile 11.828∗∗∗1.82{8}^{\ast \ast \ast }−0.257-0.2571.032∗∗1.03{2}^{\ast \ast }2.442∗∗∗2.44{2}^{\ast \ast \ast }1.738∗∗∗1.73{8}^{\ast \ast \ast }0.536−0.215-0.215(0.741)(0.517)(0.497)(0.582)(0.656)(0.440)(0.434)Quintile 22.150∗∗∗2.15{0}^{\ast \ast \ast }0.6782.131∗∗∗2.13{1}^{\ast \ast \ast }1.908∗∗∗1.90{8}^{\ast \ast \ast }2.405∗∗∗2.40{5}^{\ast \ast \ast }−0.125-0.1250.655(0.745)(0.489)(0.546)(0.626)(0.736)(0.465)(0.505)Quintile 2 minus0.321∗∗0.32{1}^{\ast \ast }0.935∗0.93{5}^{\ast }1.099∗∗1.09{9}^{\ast \ast }−0.534-0.5340.667−0.661-0.6610.869Quintile 1(0.138)(0.522)(0.501)(0.545)(0.585)(0.629)(0.676)Quintile 32.187∗∗∗2.18{7}^{\ast \ast \ast }0.933∗∗0.93{3}^{\ast \ast }1.561∗∗∗1.56{1}^{\ast \ast \ast }3.086∗∗∗3.08{6}^{\ast \ast \ast }1.940∗∗∗1.94{0}^{\ast \ast \ast }0.7430.104(0.738)(0.458)(0.530)(0.669)(0.723)(0.501)(0.531)Quintile 3 minus0.0370.255−0.570-0.5701.178∗∗1.17{8}^{\ast \ast }−0.465-0.4650.868−0.551-0.551Quintile 2(0.078)(0.490)(0.514)(0.545)(0.609)(0.677)(0.758)Quintile 42.298∗∗∗2.29{8}^{\ast \ast \ast }1.656∗∗∗1.65{6}^{\ast \ast \ast }3.222∗∗∗3.22{2}^{\ast \ast \ast }0.9292.397∗∗∗2.39{7}^{\ast \ast \ast }0.505−0.045-0.045(0.739)(0.464)(0.626)(0.702)(0.789)(0.552)(0.510)Quintile 4 minus0.1100.7221.662∗∗∗1.66{2}^{\ast \ast \ast }−2.157∗∗∗-2.15{7}^{\ast \ast \ast }0.458−0.238-0.238−0.149-0.149Quintile 3(0.074)(0.500)(0.565)(0.630)(0.654)(0.745)(0.748)Quintile 52.483∗∗∗2.48{3}^{\ast \ast \ast }3.445∗∗∗3.44{5}^{\ast \ast \ast }6.247∗∗∗6.24{7}^{\ast \ast \ast }−1.518∗-1.51{8}^{\ast }2.105∗∗∗2.10{5}^{\ast \ast \ast }−0.631-0.6310.666(0.741)(0.620)(0.864)(0.875)(0.892)(0.583)(0.567)Quintile 5 minus0.186∗0.18{6}^{\ast }1.789∗∗∗1.78{9}^{\ast \ast \ast }3.024∗∗∗3.02{4}^{\ast \ast \ast }−2.447∗∗∗-2.44{7}^{\ast \ast \ast }−0.292-0.292−1.136-1.1360.711Quintile 4(0.101)(0.599)(0.829)(0.876)(0.780)(0.803)(0.762)Notes: quintile 1 denotes the lowest quality and quintile 5 – the highest. The table reports the effect at each corresponding quintile and the differences in the effects at consecutive quintiles.∗{}^{\ast }, ∗∗{}^{\ast \ast }, and ∗∗∗{}^{\ast \ast \ast }show significance at levels of 0.1, 0.05, and 0.01, respectively.Standard errors calculated using the delta-method are in parentheses.Table 6Net total effect by quintiles of TPSt−1{{\rm{TPS}}}_{t-1}: Predicted TPS minus lagged TPS2013201420152016201720182019Quintile 12.686∗∗∗2.68{6}^{\ast \ast \ast }4.989∗∗∗4.98{9}^{\ast \ast \ast }1.153∗∗∗1.15{3}^{\ast \ast \ast }7.715∗∗∗7.71{5}^{\ast \ast \ast }5.647∗∗∗5.64{7}^{\ast \ast \ast }5.351∗∗∗5.35{1}^{\ast \ast \ast }0.226(0.458)(0.329)(0.298)(0.271)(0.263)(0.273)(0.266)Quintile 2−3.796∗∗∗-3.79{6}^{\ast \ast \ast }0.708∗∗∗0.70{8}^{\ast \ast \ast }−2.275∗∗∗-2.27{5}^{\ast \ast \ast }4.388∗∗∗4.38{8}^{\ast \ast \ast }2.586∗∗∗2.58{6}^{\ast \ast \ast }2.145∗∗∗2.14{5}^{\ast \ast \ast }−2.572∗∗∗-2.57{2}^{\ast \ast \ast }(0.288)(0.249)(0.236)(0.223)(0.212)(0.224)(0.238)Quintile 2 minus−6.482∗∗∗-6.48{2}^{\ast \ast \ast }−4.281∗∗∗-4.28{1}^{\ast \ast \ast }−3.428∗∗∗-3.42{8}^{\ast \ast \ast }−3.327∗∗∗-3.32{7}^{\ast \ast \ast }−3.061∗∗∗-3.06{1}^{\ast \ast \ast }−3.205∗∗∗-3.20{5}^{\ast \ast \ast }−2.799∗∗∗-2.79{9}^{\ast \ast \ast }Quintile 1(0.367)(0.285)(0.254)(0.215)(0.207)(0.221)(0.228)Quintile 3−7.249∗∗∗-7.24{9}^{\ast \ast \ast }−1.938∗∗∗-1.93{8}^{\ast \ast \ast }−5.054∗∗∗-5.05{4}^{\ast \ast \ast }2.405∗∗∗2.40{5}^{\ast \ast \ast }0.646∗∗∗0.64{6}^{\ast \ast \ast }0.383∗0.38{3}^{\ast }−4.523∗∗∗-4.52{3}^{\ast \ast \ast }(0.282)(0.236)(0.237)(0.218)(0.216)(0.228)(0.235)Quintile 3 minus−3.453∗∗∗-3.45{3}^{\ast \ast \ast }−2.646∗∗∗-2.64{6}^{\ast \ast \ast }−2.778∗∗∗-2.77{8}^{\ast \ast \ast }−1.982∗∗∗-1.98{2}^{\ast \ast \ast }−1.940∗∗∗-1.94{0}^{\ast \ast \ast }−1.762∗∗∗-1.76{2}^{\ast \ast \ast }−1.950∗∗∗-1.95{0}^{\ast \ast \ast }Quintile 2(0.276)(0.243)(0.212)(0.201)(0.193)(0.214)(0.231)Quintile 4−10.783∗∗∗-10.78{3}^{\ast \ast \ast }−4.199∗∗∗-4.19{9}^{\ast \ast \ast }−6.789∗∗∗-6.78{9}^{\ast \ast \ast }−0.182-0.182−1.337∗∗∗-1.33{7}^{\ast \ast \ast }−1.861∗∗∗-1.86{1}^{\ast \ast \ast }−6.893∗∗∗-6.89{3}^{\ast \ast \ast }(0.321)(0.273)(0.280)(0.250)(0.256)(0.272)(0.270)Quintile 4 minus−3.534∗∗∗-3.53{4}^{\ast \ast \ast }−2.261∗∗∗-2.26{1}^{\ast \ast \ast }−1.736∗∗∗-1.73{6}^{\ast \ast \ast }−2.587∗∗∗-2.58{7}^{\ast \ast \ast }−1.983∗∗∗-1.98{3}^{\ast \ast \ast }−2.244∗∗∗-2.24{4}^{\ast \ast \ast }−2.370∗∗∗-2.37{0}^{\ast \ast \ast }Quintile 3(0.289)(0.243)(0.247)(0.230)(0.228)(0.267)(0.229)Quintile 5−15.403∗∗∗-15.40{3}^{\ast \ast \ast }−8.318∗∗∗-8.31{8}^{\ast \ast \ast }−10.637∗∗∗-10.63{7}^{\ast \ast \ast }−3.815∗∗∗-3.81{5}^{\ast \ast \ast }−4.806∗∗∗-4.80{6}^{\ast \ast \ast }−5.654∗∗∗-5.65{4}^{\ast \ast \ast }−10.791∗∗∗-10.79{1}^{\ast \ast \ast }(0.439)(0.402)(0.476)(0.388)(0.381)(0.381)(0.396)Quintile 5 minus−4.620∗∗∗-4.62{0}^{\ast \ast \ast }−4.119∗∗∗-4.11{9}^{\ast \ast \ast }−3.847∗∗∗-3.84{7}^{\ast \ast \ast }−3.634∗∗∗-3.63{4}^{\ast \ast \ast }−3.469∗∗∗-3.46{9}^{\ast \ast \ast }−3.794∗∗∗-3.79{4}^{\ast \ast \ast }−3.898∗∗∗-3.89{8}^{\ast \ast \ast }Quintile 4(0.353)(0.330)(0.406)(0.336)(0.330)(0.323)(0.330)Notes: quintile 1 denotes the lowest quality and quintile 5 – the highest. The table reports the effect at each corresponding quintile and the differences in the effects at consecutive quintiles.∗{}^{\ast }, ∗∗{}^{\ast \ast }, and ∗∗∗{}^{\ast \ast \ast }show significance at levels of 0.1, 0.05, and 0.01, respectively.Standard errors calculated using the delta-method are in parentheses.The estimates of the effect of pay-for-performance in terms of μ(αt)−μ(0)\mu \left({\alpha }_{t})-\mu \left(0)or in terms of μ(αt)−μ(αt−1)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1})show that the higher the quintile of the quality distribution in the previous year, the larger is the impact of the reform (Tables 4 and 5). Statistically significant differences in the effect of pay-for-performance across consecutive quintiles of lagged TPS are observed in many years, for instance, in 4 years out of 7 for quintiles 1–2 in case of μ(αt)−μ(0)\mu \left({\alpha }_{t})-\mu \left(0)and for quintiles 4–5 in case of μ(αt)−μ(αt−1)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1}). The change of the effect of pay-for-performance over time μ(αt)−μ(αt−1)\mu \left({\alpha }_{t})-\mu \left({\alpha }_{t-1})increases with a rise of the quality incentive α\alpha but almost stops increasing in 2018–2019 when α\alpha becomes constant, as shown in Table 5. So pay-for-performance stimulates quality increase in all groups of Medicare’s hospitals, and the impact of pay-for-performance is greater at higher-quality hospitals.Table 6 gives estimates of the net total effect, i.e., the expected change in hospital quality over time, measured as the difference between the predicted TPS and the lagged TPS. The net total effect is the sum of the impact of mean reversion and the effect of pay-for-performance.Note that the estimation of the fitted value of TPS includes time effects which account both for time trend and for important changes in the incentive mechanism not captured by variation in α\alpha . An example of such change occurred in 2015 and temporarily decreased the value of TPS for each hospital.Specifically, the pneumonia cohort was expanded and it caused a rise in pneumonia readmission rates in 2015. Additionally, the safety domain with relatively low scores in comparison to measures of other domains was added to the list of measures which constitute TPS.Accordingly, Table 6 shows that the values of predicted TPS minus lagged TPS go down in 2015 for each quintile.The values of net total effect reveal an increase of quality in the groups of low-quality hospitals, while quality deteriorates in high-quality groups. Negative total effect is less prevalent or is smaller in absolute terms at high-quality hospitals in 2016–2017. The result can be attributed to the weakening of mean reversion with increase in α\alpha . Yet, when α\alpha becomes constant in 2018–2019, the prevalence of negative total effect and the absolute value of the negative effect returns to that of 2015.Finally, we focus on the effect of pay-for-performance for groups of Medicare hospitals according to their ownership, teaching status, urban location, and geographic region. The mean effect increases in α\alpha for public and private hospitals, for urban and rural hospitals, for teaching and non-teaching hospitals, and for hospitals in each geographic region (Tables 7 and 8).Table 7Effect of pay-for-performance as μ(αt)−μ(0)\mu \left({\alpha }_{t})-\mu \left(0)by hospital ownership, teaching status, and urban location2013201420152016201720182019Public2.099∗∗∗2.09{9}^{\ast \ast \ast }2.945∗∗∗2.94{5}^{\ast \ast \ast }4.226∗∗∗4.22{6}^{\ast \ast \ast }5.869∗∗∗5.86{9}^{\ast \ast \ast }7.413∗∗∗7.41{3}^{\ast \ast \ast }7.365∗∗∗7.36{5}^{\ast \ast \ast }7.475∗∗∗7.47{5}^{\ast \ast \ast }(0.723)(0.984)(1.304)(1.739)(2.254)(2.203)(2.148)Private2.274∗∗∗2.27{4}^{\ast \ast \ast }3.214∗∗∗3.21{4}^{\ast \ast \ast }4.858∗∗∗4.85{8}^{\ast \ast \ast }6.595∗∗∗6.59{5}^{\ast \ast \ast }8.637∗∗∗8.63{7}^{\ast \ast \ast }8.726∗∗∗8.72{6}^{\ast \ast \ast }8.656∗∗∗8.65{6}^{\ast \ast \ast }(0.740)(1.015)(1.360)(1.814)(2.377)(2.333)(2.289)Private minus0.1740.2690.632∗∗∗0.63{2}^{\ast \ast \ast }0.725∗∗0.72{5}^{\ast \ast }1.224∗∗∗1.22{4}^{\ast \ast \ast }1.361∗∗∗1.36{1}^{\ast \ast \ast }1.181∗∗∗1.18{1}^{\ast \ast \ast }Public(0.118)(0.168)(0.256)(0.323)(0.432)(0.437)(0.414)Urban2.383∗∗∗2.38{3}^{\ast \ast \ast }3.291∗∗∗3.29{1}^{\ast \ast \ast }4.735∗∗∗4.73{5}^{\ast \ast \ast }6.253∗∗∗6.25{3}^{\ast \ast \ast }8.059∗∗∗8.05{9}^{\ast \ast \ast }8.037∗∗∗8.03{7}^{\ast \ast \ast }7.929∗∗∗7.92{9}^{\ast \ast \ast }(0.739)(1.003)(1.314)(1.721)(2.203)(2.146)(2.104)Rural2.048∗∗∗2.04{8}^{\ast \ast \ast }3.069∗∗∗3.06{9}^{\ast \ast \ast }4.353∗∗∗4.35{3}^{\ast \ast \ast }6.649∗∗∗6.64{9}^{\ast \ast \ast }9.195∗∗∗9.19{5}^{\ast \ast \ast }9.275∗∗∗9.27{5}^{\ast \ast \ast }9.321∗∗∗9.32{1}^{\ast \ast \ast }(0.741)(1.070)(1.494)(2.054)(2.802)(2.706)(2.578)Rural minus−0.335-0.335−0.222-0.222−0.382-0.3820.3961.1361.239∗1.23{9}^{\ast }1.392∗∗1.39{2}^{\ast \ast }Urban(0.227)(0.333)(0.493)(0.599)(0.832)(0.749)(0.615)Teaching2.363∗∗∗2.36{3}^{\ast \ast \ast }3.294∗∗∗3.29{4}^{\ast \ast \ast }4.667∗∗∗4.66{7}^{\ast \ast \ast }6.327∗∗∗6.32{7}^{\ast \ast \ast }8.177∗∗∗8.17{7}^{\ast \ast \ast }8.284∗∗∗8.28{4}^{\ast \ast \ast }8.309∗∗∗8.30{9}^{\ast \ast \ast }(0.755)(1.013)(1.334)(1.735)(2.224)(2.198)(2.156)Non-teaching2.238∗∗∗2.23{8}^{\ast \ast \ast }3.177∗∗∗3.17{7}^{\ast \ast \ast }4.807∗∗∗4.80{7}^{\ast \ast \ast }6.546∗∗∗6.54{6}^{\ast \ast \ast }8.664∗∗∗8.66{4}^{\ast \ast \ast }8.688∗∗∗8.68{8}^{\ast \ast \ast }8.610∗∗∗8.61{0}^{\ast \ast \ast }(0.732)(1.018)(1.373)(1.854)(2.467)(2.407)(2.362)Non-teaching minus−0.125-0.125−0.118-0.1180.1400.2190.4870.4040.301Teaching(0.179)(0.236)(0.366)(0.430)(0.609)(0.578)(0.560)Notes: Standard errors (calculated using the delta-method for the difference of the reform effects across the corresponding two categories of each time-invariant hospital characteristic) are in parentheses.∗{}^{\ast }, ∗∗{}^{\ast \ast }, and ∗∗∗{}^{\ast \ast \ast }show significance at levels of 0.1, 0.05, and 0.01, respectively.Table 8Effect of pay-for-performance as μ(αt)−μ(0)\mu \left({\alpha }_{t})-\mu \left(0)for hospitals at different geographic regions2013201420152016201720182019New England1.864∗∗1.86{4}^{\ast \ast }2.0803.605∗∗3.60{5}^{\ast \ast }6.858∗∗∗6.85{8}^{\ast \ast \ast }10.211∗∗∗10.21{1}^{\ast \ast \ast }10.071∗∗∗10.07{1}^{\ast \ast \ast }9.934∗∗∗9.93{4}^{\ast \ast \ast }(0.815)(1.264)(1.752)(2.105)(3.078)(2.976)(2.888)Mid-Atlantic1.800∗∗∗1.80{0}^{\ast \ast \ast }2.550∗∗∗2.55{0}^{\ast \ast \ast }3.617∗∗∗3.61{7}^{\ast \ast \ast }5.189∗∗∗5.18{9}^{\ast \ast \ast }7.013∗∗∗7.01{3}^{\ast \ast \ast }7.194∗∗∗7.19{4}^{\ast \ast \ast }7.030∗∗∗7.03{0}^{\ast \ast \ast }(0.737)(0.999)(1.321)(1.739)(2.253)(2.219)(2.146)Mid-Atlantic minus−0.064-0.0640.4700.012−1.670∗∗∗-1.67{0}^{\ast \ast \ast }−3.198∗∗∗-3.19{8}^{\ast \ast \ast }−2.877∗∗∗-2.87{7}^{\ast \ast \ast }−2.904∗∗∗-2.90{4}^{\ast \ast \ast }New England(0.328)(0.702)(0.978)(0.574)(1.082)(0.997)(0.981)East North Central2.078∗∗∗2.07{8}^{\ast \ast \ast }3.084∗∗∗3.08{4}^{\ast \ast \ast }4.773∗∗∗4.77{3}^{\ast \ast \ast }6.552∗∗∗6.55{2}^{\ast \ast \ast }8.142∗∗∗8.14{2}^{\ast \ast \ast }8.205∗∗∗8.20{5}^{\ast \ast \ast }8.246∗∗∗8.24{6}^{\ast \ast \ast }(0.751)(1.036)(1.404)(1.865)(2.393)(2.387)(2.347)East North Central minus0.2151.0041.168−0.307-0.307−2.069∗∗-2.06{9}^{\ast \ast }−1.866∗∗∗-1.86{6}^{\ast \ast \ast }−1.688∗∗-1.68{8}^{\ast \ast }New England(0.316)(0.694)(0.955)(0.451)(0.905)(0.801)(0.775)West North Central2.318∗∗∗2.31{8}^{\ast \ast \ast }3.329∗∗∗3.32{9}^{\ast \ast \ast }5.154∗∗∗5.15{4}^{\ast \ast \ast }7.342∗∗∗7.34{2}^{\ast \ast \ast }10.138∗∗∗10.13{8}^{\ast \ast \ast }10.140∗∗∗10.14{0}^{\ast \ast \ast }10.172∗∗∗10.17{2}^{\ast \ast \ast }(0.741)(1.052)(1.431)(2.025)(2.842)(2.797)(2.765)West North Central minus0.4551.249∗1.24{9}^{\ast }1.5490.483−0.073-0.0730.0690.238New England(0.332)(0.709)(1.001)(0.520)(0.808)(0.780)(0.757)South Atlantic2.248∗∗∗2.24{8}^{\ast \ast \ast }3.210∗∗∗3.21{0}^{\ast \ast \ast }4.946∗∗∗4.94{6}^{\ast \ast \ast }6.803∗∗∗6.80{3}^{\ast \ast \ast }8.886∗∗∗8.88{6}^{\ast \ast \ast }8.863∗∗∗8.86{3}^{\ast \ast \ast }8.744∗∗∗8.74{4}^{\ast \ast \ast }(0.761)(1.040)(1.422)(1.887)(2.481)(2.407)(2.374)South Atlantic minus0.3841.1311.341−0.056-0.056−1.325-1.325−1.208-1.208−1.190-1.190New England(0.326)(0.699)(0.949)(0.434)(0.826)(0.788)(0.760)East South Central2.295∗∗∗2.29{5}^{\ast \ast \ast }3.118∗∗∗3.11{8}^{\ast \ast \ast }4.962∗∗∗4.96{2}^{\ast \ast \ast }6.913∗∗∗6.91{3}^{\ast \ast \ast }9.422∗∗∗9.42{2}^{\ast \ast \ast }8.821∗∗∗8.82{1}^{\ast \ast \ast }8.643∗∗∗8.64{3}^{\ast \ast \ast }(0.780)(1.065)(1.492)(2.042)(2.803)(2.597)(2.460)East South Central minus0.4321.0381.3570.054−0.789-0.789−1.250∗-1.25{0}^{\ast }−1.291∗-1.29{1}^{\ast }New England(0.356)(0.720)(0.989)(0.517)(0.719)(0.705)(0.726)West South Central2.276∗∗∗2.27{6}^{\ast \ast \ast }3.225∗∗∗3.22{5}^{\ast \ast \ast }5.193∗∗∗5.19{3}^{\ast \ast \ast }6.324∗∗∗6.32{4}^{\ast \ast \ast }7.926∗∗∗7.92{6}^{\ast \ast \ast }8.380∗∗∗8.38{0}^{\ast \ast \ast }8.274∗∗∗8.27{4}^{\ast \ast \ast }(0.735)(0.990)(1.325)(1.755)(2.241)(2.227)(2.175)West South Central minus0.4131.1461.588−0.535-0.535−2.286∗∗-2.28{6}^{\ast \ast }−1.691∗-1.69{1}^{\ast }−1.659∗-1.65{9}^{\ast }New England(0.347)(0.723)(1.022)(0.583)(1.061)(0.969)(0.933)Mountain1.795∗∗∗1.79{5}^{\ast \ast \ast }2.809∗∗∗2.80{9}^{\ast \ast \ast }4.303∗∗∗4.30{3}^{\ast \ast \ast }5.537∗∗∗5.53{7}^{\ast \ast \ast }7.291∗∗∗7.29{1}^{\ast \ast \ast }7.686∗∗∗7.68{6}^{\ast \ast \ast }7.941∗∗∗7.94{1}^{\ast \ast \ast }(0.647)(0.863)(1.119)(1.415)(1.809)(1.819)(1.861)Mountain minus−0.069-0.0690.7290.698−1.322-1.322−2.920∗∗-2.92{0}^{\ast \ast }−2.385∗-2.38{5}^{\ast }−1.993-1.993New England(0.371)(0.756)(1.073)(0.869)(1.468)(1.345)(1.242)Pacific2.524∗∗∗2.52{4}^{\ast \ast \ast }3.324∗∗∗3.32{4}^{\ast \ast \ast }4.276∗∗∗4.27{6}^{\ast \ast \ast }5.957∗∗∗5.95{7}^{\ast \ast \ast }7.923∗∗∗7.92{3}^{\ast \ast \ast }7.910∗∗∗7.91{0}^{\ast \ast \ast }8.190∗∗∗8.19{0}^{\ast \ast \ast }(0.716)(0.957)(1.238)(1.613)(2.101)(2.067)(2.066)Pacific minus0.661∗0.66{1}^{\ast }1.2450.671−0.901-0.901−2.288∗-2.28{8}^{\ast }−2.161∗-2.16{1}^{\ast }−1.744-1.744New England(0.388)(0.765)(1.072)(0.827)(1.315)(1.245)(1.189)Notes: Standard errors (calculated using the delta-method for the difference of the reform effects across New England hospitals and hospitals in each corresponding geographic region) are in parentheses.∗{}^{\ast }, ∗∗{}^{\ast \ast }, and ∗∗∗{}^{\ast \ast \ast }show significance at levels of 0.1, 0.05, and 0.01, respectively.The effect of pay-for-performance is greater for private hospitals than for public hospitals, which corresponds to findings in [13] and [78]. The result can be explained by a greater emphasis on financial incentives at these healthcare institutions. These profit constraints, combined with the altruistic character of healthcare services, induce more effective quality competition at non-public hospitals [16]. The difference in the effect for private and public hospitals is statistically significant in most years.As for teaching status, quality improvement owing to the incentive scheme is often higher at non-teaching hospitals, which may be because they can devote all of their labor resources to patient treatment, while teaching hospitals lose some efficiency due to their educational activities [64]. Also, teaching hospitals may be treating more difficult cases. This complexity could not be fully captured by the casemix variable in our analysis and may cause a downward bias of the estimated effect at teaching hospitals, explaining the lower value of the effect at teaching than at non-teaching hospitals. Yet, the difference in the values at teaching and non-teaching hospitals is statistically insignificant in each year.Statistically significant differences in the effect of pay-for-performance for urban and rural hospitals are observed only in the last 2 years: the effect is larger at rural hospitals.As regards geographic location, there is practically no variation in the effect across groups of hospitals in the early years of pay-for-performance. The differences are present mainly in the later few years: for instance, the mean effect of pay-for-performance is greater in New England than in Mid-Atlantic in 2016–2019 and than in East North Central and West South Central regions in 2017–2019.6DiscussionIn this article, we focused on exclusion of mean reversion in evaluating the response of TPS at Medicare hospitals to an incentive contract. Since TPS under this contract becomes an autoregressive process, our analysis deals with dynamic panels.It should be noted that dynamic panel data models are prevalent in various fields of economics. Examples in macroeconomics include the analysis of a country’s growth [11,50] or its current account [81]. Application in corporate finance deals with the study of such firm-level variables as size [33,61], profit [54], leverage [32,36], and such proxies of firm performance as return on asset and Tobin’s Q [49,65]. In the banking sphere dynamic panels are applied to ROE and profitability [35,48] while in finance they are used for housing prices [31] and fuel prices [71]. Papers in the economics of labor, health, and welfare employ dynamic panel data models to analyze physician labor supply [4], hospital staffing intensity [82], wealth of households and health status of individuals [57], and quality and efficiency of hospitals (e.g., mortality ratio in [56], and average length of stay in [10]).The approach used in our study estimates the unconditional mean of the dependent variable in the dynamic panel data model and employs it for policy evaluation. Specifically, the comparison of the fitted values of the unconditional mean at different values of policy intensity offers a measure of the effect of reform. The advantages of the approach are twofold. First, it excludes the impact of mean reversion in groupwise estimations (e.g., in lower and in higher quantiles of hospitals according to their TPS). Second, the approach may also be used in the analysis of the mean effect of the reform if we focus on effects in the long run. Indeed, the unconditional mean in dynamic panel data analysis is sometimes called the long-term mean as it reflects the mean value in the long run. It should be noted that an alternative approach that uses the estimated coefficient for the policy variable as a measure of the mean effect of reform does not suffer from the problem of mean reversion. But in dynamic panel data models it evaluates only the short-term impact of policy.As regards exclusion of mean reversion in dynamic panels, we note a limitation on the character of mean reversion, imposed by the nature of the dynamic panel model where the unconditional mean is the long-term mean. Mean reversion is not instantaneous: if a deviation from the mean is observed in period tt, the return to the mean occurs not in period t+1t+1but only in later periods.It may be noted that our approach is similar to difference-in-difference estimations. The long-run effect of reform under our approach is the difference in the fitted value of the long-term mean under the value αt{\alpha }_{t}and under counterfactual value of zero (similar to [48]). Alternatively, we can take the difference in the fitted values of the long-term means under the value of αt{\alpha }_{t}and αt−1{\alpha }_{t-1}. To summarize: in focusing on the long-run impact of the reform in dynamic panel data models, the estimation of either the mean effect or of the groupwise effects requires an unconditional mean. The approach also excludes mean reversion which contaminates policy evaluation in case of groupwise estimations.As regards policy evaluation based on panel data fixed effects methodology, our approach of computing the unconditional mean as a function of the policy variable α\alpha produces the conventional linear prediction of the dependent variable. The mean effect of reform in the static panel is either the coefficient for the reform variable or the difference in the fitted value of yyunder αt{\alpha }_{t}and the fitted value of yyunder 0 (counterfactual).Finally, we note the prerequisites for identification of the unconditional mean which are similar to the assumptions in difference-in-difference estimations. Two requirements apply both to the static and dynamic panels. First, time variation in the policy variable is required for identification of the coefficient for the policy variable in the unconditional mean function. Second, if there is only time variation in the policy variable α\alpha (and no cross-section variation in αt{\alpha }_{t}at a given value of tt, i.e., no control group), the reform effect cannot be distinguished from other time effects. So cross-section variation in another variable, which is correlated with the policy variable, is required. In our case, this variable is the Medicare share: the higher the share of Medicare patients in the hospital, the stronger the impact of α\alpha (the share of hospital funds at risk under the Medicare program becomes more important for total revenues of the hospital). The use of dynamic panel data models requires a third assumption: the unconditional mean must be defined, and for this reason the process yyhas to be stationary.7ConclusionStudies of incentive contracts usually focus on the mean tendency and give scant attention to potentially heterogeneous response to the policy of interest by agents at different percentiles of the distribution of the dependent variable. But insufficient analysis of such heterogeneity may lead to speculation on ceiling effects and belief among agents with better values of the variable of interest that there are no ways of making further financial gains by further improvements.This article highlights the fact that there is a multivariate dependence of the variable of interest in such incentive contracts. Specifically, a part of intertemporal dependence can be attributed to the policy reform and a part to mean reversion. So the article proposes a method to help model such multivariate dependence by excluding the impact of mean reversion. As mean reversion contaminates judgment regarding the time profile of the dependent variable, and this contamination is different for agents in lower and higher percentiles of the variable of interest, clearing out the reform effect of mean reversion makes the method suitable for assessing heterogeneity of incentive schemes.In an application to the longitudinal data for Medicare’s acute-care hospitals taking part in the nationwide quality incentive mechanism (“value-based purchasing”), we find that the higher the quintile of quality in the prior period, the larger the increase in the composite quality measure owing to the reform. Quality improvement in each quintile increases with the increase in size of the quality incentive.Our results reveal that increase in the quality measure owing to pay-for-performance is greater at hospitals with higher levels of quality. The finding suggests stronger emphasis on quality activities at high-quality hospitals, and this is indeed discovered in a number of works. For instance, top-performing hospitals in the US pilot program paid more attention to quality enhancement than bottom-performing hospitals [77]. Under the proportional pay-for-performance mechanism in California, high-quality physicians similarly placed more emphasis on an organizational culture of quality and demonstrated stronger dedication to addressing quality issues than low-quality physicians [21]. The desire of high-quality hospitals, which have reached top deciles of hospital performance, to pursue quality improvement by means additional to those proposed by the policy regulator is further evidence in support of our research [37].Directions for future work in health economics applications may include analysis of heterogeneous hospital response to quality incentives by considering different dimensions of the composite quality measure. A related field of research is the study of potential sacrifice of quality of non-incentivized measures in favor of measures incentivized by pay-for-performance. This has been analyzed at the mean level [27,47] and may be expanded to account for different behavior by high-quality and low-quality hospitals.

Journal

Dependence Modelingde Gruyter

Published: Jan 1, 2022

Keywords: policy evaluation; autoregressive process; intertemporal optimization; incentive contract; hospitals; 62J05; 91B55; 91B69; 91B74

There are no references for this article.