Add Journal to My Library
The Review of Economic Studies
, Volume 85 (1) – Jan 1, 2018

45 pages

/lp/ou_press/two-sided-learning-and-the-ratchet-principle-R0L9Iuy409

- Publisher
- Oxford University Press
- Copyright
- © The Author 2017. Published by Oxford University Press on behalf of The Review of Economic Studies Limited.
- ISSN
- 0034-6527
- eISSN
- 1467-937X
- D.O.I.
- 10.1093/restud/rdx019
- Publisher site
- See Article on Publisher Site

Abstract I study a class of continuous-time games of learning and imperfect monitoring. A long-run player and a market share a common prior about the initial value of a Gaussian hidden state, and learn about its subsequent values by observing a noisy public signal. The long-run player can nevertheless control the evolution of this signal, and thus affect the market’s belief. The public signal has an additive structure, and noise is Brownian. I derive conditions for an ordinary differential equation to characterize equilibrium behavior in which the long-run player’s actions depend on the history of the game only through the market’s correct belief. Using these conditions, I demonstrate the existence of pure-strategy equilibria in Markov strategies for settings in which the long-run player’s flow utility is nonlinear. The central finding is a learning-driven ratchet principle affecting incentives. I illustrate the economic implications of this principle in applications to monetary policy, earnings management, and career concerns. 1. Introduction Hidden variables are at the centre of many economic interactions: firms’ true fundamentals are hidden to both managers and shareholders; workers’ innate abilities are unobserved by both employers and workers themselves; and growth and inflation trends are hidden to both policymakers and market participants. In all these settings, the economic environment is characterized by the presence of underlying uncertainty that is common to everyone, and eliminating such uncertainty can be prohibitively costly, or simply impossible; agents thus learn about such unobserved payoff-relevant states simultaneously as decisions are being made, and the incomplete information they face need not ever fully disappear. This article is concerned with examining strategic behaviour in settings characterized by such forms of fundamental uncertainty. When agents learn about economic environment, behaviour can be influenced by the possibility of affecting the beliefs of others. The set of questions that can be asked in such contexts is incredibly rich. In financial markets, is it possible for markets to hold correct beliefs about firm’s fundamentals in the presence of earnings management? In labour markets, what are the forces that shape workers’ incentives when they want to be perceived as highly skilled? In policy, how is a central bank’s behaviour shaped by the possibility of affecting markets’ beliefs about the future evolution of inflation? The challenge in answering these questions lies on developing a framework that is tractable enough to accommodate both Bayesian updating to capture ongoing learning, and imperfect monitoring to capture strategic behaviour. To make progress towards the understanding of games of learning and imperfectly observable actions, I employ continuous-time methods using Holmström’s (1999),signal-jamming technology as the key building block. In the setting I study, there is a long-run player and a market ($$i.e.$$ a population of small individuals) who, starting from a common prior, learn about an unobserved process of Gaussian fundamentals by observing a public signal. The long-run player can nevertheless influence the market’s belief about the fundamentals by taking unobserved actions that affect the evolution of the publicly observed state. As in Holmström (1999), actions and the fundamentals are perfect substitutes in the signal technology, and thus the long-run player cannot affect the informativeness of the public signal ($$i.e.$$ there is no experimentation). Using Brownian information, I study Markov equilibria in which the long-run player’s behaviour depends on the history of the game through the market’s belief about the hidden state. In an equilibrium in pure strategies, the market must anticipate the long-run player’s actions at all times; beliefs thus coincide on the equilibrium path. However, allowing for belief divergence is critical to determine the actions that arise along the path of play. Consider, for instance, the earnings management example. To show that an equilibrium in which the market holds a correct belief exists, it must be verified that the payoff that the manager obtains by reporting earnings as conjectured by the market dominates the payoff under any other reporting strategy. But if the manager deviates, the market will misinterpret the report and will form an incorrect belief about the firm’s fundamentals. Consequently, at those off-path histories, both parties’ beliefs differ. Crucially, when actions are hidden, deviations from the market’s conjectured behaviour lead the long-run player’s belief to become private. Moreover, this private information is persistent, as it is linked to a learning process. As I will explain shortly, the combination of hidden actions and private information off the path of play severely complicates the equilibrium analysis in virtually every setting that allows for learning and imperfect monitoring with frequent arrival of information.1 To address this difficulty, I follow a first-order approach to studying Markov equilibria in settings where (i) affecting the public signal is costly and (ii) the long-run player’s flow payoff is a general—in particular, nonlinear—function of the market’s belief. Specifically, I construct a necessary condition for equilibria in which on-path behaviour is a differentiable function of the common belief, and then provide conditions under which this necessary condition is also sufficient. The advantages of this approach are both conceptual and technical. First, the necessary condition uncovers the forces that shape the long-run player’s behaviour in any Markov equilibrium, provided that an equilibrium of this form exists. Secondly, this approach offers a tractable venue for demonstrating the existence of such equilibria despite the intricacies of off-path private beliefs affecting behaviour. Economic contribution. The main finding of this article pertains to a ratchet principle affecting incentives. Consider a manager who evaluates boosting a firm’s earnings report above analysts’ predictions. The immediate benefit from this action is clear: abnormally high earnings lead the market to believe that the firm’s fundamentals have improved. Crucially, the manager understands that this optimism is incorrect, as the observation of high earnings was a consequence of altering the report. He then anticipates that subsequent manipulation will be required to maintain the impact on the firm’s value, as his private belief about the firm’s fundamentals indicates that the firm would otherwise underperform relative to the market’s expectations. Equally important, if the market expects firms with better prospects to manage their earnings more aggressively, this underperformance can become even more acute. In either case, exhibiting good performance results in a more demanding incentive scheme to be faced in the future—$$i.e.$$ a learning-driven ratchet principle emerges.2 In this article, ratchet effects—implications on behaviour of the ratchet principle just described—do not relate to reduced incentives for information revelation, as in models with ex ante asymmetric information ($$e.g.$$Laffont and Tirole, 1988): this is because the long-run player is unable to affect the informativeness of the public signal, which implies that the speed of learning is exogenous. Instead, these effects are captured in the form of distorted levels of costly actions relative to some benchmarks. More generally, their appearance is the outcome of a fundamental tension between Bayesian updating and strategic behaviour, and hence, they are not exclusive to the case of a Gaussian hidden state. Specifically, since beliefs are revised based on discrepancies between observed and expected signal realizations, actions that lead to abnormally high signals are inherently costly from a dynamic perspective: by creating higher expectations for tomorrow’s signals, such actions require stronger future actions to generate a sustained effect on beliefs. Applications. I first revisit Holmström’s (1999) seminal model of career concerns, which is a particular instance of linear payoffs within the class of games analysed. In this context, I show that the form of ratcheting previously described is embedded in the equilibrium that he finds. Importantly, by precisely quantifying the strength of this force, I show how ratcheting plays an important role in limiting the power of market-based incentives in the equilibrium found by Holmström when learning is stationary in his model. A key advantage of this article is its ability to accommodate nonlinear flow payoffs, which can be a defining feature of many economic environments. In an application to monetary policy, I consider a setting in which a price index carries noisy information about both an unobserved inflation trend and the level of money supply, and a central bank can affect employment by creating inflation surprises. The central bank’s trade-off between output and inflation is modelled via a traditional loss function that is quadratic in employment (or output) and money growth. In such a context, I show that the ratchet principle can induce a monetary authority to exhibit a stronger commitment to low inflation. Intuitively, while unanticipated inflation can be an effective tool to boost employment in the short run, it also leads the market to overestimate future inflation and, hence, to set excessively high nominal wages. This in turn puts downward pressure on future hiring decisions, which makes inflation more costly compared to settings in which the inflation trend is observed or simply absent. Finally, I study more subtle ratchet effects in an application that analyses managers’ incentives to boost earnings when they have a strong short-term incentive to exceed a zero-earnings threshold, captured in marginal flow payoffs that are single peaked and symmetric around that point. In such a context, I show that firms that expect to generate positive earnings can inflate reports more actively than firms at, or below, the threshold, despite their managers having weaker myopic incentives and being unable to affect firms’ market values. Intuitively, the market anticipates that successful manipulation by firms with poor (good) past performance will lead to stronger (weaker) myopic incentives in the future. Anticipating higher expectations of earnings management by the market, firms with poor profitability find it more costly to inflate earnings relative their successful counterparts. The distortion thus takes the form of a profile of manipulation that is skewed towards firms that have exhibited better performances in the past. Technical contribution. In the class of games analysed, learning is conditionally Gaussian and stationary, and hence, beliefs can be identified with posterior means. Moreover, a nonlinear version of the Kalman filter applies. It is then natural to look for Markov perfect equilibria (MPE) using standard dynamic programming tools, with the market and long-run player’s beliefs as states. However, the combination of hidden actions and hidden information off the path of play results in the long-run player’s value function no longer satisfying a traditional Hamilton–Jacobi–Bellman (HJB) equation. In fact, the differential equation at hand does not even have the structure of a usual partial differential equation (PDE); to the best of my knowledge, no existence theory applies. Implicit in the HJB approach is that, by demanding the determination of the long-run player’s full value function, the method requires exact knowledge of the long-run player’s off-path behaviour to determine the actions that arise along the path of play; however, the difficulty at hand is precisely that the long-run player can condition his actions on his private information in complex ways as his own belief changes. Exceptions are settings in which the long-run player’s flow payoff is linear in the market’s belief ($$e.g.$$Holmström, 1999), as in those cases the long-run player’s optimal behaviour is independent of the past history of play. However, it is exactly in those linear environments that the differential equation delivered by the HJB approach has a trivial solution. If the goal is then to analyse settings that naturally involve nonlinearities, solution methods for linear environments do not apply. The technical advantage of the first-order approach is that the ratcheting equation—the necessary condition for equilibrium behaviour—makes bypassing the exact computation of off-path payoffs possible. In fact, this ordinary differential equation (ODE) offers a method to guess for Markov equilibria without knowing how exactly the candidate equilibrium might be supported off the path of play. Importantly, provided that it is verified that a deviation from a solution to the ratcheting equation is not profitable, leaving off-path behaviour unspecified in the equilibrium concept is no disadvantage: equilibrium outcomes ($$i.e.$$ actions and payoffs) are determined exclusively by the actions prescribed by the equilibrium strategy along the path of play. Therefore, for sufficiency, instead of computing off-path payoffs exactly, I approximate them. Specifically, building on the optimal contracting literature, I bound off-path payoffs in a way that parallels sufficiency steps in relaxed formulations of principal–agent problems (Williams, 2011; Sannikov, 2014) to obtain a verification theorem for Markov equilibria (Theorem 1). The theorem involves the ratcheting equation and the ODE that characterizes the evolution of the (candidate, on-path) payoff that results from inducing no belief divergence; $$i.e.$$ a system of two ODEs rather than a non-standard differential equation or a PDE. The key requirement is that the information rent—a measure of the value of acquiring private information about the continuation game—associated with the solution of the system at hand cannot change too quickly. The advantage of this verification theorem—relative to both the HJB approach and the contracting literature—is its considerable tractability. Using this result, I determine conditions on primitives that ensure the existence of Markov equilibria in two classes of games exhibiting nonlinearities: linear quadratic games and games with bounded marginal flow payoffs (Theorems 2 and 3), which host the applications I examine. These three results address the belief divergence challenge, and the continuous-time approach is critical for their derivation. Related literaure. Regarding the literature on the ratchet effect, Weitzman (1980) illustrates how revising production targets on the basis of observed performance can dampen incentives in planning economies; both the incentive scheme and the revision rule are exogenous in his analysis. Freixas et al. (1985) and Laffont and Tirole (1988) in turn endogenize ratcheting by allowing a principal to optimally revise an incentive scheme as new information about an agent’s hidden type is revealed upon observing performance; the main result is that there is considerable pooling. As in Weitzman (1980), my analysis focuses on the size of equilibrium actions, rather than on their informativeness. In line with the second group of papers, the strength of the ratcheting that arises in any specific setting is an equilibrium object: by conjecturing the long-run player’s behaviour, the market effectively imposes an endogenous moving target against which the long-run player’s performance is evaluated. Concurrently with this article, Bhaskar (2014), Prat and Jovanovic (2014), and Bhaskar and Mailath (2016) identify ratchet principles in principal–agent models with symmetric uncertainty: namely, that good performance can negatively affect an agent’s incentives if it leads a principal to overestimate a hidden technological parameter. My analysis differs from these papers along two dimensions. First, I show that market-based incentives can lead to quite rich behaviour on behalf of a forward-looking agent; instead, the contracts that these papers analyse implement either minimal or maximal effort. Secondly, I show that, in games of symmetric uncertainty, the ratchet principle is also determined by a market revising its expectations of future behaviour, in addition to revising its beliefs about an unobserved state.3 This article belongs to a broader class of games of ex ante symmetric uncertainty in which imperfect monitoring leads to the possibility of divergent beliefs. In the reputation literature, Holmström (1999) finds an equilibrium in which a worker’s equilibrium effort is identical on and off the path of play, in part consequence of the assumed linearity in payoffs.4 In Board and Meyer-ter-Vehn (2014), private beliefs matter non-trivially for a firm’s investment policy, and the existence of an equilibrium is shown via fixed-point arguments; my approach is instead constructive and focused on pure strategies. Private beliefs also arise in strategic experimentation settings involving a risky arm of two possible types and perfectly informative Poisson signals. Since beliefs are deterministic in this case, the equilibrium analysis is tractable (Bergemann and Hege, 2005 derive homogeneity properties of off-path payoffs and Bonatti and Hörner (2011, 2016) apply standard optimal control techniques), and the ratcheting I find is absent, as the observation of a signal terminates the interaction. To conclude, this paper contributes to a growing literature that analyses dynamic incentives exploiting the tractability of continuous-time methods. Sannikov (2007), Faingold and Sannikov (2011) and Bohren (2016) study games with imperfect monitoring in which the continuation game is identical on and off the equilibrium path. In contrast, as in the current paper, in the principal-agent models of Williams (2011), Prat and Jovanovic (2014), and Sannikov (2014), deviations lead the agent to obtain private information about future output. All these contracting papers derive measures of information rents and general sufficient conditions that validate the first-order approach they follow. Such sufficient conditions involve endogenous variables, and their verification is usually done both ex post ($$i.e.$$ using the solution to the relaxed problem) and in specific settings. Instead, the sufficient conditions that I derive can be mapped to primitives for a large class of economic environments. 1.1. Outline Section 2 presents the model and Section 3 derives necessary conditions for Markov equilibria. Section 4 explores three applications. Section 5 states the verification theorem and Section 6 contains the existence results. Section 7 concludes. All proofs are relegated to the Appendix. 2. Model A long-run player and a population of small players (the market) learn about a hidden state $$(\theta_t)_{t\geq0}$$ (the fundamentals) by observing a public signal $$(\xi_t)_{t\geq 0}$$. Their evolution is given by dθt = −κ(θt−η)dt+σθdZtθ,t>0,θ0∈R, (1) dξt = (at+θt)dt+σξdZtξ,t>0,ξ0=0. (2) In this specification, $$(Z_t^\theta)_{t\geq 0}$$ and $$(Z^\xi_t)_{t\geq 0}$$ are independent Brownian motions, and $$\sigma_\theta$$ and $$\sigma_\xi$$ are strictly positive volatility parameters. The fundamentals follow a Gaussian diffusion (hence Markov) process where $$\kappa\geq 0$$ is the rate at which $$(\theta_t)_{t\geq 0}$$ reverts towards the long-run mean $$\eta\in \mathbb{R}$$.5 The public signal (2) carries information about the fundamentals in its drift, but it is affected by the long-run player’s choice of action $$a_t$$, $$t\geq 0$$. These actions take values in an interval $$A\subseteq\mathbb{R}$$, with $$0\in A$$, and they are never directly observed by the market. The monitoring technology (2) is the continuous-time analog of Holmström’s (1999)signal-jamming technology, and a key property of it is that it satisfies the full-support assumption with respect to the long-run player’s actions.6 Thus, the only information that the market has comes from realizations of $$(\xi_t)_{t\geq 0}$$; let $$(\mathcal {F}_t)_{t\geq 0}$$ denote the corresponding public filtration, and $$\xi^t:=(\xi_s: 0\leq s\leq t)$$ any realized public history. I will examine equilibria in pure strategies in which the long-run player’s behaviour along the path of play is, at all instants of time, an $$A$$-valued function of the current public history $$\xi^t$$, $$t\geq 0$$. The formal notion of any such pure public strategy for the long-run player is defined next; for simplicity, I simply use the term strategy thereafter. Definition 1. A (pure public) strategy $$(a_t)_{t\geq 0}$$ is a stochastic process taking values in $$A$$ that is also progressively measurable with respect to $$(\mathcal{F}_t)_{t\geq 0}$$, and that satisfies $$\mathbb{E}\left[\int_0^t a_s^2ds\right]<\infty$$, $$t\geq 0$$. A strategy is feasible if, in addition, equation (2) admits a unique (in a probability law sense) solution.7 Everyone shares a prior that $$\theta_0$$ is normally distributed, with a variance $$\gamma^*$$ that ensures that learning is stationary—in this case, the Gaussian structure of both the fundamentals and noise permits posterior beliefs to be identified with posterior means; I defer the details to Section 3.1. Crucially, in order to interpret the public signal correctly, the market needs to conjecture the long-run player’s equilibrium behaviour; in this way, the market can account for how the latter agent’s actions affect the evolution of the public signal. Thus, let pt∗:=Ea∗[θt|Ft] denote the mean of the market’s posterior belief about $$\theta_t$$ given the information up to time $$t\geq 0$$ under the assumption that the feasible strategy $$(a_t^*)_{t\geq 0}$$ is being followed. In what follows, the market’s conjecture $$(a_t^*)_{t\geq 0}$$ is fixed, and I refer to the corresponding posterior mean process $$(p_t^*)_{t\geq 0}$$ as the public belief process. The market behaves myopically given its beliefs about the fundamentals and equilibrium play.8 Specifically, there is a measurable function $$\chi: \mathbb{R}\times A \to \mathbb{R}$$ such that, at each time $$t$$, the market takes an action $$\chi(p_t^*,a_t^*)$$ that affects the long-run player’s utility. As a result, the total payoff to the long-run player of following a feasible strategy $$(a_t)_{t\geq0}$$ is given by Ea[∫0∞e−rt(u(χ(pt∗,at∗))−g(at))dt|p0=p], (3) where $$p_0=p$$ denotes the prior mean of $$\theta_0$$. In this specification, the notation $$\mathbb{E}^a[\cdot]$$ emphasizes that a strategy $$(a_t)_{t\geq 0}$$ induces a distribution over the paths of $$(\xi_t)_{t\geq 0}$$, thus affecting the likelihood of any realization of $$(p_t^*)_{t\geq 0}$$. Also, $$u: \mathbb{R}\to\mathbb{R}$$ is measurable, and $$r>0$$ denotes the discount rate. Finally, affecting the public signal is costly according to a convex function $$g: A\to \mathbb{R}_+$$ such that $$g(0)=0$$, $$g'(a)>0$$ for $$a>0$$, $$g'(a)<0$$ for $$a<0$$ ($$i.e.$$ increasing the rate of change of the public signal in either direction is costly at increasing rates). Mild technical conditions on $$u$$, $$\chi$$, and $$g$$ that are used for analyzing equilibria characterized by ODEs are presented next—these conditions are not needed for examining pure-strategy equilibria at a general level (Definition 2 below), and they are discussed at the end of this section (Remark 1). Let $$C^k(E;F)$$ be the set of $$k$$-times differentiable functions from $$E\subset \mathbb{R}^n$$ to $$F\subset\mathbb{R}$$, $$n\geq 1$$, with a continuous $$k$$-th derivative; I omit $$k$$ if $$k=0$$, and $$F$$ if $$F=\mathbb{R}$$. Assumption 1. (i) Differentiability: $$u\in C^1(\mathbb{R})$$, $$\chi\in C^1(\mathbb{R}\times A)$$ and $$g\in C^2(A;\mathbb{R}_+)$$ with ρ:=(g′)−1∈C2(R). (ii) Growth conditions: the partial derivatives $$\chi_p$$ and $$\chi_{a^*}$$ are bounded in $$\mathbb{R}\times A$$, and $$u$$, $$u'$$, and $$g'$$ have polynomial growth.9 (iii) Strong convexity: $$g''(\cdot)\geq \psi$$ for some $$\psi>0$$.10 As is standard in stochastic optimal control, a strategy $$(a_t)_{t\geq 0}$$ is admissible for the long-run player if it is feasible and Ea[∫0∞e−rt|u(χ(pt∗,at∗))−g(at)|dt|p0=p]<∞, (see, for instance, Pham, 2009). In this case, it is said that $$(a_t,a_t^*)_{t\geq 0}$$ is an admissible pair. Definition 2. A strategy $$(a_t^*)_{t\geq 0}$$ is a pure-strategy Nash equilibrium (NE) if $$(a_t^*,a_t^*)_{t\geq 0}$$ is an admissible pair and (i) $$(a_t^*)_{t\geq 0}$$ maximizes (3) among all strategies $$(a_t)_{t\geq 0}$$ such that $$(a_{t\geq 0}, a_t^*)_{t\geq 0}$$ is an admissible pair, and (ii) $$(p_t^*)_{t\geq 0}$$ is constructed via Bayes’ rule using $$(a_t^*)_{t\geq 0}$$. In a (pure-strategy) NE, the long-run player finds it optimal to follow the market’s conjecture of equilibrium play while the market is simultaneously using the same strategy to construct its belief. Thus, along the path of play, (i) the long-run player’s behaviour is sequentially rational and (ii) the long-run player and the market hold the same belief at all times. Allowing for belief divergence is, nevertheless, a critical step towards the determination of the actions that arise along the path of play, and at those off-path histories the long-run player can condition his actions on more information than that provided by the public signal; Sections 3 and 5 are devoted to this equilibrium analysis. It is important to stress, however, that for the analysis of equilibrium outcomes ($$i.e.$$ actions and payoffs), leaving behaviour after deviations unspecified in the equilibrium concept is without loss, as the full-support monitoring structure (2) makes this game one of unobserved actions.11 The focus is on equilibria that are Markov in the public belief with the property that actions are interior, and the corresponding policy ($$i.e.$$ the mapping between beliefs and actions) and payoffs exhibiting enough differentiability, as defined next: Definition 3. An equilibrium is Markov if there is $$a^*\in C^2(\mathbb{R};int (A))$$ Lipschitz such that $$(a^*(p_t^*))_{t\geq 0}$$ is a NE, and (3) under $$a_t=a_t^*=a^*(p_t^*)$$, $$t\geq 0$$, is of class $$C^2(\mathbb{R})$$ as a function of $$p\in\mathbb{R}$$. In a Markov equilibrium, behaviour depends on the public history only through the current common belief according to a sufficiently differentiable function—such equilibria are natural to analyse due to both the Markovian nature of the fundamentals and the presence of Brownian noise. Importantly, the long-run player’s realized actions are, at all time instants, a function of the complete current public history $$\xi^t$$ via the dependence of $$p_t^*$$ on $$\xi^t$$ ($$i.e.$$$$a_t^*=a^*(p_t^*[\xi^t])$$). Moreover, if $$a^*(\cdot)$$ is nonlinear, such path dependence will also be nonlinear. The rest of the article proceeds as follows. Necessary and sufficient conditions for Markov equilibria given a general best response $$\chi_t:=\chi(p_t^*,a_t^*)$$, $$t\geq 0$$, are stated in Sections 3 and 5, respectively. The applications that employ nonlinear flow payoffs (Sections 4.2 and 4.3) and the existence results (Section 6) in turn specialize on the case $$\chi_t=\chi(p_t^*)$$; as argued in Section 3 (specifically, the paragraph preceding Remark 3), this restriction is the natural one for studying traditional ratchet effects. Remark 1. (On Markov Perfect Equilibria).Any Markov equilibrium can be extended to MPE (with the market’s and the long-run player’s belief as states) provided an off-path Markov best response exists; the hurdle for showing such existence result is only technical, as the equilibrium analysis performed does not restrict the long-run player’s behaviour off the path of play.12 Importantly, if a MPE exists and the value function is of class $$C^2$$, the associated policy when beliefs are aligned in fact coincides with the policy of the Markov equilibrium found here (Remark 6, Section 5). Remark 2. (On Assumption 1 and the Lipschitz property).The differentiability and growth conditions in Assumption 1 are used to obtain necessary conditions for Markov equilibria in the form of ODEs. On the other hand, the strong convexity assumption on $$g(\cdot)$$ permits the construction of Lipschitz candidate equilibria using solutions to such ODEs. The Lipschitz property in turn guarantees that the long-run player’s best-response problem (via the market’s conjecture of equilibrium play) is well defined in the sufficiency step. While all these conditions can be relaxed, the extra generality brings no additional economic insights.13 3. Equilibrium Analysis: Necessary Conditions To perform equilibrium analysis, one has to consider deviations from the market’s conjecture of equilibrium behaviour and show that they are all unprofitable. After a deviation occurs, however, there is belief divergence, and long-run player’s belief becomes private. As I show in Section 5, the combination of hidden actions and persistent hidden information off the path of play leads traditional dynamic programming methods (i.e. HJB equations) to become particularly complex when the task is to find MPE. In order to bypass this complexity, I follow a first-order approach to performing equilibrium analysis in the Markov case. First, I derive a necessary condition for Markov equilibria: namely, if deviating from the market’s conjecture is not profitable, the value of a small degree of belief divergence must satisfy a particular ODE (Section 3.2). Secondly, I establish conditions under which a solution to this ODE used by the market to construct its conjecture of equilibrium play makes the creation of any degree of belief asymmetry suboptimal, thus validating the first-order approach (Section 5.2). As it will become clear, this approach is also particularly useful for uncovering the economic forces at play. 3.1. Laws of motion of beliefs and belief asymmetry process Standard results in filtering theory state that, given a conjecture $$(a_t^*)_{t\geq 0}$$, the market’s belief about $$\theta_t$$ given the public information up to $$t$$ is normally distributed (with a mean denoted by $$p_t^*$$).14 In the case of the long-run player, he can always subtract—regardless of the strategy followed—the effect of his action on the public signal to obtain $$dY_t:=\theta_tdt+\sigma_\xi dZ_t^\xi=d\xi_t-a_tdt$$, $$t\geq 0$$. Since $$(\theta_t,Y_t)_{t\geq 0}$$ is Gaussian, it follows that his posterior belief process is also Gaussian; denote by $$p_t:=\mathbb{E}[\theta_t|(Y_s)_{s\leq t}],\; t\geq 0$$, the corresponding mean process. In order for learning to be stationary, I set the common prior to have a variance equal to γ∗=σξ2(κ2+σθ2/σξ2−κ)>0. In this case, both the market and the long-run player’s posterior beliefs about $$\theta_t$$ have variance $$\gamma^*$$ at all times $$t\geq 0$$, and hence, $$(p_t^*)_{t\geq 0}$$ and $$(p_t)_{t\geq 0}$$ become sufficient statistics for their respective learning processes. Observe also that $$\gamma^*$$ is independent of both conjectured and actual play. In fact, because of the additively separable structure of the public signal, a change in the long-run player’s strategy shifts the distribution of the public signal without affecting its informativeness, $$i.e.$$ there are no experimentation effects.15 Lemma 1. If the market conjectures $$(a_t^*)_{t\geq 0}$$, yet $$(a_t)_{t\geq 0}$$ is being followed, then dpt∗ = −κ(pt∗−η)dt+γ∗σξ2[dξt−(pt∗+at∗)dt] and (4) dpt = −κ(pt−η)dt+γ∗σξdZt,t≥0, (5) where $$Z_t:=\frac{1}{\sigma_\xi}\left(\xi_t-\int_0^t (p_s+a_s)ds\right)=\frac{1}{\sigma_\xi}\left(Y_t-\int_0^t p_sds\right)$$, $$t\geq 0$$, is a Brownian motion from the long-run player’s perspective. Moreover, $$(\xi_t)_{t\geq 0}$$ admits the representation $$d\xi_t=(a_t+p_t)dt+\sigma_\xi dZ_t$$, $$t\geq 0$$, from his standpoint. Proof. Refer to Theorem 12.1 for the filtering equations and to Theorem 7.12 for the rest of the results in Liptser and Shiryaev (1977). ∥ The right-hand side of equation (4) offers a natural orthogonal decomposition for the local evolution of the public belief: the trend $$-\kappa(p_t^*-\eta)dt$$, in the market’s time $$t$$-information set, plus the residual “surprise” process dξt−Ea∗[dξt|Ft]=dξt−(at∗+pt∗)dt, (6) which is unpredictable from the market’s perspective. Positive (negative) realizations of this surprise process convey information that the fundamentals are higher (lower), and the responsiveness of the public belief to this news is constant and captured by the sensitivity β:=γ∗/σξ2=κ2+σθ2/σξ2−κ. (7)16 In the absence of news, the market adjusts its beliefs at rate $$\kappa$$, $$i.e.$$ at the same speed that the fundamentals change absent any shocks to their evolution. The long-run player’s belief $$(p_t)_{t\geq 0}$$ has an analogous structure, with the Brownian motion $$Z_t=\frac{1}{\sigma_\xi}\big(\xi_t-\int_0^t (p_s+a_s)ds\big)=\frac{1}{\sigma_\xi}\big(Y_t-\int_0^t p_sds\big)$$ (or, equivalently, the surprise process $$\sigma_\xi Z_t$$) now providing news about $$(\theta_t)_{t\geq 0}$$; the last equality stresses that the realizations of $$(Z_t)_{t\geq 0}$$ are independent of the strategy followed and, thus, that $$(p_t)_{t\geq 0}$$ is exogenous.17 In contrast, the public belief is controlled by the long-run player through his actions affecting the surprise term (6) via the realizations of $$(\xi_t)_{t\geq 0}$$. To see how deviations from $$(a_t^*)_{t\geq 0}$$ affect the public belief, observe that Lemma 1 states that the public signal follows $$d\xi_t=(a_t+p_t)dt+\sigma_\xi dZ_t$$ from the long-run player’s perspective. Plugging this into equation (4), straightforward algebra yields that $$\Delta_t:=p_t^*-p_t$$ satisfies dΔt=[−(β+κ)Δt+β(at−at∗)]dt,t>0,Δ0=0. (8) From (8), it is clear that deviations from $$(a_t^*)_{t\geq 0}$$ can lead to belief asymmetry $$\Delta\neq 0$$. Moreover, the long-run player’s belief is private in this case, as the correction $$d\xi_t-a_tdt$$ used to obtain $$dY_t$$ is incorrectly anticipated by the market. In particular, an upward deviation on the equilibrium path leads the market to hold an excessively optimistic belief about the fundamentals ($$i.e.$$$$\Delta_t=p_t^*-p_t>0$$), consequence of underestimating the contribution of the long-run player’s action to the public signal. I refer to $$(\Delta_t)_{t\geq 0}$$ as the belief asymmetry process. Starting from a common prior, however, beliefs remain aligned on the equilibrium path ($$i.e.$$$$\Delta_0=0$$ and $$a_t^*=a_t$$, $$t\geq 0$$, imply $$\Delta\equiv 0$$). In particular, both parties expect any surprise realization in (6) to decay at rate $$\kappa$$ along the path of play, as the common belief evolves according to $$dp_t=-\kappa(p_t-\eta)dt+\beta\sigma_\xi dZ_t$$ going forward at any on-path history (equation (5)). Finally, for notational simplicity let σ:=βσξ denote the volatility of the common belief along the path of play, where the dependence of both $$\sigma$$ and $$\beta$$ on the parameters $$(\kappa,\sigma_\theta,\sigma_\xi)$$ is omitted. 3.2. Necessary conditions: the ratcheting equation Consider the Markov case. In order to understand the form of ratcheting that arises in this model, it is useful to interpret $$(\xi_t)_{t\geq 0}$$ as a measure of performance ($$e.g.$$ output) and the market’s best response $$\chi(\cdot,\cdot)$$ as a payment that rewards high performance. For expositional simplicity, suppose that the long-run player is simply paid based on the market’s belief about the fundamentals, $$\chi(p^*,a^*)=p^*$$; this can occur if, for instance, the fundamentals reflect an unobserved payoff-relevant characteristic of the long-run player ($$e.g.$$ managerial ability). In this case, the dynamic of the public belief (4) is effectively an incentive scheme, $$i.e.$$ a rule that determines how payments are revised in response to current performance: dpt∗⏟change inpayments=−κ(pt∗−η)dt⏟exogenoustrend+β⏟sensitivity×[dξt⏟performance−(pt∗+a∗(pt∗))dt⏟target]. Central to this scheme is the presence of a arget in the form of expected performance: the long-run player will positively influence his payment if and only if realized performance, $$d\xi_t$$, is above the market’s expectation, $$\mathbb{E}^{a^*}[d\xi_t|\mathcal{F}_t]=(p^*_t+a^*(p_t^*))dt$$. But observe that the market’s updated belief feeds into the target against which the long-run player’s performance is evaluated tomorrow. Moreover, an upward revision of such target leads to a more demanding incentive scheme to be faced in the future, as it then becomes harder to generate abnormally high performance subsequently—a ratchet principle ensues.18 In continuous time, the distinction between today and tomorrow disappears. It is then natural to define a ratchet as the (local) sensitivity of the performance target with respect to contemporaneous realized performance $$d\xi_t$$, namely, Ratchet:=d(pt∗+a∗(pt∗))dξt=[1+da∗(p∗)dp∗]|p∗=pt∗×dpt∗dξt⏟=β=β+βda∗(pt∗)dp∗. (9)19 To understand the implications of this ratchet principle on incentives, consider the following strategy $$(a_t)_{t\geq 0}$$: the long-run player deviates from $$(a_t^*)_{t\geq 0}$$ for the first time at time $$t$$ by choosing $$a_t>a_t^*$$, and he then matches the market’s expectation of performance thereafter. Intuitively, through quantifying the extra effort that the long-run player must exert to avoid disappointing the market after strategically surprising the latter, this deviation helps illustrate the strength of the dynamic cost of exhibiting high performance Matching the market’s expectation of performance at all times after a deviation occurs amounts to equating the drift of $$(\xi_s)_{s>t}$$ from the market’s perspective. Thus, the long-run player must take actions according to as+ps⏟Long-run player’s expectationof performance atinstants>t=a(ps∗)+ps∗⏟market’s expectationofperformance at instants>t ⇒as=a∗(ps+Δs)+Δs,s>t. The term $$a^*(p_s+\Delta_s)$$ captures how the long-run player adjusts his actions to match the market’s expectation of future behaviour. The isolated term $$\Delta_s$$ in turn captures how his actions are modified due to holding a private belief off the path of play. Specifically, since an upward deviation makes the market overly optimistic about the fundamentals, the long-run player anticipates that he will have to exert more effort than expected by the market to match all future “targets” everything else equal, as his private belief indicates that the fundamentals are lower. If the long-run player does not deviate from $$a^*(\cdot)$$, $$p_t=p_t^*$$ holds at all times, and effort is costly according $$(g(a^*(p_t)))_{t\geq 0}$$ in this case. To compute the corresponding cost under $$(a_t)_{t\geq 0}$$, let $$\epsilon:=a_t-a^*(p_t^*)>0$$ denote the size of the initial deviation. From the dynamic of belief asymmetry (8), it follows that $$\Delta_{t+dt}=\beta\epsilon dt$$, and hence, using that $$a_s= a^*(p_s+\Delta_s)+\Delta_s$$, Δs=e−κ(s−t)βϵdt>0,∀s>t. (10) That is, the initial stock of belief asymmetry created, $$\beta\epsilon dt$$, decays at rate $$\kappa$$ under this deviation. Thus, the extra cost that the long-run player must bear to match the market expectation of performance at time $$s>t$$ corresponds, for $$\epsilon>0$$ small, to g(a∗(ps+Δs)+Δs)−g(a∗(ps))=g′(a∗(ps))×[1+da∗(ps)dp∗]β⏟ratchetϵe−κ(s−t)dt+O(ϵ2), (11) and the ratchet (9) naturally appears. In particular, sustaining performance becomes more costly as the strength of the ratchet grows when positive effort is exerted ($$i.e.$$$$g'(a)>0$$), as this requires more subsequent effort to match the market’s perceived distribution of $$(\xi_t)_{t\geq 0}$$. If $$a^*(\cdot)$$ is a Markov equilibrium, this type of deviation cannot be profitable. Thus, the extra cost of effort at time $$t$$ ($$i.e.$$$$g'(a^*(p_t))\epsilon$$) must equate the change in the long-run player’s continuation payoff. The latter value consists of the extra effort costs stated in (11), plus the additional stream of payments $$(\Delta_t)_{t\geq 0}$$ consequence of the public belief increasing from $$(p_s)_{s>t}$$ to $$(p_s+\Delta_s)_{s>t}$$. The next proposition formalizes this discussion for a general $$\chi(\cdot,\cdot)$$ as in the baseline model; recall that $$\rho:=(g')^{-1}(\cdot)$$ and that $$\sigma:=\beta\sigma_\xi$$ denotes the volatility of the common belief along the path of play. Proposition 1. (Necessary conditions for Markov equilibria).Consider a Markov equilibrium $$a^*(\cdot)$$. Then, $$g'(a^*(p))=\beta q(p)$$, where q(p):=E[∫0∞e−(r+κ)t[ddp∗[u(χ(p∗,a∗(p∗)))]|p∗=pt−g′(a∗(pt))(1+da∗(pt)dp∗)]dt|p0=p] (12)and $$dp_t=-\kappa(p_t-\eta)dt+\sigma dZ_t$$, $$p_0=p$$. The corresponding equilibrium payoff is given by U(p):=E[∫0∞e−rt[u(χ(pt,ρ(βq(pt))))−g(ρ(βq(pt)))]dt|p0=p]. (13) Proof. See the Appendix. ∥ The previous result states a constraint on the structure of any Markov equilibrium. Specifically, if $$a^*(\cdot)$$ is a Markov equilibrium, the resulting dynamic gain from the deviation under study, $$q(p)$$, must satisfy $$g'(a^*(p))=\beta q(p)$$, through which current and future equilibrium behavior are linked; $$\beta$$ in turn represents the sensitivity of the public belief to current performance. In (12), the ratchet negatively contributes to the value of the deviation whenever $$g'(a^*(p))(1+da^*/dp^*)>0$$, whereas $$\kappa$$ in the discount rate reflects that the additional payments $$(\Delta_t)_{t\geq 0}$$ generated decay at that particular rate. Finally, the equilibrium payoff (13) follows from plugging $$a^*(\cdot)=\rho(\beta q(\cdot))$$ in (3). Observe that $$q(p)$$ is, by definition, the extra value to the long-run player of inducing a small degree of initial belief asymmetry that vanishes at rate $$\kappa>0$$, when the current common belief is $$p$$; thus, $$q(\cdot)$$ is a measure of marginal utility in which, starting from a common belief, future beliefs do not coincide.20 Proposition 1 opens the possibility of finding Markov equilibria via solving for this measure of marginal utility—the next result is central to the subsequent analysis in this respect. Proposition 2. (System of ODEs for $$(q,U)$$).Consider a Markov equilibrium $$a^*(\cdot)$$. Then, $$a^*(\cdot)=\rho(\beta q(\cdot))$$, where $$q(p)$$ defined in (12) satisfies the ODE [r+κ+β+β2ρ′(βq(p))q′(p)]q(p)=ddp[u(χ(p,ρ(βq(p))))]−κ(p−η)q′(p)+12σ2q″(p),p∈R. (14) The long-run player’s payoff (13) in turn satisfies the linear ODE rU(p) =u(χ(p,ρ(βq(p))))−g(ρ(βq(p)))−κ(p−η)U′(p)+12σ2U″(p),p∈R. (15) Proof. See the Appendix. ∥ Proposition 2 presents a system of ODEs that the pair $$(q,U)$$ defined by (12)–(13) must satisfy. The $$U$$-ODE (15) is a standard linear equation that captures the local evolution of a net present value.21 Instead, the $$q$$-ODE (14) is a nonlinear equation that captures local evolution that the value of a small degree of belief asymmetry must satisfy in equilibrium. I refer to equation (14) as the ratcheting equation; this equation is novel. To understand this equation, notice first that the long-run player faces a dynamic decision problem given any $$a^*(\cdot)$$. Thus, equation (14) behaves as an Euler equation in the sense that it optimally balances the forces that determine his intertemporal behaviour. The right-hand side of equation (14) consists of forces that strengthen his incentives: myopic benefits (the first term) and cost-smoothing motives (the second and third terms); the larger either term, the larger $$q(p)$$, everything else equal.22 The left-hand side instead consists of forces that weaken his incentives: the rate of mean reversion $$\kappa$$ (the higher this value, the more transitory any change in beliefs is) and the ratchet $$\beta+\beta da^*/dp^*=\beta+\beta^2\rho'(\beta q(\cdot))q'(\cdot)$$. The novelty of equation (14) lies on the ratcheting embedded in it altering its structure relative to traditional Euler equations in dynamic decision problems, and this has economic implications. In fact, (14) is an equation for marginal utility in which the anticipation of stronger (weaker) incentives tomorrow dampens (strengthens) today’s incentives. This is seen in the interaction term $$\beta^2\rho'(\beta q(\cdot))q'(\cdot)q(\cdot)$$ on left-hand side of equation (14), where larger values of $$da^*/dp^*=\rho'(\beta q(\cdot))q'(\cdot)$$ put more downward pressure on $$q(p)$$ (and vice versa), everything else equal; in traditional Euler equations, the opposite effect arises (