Moral Hazard and the Optimality of Debt

Moral Hazard and the Optimality of Debt Abstract I show that, in a benchmark model, debt securities minimize the welfare losses associated with the moral hazards of excessive risk-taking and lax effort. For any security design, the variance of the security payoff is a statistic that summarizes these welfare losses. Debt securities have the least variance, among all limited liability securities with the same expected value. In other models, mixtures of debt and equity are exactly optimal, and pure debt securities are approximately optimal. I study both static and dynamic security design problems, and show that these two types of problems are equivalent. I use moral hazard in mortgage lending as a recurring example, but my results apply to other corporate finance and principal-agent problems. 1. Introduction Debt contracts are widespread, even though debt encourages excessive risk taking. In this article, I show that debt is the optimal security design in a model in which both reduced effort and excessive risk-taking are possible, even though debt leads to excessive risk taking. In the model, the seller of the security can alter the probability distribution of outcomes in arbitrary ways. This allows the seller to both alter the mean value of the outcome (‘effort’) and change the other moments of the distribution of outcomes (‘risk-shifting’). To minimize the welfare losses arising from this moral hazard, the security’s payout must be designed to minimize variance. Debt securities are optimal because, among all limited-liability securities with the same expected value, they have the least variance. The model is motivated by settings in which debt contracts are prevalent and both reduced effort and risk-shifting are possible. For example, in residential mortgage origination, lenders might be able to both underwrite loans more or less diligently (effort) and use private information to choose more or less risky borrowers (risk-shifting). Prior to the 2008 financial crisis, mortgage lenders sold debt securities, backed by mortgage loans, to outside investors. The issuance of these securities may have weakened the incentives of mortgage lenders to lend prudently. Despite this effect, I argue that debt can be optimal, because debt securities balance the need to encourage effort with the need to avoid risk-shifting. Many elements of the model are standard in the security design literature. The security is the portion of the asset value received by the outside investors, and is subject to limited liability constraints. If the seller retains a levered equity claim, she1 has sold a debt security. There are gains from trade, meaning that the outside investors value the security more than the seller does, holding the distribution of outcomes fixed. Both the outside investors and the seller are risk-neutral. The key non-standard element of the model is a flexible form of moral hazard, which builds on the work of Holmström and Milgrom (1987). The seller, through her actions, can create a “zero-cost” distribution of outcomes, which she will do if she has no stake in the outcome. If the seller creates any other probability distribution, she incurs a cost. In my benchmark model, the cost to the seller of choosing a probability distribution $$p$$ is proportional to the Kullback-Leibler divergence (or “relative entropy”) of $$p$$ from the zero-cost distribution. Under this assumption, the combined effects of reduced effort and risk-shifting can be summarized by one statistic, the variance of the security payoff. The gains from trade are proportional to another statistic, the mean security payoff. Debt securities maximize mean-variance trade-offs over the set of limited liability securities, and are therefore optimal in this benchmark model. Minimizing the variance of the security payoff is equivalent to making the security “as flat as possible”. Intuitively, if the security pays the buyer more in state $$i$$ than in state $$j$$, the seller will inefficiently act to ensure that state $$i$$ is less likely than state $$j$$. Reducing the security payoff in state $$i$$ and increasing it in state $$j$$ would cause the seller to increase the likelihood of state $$i$$ relative to state $$j$$, benefitting the buyer. Completely flat securities would be best, but because of the limited liability constraints, the security can only be completely flat if it pays nothing at all and foregoes all of the gains from trade. Debt securities are the optimal compromise: they have positive expected value, capturing some gains from trade, but are flat wherever possible, minimizing inefficient actions by the seller. I also analyse larger classes of cost functions. When the cost function is not the KL divergence, but instead another $$\alpha$$-divergence, the optimal security designs exist on a continuum, with the “live-or-die” security of Innes (1990) at one end (see the Appendix, Figure A.1), equity at the other, and debt in the middle. In some cases, the security design is upward sloping, and can be thought of as a mix of equity and debt. In other cases, the optimal security design is downward sloping. In these cases, restricting security designs to be monotone for the buyer restores the optimality of debt. Both the KL divergence and the other $$\alpha$$-divergences are part of a broader class of divergences, the invariant divergences. For this class of divergences, I show that debt securities, and mixtures of debt and equity, are approximately optimal. The approximation I use applies when the moral hazard and gains from trade are small relative to scale of the assets. It is appropriate in settings in which the difference, in utility terms, between a well-designed contract and a poorly designed contract is comparable to the seller’s “value added”. I describe the approximation in more detail, and discuss when it is and is not appropriate, in Section 5. Under this approximation, debt is first-order optimal, meaning that debt securities are a detail-free way to achieve nearly the same utility as the optimal security design. Mixtures of debt and equity, which correspond to the optimal contracts for $$\alpha$$-divergences, are second-order optimal for all invariant divergences. This can be interpreted as a “pecking order”, in which the security design grows more complex as the size of both the moral hazard problem and gains from trade grow, relative to the scale of the assets. Finally, I provide a micro-foundation for the security design problem with the KL divergence cost function, using a dynamic model. I show that a continuous-time moral hazard problem, similar to Holmström and Milgrom (1987), is equivalent to the static moral hazard problem. The equivalence of the static and dynamic problems provides an intuitive explanation for how the seller can create any probability distribution of outcomes. The key distinction between the dynamic models I discuss and the principal-agent models of Holmström and Milgrom (1987) is limited liability. In Holmström and Milgrom (1987), linear contracts for the seller (agent) are optimal, because they induce the seller to take the same (efficient) action each period. In my model, because of limited liability, the only way to implement the efficient action at every state and time is to offer the seller a very large share of the asset value. However, offering the seller a large share of the asset value limits the gains from trade. It is preferable to pay the seller nothing in the worst states of the world, and then at some point offer a linear payoff. Even though this design does not induce the seller to take the efficient action at every state and time, it achieves more gains from trade. The design for the retained tranche that I have just described, levered equity, corresponds to selling a debt security. This optimality of debt in my benchmark model illustrates a key distinction between my model and the existing security design literature. The classic paper of Jensen and Meckling (1976) argues that debt securities are good at providing incentives for effort, but create incentives for risk-shifting, while equity securities avoid risk-shifting problems, but provide weak incentives for effort. A natural conjecture, based on these intuitions, is that when both risk-shifting problems and effort incentives are important, the optimal security will be “in between” debt and equity. In my benchmark model, contrary to this intuition, a debt security is optimal. The argument of Jensen and Meckling (1976) that debt is best for inducing effort relies on a restriction to monotone security designs. The “live-or-die” result of Innes (1990) shows that when the seller can supply effort to improve the distribution of outcomes (in a monotone likelihood ratio property sense), it is efficient to give the seller all of the asset value when the asset value is high, and nothing otherwise. A revised intuition, which I formalize in Section 4, is that the securities (including debt) that optimally balance encouraging effort and avoiding risk-shifting are “in between” the live-or-die security and equity.2 The benchmark model in this article takes the idea of flexibility in moral hazard problems to an extreme, allowing the seller to create any probability distribution of outcomes, subject to a cost. This approach to moral hazard problems was introduced by Holmström and Milgrom (1987). It is conceptually similar to the notion of flexible information acquisition, emphasized in Yang (2015).3 However, in this article, the cost of choosing a probability distribution should be interpreted as a cost associated with the actions required to cause that distribution to occur ($$e.g.$$ underwriting or not underwriting mortgage loans). In the rational inattention literature, which Yang (2015) builds on, gathering or processing information (as opposed to taking actions) is costly. This distinction is blurred in the rational inattention micro-foundation in the Online Appendix, Section 2. In contrast, much of literature on security design with moral hazard allows the seller to control only one or two parameters of the probability distribution. These papers do not find that debt is optimal. In Acharya et al. (2016), bank managers can both shift risk and pursue private benefits, but do this by choosing among three possible investments. In Edmans and Liu (2010), who argue that is efficient for the agent (not the principal) to hold debt claims, also have a binary project choice. Closer to this article is Biais and Casamatta (1999), in which there are three possible states and two levels of effort and risk-shifting. Biais and Casamatta (1999) interpret the optimal contracts over those three states as mixtures of debt and equity. Hellwig (2009) has a two-parameter model with continuous choices for risk-shifting and effort, and finds that a mix of debt and equity are optimal. In his model, risk-shifting is costless for the agent. Fender and Mitchell (2009) have a model of screening and tranche retention, which is a single-parameter model. This article differs from this literature by allowing for arbitrary outcome spaces, arbitrary probability distributions, and continuous moral hazard choices, which makes deriving general results difficult (Grossman and Hart, 1983), and by considering flexible models of moral hazard. In the Online Appendix, Section 1, I discuss how to extend my results to parametric models, relating the framework I develop to this literature. Innes (1990) advocates a moral-hazard theory of debt, but debt is optimal only when the seller controls a single parameter, and the security is constrained to be monotone. If the security does not need to be monotone, or if the seller controls both the mean and variance of a log-normal distribution, the optimal contract is not debt.4 In the corporate finance setting, one argument for monotonicity is that a manager can borrow from a third party, claim higher profits, and then repay the borrowed money from the extra contract payments. In addition to the accounting and legal barriers to this kind of “secret borrowing”, the third party might find it difficult to force repayment. In the context of asset-backed securities, where cash flows are more easily verified, secret borrowing is even less plausible. Another argument in favour of monotonicity concerns the possibility of the buyer (principal, outside shareholders) sabotaging the project. In the context of securitization, the buyer exerts minimal control over the securitization trust and sabotage is not a significant concern. There is a large literature that justifies debt for reasons other than moral hazard. Papers invoking adverse selection include Nachman and Noe (1994), DeMarzo and Duffie (1999), Dang et al. (2011), Vanasco (2017), and Yang (2015). In unreported results, I find that the benchmark model of this article and of Yang (2015) can be combined to produce debt as the optimal contract, whereas other parametric models of moral hazard, when combined with Yang (2015), would not generally result in debt. Other theories of debt include costly state verification (Townsend, 1979; Gale and Hellwig, 1985) and explanations based on control or limiting investment (Jensen, 1986; Aghion and Bolton, 1992; Hart and Moore, 1994). I begin in Section 2 by explaining the benchmark security design problem, whose structure is used throughout the article. I then show in Section 3 that for a particular cost function, debt is optimal, and explain how this relates to a mean-variance trade-off. Next, I analyse other cost functions in Section 4, describing the optimal contracts and showing a related mean-variance trade-off applies. I will then introduce an approximation in Section 5, and show that for an even larger class of cost functions, the same tradeoffs hold in an approximate sense. In Section 6 and Section 7, I provide micro-foundations for the non-parametric models, from a continuous time model. In the Appendix, Section C, I discuss a calibration for the example of residential mortgage lending. In the Online Appendix, Section 1, I discuss parametric models, and apply the results in Online AppendixSection 2 to a model of rational inattention in mortgage lending. 2. Model Framework In this section, I introduce the security design framework that I will discuss throughout the article. The problem is close to Innes (1990) and other papers in the security design literature. There is a risk-neutral agent, called the “seller”, who owns an asset in the first period. In the second period, one of $$N+1$$ possible states, indexed by $$i\in\Omega=\{0,1,\ldots,N\}$$, occurs.5 In each of these states, the seller’s asset has an undiscounted value of $$v_{i}$$. I assume that $$v_{0}=0$$, $$v_{i}$$ is non-decreasing in $$i$$, and that $$v_{N}>v_{0}$$. The seller discounts second period payoffs to the first period with a discount factor $$\beta_{s}$$. There is a second risk-neutral agent, the “buyer”, who discounts second period payoffs to the first period with a larger discount factor, $$\beta_{b}>\beta_{s}$$. Because the buyer values second period cash flows more than the seller, there are “gains from trade” if the seller gives the buyer a second period claim in exchange for a first period payment. I will refer to the parameter $$\kappa=\frac{\beta_{b}-\beta_{s}}{\beta_{s}}$$ as the gains from trade.6 I assume there is limited liability, so that in each state the seller can credibly promise to pay at most the value of the asset. I also assume that the seller must offer the buyer a security, meaning that the second period payment to the buyer must be weakly positive. In this sense, the seller must offer the buyer an “asset-backed security”. When the asset takes on value $$v_{i}$$ in the second period, the security pays $$s_{i}\in[0,v_{i}]$$ to the buyer. Following the conventions of the literature, I will say that the security is a debt security if $$s_{i}=\min(v_{i},\bar{v})$$ for some $$\bar{v}\in(0,v_{N})$$. To simplify the exposition, I make particular assumptions about the timing of the events and the bargaining power of the agents. I will assume that, during the first period, the seller first designs the security, and then makes a “take-it-or-leave-it” offer to the buyer at price $$K$$. If the buyer rejects the offer, the seller retains the entire asset. After the buyer accepts or rejects the offer, the seller takes actions that modify the value of the assets (the moral hazard). The first period ends, uncertainty is resolved, and then in the second period payoffs are determined. This timing convention, which is standard in principal-agent models, is not appropriate for some applications. For example, in mortgage origination, much of the lender’s moral hazard occurs when the loans are being underwritten, before they are sold to outside investors. In the Appendix, Section B, I show that this timing of events is not necessary for the main results. This robustness to the timing of events contrasts with models based on adverse selection by the seller, such as DeMarzo and Duffie (1999), in which the timing of events is crucial. In the same Appendix section, I also show that allowing the buyer and seller to Nash-bargain over the price, or over both the price and security design, does not alter the main results. The moral hazard problem occurs when the seller creates or modifies the asset. During this process, the seller will take a variety of actions, and these actions will alter the probability distribution of second period asset values. Following Holmström and Milgrom (1987), I model the seller as directly choosing a probability distribution, $$p$$, over the sample space $$\Omega$$, subject to a cost $$\psi(p)$$. I will focus models in which any probability distribution $$p$$ can be chosen, which I will call “non-parametric”. In Online AppendixSection 1, I discuss models in which $$p$$ must belong to a parametric family of distributions.7 I will make several assumptions about the cost function $$\psi(p)$$. First, I assume that there is a unique probability distribution, $$q$$, with full support over $$\Omega$$, that minimizes the cost. Second, because I will not consider participation constraints for the seller, I assume without loss of generality that $$\psi(q)=0$$. I also assume that $$\psi(p)$$ is strictly convex and at least twice differentiable. Below, I will impose additional assumptions on the cost function, but first will describe the moral hazard and security design problems. The moral hazard occurs because the seller cares only about maximizing the value of her payoff. When the value of the asset is $$v_{i}$$, the discounted value of the seller’s retained tranche is \[ \eta_{i}=\beta_{s}(v_{i}-s_{i}). \] Because of the assumption that $$v_{0}=0$$, and limited liability, it is always the case that $$\eta_{0}=s_{0}=0$$. Let $$p^{i}$$ denote the probability that state $$i\in\Omega$$ occurs, under probability distribution $$p$$. The moral hazard sub-problem of the seller can be written as \begin{equation} \phi(\eta)=\sup_{p\in M}\left\lbrace\sum_{i>0}\eta_{i}p^{i}-\psi(p)\right\rbrace,\label{eq:MH-general} \end{equation} (2.1) where $$M$$ is the set of feasible probability distributions and $$\phi(\eta)$$ is the indirect utility function. In the non-parametric case, when $$M$$ is the set of all probability distributions on the sample space, the moral hazard problem has a unique optimal $$p$$ for each $$\eta$$. Moreover, the smoothness and convexity of $$\psi(p)$$ guarantee that this optimal policy, $$p(\eta)$$, is itself differentiable with respect to $$\eta$$. In contrast, for the parametric case (Online AppendixSection 1), there may be multiple $$p\in M$$ that achieve the same optimal utility for the seller. The buyer cannot observe $$p$$ directly, but can infer the seller’s choice of $$p$$ from the design of the retained tranche $$\eta$$. At the security design stage, the buyer’s valuation of a security $$s$$ is determined by both the structure of the security and the buyer’s inference about which probability distribution the seller will choose, $$p(\eta)$$. Without loss of generality, I will define the units of the seller’s and buyer’s payoffs so that $$\beta_{s}\sum_{i}v_{i}p^{i}(\beta_{s}v_{i})=1$$. That is, if the seller retains the entire asset, and takes actions in the moral hazard problem accordingly, the discounted asset value is one. I use this convention to ensure that the units correspond to a quantity that is at least potentially observable: the value of the assets, if those assets are retained by the seller. This convention is useful in the calibration of the model in the Appendix, Section C. Let $$s_{i}(\eta)$$ be the security corresponding to retained tranche $$\eta$$. The security design problem is \begin{equation} U(\eta^{*})=\max_{\eta}\left\lbrace\beta_{b}\sum_{i>0}p^{i}(\eta)s_{i}(\eta)+\phi(\eta)\right\rbrace,\label{eq:sec-util-eq} \end{equation} (2.2) subject to the limited liability constraint that $$\eta_{i}\in[0,\beta_{s}v_{i}]$$. From the seller’s perspective, when she is designing the security, she internalizes the effect that her subsequent choice of $$p$$ will have on the buyer’s valuation, because that valuation determines the price at which she can sell the security. The security serves as a commitment device for the seller, providing an incentive for her to choose a favorable $$p$$. This commitment is costly, because allocating more of the available asset value to the retained tranche necessarily reduces the payout of the security, reducing the gains from trade. Many of the results in this article are discussed using perturbation arguments. Any infinitesimal perturbation to the security design (and therefore retained tranche) has two effects on the seller’s utility in the security design problem. The first effect is the “direct” effect, which changes the seller’s utility by transferring more or less expected value from the seller to the buyer. In general, the size of this effect is controlled by the gains from trade parameter, $$\kappa$$. The second effect is the “indirect” effect, which changes the buyer’s valuation of the security, through the change in the seller’s behaviour in the moral hazard problem. There is no “indirect” effect on the seller’s utility in the moral hazard problem, because the seller is maximizing her utility in the moral hazard problem when she chooses the probability distribution (the envelope theorem). Consider a differentiable perturbation around the optimal security design, $$\eta(\epsilon)$$, with $$\eta(0)=\eta^{*}$$, that is feasible for some $$\epsilon>0$$. As mentioned above, in the non-parametric models that I study, $$p(\eta)$$ is differentiable. In this case, the two effects of a perturbation can be summarized by the following first-order optimality condition with respect to $$\epsilon$$, the size of the perturbation: \begin{equation} \frac{\partial U(\eta(\epsilon))}{\partial\epsilon}|_{\epsilon=0^{+}}=- \underbrace{\kappa\sum_{i\in\Omega}p^{i}(\eta^{*})\frac{\partial\eta_{i}}{\partial\epsilon}|_{\epsilon=0^{+}}}_{\text{direct effect}}+ \underbrace{\beta_{b}\sum_{i,j\in\Omega}s_{j}^{*}\frac{\partial p^{j}(\eta)}{\partial\eta_{i}}|_{\eta=\eta^{*}}\frac{\partial\eta_{i}} {\partial\epsilon}|_{\epsilon=0^{+}}}_{\text{indirect effect}}\leq0.\label{eq:lagrangian-foc} \end{equation} (2.3) Below, I will further decompose the indirect effect into an indirect effect due to a change in effort and an indirect effect due to a change in risk-shifting. First, however, I will describe the cost functions that I will be studying in more detail. As discussed earlier, the cost function $$\psi(p)$$ is convex and minimized at $$\psi(q)=0$$. It follows that the cost function is proportional to a divergence8 between $$p$$ and the zero-cost distribution, $$q$$, defined for all $$p,q\in M$$ : \[ \psi(p)=\theta D(p||q). \] Here, the scalar parameter $$\theta>0$$ controls how costly it is for the seller to change the probability distribution in the moral hazard problem. I introduce this parameter for the purpose of taking comparative statics. There are many divergences that have been defined in the information theory literature (e.g. Ali and Silvey, 1966; Csiszár, 1967; Amari and Nagaoka, 2007). In Section 3, I begin the article by focusing on a particular divergence, the Kullback-Leibler divergence. The KL divergence, also called relative entropy, is defined as \[ D_{KL}(p||q)=\sum_{i\in\Omega}p^{i}\ln\left(\frac{p^{i}}{q^{i}}\right). \] The KL divergence has the assumed convexity and differentiability properties, and also guarantees that the $$p$$ chosen by the seller will be mutually absolutely continuous with respect to $$q$$. The KL divergence has been used in a variety of economic models, notably Hansen and Sargent (2008), who use it to describe the set of models a robust decision maker considers. It also has many applications in econometrics, statistics, and information theory, and the connection between the security design problem and these topics will be discussed later in the article. I will show that when the cost function is proportional to the KL divergence, debt is the optimal security design. The KL divergence is a member of the family of $$\alpha$$-divergences. These divergences are parametrized by a real number, $$\alpha$$, which controls how the curvature of the divergence changes as $$p$$ moves away from $$q$$. The $$\alpha$$-divergences can be written, whenever $$|\alpha|\neq1$$, as \[ D_{\alpha}(p||q)=\sum_{i\in\Omega}\frac{4}{1-\alpha^{2}}q^{i}\left(1-\left(\frac{p^{i}}{q^{i}}\right)^{\frac{1}{2}(1-\alpha)}+\frac{1}{2}(1-\alpha)\left(\frac{p^{i}}{q^{i}}-1\right)\right). \] The limits of $$\alpha\rightarrow-1$$ and $$\alpha\rightarrow1$$ correspond to the KL divergence and the “reversed” KL divergence, respectively.9 For this class of divergences, in Section 4 I will show that, for $$\alpha\leq-1$$, the optimal contracts are mixtures of debt and equity. Commonly discussed $$\alpha$$-divergences include the Hellinger distance ($$\alpha=0$$) and the $$\chi^{2}$$-divergence ($$\alpha=-3)$$. I will also discuss a more general class of divergences, that contains the $$\alpha$$-divergences, known as the “$$f$$-divergences”. This class of divergences can be written as \begin{equation} D_{f}(p||q)=\sum_{i\in\Omega}q^{i}f\left(\frac{p^{i}}{q^{i}}\right),\label{eq:F-Div-Def} \end{equation} (2.4) where $$f(u)$$ is a convex function on $$\mathbb{R}^{+}$$ with $$f(1)=0$$. I adopt the convention (without loss of generality) that $$f(u)\geq0$$.10 I will limit my discussion to sufficiently differentiable $$f$$-functions, for mathematical convenience, and use the normalization that $$f''(1)=1$$. The $$f$$-divergences are analytically convenient because they are additively separable (or “decomposable”) across states. The most general class of divergences that I will discuss are the “invariant divergences”, which contain the $$f$$-divergences, along with other divergences that are not additively separable, such as the Chernoff and Bhattacharyya distances. Invariant divergences are defined by their invariance with respect to sufficient statistics (Čencov, 2000; Amari and Nagaoka, 2007).The exact definition of an invariant divergence is rather technical; for our purposes, what is special about these divergences is that, up to second order, they resemble the KL divergence, and up to third order, they resemble the $$\alpha$$-divergences. In Section 5, I will define this “resemblance” more precisely, and define how a security design can be “approximately optimal”. I will then show that debt, or mixtures of debt and equity, are approximately optimal as a result. To summarize, the divergences I discuss are related in the following way: \[ KL\in\alpha-\text{divergences}\subset f\text{-divergences}\subset\text{Invariant Divergences}\subset\text{All Divergences}. \] The KL divergence, and the broader class of invariant divergences, are interesting because they are closely related to ideas from information theory. In the Online Appendix, Section 2, I illustrate this in a model based on rational inattention (Sims, 2003), in which the cost function is related to the KL divergence. The KL divergence cost function can also be micro-founded from a dynamic moral hazard problem. In Section 6, I show that a large class of continuous time problems are equivalent to the static moral hazard problem with a divergence cost function, and show that in a particular case, that divergence is the KL divergence. In Section 7, I extend this analysis to a more general class of continuous time problems and show that they are related, in a certain sense, to static moral hazard problems with invariant divergence cost functions. I will refer throughout the article to “effort” and “risk-shifting” as separate components of the moral hazard problem. Next, I will define “effort” and “risk-shifting” formally, and clarify the connection between this framework and more conventional models of moral hazard. I define “effort” as the change in the discounted expected value of the assets: \[ e=\beta_{s}\sum_{i\in\Omega}(p^{i}-q^{i})v_{i}. \] Given a retained tranche $$\eta$$, define the effort it induces as $$e(\eta)$$. For any $$\eta$$, there is an “equivalent equity share”, $$\gamma(\eta)$$, for the seller that would induce the same amount of effort: $$e(\eta)=e(\gamma(\eta)\beta_{s}v)$$.11 In the model of Innes (1990), the seller is restricted to choosing from a family of probability distributions that satisfy a monotone likelihood ratio property. As a result, effort, defined in this way, is one-to-one with the choice variable in Innes (1990). In models with more flexible moral hazard, effort is not one-to-one with the choices of the agent. In these models, we can define “risk-shifting” as the actions that the agent takes which change the probability distribution of outcomes without changing the expected value of asset. This includes actions that change the higher moments of the asset distribution, and also actions that keep the distribution of asset values constant, but move probability between states with the same asset value ($$i,j\in\Omega$$ with $$v_{i}=v_{j}$$). Using these definitions of effort and risk-shifting, I decompose the indirect effect of any security design perturbation (equation 2.3) into effort and risk-shifting components. Lemma 1. The indirect effect of any security design perturbation can be decomposed into an effect due to the change in effort, and an effect due to the change in risk shifting: \begin{align*} \underbrace{\beta_{b}\sum_{j\in\Omega}s_{j}^{*}\frac{dp^{j}(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}}}_{\textit{indirect effect}} & =\underbrace{\frac{\beta_{b}}{\beta_{s}}(1-\gamma(\eta^{*}))\frac{de(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}}}_{\textit{indirect effect on effort}}-\\ & \underbrace{\frac{\beta_{b}}{\beta_{s}}\sum_{j\in\Omega}\frac{dp^{j}(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}}(\eta_{j}^{*}-\gamma(\eta^{*})\beta_{s}v_{j})}_{\textit{indirect effect on risk shifting}}. \end{align*} Proof. See Online Appendix, Section 3.1. ǁ This decomposition is not unique; there are many other ways of decomposing the indirect effects into different components. This particular decomposition connects the flexible moral hazard framework used in this article to other models of moral hazard. Using this definition of effort and risk-shifting, an equity contract causes no utility loss due to risk-shifting, because an equity contract is identical to its “equivalent equity” contract, consistent with the argument of Jensen and Meckling (1976). However, equity contracts might not be a very efficient way to induce effort by the seller. If the effort level is one-to-one with the seller’s choices (as in Innes (1990)), there is no possibility of risk-shifting, and this framework reduces to the classic model of moral hazard. Moral hazard models with two choice parameters, such as Hellwig (2009), allow the seller to risk-shift in one dimension, while also incorporating an effort choice. The non-parametric model of moral hazard emphasized in this article extends these models by allowing more dimensions of risk-shifting. In models with only one dimension of risk-shifting, if there are many possible outcomes ($$i.e.$$ more than the three in Biais and Casamatta, 1999), there will in general be contracts other than equity contracts that also induce no risk-shifting. In contrast, in the non-parametric model of moral hazard, equity contracts are the only contracts that avoid risk-shifting entirely. The decomposition also illustrates the externalities associated with the seller’s choices in the moral hazard problem. The buyer benefits from an increase in the seller’s effort, assuming that the seller’s equivalent equity share is less than 100%. At the same time, the buyer can benefit or be harmed by the change in the seller’s risk shifting behaviour, depending on whether the change in the security design induces more or less risk shifting. I will show in the following sections that the effect of a perturbation to the security design on risk shifting depends on whether the security becomes more or less equity-like. The models described in the article use divergences to create cost functions, which rules out two interesting cases: free disposal of output by the seller and free risk-shifting. Free disposal of output by the seller is a common assumption in security design problems, and is used to justify restricting the set of securities to designs for which the seller’s payoff is weakly increasing in the asset value. Free disposal of output does not change any of the results in the article—all of the optimal security designs without free disposal have monotone payoffs for the seller, and are therefore still optimal among the set of monotone security designs. I discuss this more in the Appendix, Section D. Free risk-shifting is the assumption that only effort, and not risk-shifting, is costly for the agent. Formally, this would require that $$D(p||q)=D(p'||q)$$ for all $$p,p'$$ with the same expected value. Technically, the assumptions of strict convexity for $$D(p||q)$$ and that $$D(p||q)=0$$ only if $$p=q$$ both rule out this case. However, the analysis in this case is straightforward. As risk-shifting becomes free, concerns about risk-shifting dominate concerns about effort, and equity contracts are optimal. This result is closely related to Ravid and Spiegel (1997), Carroll (2015), and Barron et al. (2017), and is also shown in the Appendix, Section D. In this section, I have introduced the framework that I will use throughout the article. In the next section, I analyse the benchmark model, in which the cost function is the KL divergence. 3. The Benchmark Model In this section, I discuss the non-parametric version of the model, in which the set $$M$$ of feasible probability distributions is the set of all probability distributions on $$\Omega$$. I assume that the cost function is proportional to the KL divergence between $$p$$ and $$q$$, \[ \psi(p)=\theta D_{KL}(p||q). \] I will show that the optimal security design is a debt contract. In the text, I will outline the proof, using a perturbation argument; a complete proof can be found in the Appendix, Section 3.5.12 I will start by discussing the first-order condition of the moral hazard problem. The KL divergence cost function becomes infinitely sloped at the boundaries of the simplex, and therefore guarantees an interior solution to the moral hazard problem, equation 2.1, for all $$\eta$$. The KL divergence is also convex, consistent with the assumptions described in the previous section. As a result, the first-order condition in the moral hazard problem must hold. For any $$i>0$$, we have \[ \eta_{i}=\theta\left(\ln\left(\frac{p^{i}}{q^{i}}\right)-\ln\left(\frac{p^{0}}{q^{0}}\right)\right). \] Intuitively, if the seller receives a high payoff in state $$i$$, she will increase the probability of state $$i$$ relative to state $$0$$, in which she receives zero payoff. From this first-order condition, we can observe that the semi-elasticities of the relative probabilities $$p^{i}(\eta)$$ and $$p^{0}(\eta)$$ to the payoff $$\eta_{i}$$ satisfy \begin{equation} \frac{\partial\ln(p^{i}(\eta))}{\partial\eta_{i}}-\frac{\partial\ln(p^{0}(\eta))}{\partial\eta_{i}}=\theta^{-1}.\label{eq:elasticity-KL} \end{equation} (3.1) This constant difference of semi-elasticities property is part of what is special about the KL divergence. It is constant in two respects; first, the difference of the elasticities does not depend on how far $$p(\eta)$$ is from $$q$$, and second, it is symmetric across the states $$i\in\Omega$$. The $$\alpha$$-divergences that will be discussed in the next section relax the first of these properties—the elasticity will depend on how far the endogenous probability distribution is from the zero-cost distribution. The entire class of invariant divergences, which are used throughout the article, share the second property, imposing a sort of symmetry across states of the world (this is essentially the meaning of “invariant”). Using this property, we can construct perturbations of the retained tranche (and therefore the security design) that changes the probability in two different states, $$p^{i}$$ and $$p^{j}$$, with $$i>0$$ and $$j>0$$, while leaving all other probabilities unchanged. Let $$\eta^{*}$$ be the optimal design for the retained tranche. Suppose that, starting from $$\eta^{*}$$, we increase $$\eta_{i}$$ by an amount $$\frac{\epsilon}{p^{i}(\eta^{*})}$$, while decreasing $$\eta_{j}$$ by an amount $$\frac{\epsilon}{p^{j}(\eta^{*})}$$. Conjecture that this perturbation, for infinitesimal values of $$\epsilon$$, increases $$p^{i}$$ and decreases $$p^{j}$$ by $$\theta^{-1}\epsilon$$, while leaving all other probabilities, and in particular $$p^{0}$$, unchanged. We can verify this conjecture by observing that equation 3.1 above is satisfied for all states, and that the sum of the probabilities across states remains equal to one. Having constructed this perturbation, I now turn to the security design problem. Consider the following property of debt: for a security $$s$$ to be a debt, there must be no pairs $$s_{i}$$ and $$s_{j}$$, with $$i\neq j$$, such that $$s_{j}<v_{j}$$ and $$s_{j}<s_{i}$$. This property requires that if the limited liability constraint does not bind in either state $$i$$ or state $$j$$, the security values must be equal, and if the constraint binds only in one of the two states, the payoff in that state must be smaller than in the “flat” part of the debt contract. It is essentially the definition of a debt contract, subject to the caveat that “selling everything” and “selling nothing” also have this property. Suppose that the optimal security design $$s^{*}$$ does not have this property (and therefore is not debt). For this to be true, there must be no perturbation of the security design that is feasible and can improve the seller’s utility in the security design problem. Using the perturbation described above, I will show that such a perturbation does exist, and therefore that the optimal contract is a debt (or selling everything/nothing, which are ruled out in the proof in the Appendix). We have supposed that, for the optimal security design $$s^{*}$$, there is a pair of states $$i,j\in\Omega$$, $$i\neq j$$, with $$s_{j}^{*}<v_{j}$$ and $$s_{j}^{*}<s_{i}^{*}$$. Now imagine that we increase $$s_{j}$$ by $$\beta_{s}^{-1}\frac{\epsilon}{p^{j}(\eta^{*})}$$ while decreasing $$s_{i}$$ by $$\beta_{s}^{-1}\frac{\epsilon}{p^{i}(\eta^{*})}$$. The values of the retained tranche in those states, $$\eta_{i}$$ and $$\eta_{j}$$, move opposite the security design and are perturbed in exactly the manner discussed above. Note that, because $$s_{j}^{*}<v_{j}$$ and $$s_{i}^{*}>s_{j}^{*}\geq0$$, this perturbation does not violate the limited liability constraints. The effect of this perturbation on the utility in the security design problem is described by equation 2.3 in the previous section. We can see that there is no “direct effect” of this perturbation; holding the probability distribution the seller chooses fixed, the perturbation does not affect the expected value of the security design. The perturbation does increase the probability of state $$i$$ by $$\theta^{-1}\epsilon$$, and it decreases the probability of state $$j$$ by $$\theta^{-1}\epsilon$$, leaving the probability of all other states the same. Therefore, the “indirect” effect is $$\theta^{-1}(s_{i}^{*}-s_{j}^{*})$$, which was assumed to be greater than zero. It follows that this perturbation improves the seller’s utility, and therefore the optimal contract must be a debt, selling everything, or selling nothing. This argument can be summarized as showing that the security design should be “flat wherever possible.” After introducing the formal result, I will apply the decomposition between effort and risk-shifting introduced in the previous section. The following proposition summarizes this perturbation argument, rules out selling everything and selling nothing, and also establishes a result about the face value of the debt contract. Proposition 1. In the non-parametric model, with the cost function proportional to the Kullback-Leibler divergence, the optimal security design is a debt contract, \[ s_{j}^{*}=\min(v_{j},\bar{v}), \] for some $$\bar{v}>0$$. The face value of the debt satisfies \[ \beta_{b}\bar{v}-\beta_{b}\sum_{i\in\Omega}p^{i}(\eta^{*})s_{i}^{*}=\kappa\theta. \] If the highest possible asset value is sufficiently large ($$v_{N}>\sum_{i}q^{i}v_{i}+\frac{\kappa}{\beta_{b}}\theta$$), then $$\bar{v}<v_{N}$$. Proof. The results are proven in the proof of Proposition 3. ǁ The result in Proposition 1 shows that debt is optimal, for any full-support zero-cost distribution $$q$$. The condition that $$v_{N}$$ be “high enough” is weak. If it was not satisfied for some sample space $$\Omega$$ and zero-cost distribution $$q$$, one could include a new highest value $$v_{N+1}$$ in $$\Omega$$, occurring with vanishingly small probability under $$q$$, such that the condition was satisfied. Intuitively, the sample space must contain high enough values to observe the “flat” part of the debt security. The perturbation argument described above lead to the conclusion that the security design should be flat wherever possible. A different way to view the same idea, which is mathematically equivalent, can be derived by analysing the indirect effect described above. The following corollary describes the direct and indirect effects of any perturbation in the security design problem, and decomposes the “indirect effect” into effort-only and risk-shifting components. Corollary 1. Under the conditions of Proposition 1, the effect of any perturbation is \[ \frac{\partial U(\eta(\epsilon))}{\partial\epsilon}|_{\epsilon=0^{+}}=\underbrace{\kappa\frac{\partial}{\partial\epsilon} E^{p(\eta^{*})}[\beta_{s}s(\epsilon)]|_{\epsilon=0^{+}}}_{\textit{direct effect}}-\underbrace{\frac{1}{2}\frac{\beta_{b}}{\beta_{s}}\theta^{-1}\frac{\partial}{\partial\epsilon}V^{p(\eta^{*})}[\beta_{s}s(\epsilon)]|_{\epsilon=0^{+}}}_{\textit{indirect effect}}. \] The indirect effect can be decomposed into an effort-only effect \[ \frac{\beta_{b}}{\beta_{s}}(1-\gamma(\eta^{*}))\frac{de(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}}=\theta^{-1}\frac{\beta_{b}}{\beta_{s}}(1-\gamma(\eta^{*}))\frac{\partial}{\partial\epsilon}Cov^{p(\eta^{*})}[\eta(\epsilon),\beta_{s}v]|_{\epsilon=0^{+}}, \] where$$Cov^{p(\eta^{*})}$$denotes covariance, and a risk shifting effect \[ -\frac{\beta_{b}}{\beta_{s}}\sum_{j\in\Omega}\frac{dp^{j}(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}}(\eta_{j}^{*}-\gamma(\eta^{*})\beta_{s}v_{j})=-\frac{1}{2}\theta^{-1}\frac{\beta_{b}}{\beta_{s}}\frac{\partial}{\partial\epsilon}V^{p(\eta^{*})}[\eta(\epsilon)-\gamma(\eta^{*})\beta_{s}v]|_{\epsilon=0^{+}}. \] Proof. The results are proven in the proof of Corollary 3. ǁ This corollary offers a different perspective on why the KL divergence cost function leads to debt contracts as the optimal security design. The perturbation argument discussed earlier lead to the conclusion that the optimal security should be flat wherever possible. The perturbation was designed to have zero direct effect, and therefore, by Corollary 1, would only change utility to the extent that it changed the variance of the security payoff. Examining the equity, live-or-die, and debt securities shown in the Appendix, Figure A.1, it is clear why the debt security minimizes the variance of the payout, among all limited-liability securities with the same expected value—because it is as flat as possible.13 The proof of Proposition 1 shows both that the variance-minimizing security is a debt contract, and that debt is optimal in the security design problem. The corollary also discusses the role of effort and risk-shifting in the problem. Intuitively, if we perturb the security design to align the seller’s retained tranche with the value of the underlying assets, this induces the seller to exert more effort. This extra effort benefits the buyer, assuming that the seller is not the full residual claimant. The special property of the KL divergence is that the correct notion of “alignment” is covariance. Similarly, if we perturb the security design to cause the seller’s retained tranche to vary more, relative to the equity tranche that induces the same effort, we create more opportunities for risk-shifting, reducing the value of the buyer’s security. Again, the special property of the KL divergence is that the variance summarizes this effect. Several of the assumptions in the benchmark model can be relaxed without altering the debt security result of Proposition 1. The lowest possible value, $$v_{0}$$, can be greater than zero. The buyer can be risk-averse, with any increasing, differentiable utility function. As discussed in the Appendix, Section B, the timing of the events and the bargaining power of the agents can be altered without changing the result that debt is optimal. The optimal security described in Proposition 1 has an interesting comparative static. Define the “put option value” of a debt contract as the discounted difference between its maximum payoff $$\bar{v}$$ and its expected value. Proposition 1 states that \begin{equation} P.O.V.=\beta_{b}\bar{v}-\beta_{b}E^{p(\eta^{*})}[s^{*}]=\kappa\theta.\label{eq:POV} \end{equation} (3.2) When the constant $$\theta$$ is large, meaning that it is costly for the seller to change the distribution, the put option will have a high value. Similarly, when the gains from trade, $$\kappa$$, are high, the put option will have a high value. For all distributions $$q$$, a higher put option value translates into a higher “strike” of the option, $$\bar{v}$$, although the exact mapping depends on the distribution $$q$$ and the sample space $$\Omega$$. Restated, when the agents know that the moral hazard is small, or that the gains from trade are large, they will use a large amount of debt, resulting in a riskier debt security.14 In this section, I have shown that using the KL divergence cost function leads to debt securities as the optimal contract. In the next section, I consider alternative cost functions, applying the intuitions developed in this section. 4. The Non-Parametric Model with Invariant Divergences In this section, I analyse more general classes of divergences as cost functions. First, I will show that among the $$f$$-divergences, the Kullback-Leibler divergence is the only divergence that always results in debt as the optimal security design, allowing for non-monotone security designs, but there are many $$f$$-divergences for which the optimal monotone security design is always a debt security. Second, in the particular case of the $$\alpha$$-divergences, which are a subset of the $$f$$-divergences, I show that the optimal contract is, for some parameter values, a mix of debt and equity. I assume that the cost function is proportional to an $$f$$-divergence (equation 2.4): \[ \psi(p)=\theta D_{f}(p||q), \] with an associated $$f$$ function that is continuous on $$[0,\infty)$$ and twice-differentiable on $$(0,\infty)$$. These divergences are analytically tractable because they are additive separable. That is, the cost of choosing some $$p^{i}$$ is not affected by value of $$p^{j},\:j\neq i$$, except through the constraint that probability distributions must add up to one. In some cases, such as the Hellinger distance or KL divergence, the seller’s choice of $$p$$ is guaranteed to be interior, but this is not true for all $$f$$-divergences. Among this family of divergences, the KL divergence is special. Proposition 2. In the non-parametric model, with an $$f$$-divergence cost function, if the optimal security design is debt for all sample spaces $$\Omega$$ and zero-cost probability distributions $$q$$, then that $$f$$-divergence is the Kullback-Leibler divergence. Proof. See Online Appendix Section 3.2. ǁ The statement of Proposition 2 shows that the KL divergence is special, in the sense that it is the only continuous and twice-differentiable $$f$$-divergence that always results in debt as the optimal security design. The proof uses a perturbation argument, similar to the one in the previous section. Suppose that the solution to the moral hazard problem is interior. The first-order condition in the moral hazard problem, for an arbitrary $$f$$-divergence and some $$i>0$$, is \[ \eta_{i}=\theta(f'(\frac{p^{i}(\eta)}{q^{i}})-f'(\frac{p^{0}(\eta)}{q^{0}})). \] The analogue of the difference of elasticities equation used in the previous section (equation 3.1) is \[ f''(\frac{p^{i}(\eta)}{q^{i}})\frac{p^{i}(\eta)}{q^{i}}\frac{\partial\ln(p^{i}(\eta))}{\partial\eta_{i}}-f'' (\frac{p^{0}(\eta)}{q^{0}})\frac{p^{0}(\eta)}{q^{0}}\frac{\partial\ln(p^{0}(\eta))}{\partial\eta_{i}}=\theta^{-1}. \] For the KL divergence, with $$f(u)=u\ln u-u+1$$, we have $$uf''(u)=1$$, and this equation reduces to the one introduced previously. For any other $$f$$-divergence, these terms are not constant. There is still a perturbation to the retained tranche that changes the probabilities $$p^{i}$$ and $$p^{j}$$, leaving all other probabilities unchanged. Suppose that we increase $$\eta_{i}$$ by $$\frac{\epsilon}{q^{i}}f''(\frac{p^{i}(\eta^{*})}{q^{i}})$$, and decrease $$\eta_{j}$$ by $$\frac{\epsilon}{q^{j}}f''(\frac{p^{j}(\eta^{*})}{q^{j}})$$. Using the same logic described in the previous section, this perturbation increases $$p^{i}$$ by $$\theta^{-1}\epsilon$$ and decreases $$p^{j}$$ by the same amount, leaving all other probabilities unchanged. Now suppose that a debt contract is the optimal security design, for an arbitrary $$f$$-divergence, and that there are two states associated with the flat part of the debt contract, $$i$$ and $$j$$. Consider, as before, a perturbation that decreases the value of the security in state $$i$$, while increasing the value of the security in state $$j$$, so that the values of the retained tranche, $$\eta_{i}$$ and $$\eta_{j}$$, change as described in the previous paragraph. Note that, because we have assumed that the states $$i$$ and $$j$$ are associated with the flat part of the debt contract, this perturbation is feasible. By construction, the “indirect effect” (see equation 2.3) of this perturbation is zero. The probability of state $$i$$ increases by $$\theta^{-1}\epsilon$$, while the probability of state $$j$$ decreases by the same amount, and we have assumed that $$s_{i}=s_{j}$$. However, the “direct effect” is not necessary zero. We have \begin{equation} \frac{\partial U(\eta(\epsilon))}{\partial\epsilon}=\kappa(\frac{p^{j}(\eta^{*})}{q^{j}}f'' (\frac{p^{j}(\eta^{*})}{q^{j}})-\frac{p^{i}(\eta^{*})}{q^{i}}f''(\frac{p^{i}(\eta^{*})}{q^{i}})).\label{eq:f-perturb} \end{equation} (4.1) Of course, if $$uf''(u)$$ is constant, then this effect is also zero (the KL divergence case). However, in general this will not be the case, and either this perturbation or the “reverse” perturbation (with respect to the states $$i$$ and $$j$$) can improve the seller’s utility. The proof of Proposition 2 finishes the argument by constructing samples spaces $$\Omega$$ and zero-cost distributions $$q$$ such that, for debt to always be optimal, $$uf''(u)$$ must be constant for all $$u\in[0,\infty)$$. This result depends crucially on the possibility of non-monotone security designs. I have argued in the introduction that, in the context of securitization, there is no particular reason to think that security designs must be monotone. However, in other contexts, following many papers in the security design literature, it may be appropriate to require that security designs result in payoffs that are weakly increasing for both the buyer and the seller. If we impose this assumption, the perturbation logic described above leads to a very different conclusion—that debt securities are optimal as long as $$uf''(u)$$ is weakly decreasing in $$u$$. I will say that a security design is weakly monotone for the buyer if $$v_{j}\geq v_{i}$$ implies that $$s_{j}\geq s_{i}$$. Suppose that $$v_{j}\geq v_{i}$$ and $$s_{j}=s_{i}$$. In this case, $$\eta_{j}\geq\eta_{i}$$, and therefore, by the seller’s first-order condition and the convexity of the $$f$$ function, $$\frac{p^{j}(\eta)}{q^{j}}\geq\frac{p^{i}(\eta)}{q^{i}}.$$ That is, because the seller’s payoff is higher in state $$j$$ than in state $$i$$, she acts to increase the likelihood of state $$j$$ relative to state $$i$$. If $$uf''(u)$$ is weakly decreasing in $$u$$, the perturbation analysed in equation 4.1 (increasing $$s_{j}$$ and decreasing $$s_{i}$$), starting from a debt security design, reduces the seller’s welfare. Because of the requirement that security designs be monotone, the reverse perturbation (decreasing $$s_{j}$$ and increasing $$s_{i}$$) is not feasible. As a result, there is no feasible perturbation that can increase welfare, and debt is optimal. The corollary below summarizes the result: Corollary 2. In the non-parametric model, with an $$f$$-divergence cost function such that $$uf''(u)$$ is weakly decreasing in $$u$$, if security designs are required to be monotone for the buyer, then the optimal security design is debt, selling nothing, or selling everything, for all sample spaces $$\Omega$$ and zero-cost probability distributions $$q$$. Proof. See Online Appendix Section 3.3. ǁ The result of Proposition 2 raises another question: absent monotonicity constraints, what are the optimal security designs with this class of cost functions? The logic of the perturbation argument above leads us to conclude that the function $$uf''(u)$$ plays a critical role in determining the shape of the contract. For a particular sub-class of $$f$$-divergences, the $$\alpha$$-divergences, the resulting optimal contracts are easy to characterize. Recall that, for the $$\alpha$$-divergences, \[ f(u)=\frac{4}{1-\alpha^{2}}(1-u^{\frac{1}{2}(1-\alpha)}+\frac{1}{2}(1-\alpha)(u-1)). \] For these divergences, when $$\alpha<-1$$, it is possible that the seller will set $$p^{i}=0$$ for some $$i\in\Omega$$. The proof of Proposition 3 deals with this possibility; in the main text, I will assume that $$p(\eta)$$ is interior in the neighbourhood of the optimal security design. It follows from the iso-elastic nature of these $$f$$-functions that \[ uf''(u)=1-\frac{1+\alpha}{2}f'(u). \] The first-order condition of the moral hazard problem implies that, for any retained tranche $$\eta$$, \[ \frac{p^{j}(\eta)}{q^{j}}f''(\frac{p^{j}(\eta)}{q^{j}})-\frac{p^{i}(\eta)}{q^{i}}f'' (\frac{p^{i}(\eta)}{q^{i}})=\frac{1+\alpha}{2}\theta^{-1}(\eta_{i}-\eta_{j}). \] Consider the same perturbation discussed above: increasing $$\eta_{i}$$ by $$\frac{\epsilon}{q^{i}}f''(\frac{p^{i}(\eta^{*})}{q^{i}})$$, and decreasing $$\eta_{j}$$ by $$\frac{\epsilon}{q^{j}}f''(\frac{p^{j}(\eta^{*})}{q^{j}})$$. Suppose that this is feasible. As discussed above, this will increase $$p^{i}$$ by $$\theta^{-1}\epsilon$$ and decrease $$p^{j}$$ by the same amount. If the security is not flat, the “indirect effect” is non-zero: \[ \beta_{b}\sum_{i,j\in\Omega}s_{j}^{*}\frac{\partial p^{j}(\eta)}{\partial\eta_{i}}|_{\eta=\eta^{*}}\frac{\partial\eta_{i}} {\partial\epsilon}|_{\epsilon=0^{+}}=\theta^{-1}\beta_{b}(s_{i}-s_{j}). \] Similarly, as argued above, the “direct effect” is non-zero: \begin{align*} -\kappa\sum_{i\in\Omega}p^{i}(\eta^{*})\frac{\partial\eta_{i}}{\partial\epsilon}|_{\epsilon=0^{+}} & =\kappa(\frac{p^{j}(\eta^{*})}{q^{j}}f''(\frac{p^{j}(\eta^{*})}{q^{j}})-\frac{p^{i}(\eta^{*})}{q^{i}}f''(\frac{p^{i}(\eta^{*})}{q^{i}}))\\ & =\kappa\frac{1+\alpha}{2}\theta^{-1}(\eta_{i}-\eta_{j}). \end{align*} It follows that if \[ \frac{\beta_{s}(s_{i}-s_{j})}{\eta_{i}-\eta_{j}}=-\frac{\kappa}{1+\kappa}\frac{1+\alpha}{2}, \] the indirect and direct effects will cancel, and this perturbation will not change the utility in the security design problem. For the optimal security, for all $$i,j\in\Omega$$ such that the limited liability constraints do not bind, the relative slopes of the security and retained tranche are the same. For the $$\alpha$$-divergences, the optimal contracts will be straight lines wherever the limited liability constraints do not bind. When $$\alpha=-1$$ (the KL divergence case), we recover the result that the optimal contract is flat when the constraints do not bind. For $$\alpha<-1$$, the required constant is positive, which implies that both the security design and the retained tranche are upward sloping (in the region where the limited liability constraints do not bind). When $$\alpha>-1$$, the required constant is negative, implying a downward sloping (and therefore non-monotone) security design. These are the $$\alpha$$-divergences for which $$uf''(u)$$ is decreasing in $$u$$. If the security design was required to be monotone, Corollary 2 would apply, and debt (or selling everything/nothing) would be optimal. The proposition below summarizes these ideas, describing the optimal contract for all $$\alpha$$. Proposition 3. Define $$s_{\alpha,i}$$ as the optimal security design for the problem with an $$\alpha$$-divergence cost function. If $$\alpha<1+\frac{2}{\kappa}$$, there exists a constant $$\bar{v}\geq0$$ such that \[ s_{\alpha,i}=\begin{cases} v_{i} & if\;v_{i}<\bar{v}\\ \max[-\frac{\kappa(1+\alpha)}{2+\kappa(1-\alpha)}(v_{i}-\bar{v})+\bar{v},0] & if\;v_{i}\geq\bar{v}. \end{cases} \] If $$\alpha\geq1+\frac{2}{\kappa}$$, the optimal security design is the “live-or-die” contract, \[ s_{\alpha,i}=\begin{cases} v_{i} & if\;v_{i}<\bar{v}\\ 0 & if\;v_{i}>\bar{v}. \end{cases} \] When $$\alpha<-3$$, $$\bar{v}$$ is strictly greater than zero. In all of these cases, if the highest possible asset value is sufficiently large ($$v_{N}>\sum_{i}q^{i}v_{i}+\frac{\kappa}{\beta_{b}}\theta$$), then $$\bar{v}<v_{N}$$. Proof. See Online Appendix Section 3.10. ǁ The optimal security design can be thought of as a mixture of debt and equity (at least when $$\alpha\leq-1$$), whose slope is determined by the gains from trade parameter $$\kappa$$ and the parameter $$\alpha$$. For any $$\alpha>-1$$, the optimal contract is non-monotonic, first increasing up to $$\bar{v}$$, then decreasing, and finally paying the buyer zero for the highest asset values. In Figure A.2, in the Appendix, I illustrate the different optimal security designs associated with varying values of $$\alpha$$, holding $$\bar{v}$$ fixed. In Corollary 1 below, I decompose the effects of any perturbation into direct and indirect effects, and then further decompose the indirect effects into effort and risk-shifting components. As in the KL divergence case (Corollary 1), expectations, variances, and covariances appear in these expressions. However, the variances and covariances are taken under a probability distribution $$\hat{p}$$, which is a sort of weighted average of the probability distributions $$p^{*}(\eta)$$ and $$q$$, for which the weights depend on the parameter $$\alpha$$. Because the optimal security designs are monotone for the seller, when $$\alpha>-1$$, $$\hat{p}(p(\eta^{*}))$$ places more mass on the best states of the world than $$p(\eta^{*})$$. In this case, the indirect effect is larger relative to the direct effect, when compared with $$\alpha=-1$$ (the KL divergence case). Put another way, the moral hazard concerns are larger relative to the gains from trade. As a result, the optimal security design gives less to the buyer than a debt contract in the best states of the world. When $$\alpha<-1$$, the reverse is true—in the best states of the world, $$p(\eta^{*})>\hat{p}(p(\eta^{*}))$$, and the direct effect is larger relative to the indirect effect, when compared with $$\alpha=-1$$. In this case, the gains from trade are larger relative to the moral hazard concerns in the best states, and the optimal security design gives more cashflows to the buyer than a debt contract. That is, the parameter $$\alpha$$ influences the balance of concern about gains from trade and moral hazard across the various states. This effect occurs because the parameter $$\alpha$$ controls the way the curvature of the cost function changes as the seller moves $$p(\eta)$$ away from $$q$$. Recall that, for all $$f$$-divergences, including the $$\alpha$$-divergences, we normalized the $$f$$ function so that $$f''(1)=1$$. For the $$\alpha$$-divergences, we have \begin{equation} f_{\alpha}'''(1)=-\frac{1}{2}(\alpha+3).\label{eq:alpha-div-third-order} \end{equation} (4.2) When $$\alpha$$ is large, the cost function becomes less curved as $$p^{i}$$ becomes large relative to $$q^{i}$$, and more curved as $$p^{i}$$ becomes small relative to $$q^{i}$$. In the best states of the world, the seller increases $$p^{i}$$ relative to $$q^{i}$$ under the optimal contract. Therefore, if a perturbation increased the variance of the security design in the best states of the world, the seller would easily be able to alter her actions in response. In contrast, when $$\alpha$$ is small, the increasing curvature of the cost function in the best states of the world prevents the seller from responding to perturbations that affect those states. Corollary 3. Define \[ \hat{\theta}(p)=\theta(\sum_{j\in\Omega}(p^{j})^{\frac{1}{2}(\alpha+3)}(q^{j})^{-\frac{1}{2}(\alpha+1)})^{-1}, \] \[ \hat{p}^{i}(p)=\frac{\hat{\theta}(p)}{\theta}(p^{i})^{\frac{1}{2}(\alpha+3)}(q^{i})^{-\frac{1}{2}(\alpha+1)}. \] With an $$\alpha$$-divergence cost function, the effect of any perturbation can be written as \[ \frac{\partial U(\eta(\epsilon))}{\partial\epsilon}|_{\epsilon=0^{+}}=\underbrace{\kappa\frac{\partial}{\partial\epsilon}E^{p(\eta^{*})}[\beta_{s}s(\epsilon)]|_{\epsilon=0^{+}}}_{\textit{direct effect}}-\underbrace{\frac{1}{2}\frac{\beta_{b}}{\beta_{s}}\hat{\theta}(p(\eta^{*}))^{-1}\frac{\partial}{\partial\epsilon}V^{\hat{p}(p(\eta^{*}))}[\beta_{s}s(\epsilon)]|_{\epsilon=0^{+}}}_{\textit{indirect effect}}. \] If the solution to the seller’s moral hazard problem is interior, the indirect effect can be decomposed into an effort-only effect and a risk shifting effect, \begin{align*} &\frac{\beta_{b}}{\beta_{s}}(1-\gamma(\eta^{*}))\frac{de(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}}=\hat{\theta}(p^{*}(\eta))^{-1}\frac{\beta_{b}}{\beta_{s}}(1-\gamma(\eta^{*}))\frac{\partial}{\partial\epsilon}Cov^{\hat{p}(p(\eta^{*}))}(\eta(\epsilon),\beta_{s}v)|_{\epsilon=0^{+}},&\\ &-\!\frac{\beta_{b}}{\beta_{s}}\sum_{j\in\Omega}\frac{dp^{j}(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}}(\eta_{j}^{*}\!-\!\gamma(\eta^{*})\beta_{s}v_{j}) \!=\!-\frac{1}{2}\hat{\theta}(p^{*}(\eta))^{-1}\!\frac{\beta_{b}}{\beta_{s}}\frac{\partial}{\partial\epsilon}\!V^{\hat{p}(p(\eta^{*}))}[\eta(\epsilon)\!-\!\gamma(\eta^{*})\beta_{s}v]|_{\epsilon=0^{+}}\!.& \end{align*} Proof. See Online Appendix Section 3.6. ǁ This decomposition provides an additional perspective on why contracts with low values of $$\alpha$$ end up “equity-like” in the best states of the world. For these cost functions, $$\hat{p}(p(\eta^{*}))$$ places low weight on the best states of the world. As a result, the increased effort that results from an alignment of the seller’s incentives and the asset value in those states (the covariance term in Corollary 3) is small. The risk-shifting that occurs because the seller’s retained tranche does not resemble an equity claim (the variance term in Corollary 3) in those states is also small. It is therefore efficient to give more of the cashflows to the buyer in the best states than it would be under the KL divergence cost function, because the gains from trade effects are larger than the moral hazard effects, and this results in an increasing, equity-like security design. In the next section, I will show that these results—the optimality of a mixture of equity and debt for the alpha divergences and the notion of a mean-variance tradeoff for the security design problem—apply in an approximate sense to a much larger class of cost functions. 5. Approximations In this section, I will discuss “approximately optimal” security designs. The approximation is motivated by the following observation: for the optimal security designs with an $$\alpha$$-divergence cost function (Proposition 3), the slope of the security design depends on the gains from trade $$\kappa$$ and the parameter $$\alpha$$. In many applications, the percentage gains from trade might be quite small. For example, in the context of collateralized loan obligations, Nadauld and Weisbach (2012) estimates the cost of capital advantage due to securitization at $$17$$ basis points per year. Assuming a five-year maturity, this would imply that the buyer’s valuation of the security is roughly 1% higher than the seller valuation of the security. This finding accords with intuition—there are many economic forces (the availability of substitute securities that both the buyer and seller can trade, entry into the securitization business) that act to diminish differences in valuations. For example, suppose the cost function is the $$\chi^{2}$$-divergence ($$\alpha=-3$$), and the gains from trade are 1%. In this case, the slope of the “equity portion” of the optimal security is \[ -\frac{\kappa(1+\alpha)}{2+\kappa(1-\alpha)}=\frac{0.02}{2+0.04}\approx 1%. \] The optimal contract is a debt plus a roughly 1% equity claim for the buyer; intuitively, a standard debt contract cannot be substantially worse, from a welfare perspective. This argument used a specific cost function, but the point holds generally—unless the curvature of the cost function changes rapidly ($$\alpha$$ is very large or small), the optimal security designs will resemble debt. This argument leads to a second observation: that in models with small gains from trade, which nevertheless result in a large quantity of trade, the moral hazard must also, in some sense, be small. Recall that we normalized the problem so that the expected value of the assets, if the seller retains everything, is one. Suppose that the moral hazard is large (e.g. that the expected value of the assets, if the seller retains nothing, is one-half). If the gains from trade are 1%, then no trade is much better than selling everything. Inevitably, the optimal security design in this case will be close to selling nothing. In the example of securitization, this is counterfactual; a substantial portion of the value of the underlying assets is sold in most securitizations. This leads us to the conclusion that the moral hazard must also be small, in the sense that the difference in the seller’s effort between when she sells everything and when she sells nothing must be of similar magnitude to the gains from trade. In the case of securitization, this is consistent with empirical estimates (see the Appendix, Section C). The smallness of the moral hazard means that poorly designed contracts cannot destroy entirely the value of the assets; however, they can destroy entirely the gains from trade. It does not mean that moral hazard is unimportant. In the calibration for mortgage securitization in the Appendix, Section C, I find that using the “right” security design can substantially increase the profitability of securitization. In this section, I will show that, depending on the relative size of the moral hazard and gains from trade, no trade, trading everything, and many securities in between are consistent with both the moral hazard and gains from trade being small. In other words, the moral hazard can be small relative to the notional (asset) value being traded, but large relative to the profitability of trade, and the latter comparison will determine whether moral hazard impedes trade. Formally, the approximations I consider are first- and second-order expansions of the utility function in the security design problem. I approximate the utility of using an arbitrary security design $$s$$, relative to selling nothing, to first or second order in $$\theta^{-1}$$ and $$\kappa$$. When $$\theta^{-1}$$ is small, and therefore $$\theta$$ is large, it is difficult for the seller to change $$p$$. When $$\kappa$$ is small, the gains from trade are low. I take this approximation around the limit point $$\theta^{-1}=\kappa=0$$. This approximation applies when $$\theta^{-1}$$ and $$\kappa$$ are small but positive, consistent with the arguments above. The limit point itself is degenerate; because there is no moral hazard and no gains from trade, the security design does not matter. However, near the limit point (where the approximation applies), this is not the case; some security designs are better than other security designs. The relevance of the approximation will depend on whether $$\theta^{-1}$$ and $$\kappa$$ are small enough, relative to the higher order terms of the utility function, for those terms to be negligible. This is a question that can only be answered in the context of a particular application. In the Appendix, Section C, I discuss a calibration of the model relevant to mortgage origination, for which the approximation is accurate. The results of this section apply to all invariant divergences, a class which includes all of the $$f$$-divergences, and therefore the KL divergence and the $$\alpha$$-divergences. This class also includes divergences, such as the Chernoff and Bhattacharyya distances, that are not additively separable. Using the approximation described above, I show that debt securities achieve, up to first order, the same utility as the optimal security design, for any invariant divergence cost function. Moreover, only debt contracts have this property, and it arises through the mean-variance intuition discussed in the previous section. I also show that the optimal contracts corresponding to the $$\alpha$$-divergences achieve, up to second order, the same utility as the optimal security design, for any invariant divergence cost function. This also follows from the mean-variance intuition discussed previously. To further develop the intuition behind this result, consider the $$f$$-divergences. For any $$f$$-divergence, we can approximate the divergence to third order around $$p=q$$ as \[ \sum_{i\in\Omega}q^{i}f(\frac{p^{i}}{q^{i}})\approx\sum_{i\in\Omega}q^{i}(\frac{1}{2}(\frac{p^{i}}{q^{i}}-1)^{2}-\frac{1}{12}(\alpha+3)(\frac{p^{i}}{q^{i}}-1)^{3}), \] where we have defined $$\alpha$$ to satisfy \[ f'''(1)=-\frac{1}{2}(\alpha+3). \] This definition of $$\alpha$$ extends the relationship between the third derivative of the $$f$$ functions and the parameter $$\alpha$$ of the $$\alpha$$-divergences (equation 4.2) to a definition of the parameter $$\alpha$$ for all $$f$$-divergences. The Taylor expansion shows that, up to third order, any $$f$$-divergence can be approximated by an $$\alpha$$-divergence. Additionally, up to second order, the $$\alpha$$ parameter plays no role, and all $$f$$-divergences, including the KL divergence, are identical. I will show that, for every $$f$$-divergence, the optimal contract associated with that $$f$$-divergence and the optimal contract for the KL divergence (debt) achieve, up to second order, the same utility in the security design problem. Moreover, the optimal contract associated with that $$f$$-divergence and the optimal contract for an $$\alpha$$-divergence (with $$\alpha$$ defined as above) achieve the same utility up to third order. . A different way to view the same results is through the lens of the perturbation argument employed in the previous section. The indirect effect of the perturbation is governed by \[ \frac{p^{j}(\eta^{*})}{q^{j}}f''(\frac{p^{j}(\eta^{*})}{q^{j}})-\frac{p^{i}(\eta^{*})}{q^{i}}f''(\frac{p^{i}(\eta^{*})}{q^{i}}). \] Using the first-order condition in the moral hazard problem, one can observe that as $$\theta$$ becomes large ($$\theta^{-1}$$ small), holding the retained tranche $$\eta$$ fixed, $$p(\eta)$$ converges to $$q$$. Intuitively, as it becomes increasing costly for the seller to keep $$p$$ away from $$q$$, she responds by moving $$p$$ closer to $$q$$. In the limit, $$p$$ reaches $$q$$, and the indirect effect of the utility perturbation is zero. With the KL divergence, the indirect effect is always zero. When the indirect effect is zero, the perturbation argument described in Section 2 applies, and the optimal contract is debt. The proposition and corollary below make these arguments formally. The argument above, including a definition of the parameter $$\alpha$$, can be extended to all invariant divergences, not just additively separable ones, using the results of Čencov (2000) (see Online AppendixLemma 3). Up to third order, all invariant divergences with continuous third derivatives are equivalent to an $$\alpha$$-divergence. The proposition below formalizes the approximation results. I consider a third-order asymptotic expansion of the security design problem utility, $$U(s;\theta^{-1},\kappa)$$, around the point $$\theta^{-1}=\kappa=0$$, holding $$\beta_{s}$$ fixed as $$\kappa$$ changes. As in previous sections, the proposition applies to small perturbations of the security design, in the neighbourhood of the optimal security design.15 Proposition 4. In the non-parametric model, with a smooth, convex, invariant divergence cost function, the effects of any security design perturbation (equation 2.3) are, up to second order, \begin{eqnarray*} &&\frac{\partial U(\eta(\epsilon))}{\partial\epsilon}|_{\epsilon=0^{+}}\nonumber\\ &&\qquad= \underbrace{\kappa\frac{\partial}{\partial\epsilon}E^{p(\eta^{*})}[\beta_{s}s(\epsilon)]|_{\epsilon=0^{+}}}_{\textit{direct effect}}-\underbrace{\frac{1}{2}(1+\kappa)\theta^{-1}\frac{\partial}{\partial\epsilon}V^{\tilde{p}(p(\eta^{*}))}[\beta_{s}s(\epsilon)]|_{\epsilon=0^{+}}}_{\textit{indirect effect}}+O(\theta^{-3}+\kappa\theta^{-2}), \end{eqnarray*} where \[ p^{i}(\eta)=q^{i}+\theta^{-1}q^{i}\cdot(\eta_{i}-\sum_{j\in\Omega}q^{j}\eta_{j})+O(\theta^{-2}) \] and \[ \tilde{p}^{i}(p(\eta))=q^{i}+(\frac{3+\alpha}{2})(p^{i}(\eta)-q^{i}). \] To first order, \[ \frac{\partial U(\eta(\epsilon))}{\partial\epsilon}|_{\epsilon=0^{+}}=\underbrace{\kappa\frac{\partial}{\partial\epsilon}E^{q}[\beta_{s}s(\epsilon)]|_{\epsilon=0^{+}}}_{\textit{direct effect}}-\underbrace{\frac{1}{2}\theta^{-1}\frac{\partial}{\partial\epsilon}V^{q}[\beta_{s}s(\epsilon)]|_{\epsilon=0^{+}}}_{\textit{indirect effect}}+O(\theta^{-2}+\kappa\theta^{-1}). \] Proof. See Online AppendixSection 3.9. ǁ The accuracy of the approximation that both the moral hazard and gains from trade are small will vary by application. The generality of Proposition 4, which holds for all sample spaces, zero-cost distributions, and invariant divergences, suggests that as long as the moral hazard is not too large, the agents can neglect the details of the cost function. The first-order and second-order results of Proposition 4 are reminiscent of the perturbation results (Corollary 3) described in the previous sections. In both cases, the direct effect is the change in the expected value under an fixed, possibly endogenous probability distribution, and the indirect effect is the change in the variance under another fixed, endogenous probability distribution. To first order, and to second order when $$\alpha=-1$$, the two probability distributions are the same. This was also the case under the KL divergence, and as a result, debt securities are always first-order optimal, and second-order optimal when $$\alpha=-1$$. When $$\alpha\neq-1$$, the probability distributions are different, as in the general case of $$\alpha$$-divergences. In this case, the optimal security design for that $$\alpha$$-divergence will be the second-order optimal security design. Corollary 4. Under the assumptions of Proposition 3, there exists a debt security, $$s_{debt}$$, for which the difference between the utility achieved by $$s_{debt}$$ and the optimal security $$s^{*}$$ is second order: \[ U(s^{*};\theta^{-1},\kappa)-U(s_{debt};\theta^{-1},\kappa)=O(\theta^{-2}+\kappa\theta^{-1}). \] Under those same assumptions, there exists a security, $$s_{\alpha}$$, that is the optimal security design for an $$\alpha$$-divergence cost function (Proposition 3), for which the difference between the utility achieved by $$s_{\alpha}$$ and $$s^{*}$$ is third order: \[ U(s^{*};\theta^{-1},\kappa)-U(s_{\alpha};\theta^{-1},\kappa)=O(\theta^{-3}+\kappa\theta^{-2}). \] Proof. See Online AppendixSection 3.10. ǁ The results for first-order and second-order optimal security designs can be summarized as a type of “pecking order” theory (when $$\alpha\geq-1$$). When the moral hazard and gains from trade are small, the agents can use debt contracts. As the stakes grow larger, so that both the moral hazard and gains from trade are bigger concerns, the agents can use a mix of debt and equity. For very large stakes, the security design will depend on the precise nature of the moral hazard problem. The result of Corollary 4 shows that when the gains from trade and moral hazard are small, but not zero, debt is approximately optimal in a way that other security designs are not. In the Appendix, Figure A.3, I illustrate this idea. I assume an $$\alpha$$-divergence cost function, with $$\alpha=-7$$, which results in an optimal contract that is a mixture of debt and equity. I plot the utility of this optimal contract, as well as the best debt contract and best equity contract, relative to selling everything, for different values of $$\theta$$, with $$\kappa=\bar{\kappa}\theta^{-1}$$. As $$\theta$$ becomes large, all security designs converge to the same utility. For intermediate values of $$\theta$$, the best debt contract achieves nearly the same utility as the optimal contract, which is what the first-order approximation results show. For low values of $$\theta$$, the gap between the optimal debt contract and optimal contract grows. It is important to emphasize that the securities described in Corollary 4 are not degenerate; the debt security that is first-order optimal will not, in general, be selling everything or selling nothing. The level of the debt will be determined by the probability distribution $$q$$ and the product of $$\kappa$$ and $$\theta$$, as described in Proposition 1. The approximation I have employed assumes that $$\kappa$$ is small and $$\theta$$ is large, but makes no assumption about their product. If the gains from trade are large relative to the moral hazard ($$\kappa\theta$$ large), the level of the debt will be high. If the moral hazard is large relative to the gains from trade ($$\kappa\theta$$ small), the level of the debt will be small. As in the previous sections, we can decompose the “indirect effects” of changing the security design, which are captured by the variance term in the mean-variance tradeoff described in Proposition 4, into effort and risk-shifting components, as described by Lemma 1. Corollary 5. Under the assumptions of Proposition 3, the indirect effect can be decomposed into an effort-only effect and a risk shifting effect, \begin{align*} \frac{\beta_{b}}{\beta_{s}}(1-\gamma(\eta^{*}))\frac{de(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}} & =\theta^{-1}(1+\kappa)(1-\gamma(\eta^{*}))\frac{\partial}{\partial\epsilon}Cov^{\tilde{p}(p(\eta^{*}))}(\eta(\epsilon),\beta_{s}v)|_{\epsilon=0^{+}}\\ & +O(\theta^{-3}+\kappa\theta^{-2}), \end{align*} \begin{align*} -\frac{\beta_{b}}{\beta_{s}}\sum_{j\in\Omega}\frac{dp^{j}(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}}(\eta_{j}^{*}-\gamma(\eta^{*})\beta_{s}v_{j}) & =-\frac{1}{2}\theta^{-1}(1+\kappa)\frac{\partial}{\partial\epsilon}V^{\tilde{p}(p(\eta^{*}))}[\eta(\epsilon)-\gamma(\eta^{*})\beta_{s}v]|_{\epsilon=0^{+}}\\ & +O(\theta^{-3}+\kappa\theta^{-2}). \end{align*} Proof. The corollary follows from Proposition 4 and the proof of Corollary 3. ǁ The intuition discussed in the previous section holds. To first order, the effort and risk-shifting effects are the covariance and variance under the probability distribution $$q$$. To second order, the relevant probability distribution is distorted, in a direction that depends on whether $$\alpha$$ is greater than or less than negative one. The exact and approximate results of the last two sections apply to non-parametric models, in which the seller can choose any distribution. In the Online Appendix, Sections 1 and 2, I analyse parametric models using similar methods. In the next two sections of the article, I will discuss continuous time models of effort. I will show that these models are essentially equivalent to the non-parametric models analysed thus far. As a result, the optimality of debt and the intuitions about mean-variance tradeoffs apply in to these models as well. These sections can also be thought of as providing a micro-foundation for the static models discussed thus far. 6. Dynamic Moral Hazard In this section, I will analyse a continuous time effort problem. This problem is closely connected to the static models discussed previously. The role of this section is to explain how an agent could “choose a distribution”, and show that the mean-variance intuition and optimality of debt discussed previously apply in dynamic models. I will study models in which the seller controls the drift of a Brownian motion. The contracting models I discuss are similar to those found in Holmström and Milgrom (1987), Schaettler and Sung (1993), and DeMarzo and Sannikov (2006), among others. The models can be thought of as the continuous time limit of repeated effort models,16 in which the seller has an opportunity each period to improve the value of the asset. Two recent papers are particularly relevant. The models I discuss are a special case of Cvitanić et al. (2009). I build on the results of Bierkens and Kappen (2014), who study a single-agent control problem ($$e.g.$$ the seller’s moral hazard problem) with quadratic effort costs, and show that it is equivalent to a relative entropy minimization problem. Relative these papers, I make two contributions. First, I show that the entire class of models studied by Cvitanić et al. (2009) can be rewritten as a static, non-parametric security design problem. That is, the dynamic models discussed in this section can be thought of as providing a micro-foundation for the static problems discussed in the previous sections. For the particular case of quadratic costs and a risk-neutral seller, the results of Cvitanić et al. (2009) imply that debt is optimal. Combining my result with the results of Bierkens and Kappen (2014), dynamic models with quadratic effort costs are equivalent to static problems with a KL divergence cost function, which provides a different perspective on why debt is optimal in this setting. Second, I show that for convex, but not necessarily quadratic, cost functions, debt contracts are approximately optimal, and relate this to the intuitions discussed above. This can be viewed as a micro-foundation for the approximation results discussed in the previous section. The result is also useful because the optimal contracts in this case are quite complex; Cvitanić et al. (2009) study contracts without the limited liability constraint, and show that they depend on the entire path, not just the final value, of the state variables. There are, to my knowledge, no known results with limited liability. My results can be viewed as showing that, when the approximation is applicable, simple, non-path-dependent contracts are close to optimal. I will begin by describing the structure of the dynamic model. The timing follows the standard principal-agent convention. At time zero, the seller and buyer trade a security. Between times zero and one, the seller will apply effort (or not) to change the value of the asset. At time one, the asset value is determined and the security payoffs occur. Between times zero and one, the seller controls the drift of a Brownian motion. Define $$W$$ as a Brownian motion on the canonical probability space, $$(\Omega,\mathcal{F},\tilde{P})$$, and let $$\mathcal{F}_{t}^{W}$$ be the standard augmented filtration generated by $$W$$. Denote the asset value at time $$t$$ as $$V_{t}$$, and let $$\mathcal{F}_{t}^{V}$$ be the filtration generated by $$V$$. The seller observes the history of both $$W_{t}$$ and $$V_{t}$$ at each time, whereas the buyer observes (or can contract on) only the history of $$V_{t}$$. This information asymmetry creates the moral hazard problem. The initial value, $$V_{0}>0$$, is known to both the buyer and the seller. The asset value evolves as \[ dV_{t}=b(V_{t},t)dt+u_{t}\sigma(V_{t},t)dt+\sigma(V_{t},t)dW_{t}, \] where $$b(V_{t},t)$$ and $$\sigma(V_{t},t)>0$$ satisfy standard conditions to ensure that, conditional on $$u_{t}=0$$ for all $$t$$, there is a unique, everywhere-positive solution to this SDE.17 The seller’s control, $$u_{t}$$, should be thought of as instantaneous effort (and not “effort” in the sense of the effort/risk-shifting decomposition discussed earlier). There is a flow cost of instantaneous effort, a general form of which is $$g(t,V_{t},u_{t})$$. The function $$g(\cdot)$$ is weakly positive, twice-differentiable, and strictly convex in instantaneous effort. For all $$t$$ and $$V_{t}$$, $$g(t,V_{t},0)=0$$. Instantaneous effort always improves the expected value of the asset, holding future effort constant; that is, for all $$t$$ and $$V_{t}$$, $$E_{t}[V_{s}]$$ is increasing in $$u_{t}$$, for all $$s>t$$. In the most general formulation, the seller’s information set at each time $$t$$ consists of the current time, the histories of the Brownian motion $$W$$ and asset value $$V$$, the history of her past actions, and any public or private randomization devices she chooses to employ. Using this information, the seller could pursue pure or mixed strategies over instantaneous effort levels. However, for the models that I will discuss, it is without loss of generality to restrict the seller to strategies that are a function of the history of the asset values and time (see Cvitanić et al., 2009). Intuitively, the convexity of the cost of instantaneous effort makes mixed strategies sub-optimal. Moreover, the security is a function of the history of the asset values only. As a result, at any time $$t$$, if the seller intends to pursue an instantaneous effort strategy that is $$\mathcal{F}_{s}^{V}$$-measurable for all $$s>t$$, the optimal effort at time $$t$$ will be $$\mathcal{F}_{t}^{V}$$-measurable. Formally, I define the set of admissible strategies $$\mathscr{U}$$ as the set of $$\mathcal{F}_{t}^{V}$$-adapted, square-integrable controls such that $$E[\exp(4\int_{0}^{1}u_{s}dB_{s}-2\int_{0}^{1}u_{s}^{2}ds)]<\infty$$. The retained tranche, $$\eta(V)$$, is an $$\mathcal{F}_{1}^{V}$$-measurable random variable, meaning that it can depend on the entire path of the asset value. I continue to assume limited liability, meaning that $$\eta(V)\in[0,\beta_{s}V_{1}]$$ for all paths $$V$$. The seller’s indirect utility function can be written as \begin{equation} \phi_{CT}(\eta)=\sup_{\{u_{t}\}\in\mathscr{U}}\phi_{CT}(\eta;\{u_{t}\})=\sup_{\{u_{t}\}\in\mathscr{U}}\lbrace E^{\tilde{P}}[\eta(V)]-E^{\tilde{P}}[\int_{0}^{1}g(t,V_{t},u_{t})dt]\rbrace,\label{eq:ct-mh-eq} \end{equation} (6.1) where $$E^{\tilde{P}}$$ denotes the expectation at time zero under the physical probability measure.18 In summary, given the retained tranche, the seller chooses a time-consistent instantaneous effort strategy to control the drift of the asset value. The security design problem is similar to the security design problem in the previous sections. The seller internalizes the effects of the security design on the price that the buyer is willing to pay: \begin{align} U_{CT}(s^{*}) & =\sup_{s\in S}U_{CT}(s)\nonumber \\ & =\sup_{s\in S}\lbrace\beta_{b}E^{\tilde{P}}[s(V)]+\phi_{CT}(\eta)\rbrace,\label{eq:ct-sec-util-eq} \end{align} (6.2) where $$S$$ is the set of $$\mathcal{F}_{1}^{V}$$-measurable limited liability security designs and $$\eta(V)=\beta_{s}(V_{1}-s(V)).$$ In the proposition below, I show that this problem is equivalent to a static, non-parametric security design problem. Equivalent, in this context, means that the utility achieved by the seller in the continuous time problem, for any admissible security design, is equal to the utility achieved by that security in the static, non-parametric security design problem. Proposition 5. There exists a probability space $$(\Omega,\mathcal{F},Q)$$, Brownian motion $$B$$ defined on that probability space, and stochastic process \[ dX_{t}=b(X_{t},t)dt+\sigma(X_{t},t)dB_{t}, \] such that: (1)For all strategies $$u\in\mathscr{U}$$, there exists a measure $$P$$ under which the law of $$X$$ is equal to the law of $$V$$ under measure $$\tilde{P}$$. (2)For all securities $$s\in S$$, the indirect utility function satisfies \[ \phi_{CT}(\eta)=\sup_{P\in M}E^{P}[\eta(X)]-D_{g}(P||Q), \] where $$D_{g}$$ is a divergence and $$M$$ is the set of measures on the probability space that are absolutely continuous with respect to $$Q$$ and for which $$E^{Q}[(\frac{dP}{dQ})^{4}]<\infty.$$ (3)For all securities $$s\in S$$, if there is a unique maximizer $$P(\eta)=\arg\max_{P\in M}E^{P}[\eta(X)]-D_{g}(P||Q)$$, then security design utility function satisfies \[ U(s)=\beta_{b}E^{P(\eta)}[s(X)]+E^{P(\eta)}[\eta(X)]-D_{g}(P(\eta)||Q). \] Proof. See Online Appendix Section 3.14. The proposition relies on Girsanov’s theorem and the “weak formulation” results of Schaettler and Sung (1993) and Cvitanić et al. (2009). ǁ This proposition connects the dynamic problem introduced in this section to the static problems described in the previous sections. The intuition is that instantaneous effort strategies can be used to create any probability measure over outcomes, where an outcome is a path of the asset value. Given any point in time and history of the asset value, if the seller would like to make paths that move upward at this point more likely than paths that move downward, she can exert instantaneous effort. By doing this at each possible time and history, the seller can use her control to pick the relative likelihood of every possible path. Formally, this idea is captured by Girsanov’s theorem. These results also show that the decomposition of the seller’s actions into “effort” and “risk-shifting”, as described by Lemma 1, apply to these dynamic models as well. To prevent confusion, I will refer to the sort of effort described by Lemma 1 as “cumulative effort”, and continue to use the term “instantaneous effort” to refer to the control the seller uses. The distinction between cumulative effort and instantaneous effort is related to another important point: even though the agent does not control the instantaneous variance of the asset value process, she can “spread out” the probability measure over asset value paths, creating risk-shifting effects. The proof of the proposition shows that, for any measure $$P$$, there is a (stochastically) unique effort strategy that will create that measure. The divergence $$D_{g}(P||Q)$$ is the expected cumulative flow cost $$g(\cdot)$$ of this effort strategy. It satisfies the properties of a divergence—it is zero if $$P$$ is identical to $$Q$$, and positive otherwise. The measure $$Q$$ is the measure that corresponds to zero effort; if the agent exerts zero effort for all possible histories, the law of $$X$$ under measure $$Q$$ will be equal to the law of $$V$$ under measure $$\tilde{P}$$. One technical caveat is included in the third part of the proposition. Thus far, I have not made enough assumptions about asset value process to ensure that there is a unique optimal measure, $$P(\eta)$$, or that the seller’s utility is finite. When I discuss specific cost functions $$g$$ below, I will introduce additional assumptions about the asset value process to ensure utility is finite and that there is a unique measure that solves the moral hazard problem. I have rewritten the continuous time moral hazard problem as a static problem, in which the seller chooses a probability measure subject to a cost that is described by a divergence. In light of the results for static models, two questions immediately arise. First, is there a $$g(\cdot)$$ function such that $$D_{g}(P||Q)$$ is the Kullback-Leibler divergence, in which case a debt security will be optimal? Second, are there $$g(\cdot)$$ functions such that $$D_{g}(P||Q)$$ is an invariant divergence, in which case a debt security will be approximately optimal? The answer to the first question comes from the work of Bierkens and Kappen (2014) and the sources cited therein, who show that quadratic costs functions, $$g(t,X_{t},u_{t})=\frac{\theta}{2}u_{t}^{2}$$, lead to the KL divergence.19 Intuitively, it follows that the optimal security design is a debt security. This intuition is confirmed by specializing of the results of Cvitanić et al. (2009) to the case of a risk-neutral agent. For completeness, I present this result below, and include a proof in Appendix. The proof also demonstrates that the decomposition of a perturbation’s effects into direct and indirect effects, and the further decomposition of the indirect effect into cumulative effort and risk-shifting effects, discussed in previous sections, apply to these models as well. For the quadratic flow cost function, it is sufficient to assume that the asset value, in the absence of effort by the agent, satisfies $$E^{Q}[\exp(4\theta^{-1}X_{1})]<\infty$$, which ensures that utility is finite and that there is a unique optimal policy for the seller. Proposition 6. In the continuous time model, with the quadratic cost function, if $$E^{Q}[\exp(4\theta^{-1}X_{1})]<\infty$$, the optimal security design is a debt contract, \[ s(X)=\min(X,\bar{v}), \] for some $$\bar{v}>0$$. The decomposition of perturbations into direct and indirect effects applies: \[ \frac{\partial U_{CT}(\eta(X,\epsilon))}{\partial\epsilon}|_{\epsilon=0}=\underbrace{\kappa\frac{\partial}{\partial\epsilon}E^{P^{*}(\eta^{*})}[\beta_{s}s(X,\epsilon)]}_{\textit{direct effect}}-\underbrace{(1+\kappa)\frac{1}{2}\theta^{-1}\frac{\partial}{\partial\epsilon}V^{P^{*}(\eta^{*})}[\beta_{s}s(X,\epsilon)]}_{\textit{indirect effect}}. \] The effort/risk-shifting decomposition also applies: \[ \frac{\beta_{b}}{\beta_{s}}(1-\gamma(\eta^{*}))\frac{de(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}}=\theta^{-1}\frac{\beta_{b}}{\beta_{s}}(1-\gamma(\eta^{*}))\frac{\partial}{\partial\epsilon}Cov^{P(\eta^{*})}[\eta(\epsilon),\beta_{s}X]|_{\epsilon=0^{+}}, \] \[ \frac{\beta_{b}}{\beta_{s}}\frac{d}{d\epsilon}E^{P(\eta(\epsilon))}[\eta^{*}(X)-\gamma(\eta^{*})\beta_{s}X]=\frac{1}{2}\theta^{-1}\frac{\beta_{b}}{\beta_{s}}\frac{\partial}{\partial\epsilon}V^{P(\eta^{*})}[\eta(\epsilon)-\gamma(\eta^{*})\beta_{s}X]|_{\epsilon=0^{+}}. \] Proof. See Online Appendix Section 3.15. The optimality of debt specializes Cvitanić et al. (2009). ǁ Debt is the optimal security design in the continuous time model for same reasons it is optimal in the non-parametric model. The perturbation used in Section 3 applies with minor modifications. The intersection of these results with Holmström and Milgrom (1987) is intuitive. In the principal-agent framework, when the asset value Ito process is an arithmetic Brownian motion and the flow cost function is quadratic, without limited liability, a constant security for the principal is optimal. With limited liability, in the security design framework, optimal security simply reduces the constant payoff where necessary, and debt is optimal. The debt security design may or may not be renegotiation-proof. Suppose that at some point, say time $$t=\frac{1}{2}$$, the seller can offer the buyer a restructured security. Assume that at this time, there are no gains from trade (otherwise, if the asset value has increased, the seller will “lever up” and sell more debt to the buyer). If the current asset value is low enough, the debt security provides little incentive for the seller to continue putting in effort in the future. In this state, the buyer might agree to “write down” the debt security, even though he cannot receive any additional payments from the seller, because the buyer’s gains from increased effort by the seller could more than offset the loss of potential cash flows. In this model, write-downs can be Pareto-efficient if the time-zero expected value of the debt, $$E^{P}[s(X)]$$, is greater than $$\theta$$.20 Write-downs will never be Pareto-efficient when $$\kappa$$ and $$\theta^{-1}$$ are both small, but could occur if both the gains from trade at time zero and the moral hazard were large. In the next section, I turn to the second question: are there cost functions $$g(\cdot)$$ for which debt securities are approximately optimal? 7. A Mean-Variance Approximation for Continuous Time Models For the static models discussed earlier, invariant divergence cost functions lead to models in which debt was approximately optimal. In this section, I will not directly answer the question of whether there are functions $$g(\cdot)$$ such that $$D_{g}(P||Q)$$ is invariant. Instead, I will show that for all $$g(t,X_{t},u_{t})=\theta\psi(u_{t})$$, where $$\psi(u_{t})$$ is a convex function, debt is approximately optimal.21 The approximations used in this section are identical to the ones discussed previously, in Section 5. I consider problems in which both the moral hazard and gains from trade are small, relative to the scale of the assets. I show that the utility of arbitrary security designs can be characterized, to first-order, by a mean-variance tradeoff. The approximate optimality of debt is a surprising result in this setting. Without limited liability, Cvitanić et al. (2009) are able to characterize some properties optimal security designs, making an analogy to the results of Holmström and Milgrom (1987). However, there is no explicit solution or implementation available, and in general the optimal securities will be dependent on the entire path of asset values, not just the final value, in a non-trivial way. There are no results, to my knowledge, about the model with limited liability. I modify the models introduced in the previous section in several small ways. I will assume that the control is bounded, $$|u_{t}|\leq\bar{u}$$ (this is a restriction on the set $$\mathscr{U}$$). This assumption simplifies the discussion of conditions to ensure finite utility. I assume $$\psi$$ satisfies the conditions required for $$g$$ in the previous section, and in addition that for all $$|u|\leq\bar{u}$$, $$\psi''(u)\in[K_{1},K_{2}]$$ for some positive constants $$0<K_{1}<1<K_{2}$$. That is, $$\psi$$ is “strongly convex” over its domain. I also normalize $$\psi''(0)=1$$. I assume that, for bounded control strategies $$|u_{t}|\leq\bar{u}$$, the asset value has a finite fourth moment. That is, $$E^{\tilde{P}}[(V_{1})^{4}]<\infty$$ under these bounded control strategies. There is a sense in which any twice-differentiable, convex cost function $$\psi(u_{t})$$ resembles the quadratic cost function, as $$u_{t}$$ becomes close to zero, because their second derivatives are the same. Similarly, in static models, all invariant divergences resemble the KL divergence. I apply this idea to the divergences $$D_{\psi}$$ induced by the convex cost functions $$\psi$$ (as defined in Proposition 5). I consider the same approximation discussed earlier, in which both $$\theta^{-1}$$ and $$\kappa$$ are small. In the context of continuous time models, Sannikov (2014) discusses a related “large firm limit”. As the cost of effort rises, the seller will choose to respond less and less to the incentives provided by the retained tranche. Regardless of the cost function $$\psi$$, the divergence $$D_{\psi}(P||Q)$$ will approach $$D_{KL}(P||Q)$$, and debt will be approximately optimal. Moreover, the distinction between the effort and risk-shifting components of utility that applied in the static approximations will apply to these models as well. To make this argument rigorous, I use Malliavin calculus in a manner similar to Monoyios (2013) to prove the following theorem: Proposition 7. For any limited liability security design $$s$$, the difference in utilities achieved by an arbitrary security $$s$$ and the sell-nothing security is \[ U(s;\theta^{-1},\kappa)-U(0;\theta^{-1},\kappa)=\kappa E^{Q}[\beta_{s}s]-\theta^{-1}\frac{1}{2}V^{Q}[\beta_{s}s]+O(\theta^{-2}+\theta^{-1}\kappa). \] The direct and indirect effects of a perturbation, to first order, are the ones described in Proposition 6, under measure $$Q$$. The decomposition of the indirect effect into effort-only and risk-shifting effects is also, to first order, identical to the one described in Proposition 6, under measure $$Q$$. Proof. See Online Appendix Section 3.16. ǁ In the continuous time effort problem with an arbitrary convex cost function, debt securities are first-order optimal. The same mean-variance intuition that I discussed in static models applies to continuous time models. The variance of the security payoff is again a summary statistic for the problems of reduced effort and risk shifting associated with the moral hazard problem. 8. Extensions and Conclusion The Appendix includes several extensions and applications of the model. In Appendix Section B, I show that the main results of the article continue to hold under alternative assumptions about timing and bargaining. The results hold if the seller first chooses a probability distribution, and then offers a security to the buyer. They also hold as long as the seller has some bargaining power. They hold if the buyer and seller shared a common discount factor, but the seller was required to raise a positive amount of financing from the buyer. In Section D, I show that allowing for free disposal of output would not change any of the results of the article. However, allowing for free risk shifting by the seller would cause equity to be the optimal contract. In Online Appendix Section 1, I apply the approximations of Section 5 to parametric models of moral hazard (when the seller chooses from a family of probability distributions). I show that debt guarantees the highest “worst case scenario” utility, where the worst case refers to the set of actions available to the seller. I also show that as the flexibility of the sellers actions grows, this bound becomes increasingly tight. I apply these results to provide a second micro-foundation for the benchmark model, based on a rational inattention problem, in Online AppendixSection 2. I also argue that the approximations I employ are appropriate in the context of mortgage origination, through a calibration exercise described in Appendix Section C. In this article, I have analysed a flexible form of moral hazard, which allows for both effort and risk-shifting. In my benchmark model, with the KL divergence cost function, debt securities are exactly optimal. I provide a micro-foundation for this model in terms of a dynamic contracting problem with quadratic costs of effort. Other security designs (in some cases, a mix of debt and equity) are exactly optimal with the $$\alpha$$-divergence cost functions, and approximately optimal for the larger class of invariant divergence cost functions. In all of these models, debt is optimal or approximately optimal because it minimizes the variance of the security payout, balancing the need to provide incentives for effort, minimize risk-shifting, and maximize trade. The editor in charge of this paper was Dimitri Vayanos. A. Additional Figures Figure A.1 View largeDownload slide Possible Security Designs This figure illustrates several possible security designs: a debt security, an equity security, and the “live-or-die” security of Innes (1990). The $$x$$-axis, labelled $$\beta_{s}v_{i}$$, is the discounted value of the asset, and the $$y$$-axis, labelled $$\beta_{s}s_{i}$$, is the discounted value of the security. The level of debt, the cutoff point for the live-or-die, and the fraction of equity are chosen for illustrative purposes. The discount factor for the seller is $$\beta_{s}=0.5$$. The outcome space $$v_{i}$$ is a set of 401 evenly-spaced values ranging from zero to 8. The $$x$$-axis is truncated to make the chart clearer. Figure A.1 View largeDownload slide Possible Security Designs This figure illustrates several possible security designs: a debt security, an equity security, and the “live-or-die” security of Innes (1990). The $$x$$-axis, labelled $$\beta_{s}v_{i}$$, is the discounted value of the asset, and the $$y$$-axis, labelled $$\beta_{s}s_{i}$$, is the discounted value of the security. The level of debt, the cutoff point for the live-or-die, and the fraction of equity are chosen for illustrative purposes. The discount factor for the seller is $$\beta_{s}=0.5$$. The outcome space $$v_{i}$$ is a set of 401 evenly-spaced values ranging from zero to 8. The $$x$$-axis is truncated to make the chart clearer. Figure A.2 View largeDownload slide Second-order optimal security designs This figure shows the second-order optimal security designs, for various values of the curvature parameter $$\alpha$$. The $$x$$-axis, labelled $$\beta_{s}v_{i}$$, is the discounted value of the asset, and the $$y$$-axis, labelled $$\beta_{s}s_{i}$$, is the discounted value of the security. These securities are plotted with the same $$\bar{v}$$ for each $$\alpha$$ (not an optimal $$\bar{v}$$). The value of $$\kappa$$ used to generate this figure is one-third, which was chosen to ensure that the slopes of the contracts would be visually distinct (and not because it is economically reasonable). The outcome space $$v$$ is a set of 401 evenly-spaced values ranging from zero to 8. Figure A.2 View largeDownload slide Second-order optimal security designs This figure shows the second-order optimal security designs, for various values of the curvature parameter $$\alpha$$. The $$x$$-axis, labelled $$\beta_{s}v_{i}$$, is the discounted value of the asset, and the $$y$$-axis, labelled $$\beta_{s}s_{i}$$, is the discounted value of the security. These securities are plotted with the same $$\bar{v}$$ for each $$\alpha$$ (not an optimal $$\bar{v}$$). The value of $$\kappa$$ used to generate this figure is one-third, which was chosen to ensure that the slopes of the contracts would be visually distinct (and not because it is economically reasonable). The outcome space $$v$$ is a set of 401 evenly-spaced values ranging from zero to 8. Figure A.3 View largeDownload slide The utility of various security designs This figure compares the utility of several security designs (debt, equity, and the optimal security design) relative to the utility of selling everything, for different values of $$\theta$$. The bottom $$x$$-axis is the value of $$\ln(\theta)$$, the top $$x$$-axis is the value of $$\kappa$$, and the $$y$$-axis is the difference in security design utility between the security (debt, equity, etc.) and selling everything. For each $$\theta$$ and corresponding $$\kappa$$, the optimal debt security, equity security, and the optimal security are determined. Then, the utility of using each of the four securities designs, given $$\theta$$ and $$\kappa$$, is computed. The cost function is a $$\alpha$$-divergence, with $$\alpha=-7$$, implying that a mix of debt and equity is optimal (see Proposition 3). The gains from trade, $$\kappa$$, vary as $$\theta$$ changes, with $$\kappa=\bar{\kappa}\theta^{-1}$$, $$\bar{\kappa}=0.0171$$. This parameter was chosen to be consistent with the calibration in the Appendix, Section C. The discounting parameter for the seller is $$\beta_{s}=0.5$$. The zero-cost distribution $$q$$ is a discretized, truncated gamma distribution with mean 2, 0.3 standard-deviation, and an upper bound of 8. The outcome space $$v$$ is a set of 401 evenly-spaced values ranging from zero to 8. The utilities are plotted for nine different values of $$\theta$$, ranging from $$2\exp(-7)$$ to $$2\exp(1)$$, and linearly interpolated between those values. Figure A.3 View largeDownload slide The utility of various security designs This figure compares the utility of several security designs (debt, equity, and the optimal security design) relative to the utility of selling everything, for different values of $$\theta$$. The bottom $$x$$-axis is the value of $$\ln(\theta)$$, the top $$x$$-axis is the value of $$\kappa$$, and the $$y$$-axis is the difference in security design utility between the security (debt, equity, etc.) and selling everything. For each $$\theta$$ and corresponding $$\kappa$$, the optimal debt security, equity security, and the optimal security are determined. Then, the utility of using each of the four securities designs, given $$\theta$$ and $$\kappa$$, is computed. The cost function is a $$\alpha$$-divergence, with $$\alpha=-7$$, implying that a mix of debt and equity is optimal (see Proposition 3). The gains from trade, $$\kappa$$, vary as $$\theta$$ changes, with $$\kappa=\bar{\kappa}\theta^{-1}$$, $$\bar{\kappa}=0.0171$$. This parameter was chosen to be consistent with the calibration in the Appendix, Section C. The discounting parameter for the seller is $$\beta_{s}=0.5$$. The zero-cost distribution $$q$$ is a discretized, truncated gamma distribution with mean 2, 0.3 standard-deviation, and an upper bound of 8. The outcome space $$v$$ is a set of 401 evenly-spaced values ranging from zero to 8. The utilities are plotted for nine different values of $$\theta$$, ranging from $$2\exp(-7)$$ to $$2\exp(1)$$, and linearly interpolated between those values. B. Timing Conventions and Bargaining In this Appendix Section, I will discuss several possible timing conventions for the sequence of decisions by the seller during the first period. In that period, the seller designs the security, sells it to the buyer (assuming the buyer accepts), and takes actions that will create or modify the assets backing the security. The timing convention refers to the order in which these three steps occur. In the first timing convention, the “shelf registration” convention (using the terminology of DeMarzo and Duffie, 1999), the security is designed before the assets are created, but sold afterward. In the second timing convention, the “origination” convention, the security is designed and sold after the assets are created. In the third timing convention, the “principal-agent” convention, the security is designed and sold before the seller takes her actions. In this last convention, it is natural to assume that the asset exists before the security is designed, but its payoffs are modified by the seller’s actions after the security is traded. For the “principal-agent” timing convention, I will also discuss the effects of Nash-bargaining of the security price, and over both the security design and the security price. Finally, I point out that a requirement for the seller to raise a certain amount of funds from the buyer, as in standard corporate finance models, would also generate “gains from trade”, even if the buyer and seller shared a common discount rate. There are asset securitization examples for each of these timing conventions. For some asset classes, such as first-lien mortgages, the security design is standardized, and the “shelf registration” timing convention is appropriate. For more unusual assets, the security design varies deal-by-deal, and the “origination” timing convention is appropriate. In some cases, such as the “Bowie bonds” (securitizations of music royalties), maintaining incentives post-securitization is important, and the principal-agent timing convention applies. Table B.1 Timing conventions during the first period Principal-agent timing Origination timing Shelf registration timing Security designed Actions taken Security designed Security traded Security designed Actions taken Actions taken Security traded Security traded Principal-agent timing Origination timing Shelf registration timing Security designed Actions taken Security designed Security traded Security designed Actions taken Actions taken Security traded Security traded Table B.1 Timing conventions during the first period Principal-agent timing Origination timing Shelf registration timing Security designed Actions taken Security designed Security traded Security designed Actions taken Actions taken Security traded Security traded Principal-agent timing Origination timing Shelf registration timing Security designed Actions taken Security designed Security traded Security designed Actions taken Actions taken Security traded Security traded The principal-agent timing convention is the simplest convention to analyse. In any sub-game perfect equilibrium, the seller takes actions that maximize the value of her retained tranche, because the price that she receives for the security has already been set. The buyer anticipates this, forming beliefs about the distribution of outcomes based on the design of the security. The buyer’s beliefs affect the price that he is willing to pay for the security, and the seller internalizes this when designing the security. Multiple equilibria are possible if the seller’s optimal actions for a particular retained tranche are not unique, or if there are multiple security designs that maximize the seller’s utility. The moral hazard, in this timing convention, can occur either because the buyer is unaware of the seller’s actions, or because he can observe those actions but is powerless to enforce any consequences based on them. Under the other two timing conventions, I use equilibrium refinements to argue that the optimal security design and actions associated with the principal-agent timing convention describe the most appealing equilibria of the game with those alternative timing conventions. I have drawn extensive-form game trees for these two timings in Figure B.1 and Figure B.2. The results I present are related to the findings of Matthews (1995) and Matthews (2001). Matthews (1995) shows, in a closely related model in which contracts are renegotiable, and there is no limited liability, that all equilibria are “second best efficient”, which is related to my result that the timing is irrelevant. Matthews (2001) extends the results of Matthews (1995) to a model with limited liability, but with only one choice (effort) for the agent. Figure B.1 View largeDownload slide Origination timing game tree This figure shows the extensive form game tree associated with the origination timing convention. The tree is stylized, in the sense that it shows only two possible actions $$p$$, and two security/price combinations $$s$$ and $$k$$. The symbols $$A$$ and $$R$$ denote acceptance or rejection of the offer. Figure B.1 View largeDownload slide Origination timing game tree This figure shows the extensive form game tree associated with the origination timing convention. The tree is stylized, in the sense that it shows only two possible actions $$p$$, and two security/price combinations $$s$$ and $$k$$. The symbols $$A$$ and $$R$$ denote acceptance or rejection of the offer. Figure B.2 View largeDownload slide Shelf registration timing game tree This figure shows the extensive form game tree associated with the shelf registration timing convention. The tree is stylized, in the sense that it shows only two possible actions $$p$$, two security designs $$s$$, and two possible prices $$k$$. The symbols $$A$$ and $$R$$ denote acceptance or rejection of the offer. Figure B.2 View largeDownload slide Shelf registration timing game tree This figure shows the extensive form game tree associated with the shelf registration timing convention. The tree is stylized, in the sense that it shows only two possible actions $$p$$, two security designs $$s$$, and two possible prices $$k$$. The symbols $$A$$ and $$R$$ denote acceptance or rejection of the offer. I assume that the actions of the seller are not observed by the buyer, ensuring there is still a moral hazard. I will discuss the benchmark, non-parametric model described in Section 2; the set of feasible actions by the seller, $$M$$, is the entire probability simplex. I use the notion of proper equilibrium defined by Myerson (1978), and developed for infinite action spaces by Simon and Stinchcombe (1995). I show that, if the principal-agent timing convention has a unique equilibrium security design, price, and set of actions taken by the seller, which involve acceptance with certainty by the buyer, then this security design, price, set of actions, and acceptance with certainty also characterize all strong proper equilibria22 of the game with the origination and shelf registration timing conventions, subject to a technical assumption. The key intuitions behind this result are the notions of “forward induction” (Kohlberg and Mertens, 1986) and “incredible beliefs” (Cho, 1987). Suppose that there is an equilibrium in the origination timing in which the buyer is always offered a particular security, $$\bar{s}$$. Now imagine that the seller plays an off-equilibrium strategy, and offers the buyer a different security, $$\hat{s}$$. What should the buyer believe about the unobservable actions taken by the seller? The notion of forward induction recognizes the seller controls both the security design and her actions, and infers from the seller’s offer of security $$\hat{s}$$ that the seller has taken actions consistent with the buyer accepting or rejecting that offer. That is, the seller might have taken actions that anticipated a lower or higher probability of the buyer accepting her offer, but the seller did not take actions that are not best responses to some acceptance strategy of the buyer, conditional on having offered the security $$\hat{s}$$ to the buyer. As a result, the buyer should accept or reject the security $$\hat{s}$$ based on the belief that the seller has acted in this way, and not rely on “incredible beliefs”. In particular, these notions rule out the idea that the buyer, when offered security $$\hat{s}$$ instead of the security $$\bar{s}$$, can believe the seller is “out to get him”, in the sense that the seller took actions that reduced her own utility to harm the buyer.23 These beliefs are not credible; the buyer cannot pretend to hold these beliefs in order to force the seller to offer him $$\bar{s}$$ instead of $$\hat{s}$$. The notions of forward induction and incredible beliefs, and their associated refinements, are not generally equivalent to the proper equilibrium concept. The proper equilibrium concept imposes the constraint that, in the sequence of mixed strategies whose limit is the equilibrium, actions that result in greater utility for the seller must be more likely than actions resulting in lower utility for the seller. The buyer’s beliefs, which are governed by Bayes’ rule, must place relatively high weight on the seller playing best-response actions. As a result, in the game I study, proper equilibrium, forward induction, and restrictions against incredible beliefs end up implementing the same idea: that, off the equilibrium path, the buyer cannot believe the seller has played an action that is not a best response, conditional on her observable choice of security design. The game is structured so that, if the buyer rejects the seller’s take-it-or-leave-it offer, the seller retains the entire asset (both the security and the retained tranche). For each security design, there is a one-dimensional manifold of best-response actions, each corresponding to a probability that the seller assigns to the buyer’s likelihood of acceptance. The worst case action in this one-dimensional manifold, from the perspective increasing the security’s value, is the action that corresponds to the seller believing the buyer will accept the security with certainty. In that case, the seller has no incentive to raise the value of the security. The other actions in this one-dimensional manifold correspond to best-responses in which the seller believes she might retain the security, and therefore acts to increase its value.24 Now consider the optimal security design and price from the principal-agent timing. If the buyer is offered this security design and price, he must be weakly willing to accept, because regardless of the probabilities he assigns to the seller’s actions over this one-dimensional manifold, the price is at least fair. The seller, recognizing that the buyer will accept this security and price25, must offer it—it maximizes her utility. This is a heuristic argument that outlines the proof in Proposition 8, as it applies to the origination timing. I will now discuss the shelf registration timing, and then discuss the technical assumptions required by the proof. In the shelf registration timing, the security is designed before the actions are taken. As a result, one might appeal to a notion of sequential rationality to capture the idea that the seller would not play non-best-response actions, conditional on the security design that has already been decided. However, the concept of sequential equilibrium is difficult to extend to games with infinite action spaces (see Myerson and Reny, 2015). I will instead use the proper equilibrium concept, recognizing that the results for the shelf registration timing might hold under a weaker equilibrium refinement. There is also a significant technical assumption required for the proof of Proposition 8. The technicality concerns the compactness of the action spaces available to the agents. The proof of Proposition 8 relies on the proof of existence of strong proper equilibria (Theorem 3.1) in Simon and Stinchcombe (1995), which itself requires that the action spaces of the agents be compact. This is problematic, because the buyer’s action space is the set of functions $$A_{b}:\:S\times\mathbb{R}\rightarrow\{0,1\}$$, where $$S$$ is the set of limited liability securities, 0 represents rejection, and 1 represents acceptance of the offered security and price. This is not a compact space; the buyer could (in theory) accept some particular security and price, while rejecting every offer of the same security with a price arbitrarily close to the price the buyer would have accepted. The potential for this type of strategy leads Simon and Stinchcombe (1995) to require compact action spaces. To circumvent these issues, I will require that the seller choose a security and price from a finite action space.26 That is, I will define the set $$S$$ of feasible security designs to be a finite set of possible security designs, all of which satisfy the limited liability constraints. I will define the set $$K$$ to be a finite set of feasible prices. First, consider the principal-agent timing. Let $$a(s,k)$$ be the buyer’s acceptance strategy. The buyer must accept if the price, $$k$$, is less than the buyer’s valuation, $$\beta_{b}\sum_{i>0}p^{i}(\eta(s))s_{i}$$, reject if the price is greater, and is indifferent if the price is equal to the buyer’s valuation. The seller’s payoff, given a particular acceptance strategy, is \begin{eqnarray*} U(s,k;a) & = & (1-a(s,k))\phi(\beta_{s}v)+a(s,k)(k+\phi(\eta(s))). \end{eqnarray*} I assume that there is a unique sub-game perfect equilibrium in the principal-agent timing, and that this equilibrium involves acceptance by the buyer with certainty. Let the $$s^{*}$$ and $$k^{*}$$ denote the security design and price in this equilibrium, and let $$p^{*}=p(\eta(s^{*}))$$ denote the corresponding optimal actions. I also assume that the security $$s^{*}$$ is not sell-nothing. I show that, under these assumptions, all strong proper equilibria of the games with the origination and shelf registration timing conventions are also characterized by the security design $$s^{*}$$, the price $$k^{*}$$, the action $$p^{*}$$, and acceptance with certainty. Proposition 8. In the non-parametric benchmark model described in Section 2, if there is a unique sub-game perfect equilibrium for the game with the principal-agent timing convention, characterized by security design $$s^{*}\in S$$, price $$k^{*}\in K$$, actions $$p^{*}\in M$$, and acceptance by the buyer, with $$s_{i}^{*}>0$$ for some $$i\in\Omega$$, then all strong proper equilibrium (in the terminology of Simon and Stinchcombe (1995)) of the origination timing and shelf registration timing are characterized by that security design, price, and action, and the buyer accepting the seller’s offer with certainty. Proof. See Online AppendixSection 3.18. ǁ The proposition argues that the timing of the game is, in essence, irrelevant. The analysis in the main body of the article, regarding when debt contracts are optimal or nearly optimal, applies regardless of the timing. The proposition, as stated, relies on the strong proper equilibrium concept defined by Simon and Stinchcombe (1995), but also applies to those authors’ weak proper equilibrium concept. Next, I will discuss, under the principal-agent timing convention, alternatives to giving all of the bargaining power to the seller. I will discuss two alternatives: first, that the seller designs the security, but then Nash-bargains with the buyer over the price, and second, that the seller and buyer bargain jointly over both the security design and price. First, suppose that the seller and buyer bargain over the price $$K(\eta)$$. Let $$1-\rho>0$$ and $$\rho>0$$ be their respective bargaining weights. The outside option is no trade: the seller retains everything, and the buyer pays and receives nothing. The price, as a function of the retained tranche (or, equivalently, of the security design), solves \[ K^{*}(\eta)\in\arg\max_{K}(\beta_{b}E^{p(\eta)}[s(\eta)]-K)^{\rho}(\phi(\eta)+K-\phi(\beta_{s}v))^{1-\rho}. \] Using the first-order conditions to solve for $$K^{*}(\eta)$$, \begin{equation} K^{*}(\eta)=(1-\rho)\beta_{b}E^{p(\eta)}[s]+\rho(\phi(\beta_{s}v)-\phi(\eta)).\label{eq:price-solution-bargaining} \end{equation} (B.1) The utility in the security design problem is \[ U(\eta)=(1-\rho)(\beta_{b}E^{p(\eta)}[s(\eta)]+\phi(\eta))+\rho\phi(\beta_{s}v). \] This is simply an affine transformation of the security design utility function described in the text (equation 2.2), and it follows that the same security design will be optimal. The bargaining power, in this case, changes only the price at which the agents trade the security. Note also that, if the buyer (instead of the seller) designs the security, and then the agents bargain over the price, a similar result follows. Now suppose that the agents bargain jointly over the security design and price. The agents maximize \[ U(s^{*})=\max_{K,s\in S}(\beta_{b}E^{p(\eta(s))}[s]-K)^{\rho}(\phi(\eta(s))+K-\phi(\beta_{s}v))^{1-\rho}. \] The optimal price, as a function of the optimal security design, is still described by equation B.1. Substituting this in, \[ U(s^{*})=\max_{s\in S}(1-\rho)^{1-\rho}\rho^{\rho}(\beta_{b}E^{p(\eta(s))}[s]+\phi(\eta(s))-\phi(\beta_{s}v)), \] which is also an affine transformation of the models described in the main text. It again follows that, if the agents bargain jointly over both the security design and price, the same security designs would be optimal. Finally, suppose that the seller and buyer share a common discount rate, $$\beta$$, but that the seller is required to raise a certain amount of funds, $$I>0$$, from the buyer. Using the principal-agent timing, in the security stage, the seller solves \[ \max_{\eta}\phi(\eta) \] subject to the limited liability constraints ($$\eta_{i}\in[0,\beta v_{i}]$$) and the fund raising constraint, \[ \beta E^{p(\eta)}[s(\eta)]\geq I. \] Let $$\lambda\geq0$$ denote the multiplier on the fundraising constraint. For any perturbation $$\eta(\epsilon)$$ satisfying the limited liability constraints, the first-order condition for the Lagrangian of this problem is \[ -(\lambda-1)\sum_{i\in\Omega}p^{i}(\eta^{*})\frac{\partial\eta_{i}}{\partial\epsilon}|_{\epsilon=0^{+}}+\lambda\beta\sum_{i,j\in\Omega}s_{j}^{*}\frac{\partial p^{j}(\eta)}{\partial\eta_{i}}|_{\eta=\eta^{*}}\frac{\partial\eta_{i}}{\partial\epsilon}|_{\epsilon=0^{+}}\leq0. \] If $$\lambda=1+\kappa>1$$, this expression is identical to equation 2.3, and it follows that the optimal security design in this case will be identical to the case studied in the main text. I will prove that $$\lambda>1$$ under the assumption that the solution to the moral hazard problem is always interior (as in the KL divergence case). Observe that it is always feasible to set \[ \frac{\partial\eta_{i}}{\partial\epsilon}|_{\epsilon=0^{+}}=s_{i}^{*}, \] a perturbation that gives some share of the security to the seller instead of the buyer. For this security, we must have $$\lambda>0$$, as $$\lambda=0$$ would imply $$E^{p(\eta)}[s(\eta)]\leq0<I$$. It follows that the constraint binds, and therefore \[ (\lambda-1)\beta^{-1}I\geq\lambda\beta\sum_{i,j\in\Omega}s_{j}^{*}\frac{\partial p^{j}(\eta)}{\partial\eta_{i}}|_{\eta=\eta^{*}}s_{i}^{*}. \] Noting that $$\phi(\eta)$$ is the convex conjugate of $$\psi(p)$$, and therefore strictly convex, and that $$\frac{\partial p^{j}(\eta)}{\partial\eta_{i}}=\partial^{i}\partial^{j}\phi(\eta)$$, the right-hand side of the above expression is strictly positive, and therefore $$\lambda>1$$, completing the proof. C. Calibration In this section of the Appendix, I will discuss possible calibration strategies for the static, non-parametric model of moral hazard discussed in the main text. I will focus on the context of mortgage securitization, and how to calibrate the key parameters $$\kappa$$ and $$\theta$$, under the assumption that the cost function is the KL divergence, or that the cost function is an invariant divergence and the first-order approximation discussed in the text is accurate. In both of these cases, a debt security design is optimal. In the context of mortgage origination, there is empirical evidence for lax screening by originators who intended to securitize their mortgage loans, which suggests that moral hazard is a relevant issue (see Demiroglu and James, 2012; Elul, 2016; Jiang et al., 2013; Keys et al., 2010; Krainer and Laderman, 2014; Mian and Sufi, 2009; Nadauld and Sherlund, 2013; Purnanandam, 2010; Rajan et al., 2015, although some of this evidence is disputed by Bubb and Kaufman (2014)). However, some of this evidence is consistent with information asymmetries but cannot distinguish between moral hazard and adverse selection. There are also mechanisms to mitigate adverse selection by the seller, such as the inability to retain loans and random selection of loans into securitization (Keys et al., 2010). I will discuss an “experimental” approach to calibration first. This approach is consistent in spirit with the empirical literature on moral hazard in mortgage lending (Keys et al., 2010; Purnanandam, 2010, others). In that literature, the quasi-experiment compares no securitization ($$\eta_{i}=\beta_{s}v_{i}$$) with securitization. If we assume securitization uses the optimal security design $$\eta^{*}$$, then $$\theta$$ can be approximated (for any invariant divergence cost function, see Section 5) as \[ \theta^{-1}\approx E^{p(\beta_{s}v)}[v_{i}]\cdot\frac{E^{p(\beta_{s}v)}[v_{i}]-E^{p^{*}}[v_{i}]}{Cov^{p^{*}}(v_{i},s_{i}^{*})}. \] This formula illustrates the difficulties of calibrating the model using the empirical work on moral hazard in mortgage lending. For the purposes of the model, what matters is the loss in expected value due to securitization, relative to the risk taken on by the buyers, ex-ante. The empirical literature estimates ex-post differences, and the magnitude of these differences varies substantially, depending on whether the data sample is from before or during the recent crash in home prices. Converting this into an ex-ante difference would require assigning beliefs to the buyer and seller about the likelihood of a crash. Estimating the ex-ante covariance, which can be understood as a measure of the quantity of “skin in the game”, is even more fraught. For these reasons, I have not pursed this calibration strategy further. The second calibration strategy, which is somewhat more promising, is to use the design of mortgage securities to infer $$\theta$$. Essentially, by (crudely) estimating the other terms in the “put option value” equation (equation 3.2), and assuming the model is correct, we can infer what the security designers thought the moral hazard was. Rearranging that equation, \[ \underbrace{\frac{\beta_{b}\bar{v}-\beta_{b}E^{p^{*}}[s_{i}]}{\beta_{b}E^{p^{*}} [s_{i}]}}_{\mathrm{Spread}}\underbrace{\frac{E^{p^{*}}[s_{i}]}{E^{p^{*}}[v_{i}]}}_{\rm Share} \left(1-\underbrace{\frac{E^{p(\beta_{s}v)}[v_{i}]-E^{p^{*}}[v_{i}]}{E^{p(\beta_{s}(v))}[v_{i}]}}_{\rm Moral\:Hazard}\right)\kappa^{-1}=\theta. \] The spread term should be thought of as reflecting the initial spread between the assets purchased by the buyer and the discount rate, under the assumption that the bonds will not default. Using a 90/10 weighting on the initial AAA and BBB 06-2 ABX coupons reported in Gorton (2008), I estimate this as 34 basis points per year. In a different setting (CLOs), the work of Nadauld and Weisbach (2012) estimates the cost of capital advantage (gains from trade) due to securitization at $$17$$ basis points per year. The “share” term is the ratio of the initial market value of the security to the initial market value of the assets. Begley and Purnanandam (2016) document that the value of the non-equity tranches was roughly 99% of the principal value in their sample of residential mortgage securitizations. Similarly, the moral hazard term is likely to be small. The estimates of Keys et al. (2010), whose interpretation is disputed by Bubb and Kaufman (2014), imply that pre-crisis, securitized mortgage loans defaulted at a 3% higher rate27 than loans held in portfolio. Assuming a 50% recovery rate, and using this as an estimate of the ex-ante expected difference in asset value, this suggests that the moral hazard term is roughly 1.5%, and therefore negligible in this calibration. Combining all of these estimates, I find $$\theta$$ of 2 is consistent with the empirical literature on securitization. This calibration assumed that the security design problem with the KL divergence was being solved. However, this formula also holds (approximately) under invariant divergences, conditional on the assumption that $$\theta^{-1}$$ and $$\kappa$$ are small enough. The value of $$\theta=2$$ can be compared with the results of Figure A.3. Under the assumptions used to generate that figure, which are described in its caption, I find that with $$\theta=2$$ and $$\kappa=0.85$$ (17 basis points per year times 5 years), debt would be achieve 99.96% of gains achieved by the optimal contract, relative to selling everything (and an even larger fraction of the gains relative to selling nothing). Under these parameters, the utility difference between the best debt security and selling nothing would be roughly 0.73% of the total asset value. While that might seem like an economically small gain, for a single deal described in Gorton (2008), SAIL 2005-6, the private gains of securitization would be roughly $\$$ 16.4mm. In contrast, the utility difference between the best equity security and selling nothing is about 0.56% of the total asset value. The private cost of using the optimal equity contract, instead of the optimal debt contract, would be roughly $\$$4mm for this particular securitization deal. The numbers discussed in this calculation depend on the assumptions used in Figure A.3, some of which are ad hoc. Nevertheless, they illustrate the general point that it is simultaneously possible for debt to be approximately optimal, and for the private gains of securitization to be large. D. Free Disposal and Free Risk-Shifting In this section, I will discuss the impact that free disposal of output by the seller and free risk-shifting would have on the models discussed in the main. For the static, non-parametric moral hazard problems discussed in Sections 3, 4, and 5, the optimal security designs feature monotone retained tranches. In the proofs, in Online AppendixLemma 1, I show this is true for any static, non-parametric security design problem with an invariant divergence cost function whose gradient (in $$p$$) is continuous in $$q$$. Intuitively, because the optimal retained tranche is monotone even without free disposal, allowing for free disposal does not change the optimal security design. To see this formally, I will show that, with free disposal, it is without loss of generality to consider monotone retained tranches and ignore the disposal option. Imagine that there is free disposal. We can write the agent’s moral hazard problem as \[ \phi(\eta)=\sup_{p\in F(r),r\in M}\left\lbrace\sum_{i>0}\eta_{i}p^{i}-\psi(r)\right\rbrace, \] where $$F(r)$$ is the set of probability distributions first-order stochastically dominated by $$r$$, under the ordering given by $$\Omega$$. The agent, in effect, makes two choices—first choosing $$r$$ using the technology discussed in the text, then following a (possibly random) output destruction strategy to create $$p$$. The buyer still receives payoff $$\beta_{b}E^{p}[s]$$, and therefore the security design utility described in equation 2.2 is still valid. Define, for any retained tranche $$\eta$$, the “monotone version” \[ \bar{\eta}_{i}(\eta)=\max_{j\in\{0,\ldots,i\}}\eta_{j}. \] Note that, because $$v_{i}$$ is weakly increasing in $$i$$, such a design does not violate the limited liability constraints. Note also that, because of the monotonicity of $$\bar{\eta}_{i}(\eta)$$, \[ \sum_{i>0}[p^{i}(\eta)-r^{i}(\eta)]\bar{\eta}_{i}(\eta)=0. \] We can rewrite the moral hazard problem as \[ \phi(\eta)=\sup_{p\in F(r),r\in M}\left\lbrace\sum_{i>0}(\eta_{i}-\bar{\eta}_{i}(\eta))p^{i}+\sum_{i>0}\bar{\eta}_{i}(\eta)r^{i}-\psi(r)\right\rbrace. \] It immediately follows that the behavior without output destruction is the same for the two securities: $$r(\eta)=r(\bar{\eta}(\eta))$$. By the definition of the retained tranche, if $$\eta_{i}<\eta_{j}$$ for some $$i>j$$, then $$s_{i}>s_{j}$$. As a result, output destruction hurts the value of the buyer’s security: \[ \sum_{i>0}[p^{i}(\eta)-r^{i}(\eta)]s_{i}(\eta)\leq0 \] for all $$\eta$$. Therefore, utility in in the security design problem is weakly higher under $$\bar{\eta}(\eta)$$ than under $$\eta$$, and it is without loss of generality to consider monotone security designs. I have shown that free disposal does not affect the static problems discussed in the text—it is equivalent to a restriction to monotone security designs in the absence of free disposal, and the optimal security designs were monotone even without such a restriction. Conveniently, essentially the same proof applies to the dynamic security design problems. Suppose we modify the stochastic process for the asset value described in Section 6 to allow for output destruction: \[ dV_{t}=b(V_{t},t)dt+u_{t}\sigma(V_{t},t)dt-dY_{t}+\sigma(V_{t},t)dW_{t}, \] where $$dY_{t}\geq0$$ is the seller’s destruction of asset value at time $$t$$. To allow such a modification, we need use as the space of asset values processes the space of RCLL functions on $$[0,1]$$, which I will denote $$\bar{\Omega}$$, instead of the space of continuous functions, which I will continue to denote $$\Omega$$. We also need to allow the security design to be a function on $$\bar{\Omega}$$. I will say that a retained tranche is monotonic in asset value if, for all $$t\in[0,1]$$, and all $$V\in\bar{\Omega}$$, $$\eta(V)$$ is weakly increasing in $$V_{t}$$. Using this definition, debt contracts are monotonic in asset value. It follows immediately that if the seller is given a retained tranche that is monotonic in asset value, she will not destroy asset value. We can define the “monotone version” of the retained tranche in the following way. Let $$F(V)$$ be the set of all RCLL functions on $$[0,1]$$ for which, for all $$f\in F(V)$$ and $$t\in[0,1]$$, \[ f_{t}\leq V_{t}. \] The monotone version of $$\eta(V)$$ is \[ \bar{\eta}(\eta,V)=\sup_{f\in F(V)}\eta(f). \] Note that, because $$f_{1}\leq V_{1}$$, this retained tranche satisfies the limited liability constraints. The “weak formulation” approach, based on Girsanov’s theorem and described in Proposition 5, can be applied. We can defined an alternative probability space, with measure $$Q$$, on which \[ dX_{t}=b(X_{t},t)dt-dY_{t}+\sigma(X_{t},t)dB_{t}, \] and a measure $$P$$, absolutely continuous with respect to $$Q$$, such that, under $$P$$, $$X$$ has the same law as $$V$$ under measure $$\tilde{P}$$. Suppose that the retained tranche is not monotonic in asset value. There is some $$t$$ and some $$X$$ such that, if the seller reaches state $$(t,X_{t})$$, she will wish to destroy output. If such a state is never reached with positive probability under measure $$P(\eta)$$ (and hence $$Q$$), the retained tranche and its monotone version achieve the same utility in the security design problem, holding the measure $$P(\eta)$$ constant. Such a state can never be reached under any measure that is absolutely continuous with respect to $$Q$$, and therefore the monotone version of the retained tranche will not affect the agent’s choice of $$P$$. It follows, in this case, that it is without loss of generality to assume monotonicity. Assume, going forward, that if a non-monotonicity exists, it is reached with positive probability. I will show that, for any retained tranche that induces the seller to destroy some asset value, there is another retained tranche that does not induce the seller to destroy asset value and achieves higher utility in the security design problem. As a result, the optimal security design is monotone. Define a modified version of the retained tranche in the following way: for each $$B\in\Omega$$, let $$X^{Y}(\eta,B)\in\bar{\Omega}$$ denote the asset value path that occurs under the seller’s optimal output destruction plan, given retained tranche $$\eta$$ and brownian motion $$B$$, and let $$X(B)$$ be the asset value path that would occur in the absence of output destruction. Note that $$X(B)$$ is not affected by the design of the retained tranche, and that there is a one-to-one mapping between $$X$$ and $$B$$. We can defined a modified version of the retained tranche, for $$X\in\Omega$$, as \[ \tilde{\eta}(X,\eta)=\eta(X^{Y}(\eta,B(X))), \] where $$B(X)$$ is the Brownian motion that induces $$X$$ in the absence of asset value destruction. For discontinuous $$X$$, let $$\tilde{\eta}(X,\eta)=0$$. Note that, because asset value destruction decreases $$X_{1}$$, this modified retained tranche satisfies the limited liability constraints. By revealed preference, $$\tilde{\eta}(X)$$ does not induce output destruction—if it did, the seller’s output destruction given $$\eta$$ would not have been optimal. Moreover, $$\tilde{\eta}$$ must also induce the same choice of $$P$$; again, if some different choice of $$P$$ was preferable, it would also be preferable under the contract $$\eta$$. It follows that the seller receives the same utility from $$\eta$$ and $$\tilde{\eta}$$. The buyer, however, receives weakly higher utility from $$\tilde{\eta}$$. By the assumptions discussed in Section 6, for any realization of the Brownian motion $$\omega\in\Omega$$, destruction of output at time $$t$$ lowers the value of the asset for all times $$s\geq t$$, relative to the asset values that would have been generated in the absence of destruction. $$E_{t}[V_{1}]$$ is always decreased by destruction (by assumption), and $$s(X)=X_{1}-\beta_{s}^{-1}\eta(X).$$ As a result, $$\tilde{\eta}$$ delivers weakly higher utility than $$\eta$$, and it is without loss of generality to study monotone security designs and assume no output destruction. Finally, I will discuss “free risk-shifting”. As discussed in the text, the strict convexity assumption on the divergences I study rules out risk-shifting that is completely free. One implication of free risk-shifting is that there is not necessarily a unique optimal probability distribution for the seller to choose in the moral hazard problem. For the purposes of discussion, suppose that there is some convention by which a single $$p(\eta)$$ is determined for each $$\eta$$. The utility of any security design can be decomposed (along the lines of Lemma 1), with free risk-shifting,as \begin{align*} U(\eta) & =\beta_{b}\sum_{i\in\Omega}q^{i}v_{i}-\kappa\sum_{i\in\Omega}p^{i}(\eta)[\eta_{i}-\gamma(\eta)\beta_{s}v_{i}]+\\ & \frac{\beta_{b}}{\beta_{s}}e(\eta)-c(e(\eta))-\kappa\sum_{i\in\Omega}p^{i}(\gamma(\eta)\beta_{s}v)\gamma(\eta)\beta_{s}v_{i}, \end{align*} where $$M(e)\subset M$$ is the set of probability distributions associated with effort level $$e$$ and \[ c(e)=\min_{p\in M(e)}\psi(p). \] The moral hazard problem can be written as \begin{align*} \phi(\eta) & =\max_{e,p\in M(e)}\sum_{i\in\Omega}p^{i}[\eta_{i}-\gamma(\eta)\beta_{s}v_{i}]\\ & +\sum_{i\in\Omega}p_{e}^{i}(e)\gamma(\eta)\beta_{s}v_{i}-c(e), \end{align*} where \[ p_{e}(e)=\arg\min_{p\in M(e)}\psi(p). \] By the seller’s optimal choice of $$p$$ in the moral hazard problem, it must be the case that \[ \sum_{i\in\Omega}p^{i}(\eta)[\eta_{i}-\gamma(\eta)\beta_{s}v_{i}]\geq0. \] It follows immediately that the equivalent equity tranche delivers higher utility in the security design problem, if it is feasible. If $$\gamma(\eta)>1$$, then \begin{align*} U(\eta) & \leq(1+\kappa)\beta_{s}\sum_{i\in\Omega}q^{i}v_{i}+(1+\kappa)e(\eta)\\ & -c(e(\eta))-\kappa(e+\beta_{s}\sum_{i\in\Omega}q^{i}v_{i})\\ & \leq\beta_{s}\sum_{i\in\Omega}q^{i}v_{i}+e(\eta)-c(e(\eta))\\ & \leq\beta_{s}\sum_{i\in\Omega}q^{i}v_{i}+e(\beta_{s}v)-c(e(\beta_{s}v)), \end{align*} implying that $$\eta_{i}=\beta_{s}v_{i}$$ is preferable, and hence $$\gamma(\eta)>1$$ is never optimal. Negative effort is also sub-optimal, and hence the optimal design features $$\gamma(\eta)\in[0,1]$$, and therefore that the equivalent equity tranche is feasible. It follows that, with free risk-shifting, an equity security is always an optimal security design. Acknowledgements The author would like to thank, in no particular order, Emmanuel Farhi, Philippe Aghion, Alp Simsek, David Laibson, Alex Edmans, Luis Viceira, Jeremy Stein, Yao Zeng, John Campbell, Ming Yang, Oliver Hart, David Scharfstein, Sam Hanson, Adi Sunderam, Guillaume Pouliot, Yuliy Sannikov, Zhiguo He, Lars Hansen, Roger Myerson, Michael Woodford, Gabriel Carroll, Drew Fudenberg, Scott Kominers, Eric Maskin, Mikkel Plagborg-Møller, Bengt Holmstrom, Arvind Krishnamurthy, Peter DeMarzo, Sebastian Di Tella, and many seminar participants for helpful feedback. The author would also like to thank Dimitri Vayanos (the editor) and three anonymous referees for comments that helped improve the article. A portion of this research was conducted while visiting the Becker Friedman Institute. All remaining errors soley depend on the author. Footnotes 1. Throughout the article, I will use she/her to refer to the seller and he/his to the buyer of the security. No association of the agents to particular genders is intended. 2. A similar result, derived from a robust contracting framework, appears in Antic (2015). 3. This article also builds on some of the methods of Yang (2015) (see the Appendix, Section 3.15). 4. For brevity, I have omitted the result for log-normal distributions from the article. It is available upon request. 5. Using a discrete outcome space simplifies the exposition, but is not necessary for the main results. 6. The gains from trade could also be motivated by a requirement that the seller raise a certain amount of funds from the buyer (see Appendix Section B). 7. Because the sample space $$\Omega$$ is a finite set of outcomes, even in the “non-parametric” case, the choice of $$p$$ can be expressed as a choice over a finite number of parameters. I am using the terms non-parametric and parametric to denote whether the set $$M$$ of feasible probability distributions is the entire simplex, or a restricted set. 8. A “divergence” is similar to a distance, except that there is no requirement that it be symmetric between $$p$$ and $$q$$, or that it satisfy the triangle inequality. 9. Other authors use different sign conventions or scaling for the $$\alpha$$ parameter. 10. Under this convention, the KL divergence corresponds to $$f(u)=u\ln u-u+1$$. 11. This equity share is not necessarily feasible—if $$\eta$$ induces a very high or very low level effort, the equivalent equity share might be more than 100% or less than 0% of the asset value. The probability distribution associated with the equivalent equity contract, $$p(\gamma(\eta)\beta_{s}v)$$, has the lowest cost among all probability distributions with the same effort level. 12. This perturbation argument builds on the suggestions of an anonymous referee. 13. Shavell (1979) mentions that flat contracts minimize variance, in a context without limited liability. A related result with limited liability can be found in Plantin (2015). 14. The model has ambiguous comparative statics for the zero-effort distribution $$q$$. A mean-preserving spread perturbation to $$q$$ can decrease the optimal debt level, because higher volatility increases the value of the put option, or increase it, because it can increase the mean of $$p^{*}$$, decreasing the value of the put option. 15. It is possible to characterize the utility in the security design problem up to second order, for all security designs, not just those that are close to the optimal security design. In fact, to first order, the utility in the security design problem is exactly a mean-variance tradeoff (Online AppendixProposition 3). 16. See Biais et al. (2007); Hellwig and Schmidt (2002); Sadzik and Stacchetti (2015) for analysis of the relationship between discrete and continuous time models. 17. The following conditions are sufficient. For all $$V\in\mathbb{R}^{+}$$ and $$t\in[0,1]$$, $$\sigma(V,t)>0$$ and $$|b(V,t)|+|\sigma(V,t)|\leq C(1+|V|)$$ for some positive constant $$C$$. For all $$t\in[0,1]$$, $$V,V'\in\mathbb{R}^{+}$$, $$|b(V,t)-b(V',t)|+|\sigma(V,t)-\sigma(V',t)|\leq D|V-V'|$$, for some positive constant $$D$$. For all $$t\in[0,1]$$, $$\lim_{v\rightarrow0^{+}}\sigma(v,t)=0$$, and $$\lim_{v\rightarrow0^{+}}b(t,v)\geq0$$. 18. If the buyer and seller were risk-averse, but shared a common risk-neutral measure $$\tilde{P}$$, the problem would be identical. The key assumption in that case would be that the problem is small, in the sense that the outcome of this particular asset and security does not alter the common risk-neutral measure. 19. Note that this formulation rules out time discounting of the effort costs. One way to motivate this assumption is to suppose that neither agent discounts the future, but the seller is required to raise $$I$$ dollars to initiate the project. In this case, the gains from trade is the multiplier on this constraint (see Appendix Section B). 20. This condition is sufficient, not necessary. I have omitted the proof for brevity. 21. These flow cost functions will generate divergences $$D_{g}(P||Q)$$ that, like the KL divergence, have the property that their “second variations” are proportional to the Fisher information. This is the infinite-dimensional analogue of the mathematical property of invariant divergences that leads to the approximate optimality of debt. 22. I have not shown that there is a unique strong proper equilibrium—in theory, there could be multiple equilibria with different acceptance strategies by the buyer for security/price combinations that never occur in equilibrium. However, results in the proof of Proposition 8 lead me to believe this is not the case. 23. This argument uses the strategic form of the game, not the agent-strategic form (see Fudenberg and Tirole (1991), chapter 8.4). That is, off-equilibrium security designs are assumed to be correlated with off-equilibrium actions by the seller. 24. This result depends on the convexity of the set $$M$$. 25. Actually, some price slightly lower but arbitrarily close to this price. 26. There are at least three possible alternative strategies. I could have required that the buyer’s strategy satisfy enough conditions to ensure compactness. Alternatively, I could have pursued the “limit-of-finite” approach described in Simon and Stinchcombe (1995). Finally, I could have attempted to explicitly construct the sequence of mixed strategies that generate the proper equilibrium. Each of these seemed to require significant technical work that is beyond the scope of this article. 27. After about one year, ~11% of securitized loans were in default, compared to ~8% of loans held in portfolio. REFERENCES ACHARYA V. , MEHRAN H. and THAKOR A. V. ( 2016 ), “Caught Between Scylla and Charybdis? Regulating Bank Leverage When There is Rent Seeking and Risk Shifting” , The Review of Corporate Finance Studies , 5 , 36 – 75 . AGHION P. and BOLTON P. ( 1992 ), “An Incomplete Contracts Approach to Financial Contracting” , The Review of Economic Studies , 59 , 473 – 494 . Google Scholar Crossref Search ADS ALI S. and SILVEY S. ( 1966 ), “A General Class of Coefficients of Divergence of One Distribution from Another” , Journal of the Royal Statistical Society. Series B (Methodological) , 28 , 131 – 142 . AMARI S. and NAGAOKA H. ( 2007 ), Methods of Information Geometry , Vol. 191 ( Providence, Rhode Island : American Mathematical Society ). ANTIC N. ( 2015 ), “Contracting with Unknown Technologies” ( Unpublished Paper , Princeton University ). BARRON D. , GEORGIADIS G. and SWINKELS J. ( 2017 ), “Optimal Contracts with a Risk-Taking Agent” (Unpublished manuscript) . BEGLEY T. A. and PURNANANDAM A. K. ( 2016 ), “Design of Financial Securities: Empirical Evidence from Private-label RMBS Deals” , The Review of Financial Studies , 30 , 120 – 161 . Google Scholar Crossref Search ADS BIAIS B. and CASAMATTA C. ( 1999 ), “Optimal Leverage and Aggregate Investment” , The Journal of Finance , 54 , 1291 – 1323 . Google Scholar Crossref Search ADS BIAIS B. , MARIOTTI T. , PLANTIN G. and ROCHET J. ( 2007 ), “Dynamic Security Design: Convergence to Continuous Time and Asset Pricing Implications” , The Review of Economic Studies , 74 , 345 – 390 . Google Scholar Crossref Search ADS BIERKENS J. and KAPPEN H. J. ( 2014 ), “Explicit Solution of Relative Entropy Weighted Control” , Systems & Control Letters , 72 , 36 – 43 . Google Scholar Crossref Search ADS BUBB R. and KAUFMAN A. ( 2014 ), “Securitization and Moral Hazard: Evidence from Credit Score Cutoff Rules” , Journal of Monetary Economics , 63 , 1 – 18 . Google Scholar Crossref Search ADS CARROLL G. ( 2015 ), “Robustness and Linear Contracts” , The American Economic Review , 105 , 536 – 563 . Google Scholar Crossref Search ADS ČENCOV N. N. ( 2000 ), Statistical Decision Rules and Optimal Inference , Vol. 53 ( Providence, Rhode Island : American Mathematical Society ). CHO I. K. ( 1987 ), “A Refinement of Sequential Equilibrium” , Econometrica: Journal of the Econometric Society , 1367 – 1389 . CSISZÁR I. ( 1967 ), “Information-Type Measures of Difference of Probability Distributions and Indirect Observations” , Studia Sci. Math. Hungar. , 2 , 299 – 318 . CVITANIĆ J. , WAN X. and ZHANG J. ( 2009 ), “Optimal Compensation with Hidden Action and Lump-Sum Payment in a Continuous-time Model” , Applied Mathematics and Optimization , 59 , 99 – 146 . Google Scholar Crossref Search ADS DANG T. , GORTON G. and HOLMSTRÖM B. ( 2011 ), “Ignorance and the Optimality of Debt for Liquidity Provision” ( Technical report, Working Paper , Yale University ). DEMARZO P. and DUFFIE D. ( 1999 ), “A Liquidity-Based Model of Security Design” , Econometrica , 67 , 65 – 99 . Google Scholar Crossref Search ADS DEMARZO P. M. and SANNIKOV Y. ( 2006 ), “Optimal Security Design and Dynamic Capital Structure in a Continuous-Time Agency Model” , The Journal of Finance , 61 , 2681 – 2724 . Google Scholar Crossref Search ADS DEMIROGLU C. and JAMES C. ( 2012 ), “How Important is Having Skin in the Game? Originator-Sponsor Affiliation and Losses on Mortgage-Backed Securities” , Review of Financial Studies , 25 , 3217 – 3258 . Google Scholar Crossref Search ADS EDMANS A. and LIU Q. ( 2010 ), “Inside Debt” , Review of Finance , 15 , 75 – 102 . Google Scholar Crossref Search ADS ELUL R. ( 2016 ), “Securitization and Mortgage Default” , Journal of Financial Services Research , 49 , 281 – 309 . Google Scholar Crossref Search ADS FENDER I. and MITCHELL J. ( 2009 ), “Incentives and Tranche Retention in Securitisation: a Screening Model” (CEPR Discussion Paper No. DP7483) . FUDENBERG D. and TIROLE J. ( 1991 ), “Game Theory” . GALE D. and HELLWIG M. ( 1985 ), “Incentive-Compatible Debt Contracts: The One-Period Problem” , The Review of Economic Studies , 52 , 647 – 663 . Google Scholar Crossref Search ADS GORTON G. ( 2008 ), “The Panic of 2007” (Technical report, National Bureau of Economic Research) . GROSSMAN S. J. and HART O. D. ( 1983 ), “An Analysis of the Principal-Agent Problem” , Econometrica: Journal of the Econometric Society , 51 , 7 – 45 . Google Scholar Crossref Search ADS HANSEN L. and SARGENT T. ( 2008 ), Robustness ( Princeton, New Jersey : Princeton University Press ). HART O. and MOORE J. ( 1994 ), “A Theory of Debt Based on the Inalienability of Human Capital” , The Quarterly Journal of Economics , 109 , 841 – 879 . Google Scholar Crossref Search ADS HELLWIG M. F. and SCHMIDT K. M. ( 2002 ), “Discrete–Time Approximations of the Holmström–Milgrom Brownian–Motion Model of Intertemporal Incentive Provision” , Econometrica , 70 , 2225 – 2264 . Google Scholar Crossref Search ADS HELLWIG M. ( 2009 ), “A Reconsideration of the Jensen-Meckling Model of Outside Finance” , Journal of Financial Intermediation , 18 , 495 – 525 . Google Scholar Crossref Search ADS HOLMSTRÖM B. and MILGROM P. ( 1987 ), “Aggregation and Linearity in the Provision of Intertemporal Incentives” , Econometrica , 55 , 303 – 328 . Google Scholar Crossref Search ADS INNES R. ( 1990 ), “Limited Liability and Incentive Contracting with Ex-ante Action Choices” , Journal of Economic Theory , 52 , 45 – 67 . Google Scholar Crossref Search ADS JENSEN M. ( 1986 ), “Agency Costs of Free Cash Flow, Corporate Finance, and Takeovers” , The American Economic Review , 76 , 323 – 329 . JENSEN M. and MECKLING W. ( 1976 ), “Theory of the Firm: Managerial Behavior, Agency Costs and Ownership Structure” , Journal of Financial Economics , 3 , 305 – 360 . Google Scholar Crossref Search ADS JIANG W. , NELSON A. A. and VYTLACIL E. ( 2013 ), “Securitization and Loan Performance: Ex Ante and Ex Post Relations in the Mortgage Market” , Review of Financial Studies , 27 , 454 – 483 . Google Scholar Crossref Search ADS KEYS B. , MUKHERJEE T. , SERU A. , et al. ( 2010 ), “Did Securitization Lead to Lax Screening? Evidence from Subprime Loans” , The Quarterly Journal of Economics , 125 , 307 – 362 . Google Scholar Crossref Search ADS KOHLBERG E. and MERTENS J. F. ( 1986 ), “On the Strategic Stability of Equilibria” , Econometrica: Journal of the Econometric Society , 1003 – 1037 . KRAINER J. and LADERMAN E. ( 2014 ), “Mortgage Loan Securitization and Relative Loan Performance” , Journal of Financial Services Research , 45 , 39 – 66 . Google Scholar Crossref Search ADS MATTHEWS S. A. ( 1995 ), “Renegotiation of Sales Contracts” , Econometrica: Journal of the Econometric Society , 567 – 589 . MATTHEWS S. A. ( 2001 ), “Renegotiating Moral Hazard Contracts under Limited Liability and Monotonicity” , Journal of Economic Theory , 97 , 1 – 29 . Google Scholar Crossref Search ADS MIAN A. and SUFI A. ( 2009 ), “The Consequences of Mortgage Credit Expansion: Evidence from the U.S. Mortgage Default Crisis” , The Quarterly Journal of Economics , 124 , 1449 – 1496 . Google Scholar Crossref Search ADS MONOYIOS M. ( 2013 ), “Malliavin Calculus Method for Asymptotic Expansion of Dual Control Problems” , SIAM Journal on Financial Mathematics , 4 , 884 – 915 . Google Scholar Crossref Search ADS MYERSON R. B. ( 1978 ), “Refinements of the Nash Equilibrium Concept” , International journal of game theory , 7 , 73 – 80 . Google Scholar Crossref Search ADS MYERSON R. B. and RENY P. J. ( 2015 ), “Sequential Equilibria of Multi-stage Games with Infinite Sets of Types and Actions” ( Manuscript , University of Chicago ). NACHMAN D. C. and NOE T. H. ( 1994 ), “Optimal Design of Securities under Asymmetric Information” , Review of Financial Studies , 7 , 1 – 44 . Google Scholar Crossref Search ADS NADAULD T. D. and SHERLUND S. M. ( 2013 ), “The Impact of Securitization on the Expansion of Subprime Credit” , Journal of Financial Economics , 107 , 454 – 476 . Google Scholar Crossref Search ADS NADAULD T. D. and WEISBACH M. S. ( 2012 ), “Did Securitization Affect the Cost of Corporate Debt?” , Journal of Financial Economics , 105 , 332 – 352 . Google Scholar Crossref Search ADS PLANTIN G. ( 2015 ), “Shadow Banking and Bank Capital Regulation” , Review of Financial Studies , 28 , 146 – 175 . Google Scholar Crossref Search ADS PURNANANDAM A. ( 2010 ): “Originate-to-distribute Model and the Subprime Mortgage Crisis” , Review of Financial Studies , 24 , 1881 – 1915 . Google Scholar Crossref Search ADS RAJAN U. , SERU A. and VIG V. ( 2015 ), “The Failure of Models that Predict Failure: Distance, Incentives and Defaults” , Journal of Financial Economics , 115 , 237 – 260 . Google Scholar Crossref Search ADS RAVID S. A. and SPIEGEL M. ( 1997 ), “Optimal Financial Contracts for a Start-up with Unlimited Operating Discretion” , Journal of Financial and Quantitative Analysis , 32 , 269 – 286 . Google Scholar Crossref Search ADS SADZIK T. and STACCHETTI E. ( 2015 ), “Agency Models With Frequent Actions” , Econometrica , 83 , 193 – 237 . Google Scholar Crossref Search ADS SANNIKOV Y. ( 2014 ), “Moral Hazard and Long-Run Incentives” (Working Paper No. 3430, Stanford Graduate School of Business) . SCHAETTLER H. and SUNG J. ( 1993 ), “The First-Order Approach to the Continuous-Time Principal–Agent Problem with Exponential Utility” , Journal of Economic Theory , 61 , 331 – 371 . Google Scholar Crossref Search ADS SHAVELL S. ( 1979 ), “Risk Sharing and Incentives in the Principal and Agent Relationship” , The Bell Journal of Economics , 55 – 73 . SIMS C. ( 2003 ), “Implications of Rational Inattention” , Journal of Monetary Economics , 50 , 665 – 690 . Google Scholar Crossref Search ADS SIMON L. K. and STINCHCOMBE M. B. ( 1995 ), “Equilibrium Refinement for Infinite Normal-Form Games” , Econometrica: Journal of the Econometric Society , 1421 – 1443 . TOWNSEND R. ( 1979 ), “Optimal Contracts and Competitive Markets with Costly State Verification” , Journal of Economic Theory , 21 , 265 – 93 . Google Scholar Crossref Search ADS VANASCO V. ( 2017 ), “The Downside of Asset Screening for Market Liquidity” , The Journal of Finance , 72 , 1937 – 1982 . Google Scholar Crossref Search ADS YANG M. ( 2015 ), “Optimality of Debt under Flexible Information Acquisition” (Available at SSRN 2103971) . © The Author(s) 2017. Published by Oxford University Press on behalf of The Review of Economic Studies Limited. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png The Review of Economic Studies Oxford University Press

Moral Hazard and the Optimality of Debt

Loading next page...
 
/lp/ou_press/moral-hazard-and-the-optimality-of-debt-Kun3EgyHn0
Publisher
Oxford University Press
Copyright
© The Author(s) 2017. Published by Oxford University Press on behalf of The Review of Economic Studies Limited.
ISSN
0034-6527
eISSN
1467-937X
D.O.I.
10.1093/restud/rdx080
Publisher site
See Article on Publisher Site

Abstract

Abstract I show that, in a benchmark model, debt securities minimize the welfare losses associated with the moral hazards of excessive risk-taking and lax effort. For any security design, the variance of the security payoff is a statistic that summarizes these welfare losses. Debt securities have the least variance, among all limited liability securities with the same expected value. In other models, mixtures of debt and equity are exactly optimal, and pure debt securities are approximately optimal. I study both static and dynamic security design problems, and show that these two types of problems are equivalent. I use moral hazard in mortgage lending as a recurring example, but my results apply to other corporate finance and principal-agent problems. 1. Introduction Debt contracts are widespread, even though debt encourages excessive risk taking. In this article, I show that debt is the optimal security design in a model in which both reduced effort and excessive risk-taking are possible, even though debt leads to excessive risk taking. In the model, the seller of the security can alter the probability distribution of outcomes in arbitrary ways. This allows the seller to both alter the mean value of the outcome (‘effort’) and change the other moments of the distribution of outcomes (‘risk-shifting’). To minimize the welfare losses arising from this moral hazard, the security’s payout must be designed to minimize variance. Debt securities are optimal because, among all limited-liability securities with the same expected value, they have the least variance. The model is motivated by settings in which debt contracts are prevalent and both reduced effort and risk-shifting are possible. For example, in residential mortgage origination, lenders might be able to both underwrite loans more or less diligently (effort) and use private information to choose more or less risky borrowers (risk-shifting). Prior to the 2008 financial crisis, mortgage lenders sold debt securities, backed by mortgage loans, to outside investors. The issuance of these securities may have weakened the incentives of mortgage lenders to lend prudently. Despite this effect, I argue that debt can be optimal, because debt securities balance the need to encourage effort with the need to avoid risk-shifting. Many elements of the model are standard in the security design literature. The security is the portion of the asset value received by the outside investors, and is subject to limited liability constraints. If the seller retains a levered equity claim, she1 has sold a debt security. There are gains from trade, meaning that the outside investors value the security more than the seller does, holding the distribution of outcomes fixed. Both the outside investors and the seller are risk-neutral. The key non-standard element of the model is a flexible form of moral hazard, which builds on the work of Holmström and Milgrom (1987). The seller, through her actions, can create a “zero-cost” distribution of outcomes, which she will do if she has no stake in the outcome. If the seller creates any other probability distribution, she incurs a cost. In my benchmark model, the cost to the seller of choosing a probability distribution $$p$$ is proportional to the Kullback-Leibler divergence (or “relative entropy”) of $$p$$ from the zero-cost distribution. Under this assumption, the combined effects of reduced effort and risk-shifting can be summarized by one statistic, the variance of the security payoff. The gains from trade are proportional to another statistic, the mean security payoff. Debt securities maximize mean-variance trade-offs over the set of limited liability securities, and are therefore optimal in this benchmark model. Minimizing the variance of the security payoff is equivalent to making the security “as flat as possible”. Intuitively, if the security pays the buyer more in state $$i$$ than in state $$j$$, the seller will inefficiently act to ensure that state $$i$$ is less likely than state $$j$$. Reducing the security payoff in state $$i$$ and increasing it in state $$j$$ would cause the seller to increase the likelihood of state $$i$$ relative to state $$j$$, benefitting the buyer. Completely flat securities would be best, but because of the limited liability constraints, the security can only be completely flat if it pays nothing at all and foregoes all of the gains from trade. Debt securities are the optimal compromise: they have positive expected value, capturing some gains from trade, but are flat wherever possible, minimizing inefficient actions by the seller. I also analyse larger classes of cost functions. When the cost function is not the KL divergence, but instead another $$\alpha$$-divergence, the optimal security designs exist on a continuum, with the “live-or-die” security of Innes (1990) at one end (see the Appendix, Figure A.1), equity at the other, and debt in the middle. In some cases, the security design is upward sloping, and can be thought of as a mix of equity and debt. In other cases, the optimal security design is downward sloping. In these cases, restricting security designs to be monotone for the buyer restores the optimality of debt. Both the KL divergence and the other $$\alpha$$-divergences are part of a broader class of divergences, the invariant divergences. For this class of divergences, I show that debt securities, and mixtures of debt and equity, are approximately optimal. The approximation I use applies when the moral hazard and gains from trade are small relative to scale of the assets. It is appropriate in settings in which the difference, in utility terms, between a well-designed contract and a poorly designed contract is comparable to the seller’s “value added”. I describe the approximation in more detail, and discuss when it is and is not appropriate, in Section 5. Under this approximation, debt is first-order optimal, meaning that debt securities are a detail-free way to achieve nearly the same utility as the optimal security design. Mixtures of debt and equity, which correspond to the optimal contracts for $$\alpha$$-divergences, are second-order optimal for all invariant divergences. This can be interpreted as a “pecking order”, in which the security design grows more complex as the size of both the moral hazard problem and gains from trade grow, relative to the scale of the assets. Finally, I provide a micro-foundation for the security design problem with the KL divergence cost function, using a dynamic model. I show that a continuous-time moral hazard problem, similar to Holmström and Milgrom (1987), is equivalent to the static moral hazard problem. The equivalence of the static and dynamic problems provides an intuitive explanation for how the seller can create any probability distribution of outcomes. The key distinction between the dynamic models I discuss and the principal-agent models of Holmström and Milgrom (1987) is limited liability. In Holmström and Milgrom (1987), linear contracts for the seller (agent) are optimal, because they induce the seller to take the same (efficient) action each period. In my model, because of limited liability, the only way to implement the efficient action at every state and time is to offer the seller a very large share of the asset value. However, offering the seller a large share of the asset value limits the gains from trade. It is preferable to pay the seller nothing in the worst states of the world, and then at some point offer a linear payoff. Even though this design does not induce the seller to take the efficient action at every state and time, it achieves more gains from trade. The design for the retained tranche that I have just described, levered equity, corresponds to selling a debt security. This optimality of debt in my benchmark model illustrates a key distinction between my model and the existing security design literature. The classic paper of Jensen and Meckling (1976) argues that debt securities are good at providing incentives for effort, but create incentives for risk-shifting, while equity securities avoid risk-shifting problems, but provide weak incentives for effort. A natural conjecture, based on these intuitions, is that when both risk-shifting problems and effort incentives are important, the optimal security will be “in between” debt and equity. In my benchmark model, contrary to this intuition, a debt security is optimal. The argument of Jensen and Meckling (1976) that debt is best for inducing effort relies on a restriction to monotone security designs. The “live-or-die” result of Innes (1990) shows that when the seller can supply effort to improve the distribution of outcomes (in a monotone likelihood ratio property sense), it is efficient to give the seller all of the asset value when the asset value is high, and nothing otherwise. A revised intuition, which I formalize in Section 4, is that the securities (including debt) that optimally balance encouraging effort and avoiding risk-shifting are “in between” the live-or-die security and equity.2 The benchmark model in this article takes the idea of flexibility in moral hazard problems to an extreme, allowing the seller to create any probability distribution of outcomes, subject to a cost. This approach to moral hazard problems was introduced by Holmström and Milgrom (1987). It is conceptually similar to the notion of flexible information acquisition, emphasized in Yang (2015).3 However, in this article, the cost of choosing a probability distribution should be interpreted as a cost associated with the actions required to cause that distribution to occur ($$e.g.$$ underwriting or not underwriting mortgage loans). In the rational inattention literature, which Yang (2015) builds on, gathering or processing information (as opposed to taking actions) is costly. This distinction is blurred in the rational inattention micro-foundation in the Online Appendix, Section 2. In contrast, much of literature on security design with moral hazard allows the seller to control only one or two parameters of the probability distribution. These papers do not find that debt is optimal. In Acharya et al. (2016), bank managers can both shift risk and pursue private benefits, but do this by choosing among three possible investments. In Edmans and Liu (2010), who argue that is efficient for the agent (not the principal) to hold debt claims, also have a binary project choice. Closer to this article is Biais and Casamatta (1999), in which there are three possible states and two levels of effort and risk-shifting. Biais and Casamatta (1999) interpret the optimal contracts over those three states as mixtures of debt and equity. Hellwig (2009) has a two-parameter model with continuous choices for risk-shifting and effort, and finds that a mix of debt and equity are optimal. In his model, risk-shifting is costless for the agent. Fender and Mitchell (2009) have a model of screening and tranche retention, which is a single-parameter model. This article differs from this literature by allowing for arbitrary outcome spaces, arbitrary probability distributions, and continuous moral hazard choices, which makes deriving general results difficult (Grossman and Hart, 1983), and by considering flexible models of moral hazard. In the Online Appendix, Section 1, I discuss how to extend my results to parametric models, relating the framework I develop to this literature. Innes (1990) advocates a moral-hazard theory of debt, but debt is optimal only when the seller controls a single parameter, and the security is constrained to be monotone. If the security does not need to be monotone, or if the seller controls both the mean and variance of a log-normal distribution, the optimal contract is not debt.4 In the corporate finance setting, one argument for monotonicity is that a manager can borrow from a third party, claim higher profits, and then repay the borrowed money from the extra contract payments. In addition to the accounting and legal barriers to this kind of “secret borrowing”, the third party might find it difficult to force repayment. In the context of asset-backed securities, where cash flows are more easily verified, secret borrowing is even less plausible. Another argument in favour of monotonicity concerns the possibility of the buyer (principal, outside shareholders) sabotaging the project. In the context of securitization, the buyer exerts minimal control over the securitization trust and sabotage is not a significant concern. There is a large literature that justifies debt for reasons other than moral hazard. Papers invoking adverse selection include Nachman and Noe (1994), DeMarzo and Duffie (1999), Dang et al. (2011), Vanasco (2017), and Yang (2015). In unreported results, I find that the benchmark model of this article and of Yang (2015) can be combined to produce debt as the optimal contract, whereas other parametric models of moral hazard, when combined with Yang (2015), would not generally result in debt. Other theories of debt include costly state verification (Townsend, 1979; Gale and Hellwig, 1985) and explanations based on control or limiting investment (Jensen, 1986; Aghion and Bolton, 1992; Hart and Moore, 1994). I begin in Section 2 by explaining the benchmark security design problem, whose structure is used throughout the article. I then show in Section 3 that for a particular cost function, debt is optimal, and explain how this relates to a mean-variance trade-off. Next, I analyse other cost functions in Section 4, describing the optimal contracts and showing a related mean-variance trade-off applies. I will then introduce an approximation in Section 5, and show that for an even larger class of cost functions, the same tradeoffs hold in an approximate sense. In Section 6 and Section 7, I provide micro-foundations for the non-parametric models, from a continuous time model. In the Appendix, Section C, I discuss a calibration for the example of residential mortgage lending. In the Online Appendix, Section 1, I discuss parametric models, and apply the results in Online AppendixSection 2 to a model of rational inattention in mortgage lending. 2. Model Framework In this section, I introduce the security design framework that I will discuss throughout the article. The problem is close to Innes (1990) and other papers in the security design literature. There is a risk-neutral agent, called the “seller”, who owns an asset in the first period. In the second period, one of $$N+1$$ possible states, indexed by $$i\in\Omega=\{0,1,\ldots,N\}$$, occurs.5 In each of these states, the seller’s asset has an undiscounted value of $$v_{i}$$. I assume that $$v_{0}=0$$, $$v_{i}$$ is non-decreasing in $$i$$, and that $$v_{N}>v_{0}$$. The seller discounts second period payoffs to the first period with a discount factor $$\beta_{s}$$. There is a second risk-neutral agent, the “buyer”, who discounts second period payoffs to the first period with a larger discount factor, $$\beta_{b}>\beta_{s}$$. Because the buyer values second period cash flows more than the seller, there are “gains from trade” if the seller gives the buyer a second period claim in exchange for a first period payment. I will refer to the parameter $$\kappa=\frac{\beta_{b}-\beta_{s}}{\beta_{s}}$$ as the gains from trade.6 I assume there is limited liability, so that in each state the seller can credibly promise to pay at most the value of the asset. I also assume that the seller must offer the buyer a security, meaning that the second period payment to the buyer must be weakly positive. In this sense, the seller must offer the buyer an “asset-backed security”. When the asset takes on value $$v_{i}$$ in the second period, the security pays $$s_{i}\in[0,v_{i}]$$ to the buyer. Following the conventions of the literature, I will say that the security is a debt security if $$s_{i}=\min(v_{i},\bar{v})$$ for some $$\bar{v}\in(0,v_{N})$$. To simplify the exposition, I make particular assumptions about the timing of the events and the bargaining power of the agents. I will assume that, during the first period, the seller first designs the security, and then makes a “take-it-or-leave-it” offer to the buyer at price $$K$$. If the buyer rejects the offer, the seller retains the entire asset. After the buyer accepts or rejects the offer, the seller takes actions that modify the value of the assets (the moral hazard). The first period ends, uncertainty is resolved, and then in the second period payoffs are determined. This timing convention, which is standard in principal-agent models, is not appropriate for some applications. For example, in mortgage origination, much of the lender’s moral hazard occurs when the loans are being underwritten, before they are sold to outside investors. In the Appendix, Section B, I show that this timing of events is not necessary for the main results. This robustness to the timing of events contrasts with models based on adverse selection by the seller, such as DeMarzo and Duffie (1999), in which the timing of events is crucial. In the same Appendix section, I also show that allowing the buyer and seller to Nash-bargain over the price, or over both the price and security design, does not alter the main results. The moral hazard problem occurs when the seller creates or modifies the asset. During this process, the seller will take a variety of actions, and these actions will alter the probability distribution of second period asset values. Following Holmström and Milgrom (1987), I model the seller as directly choosing a probability distribution, $$p$$, over the sample space $$\Omega$$, subject to a cost $$\psi(p)$$. I will focus models in which any probability distribution $$p$$ can be chosen, which I will call “non-parametric”. In Online AppendixSection 1, I discuss models in which $$p$$ must belong to a parametric family of distributions.7 I will make several assumptions about the cost function $$\psi(p)$$. First, I assume that there is a unique probability distribution, $$q$$, with full support over $$\Omega$$, that minimizes the cost. Second, because I will not consider participation constraints for the seller, I assume without loss of generality that $$\psi(q)=0$$. I also assume that $$\psi(p)$$ is strictly convex and at least twice differentiable. Below, I will impose additional assumptions on the cost function, but first will describe the moral hazard and security design problems. The moral hazard occurs because the seller cares only about maximizing the value of her payoff. When the value of the asset is $$v_{i}$$, the discounted value of the seller’s retained tranche is \[ \eta_{i}=\beta_{s}(v_{i}-s_{i}). \] Because of the assumption that $$v_{0}=0$$, and limited liability, it is always the case that $$\eta_{0}=s_{0}=0$$. Let $$p^{i}$$ denote the probability that state $$i\in\Omega$$ occurs, under probability distribution $$p$$. The moral hazard sub-problem of the seller can be written as \begin{equation} \phi(\eta)=\sup_{p\in M}\left\lbrace\sum_{i>0}\eta_{i}p^{i}-\psi(p)\right\rbrace,\label{eq:MH-general} \end{equation} (2.1) where $$M$$ is the set of feasible probability distributions and $$\phi(\eta)$$ is the indirect utility function. In the non-parametric case, when $$M$$ is the set of all probability distributions on the sample space, the moral hazard problem has a unique optimal $$p$$ for each $$\eta$$. Moreover, the smoothness and convexity of $$\psi(p)$$ guarantee that this optimal policy, $$p(\eta)$$, is itself differentiable with respect to $$\eta$$. In contrast, for the parametric case (Online AppendixSection 1), there may be multiple $$p\in M$$ that achieve the same optimal utility for the seller. The buyer cannot observe $$p$$ directly, but can infer the seller’s choice of $$p$$ from the design of the retained tranche $$\eta$$. At the security design stage, the buyer’s valuation of a security $$s$$ is determined by both the structure of the security and the buyer’s inference about which probability distribution the seller will choose, $$p(\eta)$$. Without loss of generality, I will define the units of the seller’s and buyer’s payoffs so that $$\beta_{s}\sum_{i}v_{i}p^{i}(\beta_{s}v_{i})=1$$. That is, if the seller retains the entire asset, and takes actions in the moral hazard problem accordingly, the discounted asset value is one. I use this convention to ensure that the units correspond to a quantity that is at least potentially observable: the value of the assets, if those assets are retained by the seller. This convention is useful in the calibration of the model in the Appendix, Section C. Let $$s_{i}(\eta)$$ be the security corresponding to retained tranche $$\eta$$. The security design problem is \begin{equation} U(\eta^{*})=\max_{\eta}\left\lbrace\beta_{b}\sum_{i>0}p^{i}(\eta)s_{i}(\eta)+\phi(\eta)\right\rbrace,\label{eq:sec-util-eq} \end{equation} (2.2) subject to the limited liability constraint that $$\eta_{i}\in[0,\beta_{s}v_{i}]$$. From the seller’s perspective, when she is designing the security, she internalizes the effect that her subsequent choice of $$p$$ will have on the buyer’s valuation, because that valuation determines the price at which she can sell the security. The security serves as a commitment device for the seller, providing an incentive for her to choose a favorable $$p$$. This commitment is costly, because allocating more of the available asset value to the retained tranche necessarily reduces the payout of the security, reducing the gains from trade. Many of the results in this article are discussed using perturbation arguments. Any infinitesimal perturbation to the security design (and therefore retained tranche) has two effects on the seller’s utility in the security design problem. The first effect is the “direct” effect, which changes the seller’s utility by transferring more or less expected value from the seller to the buyer. In general, the size of this effect is controlled by the gains from trade parameter, $$\kappa$$. The second effect is the “indirect” effect, which changes the buyer’s valuation of the security, through the change in the seller’s behaviour in the moral hazard problem. There is no “indirect” effect on the seller’s utility in the moral hazard problem, because the seller is maximizing her utility in the moral hazard problem when she chooses the probability distribution (the envelope theorem). Consider a differentiable perturbation around the optimal security design, $$\eta(\epsilon)$$, with $$\eta(0)=\eta^{*}$$, that is feasible for some $$\epsilon>0$$. As mentioned above, in the non-parametric models that I study, $$p(\eta)$$ is differentiable. In this case, the two effects of a perturbation can be summarized by the following first-order optimality condition with respect to $$\epsilon$$, the size of the perturbation: \begin{equation} \frac{\partial U(\eta(\epsilon))}{\partial\epsilon}|_{\epsilon=0^{+}}=- \underbrace{\kappa\sum_{i\in\Omega}p^{i}(\eta^{*})\frac{\partial\eta_{i}}{\partial\epsilon}|_{\epsilon=0^{+}}}_{\text{direct effect}}+ \underbrace{\beta_{b}\sum_{i,j\in\Omega}s_{j}^{*}\frac{\partial p^{j}(\eta)}{\partial\eta_{i}}|_{\eta=\eta^{*}}\frac{\partial\eta_{i}} {\partial\epsilon}|_{\epsilon=0^{+}}}_{\text{indirect effect}}\leq0.\label{eq:lagrangian-foc} \end{equation} (2.3) Below, I will further decompose the indirect effect into an indirect effect due to a change in effort and an indirect effect due to a change in risk-shifting. First, however, I will describe the cost functions that I will be studying in more detail. As discussed earlier, the cost function $$\psi(p)$$ is convex and minimized at $$\psi(q)=0$$. It follows that the cost function is proportional to a divergence8 between $$p$$ and the zero-cost distribution, $$q$$, defined for all $$p,q\in M$$ : \[ \psi(p)=\theta D(p||q). \] Here, the scalar parameter $$\theta>0$$ controls how costly it is for the seller to change the probability distribution in the moral hazard problem. I introduce this parameter for the purpose of taking comparative statics. There are many divergences that have been defined in the information theory literature (e.g. Ali and Silvey, 1966; Csiszár, 1967; Amari and Nagaoka, 2007). In Section 3, I begin the article by focusing on a particular divergence, the Kullback-Leibler divergence. The KL divergence, also called relative entropy, is defined as \[ D_{KL}(p||q)=\sum_{i\in\Omega}p^{i}\ln\left(\frac{p^{i}}{q^{i}}\right). \] The KL divergence has the assumed convexity and differentiability properties, and also guarantees that the $$p$$ chosen by the seller will be mutually absolutely continuous with respect to $$q$$. The KL divergence has been used in a variety of economic models, notably Hansen and Sargent (2008), who use it to describe the set of models a robust decision maker considers. It also has many applications in econometrics, statistics, and information theory, and the connection between the security design problem and these topics will be discussed later in the article. I will show that when the cost function is proportional to the KL divergence, debt is the optimal security design. The KL divergence is a member of the family of $$\alpha$$-divergences. These divergences are parametrized by a real number, $$\alpha$$, which controls how the curvature of the divergence changes as $$p$$ moves away from $$q$$. The $$\alpha$$-divergences can be written, whenever $$|\alpha|\neq1$$, as \[ D_{\alpha}(p||q)=\sum_{i\in\Omega}\frac{4}{1-\alpha^{2}}q^{i}\left(1-\left(\frac{p^{i}}{q^{i}}\right)^{\frac{1}{2}(1-\alpha)}+\frac{1}{2}(1-\alpha)\left(\frac{p^{i}}{q^{i}}-1\right)\right). \] The limits of $$\alpha\rightarrow-1$$ and $$\alpha\rightarrow1$$ correspond to the KL divergence and the “reversed” KL divergence, respectively.9 For this class of divergences, in Section 4 I will show that, for $$\alpha\leq-1$$, the optimal contracts are mixtures of debt and equity. Commonly discussed $$\alpha$$-divergences include the Hellinger distance ($$\alpha=0$$) and the $$\chi^{2}$$-divergence ($$\alpha=-3)$$. I will also discuss a more general class of divergences, that contains the $$\alpha$$-divergences, known as the “$$f$$-divergences”. This class of divergences can be written as \begin{equation} D_{f}(p||q)=\sum_{i\in\Omega}q^{i}f\left(\frac{p^{i}}{q^{i}}\right),\label{eq:F-Div-Def} \end{equation} (2.4) where $$f(u)$$ is a convex function on $$\mathbb{R}^{+}$$ with $$f(1)=0$$. I adopt the convention (without loss of generality) that $$f(u)\geq0$$.10 I will limit my discussion to sufficiently differentiable $$f$$-functions, for mathematical convenience, and use the normalization that $$f''(1)=1$$. The $$f$$-divergences are analytically convenient because they are additively separable (or “decomposable”) across states. The most general class of divergences that I will discuss are the “invariant divergences”, which contain the $$f$$-divergences, along with other divergences that are not additively separable, such as the Chernoff and Bhattacharyya distances. Invariant divergences are defined by their invariance with respect to sufficient statistics (Čencov, 2000; Amari and Nagaoka, 2007).The exact definition of an invariant divergence is rather technical; for our purposes, what is special about these divergences is that, up to second order, they resemble the KL divergence, and up to third order, they resemble the $$\alpha$$-divergences. In Section 5, I will define this “resemblance” more precisely, and define how a security design can be “approximately optimal”. I will then show that debt, or mixtures of debt and equity, are approximately optimal as a result. To summarize, the divergences I discuss are related in the following way: \[ KL\in\alpha-\text{divergences}\subset f\text{-divergences}\subset\text{Invariant Divergences}\subset\text{All Divergences}. \] The KL divergence, and the broader class of invariant divergences, are interesting because they are closely related to ideas from information theory. In the Online Appendix, Section 2, I illustrate this in a model based on rational inattention (Sims, 2003), in which the cost function is related to the KL divergence. The KL divergence cost function can also be micro-founded from a dynamic moral hazard problem. In Section 6, I show that a large class of continuous time problems are equivalent to the static moral hazard problem with a divergence cost function, and show that in a particular case, that divergence is the KL divergence. In Section 7, I extend this analysis to a more general class of continuous time problems and show that they are related, in a certain sense, to static moral hazard problems with invariant divergence cost functions. I will refer throughout the article to “effort” and “risk-shifting” as separate components of the moral hazard problem. Next, I will define “effort” and “risk-shifting” formally, and clarify the connection between this framework and more conventional models of moral hazard. I define “effort” as the change in the discounted expected value of the assets: \[ e=\beta_{s}\sum_{i\in\Omega}(p^{i}-q^{i})v_{i}. \] Given a retained tranche $$\eta$$, define the effort it induces as $$e(\eta)$$. For any $$\eta$$, there is an “equivalent equity share”, $$\gamma(\eta)$$, for the seller that would induce the same amount of effort: $$e(\eta)=e(\gamma(\eta)\beta_{s}v)$$.11 In the model of Innes (1990), the seller is restricted to choosing from a family of probability distributions that satisfy a monotone likelihood ratio property. As a result, effort, defined in this way, is one-to-one with the choice variable in Innes (1990). In models with more flexible moral hazard, effort is not one-to-one with the choices of the agent. In these models, we can define “risk-shifting” as the actions that the agent takes which change the probability distribution of outcomes without changing the expected value of asset. This includes actions that change the higher moments of the asset distribution, and also actions that keep the distribution of asset values constant, but move probability between states with the same asset value ($$i,j\in\Omega$$ with $$v_{i}=v_{j}$$). Using these definitions of effort and risk-shifting, I decompose the indirect effect of any security design perturbation (equation 2.3) into effort and risk-shifting components. Lemma 1. The indirect effect of any security design perturbation can be decomposed into an effect due to the change in effort, and an effect due to the change in risk shifting: \begin{align*} \underbrace{\beta_{b}\sum_{j\in\Omega}s_{j}^{*}\frac{dp^{j}(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}}}_{\textit{indirect effect}} & =\underbrace{\frac{\beta_{b}}{\beta_{s}}(1-\gamma(\eta^{*}))\frac{de(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}}}_{\textit{indirect effect on effort}}-\\ & \underbrace{\frac{\beta_{b}}{\beta_{s}}\sum_{j\in\Omega}\frac{dp^{j}(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}}(\eta_{j}^{*}-\gamma(\eta^{*})\beta_{s}v_{j})}_{\textit{indirect effect on risk shifting}}. \end{align*} Proof. See Online Appendix, Section 3.1. ǁ This decomposition is not unique; there are many other ways of decomposing the indirect effects into different components. This particular decomposition connects the flexible moral hazard framework used in this article to other models of moral hazard. Using this definition of effort and risk-shifting, an equity contract causes no utility loss due to risk-shifting, because an equity contract is identical to its “equivalent equity” contract, consistent with the argument of Jensen and Meckling (1976). However, equity contracts might not be a very efficient way to induce effort by the seller. If the effort level is one-to-one with the seller’s choices (as in Innes (1990)), there is no possibility of risk-shifting, and this framework reduces to the classic model of moral hazard. Moral hazard models with two choice parameters, such as Hellwig (2009), allow the seller to risk-shift in one dimension, while also incorporating an effort choice. The non-parametric model of moral hazard emphasized in this article extends these models by allowing more dimensions of risk-shifting. In models with only one dimension of risk-shifting, if there are many possible outcomes ($$i.e.$$ more than the three in Biais and Casamatta, 1999), there will in general be contracts other than equity contracts that also induce no risk-shifting. In contrast, in the non-parametric model of moral hazard, equity contracts are the only contracts that avoid risk-shifting entirely. The decomposition also illustrates the externalities associated with the seller’s choices in the moral hazard problem. The buyer benefits from an increase in the seller’s effort, assuming that the seller’s equivalent equity share is less than 100%. At the same time, the buyer can benefit or be harmed by the change in the seller’s risk shifting behaviour, depending on whether the change in the security design induces more or less risk shifting. I will show in the following sections that the effect of a perturbation to the security design on risk shifting depends on whether the security becomes more or less equity-like. The models described in the article use divergences to create cost functions, which rules out two interesting cases: free disposal of output by the seller and free risk-shifting. Free disposal of output by the seller is a common assumption in security design problems, and is used to justify restricting the set of securities to designs for which the seller’s payoff is weakly increasing in the asset value. Free disposal of output does not change any of the results in the article—all of the optimal security designs without free disposal have monotone payoffs for the seller, and are therefore still optimal among the set of monotone security designs. I discuss this more in the Appendix, Section D. Free risk-shifting is the assumption that only effort, and not risk-shifting, is costly for the agent. Formally, this would require that $$D(p||q)=D(p'||q)$$ for all $$p,p'$$ with the same expected value. Technically, the assumptions of strict convexity for $$D(p||q)$$ and that $$D(p||q)=0$$ only if $$p=q$$ both rule out this case. However, the analysis in this case is straightforward. As risk-shifting becomes free, concerns about risk-shifting dominate concerns about effort, and equity contracts are optimal. This result is closely related to Ravid and Spiegel (1997), Carroll (2015), and Barron et al. (2017), and is also shown in the Appendix, Section D. In this section, I have introduced the framework that I will use throughout the article. In the next section, I analyse the benchmark model, in which the cost function is the KL divergence. 3. The Benchmark Model In this section, I discuss the non-parametric version of the model, in which the set $$M$$ of feasible probability distributions is the set of all probability distributions on $$\Omega$$. I assume that the cost function is proportional to the KL divergence between $$p$$ and $$q$$, \[ \psi(p)=\theta D_{KL}(p||q). \] I will show that the optimal security design is a debt contract. In the text, I will outline the proof, using a perturbation argument; a complete proof can be found in the Appendix, Section 3.5.12 I will start by discussing the first-order condition of the moral hazard problem. The KL divergence cost function becomes infinitely sloped at the boundaries of the simplex, and therefore guarantees an interior solution to the moral hazard problem, equation 2.1, for all $$\eta$$. The KL divergence is also convex, consistent with the assumptions described in the previous section. As a result, the first-order condition in the moral hazard problem must hold. For any $$i>0$$, we have \[ \eta_{i}=\theta\left(\ln\left(\frac{p^{i}}{q^{i}}\right)-\ln\left(\frac{p^{0}}{q^{0}}\right)\right). \] Intuitively, if the seller receives a high payoff in state $$i$$, she will increase the probability of state $$i$$ relative to state $$0$$, in which she receives zero payoff. From this first-order condition, we can observe that the semi-elasticities of the relative probabilities $$p^{i}(\eta)$$ and $$p^{0}(\eta)$$ to the payoff $$\eta_{i}$$ satisfy \begin{equation} \frac{\partial\ln(p^{i}(\eta))}{\partial\eta_{i}}-\frac{\partial\ln(p^{0}(\eta))}{\partial\eta_{i}}=\theta^{-1}.\label{eq:elasticity-KL} \end{equation} (3.1) This constant difference of semi-elasticities property is part of what is special about the KL divergence. It is constant in two respects; first, the difference of the elasticities does not depend on how far $$p(\eta)$$ is from $$q$$, and second, it is symmetric across the states $$i\in\Omega$$. The $$\alpha$$-divergences that will be discussed in the next section relax the first of these properties—the elasticity will depend on how far the endogenous probability distribution is from the zero-cost distribution. The entire class of invariant divergences, which are used throughout the article, share the second property, imposing a sort of symmetry across states of the world (this is essentially the meaning of “invariant”). Using this property, we can construct perturbations of the retained tranche (and therefore the security design) that changes the probability in two different states, $$p^{i}$$ and $$p^{j}$$, with $$i>0$$ and $$j>0$$, while leaving all other probabilities unchanged. Let $$\eta^{*}$$ be the optimal design for the retained tranche. Suppose that, starting from $$\eta^{*}$$, we increase $$\eta_{i}$$ by an amount $$\frac{\epsilon}{p^{i}(\eta^{*})}$$, while decreasing $$\eta_{j}$$ by an amount $$\frac{\epsilon}{p^{j}(\eta^{*})}$$. Conjecture that this perturbation, for infinitesimal values of $$\epsilon$$, increases $$p^{i}$$ and decreases $$p^{j}$$ by $$\theta^{-1}\epsilon$$, while leaving all other probabilities, and in particular $$p^{0}$$, unchanged. We can verify this conjecture by observing that equation 3.1 above is satisfied for all states, and that the sum of the probabilities across states remains equal to one. Having constructed this perturbation, I now turn to the security design problem. Consider the following property of debt: for a security $$s$$ to be a debt, there must be no pairs $$s_{i}$$ and $$s_{j}$$, with $$i\neq j$$, such that $$s_{j}<v_{j}$$ and $$s_{j}<s_{i}$$. This property requires that if the limited liability constraint does not bind in either state $$i$$ or state $$j$$, the security values must be equal, and if the constraint binds only in one of the two states, the payoff in that state must be smaller than in the “flat” part of the debt contract. It is essentially the definition of a debt contract, subject to the caveat that “selling everything” and “selling nothing” also have this property. Suppose that the optimal security design $$s^{*}$$ does not have this property (and therefore is not debt). For this to be true, there must be no perturbation of the security design that is feasible and can improve the seller’s utility in the security design problem. Using the perturbation described above, I will show that such a perturbation does exist, and therefore that the optimal contract is a debt (or selling everything/nothing, which are ruled out in the proof in the Appendix). We have supposed that, for the optimal security design $$s^{*}$$, there is a pair of states $$i,j\in\Omega$$, $$i\neq j$$, with $$s_{j}^{*}<v_{j}$$ and $$s_{j}^{*}<s_{i}^{*}$$. Now imagine that we increase $$s_{j}$$ by $$\beta_{s}^{-1}\frac{\epsilon}{p^{j}(\eta^{*})}$$ while decreasing $$s_{i}$$ by $$\beta_{s}^{-1}\frac{\epsilon}{p^{i}(\eta^{*})}$$. The values of the retained tranche in those states, $$\eta_{i}$$ and $$\eta_{j}$$, move opposite the security design and are perturbed in exactly the manner discussed above. Note that, because $$s_{j}^{*}<v_{j}$$ and $$s_{i}^{*}>s_{j}^{*}\geq0$$, this perturbation does not violate the limited liability constraints. The effect of this perturbation on the utility in the security design problem is described by equation 2.3 in the previous section. We can see that there is no “direct effect” of this perturbation; holding the probability distribution the seller chooses fixed, the perturbation does not affect the expected value of the security design. The perturbation does increase the probability of state $$i$$ by $$\theta^{-1}\epsilon$$, and it decreases the probability of state $$j$$ by $$\theta^{-1}\epsilon$$, leaving the probability of all other states the same. Therefore, the “indirect” effect is $$\theta^{-1}(s_{i}^{*}-s_{j}^{*})$$, which was assumed to be greater than zero. It follows that this perturbation improves the seller’s utility, and therefore the optimal contract must be a debt, selling everything, or selling nothing. This argument can be summarized as showing that the security design should be “flat wherever possible.” After introducing the formal result, I will apply the decomposition between effort and risk-shifting introduced in the previous section. The following proposition summarizes this perturbation argument, rules out selling everything and selling nothing, and also establishes a result about the face value of the debt contract. Proposition 1. In the non-parametric model, with the cost function proportional to the Kullback-Leibler divergence, the optimal security design is a debt contract, \[ s_{j}^{*}=\min(v_{j},\bar{v}), \] for some $$\bar{v}>0$$. The face value of the debt satisfies \[ \beta_{b}\bar{v}-\beta_{b}\sum_{i\in\Omega}p^{i}(\eta^{*})s_{i}^{*}=\kappa\theta. \] If the highest possible asset value is sufficiently large ($$v_{N}>\sum_{i}q^{i}v_{i}+\frac{\kappa}{\beta_{b}}\theta$$), then $$\bar{v}<v_{N}$$. Proof. The results are proven in the proof of Proposition 3. ǁ The result in Proposition 1 shows that debt is optimal, for any full-support zero-cost distribution $$q$$. The condition that $$v_{N}$$ be “high enough” is weak. If it was not satisfied for some sample space $$\Omega$$ and zero-cost distribution $$q$$, one could include a new highest value $$v_{N+1}$$ in $$\Omega$$, occurring with vanishingly small probability under $$q$$, such that the condition was satisfied. Intuitively, the sample space must contain high enough values to observe the “flat” part of the debt security. The perturbation argument described above lead to the conclusion that the security design should be flat wherever possible. A different way to view the same idea, which is mathematically equivalent, can be derived by analysing the indirect effect described above. The following corollary describes the direct and indirect effects of any perturbation in the security design problem, and decomposes the “indirect effect” into effort-only and risk-shifting components. Corollary 1. Under the conditions of Proposition 1, the effect of any perturbation is \[ \frac{\partial U(\eta(\epsilon))}{\partial\epsilon}|_{\epsilon=0^{+}}=\underbrace{\kappa\frac{\partial}{\partial\epsilon} E^{p(\eta^{*})}[\beta_{s}s(\epsilon)]|_{\epsilon=0^{+}}}_{\textit{direct effect}}-\underbrace{\frac{1}{2}\frac{\beta_{b}}{\beta_{s}}\theta^{-1}\frac{\partial}{\partial\epsilon}V^{p(\eta^{*})}[\beta_{s}s(\epsilon)]|_{\epsilon=0^{+}}}_{\textit{indirect effect}}. \] The indirect effect can be decomposed into an effort-only effect \[ \frac{\beta_{b}}{\beta_{s}}(1-\gamma(\eta^{*}))\frac{de(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}}=\theta^{-1}\frac{\beta_{b}}{\beta_{s}}(1-\gamma(\eta^{*}))\frac{\partial}{\partial\epsilon}Cov^{p(\eta^{*})}[\eta(\epsilon),\beta_{s}v]|_{\epsilon=0^{+}}, \] where$$Cov^{p(\eta^{*})}$$denotes covariance, and a risk shifting effect \[ -\frac{\beta_{b}}{\beta_{s}}\sum_{j\in\Omega}\frac{dp^{j}(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}}(\eta_{j}^{*}-\gamma(\eta^{*})\beta_{s}v_{j})=-\frac{1}{2}\theta^{-1}\frac{\beta_{b}}{\beta_{s}}\frac{\partial}{\partial\epsilon}V^{p(\eta^{*})}[\eta(\epsilon)-\gamma(\eta^{*})\beta_{s}v]|_{\epsilon=0^{+}}. \] Proof. The results are proven in the proof of Corollary 3. ǁ This corollary offers a different perspective on why the KL divergence cost function leads to debt contracts as the optimal security design. The perturbation argument discussed earlier lead to the conclusion that the optimal security should be flat wherever possible. The perturbation was designed to have zero direct effect, and therefore, by Corollary 1, would only change utility to the extent that it changed the variance of the security payoff. Examining the equity, live-or-die, and debt securities shown in the Appendix, Figure A.1, it is clear why the debt security minimizes the variance of the payout, among all limited-liability securities with the same expected value—because it is as flat as possible.13 The proof of Proposition 1 shows both that the variance-minimizing security is a debt contract, and that debt is optimal in the security design problem. The corollary also discusses the role of effort and risk-shifting in the problem. Intuitively, if we perturb the security design to align the seller’s retained tranche with the value of the underlying assets, this induces the seller to exert more effort. This extra effort benefits the buyer, assuming that the seller is not the full residual claimant. The special property of the KL divergence is that the correct notion of “alignment” is covariance. Similarly, if we perturb the security design to cause the seller’s retained tranche to vary more, relative to the equity tranche that induces the same effort, we create more opportunities for risk-shifting, reducing the value of the buyer’s security. Again, the special property of the KL divergence is that the variance summarizes this effect. Several of the assumptions in the benchmark model can be relaxed without altering the debt security result of Proposition 1. The lowest possible value, $$v_{0}$$, can be greater than zero. The buyer can be risk-averse, with any increasing, differentiable utility function. As discussed in the Appendix, Section B, the timing of the events and the bargaining power of the agents can be altered without changing the result that debt is optimal. The optimal security described in Proposition 1 has an interesting comparative static. Define the “put option value” of a debt contract as the discounted difference between its maximum payoff $$\bar{v}$$ and its expected value. Proposition 1 states that \begin{equation} P.O.V.=\beta_{b}\bar{v}-\beta_{b}E^{p(\eta^{*})}[s^{*}]=\kappa\theta.\label{eq:POV} \end{equation} (3.2) When the constant $$\theta$$ is large, meaning that it is costly for the seller to change the distribution, the put option will have a high value. Similarly, when the gains from trade, $$\kappa$$, are high, the put option will have a high value. For all distributions $$q$$, a higher put option value translates into a higher “strike” of the option, $$\bar{v}$$, although the exact mapping depends on the distribution $$q$$ and the sample space $$\Omega$$. Restated, when the agents know that the moral hazard is small, or that the gains from trade are large, they will use a large amount of debt, resulting in a riskier debt security.14 In this section, I have shown that using the KL divergence cost function leads to debt securities as the optimal contract. In the next section, I consider alternative cost functions, applying the intuitions developed in this section. 4. The Non-Parametric Model with Invariant Divergences In this section, I analyse more general classes of divergences as cost functions. First, I will show that among the $$f$$-divergences, the Kullback-Leibler divergence is the only divergence that always results in debt as the optimal security design, allowing for non-monotone security designs, but there are many $$f$$-divergences for which the optimal monotone security design is always a debt security. Second, in the particular case of the $$\alpha$$-divergences, which are a subset of the $$f$$-divergences, I show that the optimal contract is, for some parameter values, a mix of debt and equity. I assume that the cost function is proportional to an $$f$$-divergence (equation 2.4): \[ \psi(p)=\theta D_{f}(p||q), \] with an associated $$f$$ function that is continuous on $$[0,\infty)$$ and twice-differentiable on $$(0,\infty)$$. These divergences are analytically tractable because they are additive separable. That is, the cost of choosing some $$p^{i}$$ is not affected by value of $$p^{j},\:j\neq i$$, except through the constraint that probability distributions must add up to one. In some cases, such as the Hellinger distance or KL divergence, the seller’s choice of $$p$$ is guaranteed to be interior, but this is not true for all $$f$$-divergences. Among this family of divergences, the KL divergence is special. Proposition 2. In the non-parametric model, with an $$f$$-divergence cost function, if the optimal security design is debt for all sample spaces $$\Omega$$ and zero-cost probability distributions $$q$$, then that $$f$$-divergence is the Kullback-Leibler divergence. Proof. See Online Appendix Section 3.2. ǁ The statement of Proposition 2 shows that the KL divergence is special, in the sense that it is the only continuous and twice-differentiable $$f$$-divergence that always results in debt as the optimal security design. The proof uses a perturbation argument, similar to the one in the previous section. Suppose that the solution to the moral hazard problem is interior. The first-order condition in the moral hazard problem, for an arbitrary $$f$$-divergence and some $$i>0$$, is \[ \eta_{i}=\theta(f'(\frac{p^{i}(\eta)}{q^{i}})-f'(\frac{p^{0}(\eta)}{q^{0}})). \] The analogue of the difference of elasticities equation used in the previous section (equation 3.1) is \[ f''(\frac{p^{i}(\eta)}{q^{i}})\frac{p^{i}(\eta)}{q^{i}}\frac{\partial\ln(p^{i}(\eta))}{\partial\eta_{i}}-f'' (\frac{p^{0}(\eta)}{q^{0}})\frac{p^{0}(\eta)}{q^{0}}\frac{\partial\ln(p^{0}(\eta))}{\partial\eta_{i}}=\theta^{-1}. \] For the KL divergence, with $$f(u)=u\ln u-u+1$$, we have $$uf''(u)=1$$, and this equation reduces to the one introduced previously. For any other $$f$$-divergence, these terms are not constant. There is still a perturbation to the retained tranche that changes the probabilities $$p^{i}$$ and $$p^{j}$$, leaving all other probabilities unchanged. Suppose that we increase $$\eta_{i}$$ by $$\frac{\epsilon}{q^{i}}f''(\frac{p^{i}(\eta^{*})}{q^{i}})$$, and decrease $$\eta_{j}$$ by $$\frac{\epsilon}{q^{j}}f''(\frac{p^{j}(\eta^{*})}{q^{j}})$$. Using the same logic described in the previous section, this perturbation increases $$p^{i}$$ by $$\theta^{-1}\epsilon$$ and decreases $$p^{j}$$ by the same amount, leaving all other probabilities unchanged. Now suppose that a debt contract is the optimal security design, for an arbitrary $$f$$-divergence, and that there are two states associated with the flat part of the debt contract, $$i$$ and $$j$$. Consider, as before, a perturbation that decreases the value of the security in state $$i$$, while increasing the value of the security in state $$j$$, so that the values of the retained tranche, $$\eta_{i}$$ and $$\eta_{j}$$, change as described in the previous paragraph. Note that, because we have assumed that the states $$i$$ and $$j$$ are associated with the flat part of the debt contract, this perturbation is feasible. By construction, the “indirect effect” (see equation 2.3) of this perturbation is zero. The probability of state $$i$$ increases by $$\theta^{-1}\epsilon$$, while the probability of state $$j$$ decreases by the same amount, and we have assumed that $$s_{i}=s_{j}$$. However, the “direct effect” is not necessary zero. We have \begin{equation} \frac{\partial U(\eta(\epsilon))}{\partial\epsilon}=\kappa(\frac{p^{j}(\eta^{*})}{q^{j}}f'' (\frac{p^{j}(\eta^{*})}{q^{j}})-\frac{p^{i}(\eta^{*})}{q^{i}}f''(\frac{p^{i}(\eta^{*})}{q^{i}})).\label{eq:f-perturb} \end{equation} (4.1) Of course, if $$uf''(u)$$ is constant, then this effect is also zero (the KL divergence case). However, in general this will not be the case, and either this perturbation or the “reverse” perturbation (with respect to the states $$i$$ and $$j$$) can improve the seller’s utility. The proof of Proposition 2 finishes the argument by constructing samples spaces $$\Omega$$ and zero-cost distributions $$q$$ such that, for debt to always be optimal, $$uf''(u)$$ must be constant for all $$u\in[0,\infty)$$. This result depends crucially on the possibility of non-monotone security designs. I have argued in the introduction that, in the context of securitization, there is no particular reason to think that security designs must be monotone. However, in other contexts, following many papers in the security design literature, it may be appropriate to require that security designs result in payoffs that are weakly increasing for both the buyer and the seller. If we impose this assumption, the perturbation logic described above leads to a very different conclusion—that debt securities are optimal as long as $$uf''(u)$$ is weakly decreasing in $$u$$. I will say that a security design is weakly monotone for the buyer if $$v_{j}\geq v_{i}$$ implies that $$s_{j}\geq s_{i}$$. Suppose that $$v_{j}\geq v_{i}$$ and $$s_{j}=s_{i}$$. In this case, $$\eta_{j}\geq\eta_{i}$$, and therefore, by the seller’s first-order condition and the convexity of the $$f$$ function, $$\frac{p^{j}(\eta)}{q^{j}}\geq\frac{p^{i}(\eta)}{q^{i}}.$$ That is, because the seller’s payoff is higher in state $$j$$ than in state $$i$$, she acts to increase the likelihood of state $$j$$ relative to state $$i$$. If $$uf''(u)$$ is weakly decreasing in $$u$$, the perturbation analysed in equation 4.1 (increasing $$s_{j}$$ and decreasing $$s_{i}$$), starting from a debt security design, reduces the seller’s welfare. Because of the requirement that security designs be monotone, the reverse perturbation (decreasing $$s_{j}$$ and increasing $$s_{i}$$) is not feasible. As a result, there is no feasible perturbation that can increase welfare, and debt is optimal. The corollary below summarizes the result: Corollary 2. In the non-parametric model, with an $$f$$-divergence cost function such that $$uf''(u)$$ is weakly decreasing in $$u$$, if security designs are required to be monotone for the buyer, then the optimal security design is debt, selling nothing, or selling everything, for all sample spaces $$\Omega$$ and zero-cost probability distributions $$q$$. Proof. See Online Appendix Section 3.3. ǁ The result of Proposition 2 raises another question: absent monotonicity constraints, what are the optimal security designs with this class of cost functions? The logic of the perturbation argument above leads us to conclude that the function $$uf''(u)$$ plays a critical role in determining the shape of the contract. For a particular sub-class of $$f$$-divergences, the $$\alpha$$-divergences, the resulting optimal contracts are easy to characterize. Recall that, for the $$\alpha$$-divergences, \[ f(u)=\frac{4}{1-\alpha^{2}}(1-u^{\frac{1}{2}(1-\alpha)}+\frac{1}{2}(1-\alpha)(u-1)). \] For these divergences, when $$\alpha<-1$$, it is possible that the seller will set $$p^{i}=0$$ for some $$i\in\Omega$$. The proof of Proposition 3 deals with this possibility; in the main text, I will assume that $$p(\eta)$$ is interior in the neighbourhood of the optimal security design. It follows from the iso-elastic nature of these $$f$$-functions that \[ uf''(u)=1-\frac{1+\alpha}{2}f'(u). \] The first-order condition of the moral hazard problem implies that, for any retained tranche $$\eta$$, \[ \frac{p^{j}(\eta)}{q^{j}}f''(\frac{p^{j}(\eta)}{q^{j}})-\frac{p^{i}(\eta)}{q^{i}}f'' (\frac{p^{i}(\eta)}{q^{i}})=\frac{1+\alpha}{2}\theta^{-1}(\eta_{i}-\eta_{j}). \] Consider the same perturbation discussed above: increasing $$\eta_{i}$$ by $$\frac{\epsilon}{q^{i}}f''(\frac{p^{i}(\eta^{*})}{q^{i}})$$, and decreasing $$\eta_{j}$$ by $$\frac{\epsilon}{q^{j}}f''(\frac{p^{j}(\eta^{*})}{q^{j}})$$. Suppose that this is feasible. As discussed above, this will increase $$p^{i}$$ by $$\theta^{-1}\epsilon$$ and decrease $$p^{j}$$ by the same amount. If the security is not flat, the “indirect effect” is non-zero: \[ \beta_{b}\sum_{i,j\in\Omega}s_{j}^{*}\frac{\partial p^{j}(\eta)}{\partial\eta_{i}}|_{\eta=\eta^{*}}\frac{\partial\eta_{i}} {\partial\epsilon}|_{\epsilon=0^{+}}=\theta^{-1}\beta_{b}(s_{i}-s_{j}). \] Similarly, as argued above, the “direct effect” is non-zero: \begin{align*} -\kappa\sum_{i\in\Omega}p^{i}(\eta^{*})\frac{\partial\eta_{i}}{\partial\epsilon}|_{\epsilon=0^{+}} & =\kappa(\frac{p^{j}(\eta^{*})}{q^{j}}f''(\frac{p^{j}(\eta^{*})}{q^{j}})-\frac{p^{i}(\eta^{*})}{q^{i}}f''(\frac{p^{i}(\eta^{*})}{q^{i}}))\\ & =\kappa\frac{1+\alpha}{2}\theta^{-1}(\eta_{i}-\eta_{j}). \end{align*} It follows that if \[ \frac{\beta_{s}(s_{i}-s_{j})}{\eta_{i}-\eta_{j}}=-\frac{\kappa}{1+\kappa}\frac{1+\alpha}{2}, \] the indirect and direct effects will cancel, and this perturbation will not change the utility in the security design problem. For the optimal security, for all $$i,j\in\Omega$$ such that the limited liability constraints do not bind, the relative slopes of the security and retained tranche are the same. For the $$\alpha$$-divergences, the optimal contracts will be straight lines wherever the limited liability constraints do not bind. When $$\alpha=-1$$ (the KL divergence case), we recover the result that the optimal contract is flat when the constraints do not bind. For $$\alpha<-1$$, the required constant is positive, which implies that both the security design and the retained tranche are upward sloping (in the region where the limited liability constraints do not bind). When $$\alpha>-1$$, the required constant is negative, implying a downward sloping (and therefore non-monotone) security design. These are the $$\alpha$$-divergences for which $$uf''(u)$$ is decreasing in $$u$$. If the security design was required to be monotone, Corollary 2 would apply, and debt (or selling everything/nothing) would be optimal. The proposition below summarizes these ideas, describing the optimal contract for all $$\alpha$$. Proposition 3. Define $$s_{\alpha,i}$$ as the optimal security design for the problem with an $$\alpha$$-divergence cost function. If $$\alpha<1+\frac{2}{\kappa}$$, there exists a constant $$\bar{v}\geq0$$ such that \[ s_{\alpha,i}=\begin{cases} v_{i} & if\;v_{i}<\bar{v}\\ \max[-\frac{\kappa(1+\alpha)}{2+\kappa(1-\alpha)}(v_{i}-\bar{v})+\bar{v},0] & if\;v_{i}\geq\bar{v}. \end{cases} \] If $$\alpha\geq1+\frac{2}{\kappa}$$, the optimal security design is the “live-or-die” contract, \[ s_{\alpha,i}=\begin{cases} v_{i} & if\;v_{i}<\bar{v}\\ 0 & if\;v_{i}>\bar{v}. \end{cases} \] When $$\alpha<-3$$, $$\bar{v}$$ is strictly greater than zero. In all of these cases, if the highest possible asset value is sufficiently large ($$v_{N}>\sum_{i}q^{i}v_{i}+\frac{\kappa}{\beta_{b}}\theta$$), then $$\bar{v}<v_{N}$$. Proof. See Online Appendix Section 3.10. ǁ The optimal security design can be thought of as a mixture of debt and equity (at least when $$\alpha\leq-1$$), whose slope is determined by the gains from trade parameter $$\kappa$$ and the parameter $$\alpha$$. For any $$\alpha>-1$$, the optimal contract is non-monotonic, first increasing up to $$\bar{v}$$, then decreasing, and finally paying the buyer zero for the highest asset values. In Figure A.2, in the Appendix, I illustrate the different optimal security designs associated with varying values of $$\alpha$$, holding $$\bar{v}$$ fixed. In Corollary 1 below, I decompose the effects of any perturbation into direct and indirect effects, and then further decompose the indirect effects into effort and risk-shifting components. As in the KL divergence case (Corollary 1), expectations, variances, and covariances appear in these expressions. However, the variances and covariances are taken under a probability distribution $$\hat{p}$$, which is a sort of weighted average of the probability distributions $$p^{*}(\eta)$$ and $$q$$, for which the weights depend on the parameter $$\alpha$$. Because the optimal security designs are monotone for the seller, when $$\alpha>-1$$, $$\hat{p}(p(\eta^{*}))$$ places more mass on the best states of the world than $$p(\eta^{*})$$. In this case, the indirect effect is larger relative to the direct effect, when compared with $$\alpha=-1$$ (the KL divergence case). Put another way, the moral hazard concerns are larger relative to the gains from trade. As a result, the optimal security design gives less to the buyer than a debt contract in the best states of the world. When $$\alpha<-1$$, the reverse is true—in the best states of the world, $$p(\eta^{*})>\hat{p}(p(\eta^{*}))$$, and the direct effect is larger relative to the indirect effect, when compared with $$\alpha=-1$$. In this case, the gains from trade are larger relative to the moral hazard concerns in the best states, and the optimal security design gives more cashflows to the buyer than a debt contract. That is, the parameter $$\alpha$$ influences the balance of concern about gains from trade and moral hazard across the various states. This effect occurs because the parameter $$\alpha$$ controls the way the curvature of the cost function changes as the seller moves $$p(\eta)$$ away from $$q$$. Recall that, for all $$f$$-divergences, including the $$\alpha$$-divergences, we normalized the $$f$$ function so that $$f''(1)=1$$. For the $$\alpha$$-divergences, we have \begin{equation} f_{\alpha}'''(1)=-\frac{1}{2}(\alpha+3).\label{eq:alpha-div-third-order} \end{equation} (4.2) When $$\alpha$$ is large, the cost function becomes less curved as $$p^{i}$$ becomes large relative to $$q^{i}$$, and more curved as $$p^{i}$$ becomes small relative to $$q^{i}$$. In the best states of the world, the seller increases $$p^{i}$$ relative to $$q^{i}$$ under the optimal contract. Therefore, if a perturbation increased the variance of the security design in the best states of the world, the seller would easily be able to alter her actions in response. In contrast, when $$\alpha$$ is small, the increasing curvature of the cost function in the best states of the world prevents the seller from responding to perturbations that affect those states. Corollary 3. Define \[ \hat{\theta}(p)=\theta(\sum_{j\in\Omega}(p^{j})^{\frac{1}{2}(\alpha+3)}(q^{j})^{-\frac{1}{2}(\alpha+1)})^{-1}, \] \[ \hat{p}^{i}(p)=\frac{\hat{\theta}(p)}{\theta}(p^{i})^{\frac{1}{2}(\alpha+3)}(q^{i})^{-\frac{1}{2}(\alpha+1)}. \] With an $$\alpha$$-divergence cost function, the effect of any perturbation can be written as \[ \frac{\partial U(\eta(\epsilon))}{\partial\epsilon}|_{\epsilon=0^{+}}=\underbrace{\kappa\frac{\partial}{\partial\epsilon}E^{p(\eta^{*})}[\beta_{s}s(\epsilon)]|_{\epsilon=0^{+}}}_{\textit{direct effect}}-\underbrace{\frac{1}{2}\frac{\beta_{b}}{\beta_{s}}\hat{\theta}(p(\eta^{*}))^{-1}\frac{\partial}{\partial\epsilon}V^{\hat{p}(p(\eta^{*}))}[\beta_{s}s(\epsilon)]|_{\epsilon=0^{+}}}_{\textit{indirect effect}}. \] If the solution to the seller’s moral hazard problem is interior, the indirect effect can be decomposed into an effort-only effect and a risk shifting effect, \begin{align*} &\frac{\beta_{b}}{\beta_{s}}(1-\gamma(\eta^{*}))\frac{de(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}}=\hat{\theta}(p^{*}(\eta))^{-1}\frac{\beta_{b}}{\beta_{s}}(1-\gamma(\eta^{*}))\frac{\partial}{\partial\epsilon}Cov^{\hat{p}(p(\eta^{*}))}(\eta(\epsilon),\beta_{s}v)|_{\epsilon=0^{+}},&\\ &-\!\frac{\beta_{b}}{\beta_{s}}\sum_{j\in\Omega}\frac{dp^{j}(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}}(\eta_{j}^{*}\!-\!\gamma(\eta^{*})\beta_{s}v_{j}) \!=\!-\frac{1}{2}\hat{\theta}(p^{*}(\eta))^{-1}\!\frac{\beta_{b}}{\beta_{s}}\frac{\partial}{\partial\epsilon}\!V^{\hat{p}(p(\eta^{*}))}[\eta(\epsilon)\!-\!\gamma(\eta^{*})\beta_{s}v]|_{\epsilon=0^{+}}\!.& \end{align*} Proof. See Online Appendix Section 3.6. ǁ This decomposition provides an additional perspective on why contracts with low values of $$\alpha$$ end up “equity-like” in the best states of the world. For these cost functions, $$\hat{p}(p(\eta^{*}))$$ places low weight on the best states of the world. As a result, the increased effort that results from an alignment of the seller’s incentives and the asset value in those states (the covariance term in Corollary 3) is small. The risk-shifting that occurs because the seller’s retained tranche does not resemble an equity claim (the variance term in Corollary 3) in those states is also small. It is therefore efficient to give more of the cashflows to the buyer in the best states than it would be under the KL divergence cost function, because the gains from trade effects are larger than the moral hazard effects, and this results in an increasing, equity-like security design. In the next section, I will show that these results—the optimality of a mixture of equity and debt for the alpha divergences and the notion of a mean-variance tradeoff for the security design problem—apply in an approximate sense to a much larger class of cost functions. 5. Approximations In this section, I will discuss “approximately optimal” security designs. The approximation is motivated by the following observation: for the optimal security designs with an $$\alpha$$-divergence cost function (Proposition 3), the slope of the security design depends on the gains from trade $$\kappa$$ and the parameter $$\alpha$$. In many applications, the percentage gains from trade might be quite small. For example, in the context of collateralized loan obligations, Nadauld and Weisbach (2012) estimates the cost of capital advantage due to securitization at $$17$$ basis points per year. Assuming a five-year maturity, this would imply that the buyer’s valuation of the security is roughly 1% higher than the seller valuation of the security. This finding accords with intuition—there are many economic forces (the availability of substitute securities that both the buyer and seller can trade, entry into the securitization business) that act to diminish differences in valuations. For example, suppose the cost function is the $$\chi^{2}$$-divergence ($$\alpha=-3$$), and the gains from trade are 1%. In this case, the slope of the “equity portion” of the optimal security is \[ -\frac{\kappa(1+\alpha)}{2+\kappa(1-\alpha)}=\frac{0.02}{2+0.04}\approx 1%. \] The optimal contract is a debt plus a roughly 1% equity claim for the buyer; intuitively, a standard debt contract cannot be substantially worse, from a welfare perspective. This argument used a specific cost function, but the point holds generally—unless the curvature of the cost function changes rapidly ($$\alpha$$ is very large or small), the optimal security designs will resemble debt. This argument leads to a second observation: that in models with small gains from trade, which nevertheless result in a large quantity of trade, the moral hazard must also, in some sense, be small. Recall that we normalized the problem so that the expected value of the assets, if the seller retains everything, is one. Suppose that the moral hazard is large (e.g. that the expected value of the assets, if the seller retains nothing, is one-half). If the gains from trade are 1%, then no trade is much better than selling everything. Inevitably, the optimal security design in this case will be close to selling nothing. In the example of securitization, this is counterfactual; a substantial portion of the value of the underlying assets is sold in most securitizations. This leads us to the conclusion that the moral hazard must also be small, in the sense that the difference in the seller’s effort between when she sells everything and when she sells nothing must be of similar magnitude to the gains from trade. In the case of securitization, this is consistent with empirical estimates (see the Appendix, Section C). The smallness of the moral hazard means that poorly designed contracts cannot destroy entirely the value of the assets; however, they can destroy entirely the gains from trade. It does not mean that moral hazard is unimportant. In the calibration for mortgage securitization in the Appendix, Section C, I find that using the “right” security design can substantially increase the profitability of securitization. In this section, I will show that, depending on the relative size of the moral hazard and gains from trade, no trade, trading everything, and many securities in between are consistent with both the moral hazard and gains from trade being small. In other words, the moral hazard can be small relative to the notional (asset) value being traded, but large relative to the profitability of trade, and the latter comparison will determine whether moral hazard impedes trade. Formally, the approximations I consider are first- and second-order expansions of the utility function in the security design problem. I approximate the utility of using an arbitrary security design $$s$$, relative to selling nothing, to first or second order in $$\theta^{-1}$$ and $$\kappa$$. When $$\theta^{-1}$$ is small, and therefore $$\theta$$ is large, it is difficult for the seller to change $$p$$. When $$\kappa$$ is small, the gains from trade are low. I take this approximation around the limit point $$\theta^{-1}=\kappa=0$$. This approximation applies when $$\theta^{-1}$$ and $$\kappa$$ are small but positive, consistent with the arguments above. The limit point itself is degenerate; because there is no moral hazard and no gains from trade, the security design does not matter. However, near the limit point (where the approximation applies), this is not the case; some security designs are better than other security designs. The relevance of the approximation will depend on whether $$\theta^{-1}$$ and $$\kappa$$ are small enough, relative to the higher order terms of the utility function, for those terms to be negligible. This is a question that can only be answered in the context of a particular application. In the Appendix, Section C, I discuss a calibration of the model relevant to mortgage origination, for which the approximation is accurate. The results of this section apply to all invariant divergences, a class which includes all of the $$f$$-divergences, and therefore the KL divergence and the $$\alpha$$-divergences. This class also includes divergences, such as the Chernoff and Bhattacharyya distances, that are not additively separable. Using the approximation described above, I show that debt securities achieve, up to first order, the same utility as the optimal security design, for any invariant divergence cost function. Moreover, only debt contracts have this property, and it arises through the mean-variance intuition discussed in the previous section. I also show that the optimal contracts corresponding to the $$\alpha$$-divergences achieve, up to second order, the same utility as the optimal security design, for any invariant divergence cost function. This also follows from the mean-variance intuition discussed previously. To further develop the intuition behind this result, consider the $$f$$-divergences. For any $$f$$-divergence, we can approximate the divergence to third order around $$p=q$$ as \[ \sum_{i\in\Omega}q^{i}f(\frac{p^{i}}{q^{i}})\approx\sum_{i\in\Omega}q^{i}(\frac{1}{2}(\frac{p^{i}}{q^{i}}-1)^{2}-\frac{1}{12}(\alpha+3)(\frac{p^{i}}{q^{i}}-1)^{3}), \] where we have defined $$\alpha$$ to satisfy \[ f'''(1)=-\frac{1}{2}(\alpha+3). \] This definition of $$\alpha$$ extends the relationship between the third derivative of the $$f$$ functions and the parameter $$\alpha$$ of the $$\alpha$$-divergences (equation 4.2) to a definition of the parameter $$\alpha$$ for all $$f$$-divergences. The Taylor expansion shows that, up to third order, any $$f$$-divergence can be approximated by an $$\alpha$$-divergence. Additionally, up to second order, the $$\alpha$$ parameter plays no role, and all $$f$$-divergences, including the KL divergence, are identical. I will show that, for every $$f$$-divergence, the optimal contract associated with that $$f$$-divergence and the optimal contract for the KL divergence (debt) achieve, up to second order, the same utility in the security design problem. Moreover, the optimal contract associated with that $$f$$-divergence and the optimal contract for an $$\alpha$$-divergence (with $$\alpha$$ defined as above) achieve the same utility up to third order. . A different way to view the same results is through the lens of the perturbation argument employed in the previous section. The indirect effect of the perturbation is governed by \[ \frac{p^{j}(\eta^{*})}{q^{j}}f''(\frac{p^{j}(\eta^{*})}{q^{j}})-\frac{p^{i}(\eta^{*})}{q^{i}}f''(\frac{p^{i}(\eta^{*})}{q^{i}}). \] Using the first-order condition in the moral hazard problem, one can observe that as $$\theta$$ becomes large ($$\theta^{-1}$$ small), holding the retained tranche $$\eta$$ fixed, $$p(\eta)$$ converges to $$q$$. Intuitively, as it becomes increasing costly for the seller to keep $$p$$ away from $$q$$, she responds by moving $$p$$ closer to $$q$$. In the limit, $$p$$ reaches $$q$$, and the indirect effect of the utility perturbation is zero. With the KL divergence, the indirect effect is always zero. When the indirect effect is zero, the perturbation argument described in Section 2 applies, and the optimal contract is debt. The proposition and corollary below make these arguments formally. The argument above, including a definition of the parameter $$\alpha$$, can be extended to all invariant divergences, not just additively separable ones, using the results of Čencov (2000) (see Online AppendixLemma 3). Up to third order, all invariant divergences with continuous third derivatives are equivalent to an $$\alpha$$-divergence. The proposition below formalizes the approximation results. I consider a third-order asymptotic expansion of the security design problem utility, $$U(s;\theta^{-1},\kappa)$$, around the point $$\theta^{-1}=\kappa=0$$, holding $$\beta_{s}$$ fixed as $$\kappa$$ changes. As in previous sections, the proposition applies to small perturbations of the security design, in the neighbourhood of the optimal security design.15 Proposition 4. In the non-parametric model, with a smooth, convex, invariant divergence cost function, the effects of any security design perturbation (equation 2.3) are, up to second order, \begin{eqnarray*} &&\frac{\partial U(\eta(\epsilon))}{\partial\epsilon}|_{\epsilon=0^{+}}\nonumber\\ &&\qquad= \underbrace{\kappa\frac{\partial}{\partial\epsilon}E^{p(\eta^{*})}[\beta_{s}s(\epsilon)]|_{\epsilon=0^{+}}}_{\textit{direct effect}}-\underbrace{\frac{1}{2}(1+\kappa)\theta^{-1}\frac{\partial}{\partial\epsilon}V^{\tilde{p}(p(\eta^{*}))}[\beta_{s}s(\epsilon)]|_{\epsilon=0^{+}}}_{\textit{indirect effect}}+O(\theta^{-3}+\kappa\theta^{-2}), \end{eqnarray*} where \[ p^{i}(\eta)=q^{i}+\theta^{-1}q^{i}\cdot(\eta_{i}-\sum_{j\in\Omega}q^{j}\eta_{j})+O(\theta^{-2}) \] and \[ \tilde{p}^{i}(p(\eta))=q^{i}+(\frac{3+\alpha}{2})(p^{i}(\eta)-q^{i}). \] To first order, \[ \frac{\partial U(\eta(\epsilon))}{\partial\epsilon}|_{\epsilon=0^{+}}=\underbrace{\kappa\frac{\partial}{\partial\epsilon}E^{q}[\beta_{s}s(\epsilon)]|_{\epsilon=0^{+}}}_{\textit{direct effect}}-\underbrace{\frac{1}{2}\theta^{-1}\frac{\partial}{\partial\epsilon}V^{q}[\beta_{s}s(\epsilon)]|_{\epsilon=0^{+}}}_{\textit{indirect effect}}+O(\theta^{-2}+\kappa\theta^{-1}). \] Proof. See Online AppendixSection 3.9. ǁ The accuracy of the approximation that both the moral hazard and gains from trade are small will vary by application. The generality of Proposition 4, which holds for all sample spaces, zero-cost distributions, and invariant divergences, suggests that as long as the moral hazard is not too large, the agents can neglect the details of the cost function. The first-order and second-order results of Proposition 4 are reminiscent of the perturbation results (Corollary 3) described in the previous sections. In both cases, the direct effect is the change in the expected value under an fixed, possibly endogenous probability distribution, and the indirect effect is the change in the variance under another fixed, endogenous probability distribution. To first order, and to second order when $$\alpha=-1$$, the two probability distributions are the same. This was also the case under the KL divergence, and as a result, debt securities are always first-order optimal, and second-order optimal when $$\alpha=-1$$. When $$\alpha\neq-1$$, the probability distributions are different, as in the general case of $$\alpha$$-divergences. In this case, the optimal security design for that $$\alpha$$-divergence will be the second-order optimal security design. Corollary 4. Under the assumptions of Proposition 3, there exists a debt security, $$s_{debt}$$, for which the difference between the utility achieved by $$s_{debt}$$ and the optimal security $$s^{*}$$ is second order: \[ U(s^{*};\theta^{-1},\kappa)-U(s_{debt};\theta^{-1},\kappa)=O(\theta^{-2}+\kappa\theta^{-1}). \] Under those same assumptions, there exists a security, $$s_{\alpha}$$, that is the optimal security design for an $$\alpha$$-divergence cost function (Proposition 3), for which the difference between the utility achieved by $$s_{\alpha}$$ and $$s^{*}$$ is third order: \[ U(s^{*};\theta^{-1},\kappa)-U(s_{\alpha};\theta^{-1},\kappa)=O(\theta^{-3}+\kappa\theta^{-2}). \] Proof. See Online AppendixSection 3.10. ǁ The results for first-order and second-order optimal security designs can be summarized as a type of “pecking order” theory (when $$\alpha\geq-1$$). When the moral hazard and gains from trade are small, the agents can use debt contracts. As the stakes grow larger, so that both the moral hazard and gains from trade are bigger concerns, the agents can use a mix of debt and equity. For very large stakes, the security design will depend on the precise nature of the moral hazard problem. The result of Corollary 4 shows that when the gains from trade and moral hazard are small, but not zero, debt is approximately optimal in a way that other security designs are not. In the Appendix, Figure A.3, I illustrate this idea. I assume an $$\alpha$$-divergence cost function, with $$\alpha=-7$$, which results in an optimal contract that is a mixture of debt and equity. I plot the utility of this optimal contract, as well as the best debt contract and best equity contract, relative to selling everything, for different values of $$\theta$$, with $$\kappa=\bar{\kappa}\theta^{-1}$$. As $$\theta$$ becomes large, all security designs converge to the same utility. For intermediate values of $$\theta$$, the best debt contract achieves nearly the same utility as the optimal contract, which is what the first-order approximation results show. For low values of $$\theta$$, the gap between the optimal debt contract and optimal contract grows. It is important to emphasize that the securities described in Corollary 4 are not degenerate; the debt security that is first-order optimal will not, in general, be selling everything or selling nothing. The level of the debt will be determined by the probability distribution $$q$$ and the product of $$\kappa$$ and $$\theta$$, as described in Proposition 1. The approximation I have employed assumes that $$\kappa$$ is small and $$\theta$$ is large, but makes no assumption about their product. If the gains from trade are large relative to the moral hazard ($$\kappa\theta$$ large), the level of the debt will be high. If the moral hazard is large relative to the gains from trade ($$\kappa\theta$$ small), the level of the debt will be small. As in the previous sections, we can decompose the “indirect effects” of changing the security design, which are captured by the variance term in the mean-variance tradeoff described in Proposition 4, into effort and risk-shifting components, as described by Lemma 1. Corollary 5. Under the assumptions of Proposition 3, the indirect effect can be decomposed into an effort-only effect and a risk shifting effect, \begin{align*} \frac{\beta_{b}}{\beta_{s}}(1-\gamma(\eta^{*}))\frac{de(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}} & =\theta^{-1}(1+\kappa)(1-\gamma(\eta^{*}))\frac{\partial}{\partial\epsilon}Cov^{\tilde{p}(p(\eta^{*}))}(\eta(\epsilon),\beta_{s}v)|_{\epsilon=0^{+}}\\ & +O(\theta^{-3}+\kappa\theta^{-2}), \end{align*} \begin{align*} -\frac{\beta_{b}}{\beta_{s}}\sum_{j\in\Omega}\frac{dp^{j}(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}}(\eta_{j}^{*}-\gamma(\eta^{*})\beta_{s}v_{j}) & =-\frac{1}{2}\theta^{-1}(1+\kappa)\frac{\partial}{\partial\epsilon}V^{\tilde{p}(p(\eta^{*}))}[\eta(\epsilon)-\gamma(\eta^{*})\beta_{s}v]|_{\epsilon=0^{+}}\\ & +O(\theta^{-3}+\kappa\theta^{-2}). \end{align*} Proof. The corollary follows from Proposition 4 and the proof of Corollary 3. ǁ The intuition discussed in the previous section holds. To first order, the effort and risk-shifting effects are the covariance and variance under the probability distribution $$q$$. To second order, the relevant probability distribution is distorted, in a direction that depends on whether $$\alpha$$ is greater than or less than negative one. The exact and approximate results of the last two sections apply to non-parametric models, in which the seller can choose any distribution. In the Online Appendix, Sections 1 and 2, I analyse parametric models using similar methods. In the next two sections of the article, I will discuss continuous time models of effort. I will show that these models are essentially equivalent to the non-parametric models analysed thus far. As a result, the optimality of debt and the intuitions about mean-variance tradeoffs apply in to these models as well. These sections can also be thought of as providing a micro-foundation for the static models discussed thus far. 6. Dynamic Moral Hazard In this section, I will analyse a continuous time effort problem. This problem is closely connected to the static models discussed previously. The role of this section is to explain how an agent could “choose a distribution”, and show that the mean-variance intuition and optimality of debt discussed previously apply in dynamic models. I will study models in which the seller controls the drift of a Brownian motion. The contracting models I discuss are similar to those found in Holmström and Milgrom (1987), Schaettler and Sung (1993), and DeMarzo and Sannikov (2006), among others. The models can be thought of as the continuous time limit of repeated effort models,16 in which the seller has an opportunity each period to improve the value of the asset. Two recent papers are particularly relevant. The models I discuss are a special case of Cvitanić et al. (2009). I build on the results of Bierkens and Kappen (2014), who study a single-agent control problem ($$e.g.$$ the seller’s moral hazard problem) with quadratic effort costs, and show that it is equivalent to a relative entropy minimization problem. Relative these papers, I make two contributions. First, I show that the entire class of models studied by Cvitanić et al. (2009) can be rewritten as a static, non-parametric security design problem. That is, the dynamic models discussed in this section can be thought of as providing a micro-foundation for the static problems discussed in the previous sections. For the particular case of quadratic costs and a risk-neutral seller, the results of Cvitanić et al. (2009) imply that debt is optimal. Combining my result with the results of Bierkens and Kappen (2014), dynamic models with quadratic effort costs are equivalent to static problems with a KL divergence cost function, which provides a different perspective on why debt is optimal in this setting. Second, I show that for convex, but not necessarily quadratic, cost functions, debt contracts are approximately optimal, and relate this to the intuitions discussed above. This can be viewed as a micro-foundation for the approximation results discussed in the previous section. The result is also useful because the optimal contracts in this case are quite complex; Cvitanić et al. (2009) study contracts without the limited liability constraint, and show that they depend on the entire path, not just the final value, of the state variables. There are, to my knowledge, no known results with limited liability. My results can be viewed as showing that, when the approximation is applicable, simple, non-path-dependent contracts are close to optimal. I will begin by describing the structure of the dynamic model. The timing follows the standard principal-agent convention. At time zero, the seller and buyer trade a security. Between times zero and one, the seller will apply effort (or not) to change the value of the asset. At time one, the asset value is determined and the security payoffs occur. Between times zero and one, the seller controls the drift of a Brownian motion. Define $$W$$ as a Brownian motion on the canonical probability space, $$(\Omega,\mathcal{F},\tilde{P})$$, and let $$\mathcal{F}_{t}^{W}$$ be the standard augmented filtration generated by $$W$$. Denote the asset value at time $$t$$ as $$V_{t}$$, and let $$\mathcal{F}_{t}^{V}$$ be the filtration generated by $$V$$. The seller observes the history of both $$W_{t}$$ and $$V_{t}$$ at each time, whereas the buyer observes (or can contract on) only the history of $$V_{t}$$. This information asymmetry creates the moral hazard problem. The initial value, $$V_{0}>0$$, is known to both the buyer and the seller. The asset value evolves as \[ dV_{t}=b(V_{t},t)dt+u_{t}\sigma(V_{t},t)dt+\sigma(V_{t},t)dW_{t}, \] where $$b(V_{t},t)$$ and $$\sigma(V_{t},t)>0$$ satisfy standard conditions to ensure that, conditional on $$u_{t}=0$$ for all $$t$$, there is a unique, everywhere-positive solution to this SDE.17 The seller’s control, $$u_{t}$$, should be thought of as instantaneous effort (and not “effort” in the sense of the effort/risk-shifting decomposition discussed earlier). There is a flow cost of instantaneous effort, a general form of which is $$g(t,V_{t},u_{t})$$. The function $$g(\cdot)$$ is weakly positive, twice-differentiable, and strictly convex in instantaneous effort. For all $$t$$ and $$V_{t}$$, $$g(t,V_{t},0)=0$$. Instantaneous effort always improves the expected value of the asset, holding future effort constant; that is, for all $$t$$ and $$V_{t}$$, $$E_{t}[V_{s}]$$ is increasing in $$u_{t}$$, for all $$s>t$$. In the most general formulation, the seller’s information set at each time $$t$$ consists of the current time, the histories of the Brownian motion $$W$$ and asset value $$V$$, the history of her past actions, and any public or private randomization devices she chooses to employ. Using this information, the seller could pursue pure or mixed strategies over instantaneous effort levels. However, for the models that I will discuss, it is without loss of generality to restrict the seller to strategies that are a function of the history of the asset values and time (see Cvitanić et al., 2009). Intuitively, the convexity of the cost of instantaneous effort makes mixed strategies sub-optimal. Moreover, the security is a function of the history of the asset values only. As a result, at any time $$t$$, if the seller intends to pursue an instantaneous effort strategy that is $$\mathcal{F}_{s}^{V}$$-measurable for all $$s>t$$, the optimal effort at time $$t$$ will be $$\mathcal{F}_{t}^{V}$$-measurable. Formally, I define the set of admissible strategies $$\mathscr{U}$$ as the set of $$\mathcal{F}_{t}^{V}$$-adapted, square-integrable controls such that $$E[\exp(4\int_{0}^{1}u_{s}dB_{s}-2\int_{0}^{1}u_{s}^{2}ds)]<\infty$$. The retained tranche, $$\eta(V)$$, is an $$\mathcal{F}_{1}^{V}$$-measurable random variable, meaning that it can depend on the entire path of the asset value. I continue to assume limited liability, meaning that $$\eta(V)\in[0,\beta_{s}V_{1}]$$ for all paths $$V$$. The seller’s indirect utility function can be written as \begin{equation} \phi_{CT}(\eta)=\sup_{\{u_{t}\}\in\mathscr{U}}\phi_{CT}(\eta;\{u_{t}\})=\sup_{\{u_{t}\}\in\mathscr{U}}\lbrace E^{\tilde{P}}[\eta(V)]-E^{\tilde{P}}[\int_{0}^{1}g(t,V_{t},u_{t})dt]\rbrace,\label{eq:ct-mh-eq} \end{equation} (6.1) where $$E^{\tilde{P}}$$ denotes the expectation at time zero under the physical probability measure.18 In summary, given the retained tranche, the seller chooses a time-consistent instantaneous effort strategy to control the drift of the asset value. The security design problem is similar to the security design problem in the previous sections. The seller internalizes the effects of the security design on the price that the buyer is willing to pay: \begin{align} U_{CT}(s^{*}) & =\sup_{s\in S}U_{CT}(s)\nonumber \\ & =\sup_{s\in S}\lbrace\beta_{b}E^{\tilde{P}}[s(V)]+\phi_{CT}(\eta)\rbrace,\label{eq:ct-sec-util-eq} \end{align} (6.2) where $$S$$ is the set of $$\mathcal{F}_{1}^{V}$$-measurable limited liability security designs and $$\eta(V)=\beta_{s}(V_{1}-s(V)).$$ In the proposition below, I show that this problem is equivalent to a static, non-parametric security design problem. Equivalent, in this context, means that the utility achieved by the seller in the continuous time problem, for any admissible security design, is equal to the utility achieved by that security in the static, non-parametric security design problem. Proposition 5. There exists a probability space $$(\Omega,\mathcal{F},Q)$$, Brownian motion $$B$$ defined on that probability space, and stochastic process \[ dX_{t}=b(X_{t},t)dt+\sigma(X_{t},t)dB_{t}, \] such that: (1)For all strategies $$u\in\mathscr{U}$$, there exists a measure $$P$$ under which the law of $$X$$ is equal to the law of $$V$$ under measure $$\tilde{P}$$. (2)For all securities $$s\in S$$, the indirect utility function satisfies \[ \phi_{CT}(\eta)=\sup_{P\in M}E^{P}[\eta(X)]-D_{g}(P||Q), \] where $$D_{g}$$ is a divergence and $$M$$ is the set of measures on the probability space that are absolutely continuous with respect to $$Q$$ and for which $$E^{Q}[(\frac{dP}{dQ})^{4}]<\infty.$$ (3)For all securities $$s\in S$$, if there is a unique maximizer $$P(\eta)=\arg\max_{P\in M}E^{P}[\eta(X)]-D_{g}(P||Q)$$, then security design utility function satisfies \[ U(s)=\beta_{b}E^{P(\eta)}[s(X)]+E^{P(\eta)}[\eta(X)]-D_{g}(P(\eta)||Q). \] Proof. See Online Appendix Section 3.14. The proposition relies on Girsanov’s theorem and the “weak formulation” results of Schaettler and Sung (1993) and Cvitanić et al. (2009). ǁ This proposition connects the dynamic problem introduced in this section to the static problems described in the previous sections. The intuition is that instantaneous effort strategies can be used to create any probability measure over outcomes, where an outcome is a path of the asset value. Given any point in time and history of the asset value, if the seller would like to make paths that move upward at this point more likely than paths that move downward, she can exert instantaneous effort. By doing this at each possible time and history, the seller can use her control to pick the relative likelihood of every possible path. Formally, this idea is captured by Girsanov’s theorem. These results also show that the decomposition of the seller’s actions into “effort” and “risk-shifting”, as described by Lemma 1, apply to these dynamic models as well. To prevent confusion, I will refer to the sort of effort described by Lemma 1 as “cumulative effort”, and continue to use the term “instantaneous effort” to refer to the control the seller uses. The distinction between cumulative effort and instantaneous effort is related to another important point: even though the agent does not control the instantaneous variance of the asset value process, she can “spread out” the probability measure over asset value paths, creating risk-shifting effects. The proof of the proposition shows that, for any measure $$P$$, there is a (stochastically) unique effort strategy that will create that measure. The divergence $$D_{g}(P||Q)$$ is the expected cumulative flow cost $$g(\cdot)$$ of this effort strategy. It satisfies the properties of a divergence—it is zero if $$P$$ is identical to $$Q$$, and positive otherwise. The measure $$Q$$ is the measure that corresponds to zero effort; if the agent exerts zero effort for all possible histories, the law of $$X$$ under measure $$Q$$ will be equal to the law of $$V$$ under measure $$\tilde{P}$$. One technical caveat is included in the third part of the proposition. Thus far, I have not made enough assumptions about asset value process to ensure that there is a unique optimal measure, $$P(\eta)$$, or that the seller’s utility is finite. When I discuss specific cost functions $$g$$ below, I will introduce additional assumptions about the asset value process to ensure utility is finite and that there is a unique measure that solves the moral hazard problem. I have rewritten the continuous time moral hazard problem as a static problem, in which the seller chooses a probability measure subject to a cost that is described by a divergence. In light of the results for static models, two questions immediately arise. First, is there a $$g(\cdot)$$ function such that $$D_{g}(P||Q)$$ is the Kullback-Leibler divergence, in which case a debt security will be optimal? Second, are there $$g(\cdot)$$ functions such that $$D_{g}(P||Q)$$ is an invariant divergence, in which case a debt security will be approximately optimal? The answer to the first question comes from the work of Bierkens and Kappen (2014) and the sources cited therein, who show that quadratic costs functions, $$g(t,X_{t},u_{t})=\frac{\theta}{2}u_{t}^{2}$$, lead to the KL divergence.19 Intuitively, it follows that the optimal security design is a debt security. This intuition is confirmed by specializing of the results of Cvitanić et al. (2009) to the case of a risk-neutral agent. For completeness, I present this result below, and include a proof in Appendix. The proof also demonstrates that the decomposition of a perturbation’s effects into direct and indirect effects, and the further decomposition of the indirect effect into cumulative effort and risk-shifting effects, discussed in previous sections, apply to these models as well. For the quadratic flow cost function, it is sufficient to assume that the asset value, in the absence of effort by the agent, satisfies $$E^{Q}[\exp(4\theta^{-1}X_{1})]<\infty$$, which ensures that utility is finite and that there is a unique optimal policy for the seller. Proposition 6. In the continuous time model, with the quadratic cost function, if $$E^{Q}[\exp(4\theta^{-1}X_{1})]<\infty$$, the optimal security design is a debt contract, \[ s(X)=\min(X,\bar{v}), \] for some $$\bar{v}>0$$. The decomposition of perturbations into direct and indirect effects applies: \[ \frac{\partial U_{CT}(\eta(X,\epsilon))}{\partial\epsilon}|_{\epsilon=0}=\underbrace{\kappa\frac{\partial}{\partial\epsilon}E^{P^{*}(\eta^{*})}[\beta_{s}s(X,\epsilon)]}_{\textit{direct effect}}-\underbrace{(1+\kappa)\frac{1}{2}\theta^{-1}\frac{\partial}{\partial\epsilon}V^{P^{*}(\eta^{*})}[\beta_{s}s(X,\epsilon)]}_{\textit{indirect effect}}. \] The effort/risk-shifting decomposition also applies: \[ \frac{\beta_{b}}{\beta_{s}}(1-\gamma(\eta^{*}))\frac{de(\eta(\epsilon))}{d\epsilon}|_{\epsilon=0^{+}}=\theta^{-1}\frac{\beta_{b}}{\beta_{s}}(1-\gamma(\eta^{*}))\frac{\partial}{\partial\epsilon}Cov^{P(\eta^{*})}[\eta(\epsilon),\beta_{s}X]|_{\epsilon=0^{+}}, \] \[ \frac{\beta_{b}}{\beta_{s}}\frac{d}{d\epsilon}E^{P(\eta(\epsilon))}[\eta^{*}(X)-\gamma(\eta^{*})\beta_{s}X]=\frac{1}{2}\theta^{-1}\frac{\beta_{b}}{\beta_{s}}\frac{\partial}{\partial\epsilon}V^{P(\eta^{*})}[\eta(\epsilon)-\gamma(\eta^{*})\beta_{s}X]|_{\epsilon=0^{+}}. \] Proof. See Online Appendix Section 3.15. The optimality of debt specializes Cvitanić et al. (2009). ǁ Debt is the optimal security design in the continuous time model for same reasons it is optimal in the non-parametric model. The perturbation used in Section 3 applies with minor modifications. The intersection of these results with Holmström and Milgrom (1987) is intuitive. In the principal-agent framework, when the asset value Ito process is an arithmetic Brownian motion and the flow cost function is quadratic, without limited liability, a constant security for the principal is optimal. With limited liability, in the security design framework, optimal security simply reduces the constant payoff where necessary, and debt is optimal. The debt security design may or may not be renegotiation-proof. Suppose that at some point, say time $$t=\frac{1}{2}$$, the seller can offer the buyer a restructured security. Assume that at this time, there are no gains from trade (otherwise, if the asset value has increased, the seller will “lever up” and sell more debt to the buyer). If the current asset value is low enough, the debt security provides little incentive for the seller to continue putting in effort in the future. In this state, the buyer might agree to “write down” the debt security, even though he cannot receive any additional payments from the seller, because the buyer’s gains from increased effort by the seller could more than offset the loss of potential cash flows. In this model, write-downs can be Pareto-efficient if the time-zero expected value of the debt, $$E^{P}[s(X)]$$, is greater than $$\theta$$.20 Write-downs will never be Pareto-efficient when $$\kappa$$ and $$\theta^{-1}$$ are both small, but could occur if both the gains from trade at time zero and the moral hazard were large. In the next section, I turn to the second question: are there cost functions $$g(\cdot)$$ for which debt securities are approximately optimal? 7. A Mean-Variance Approximation for Continuous Time Models For the static models discussed earlier, invariant divergence cost functions lead to models in which debt was approximately optimal. In this section, I will not directly answer the question of whether there are functions $$g(\cdot)$$ such that $$D_{g}(P||Q)$$ is invariant. Instead, I will show that for all $$g(t,X_{t},u_{t})=\theta\psi(u_{t})$$, where $$\psi(u_{t})$$ is a convex function, debt is approximately optimal.21 The approximations used in this section are identical to the ones discussed previously, in Section 5. I consider problems in which both the moral hazard and gains from trade are small, relative to the scale of the assets. I show that the utility of arbitrary security designs can be characterized, to first-order, by a mean-variance tradeoff. The approximate optimality of debt is a surprising result in this setting. Without limited liability, Cvitanić et al. (2009) are able to characterize some properties optimal security designs, making an analogy to the results of Holmström and Milgrom (1987). However, there is no explicit solution or implementation available, and in general the optimal securities will be dependent on the entire path of asset values, not just the final value, in a non-trivial way. There are no results, to my knowledge, about the model with limited liability. I modify the models introduced in the previous section in several small ways. I will assume that the control is bounded, $$|u_{t}|\leq\bar{u}$$ (this is a restriction on the set $$\mathscr{U}$$). This assumption simplifies the discussion of conditions to ensure finite utility. I assume $$\psi$$ satisfies the conditions required for $$g$$ in the previous section, and in addition that for all $$|u|\leq\bar{u}$$, $$\psi''(u)\in[K_{1},K_{2}]$$ for some positive constants $$0<K_{1}<1<K_{2}$$. That is, $$\psi$$ is “strongly convex” over its domain. I also normalize $$\psi''(0)=1$$. I assume that, for bounded control strategies $$|u_{t}|\leq\bar{u}$$, the asset value has a finite fourth moment. That is, $$E^{\tilde{P}}[(V_{1})^{4}]<\infty$$ under these bounded control strategies. There is a sense in which any twice-differentiable, convex cost function $$\psi(u_{t})$$ resembles the quadratic cost function, as $$u_{t}$$ becomes close to zero, because their second derivatives are the same. Similarly, in static models, all invariant divergences resemble the KL divergence. I apply this idea to the divergences $$D_{\psi}$$ induced by the convex cost functions $$\psi$$ (as defined in Proposition 5). I consider the same approximation discussed earlier, in which both $$\theta^{-1}$$ and $$\kappa$$ are small. In the context of continuous time models, Sannikov (2014) discusses a related “large firm limit”. As the cost of effort rises, the seller will choose to respond less and less to the incentives provided by the retained tranche. Regardless of the cost function $$\psi$$, the divergence $$D_{\psi}(P||Q)$$ will approach $$D_{KL}(P||Q)$$, and debt will be approximately optimal. Moreover, the distinction between the effort and risk-shifting components of utility that applied in the static approximations will apply to these models as well. To make this argument rigorous, I use Malliavin calculus in a manner similar to Monoyios (2013) to prove the following theorem: Proposition 7. For any limited liability security design $$s$$, the difference in utilities achieved by an arbitrary security $$s$$ and the sell-nothing security is \[ U(s;\theta^{-1},\kappa)-U(0;\theta^{-1},\kappa)=\kappa E^{Q}[\beta_{s}s]-\theta^{-1}\frac{1}{2}V^{Q}[\beta_{s}s]+O(\theta^{-2}+\theta^{-1}\kappa). \] The direct and indirect effects of a perturbation, to first order, are the ones described in Proposition 6, under measure $$Q$$. The decomposition of the indirect effect into effort-only and risk-shifting effects is also, to first order, identical to the one described in Proposition 6, under measure $$Q$$. Proof. See Online Appendix Section 3.16. ǁ In the continuous time effort problem with an arbitrary convex cost function, debt securities are first-order optimal. The same mean-variance intuition that I discussed in static models applies to continuous time models. The variance of the security payoff is again a summary statistic for the problems of reduced effort and risk shifting associated with the moral hazard problem. 8. Extensions and Conclusion The Appendix includes several extensions and applications of the model. In Appendix Section B, I show that the main results of the article continue to hold under alternative assumptions about timing and bargaining. The results hold if the seller first chooses a probability distribution, and then offers a security to the buyer. They also hold as long as the seller has some bargaining power. They hold if the buyer and seller shared a common discount factor, but the seller was required to raise a positive amount of financing from the buyer. In Section D, I show that allowing for free disposal of output would not change any of the results of the article. However, allowing for free risk shifting by the seller would cause equity to be the optimal contract. In Online Appendix Section 1, I apply the approximations of Section 5 to parametric models of moral hazard (when the seller chooses from a family of probability distributions). I show that debt guarantees the highest “worst case scenario” utility, where the worst case refers to the set of actions available to the seller. I also show that as the flexibility of the sellers actions grows, this bound becomes increasingly tight. I apply these results to provide a second micro-foundation for the benchmark model, based on a rational inattention problem, in Online AppendixSection 2. I also argue that the approximations I employ are appropriate in the context of mortgage origination, through a calibration exercise described in Appendix Section C. In this article, I have analysed a flexible form of moral hazard, which allows for both effort and risk-shifting. In my benchmark model, with the KL divergence cost function, debt securities are exactly optimal. I provide a micro-foundation for this model in terms of a dynamic contracting problem with quadratic costs of effort. Other security designs (in some cases, a mix of debt and equity) are exactly optimal with the $$\alpha$$-divergence cost functions, and approximately optimal for the larger class of invariant divergence cost functions. In all of these models, debt is optimal or approximately optimal because it minimizes the variance of the security payout, balancing the need to provide incentives for effort, minimize risk-shifting, and maximize trade. The editor in charge of this paper was Dimitri Vayanos. A. Additional Figures Figure A.1 View largeDownload slide Possible Security Designs This figure illustrates several possible security designs: a debt security, an equity security, and the “live-or-die” security of Innes (1990). The $$x$$-axis, labelled $$\beta_{s}v_{i}$$, is the discounted value of the asset, and the $$y$$-axis, labelled $$\beta_{s}s_{i}$$, is the discounted value of the security. The level of debt, the cutoff point for the live-or-die, and the fraction of equity are chosen for illustrative purposes. The discount factor for the seller is $$\beta_{s}=0.5$$. The outcome space $$v_{i}$$ is a set of 401 evenly-spaced values ranging from zero to 8. The $$x$$-axis is truncated to make the chart clearer. Figure A.1 View largeDownload slide Possible Security Designs This figure illustrates several possible security designs: a debt security, an equity security, and the “live-or-die” security of Innes (1990). The $$x$$-axis, labelled $$\beta_{s}v_{i}$$, is the discounted value of the asset, and the $$y$$-axis, labelled $$\beta_{s}s_{i}$$, is the discounted value of the security. The level of debt, the cutoff point for the live-or-die, and the fraction of equity are chosen for illustrative purposes. The discount factor for the seller is $$\beta_{s}=0.5$$. The outcome space $$v_{i}$$ is a set of 401 evenly-spaced values ranging from zero to 8. The $$x$$-axis is truncated to make the chart clearer. Figure A.2 View largeDownload slide Second-order optimal security designs This figure shows the second-order optimal security designs, for various values of the curvature parameter $$\alpha$$. The $$x$$-axis, labelled $$\beta_{s}v_{i}$$, is the discounted value of the asset, and the $$y$$-axis, labelled $$\beta_{s}s_{i}$$, is the discounted value of the security. These securities are plotted with the same $$\bar{v}$$ for each $$\alpha$$ (not an optimal $$\bar{v}$$). The value of $$\kappa$$ used to generate this figure is one-third, which was chosen to ensure that the slopes of the contracts would be visually distinct (and not because it is economically reasonable). The outcome space $$v$$ is a set of 401 evenly-spaced values ranging from zero to 8. Figure A.2 View largeDownload slide Second-order optimal security designs This figure shows the second-order optimal security designs, for various values of the curvature parameter $$\alpha$$. The $$x$$-axis, labelled $$\beta_{s}v_{i}$$, is the discounted value of the asset, and the $$y$$-axis, labelled $$\beta_{s}s_{i}$$, is the discounted value of the security. These securities are plotted with the same $$\bar{v}$$ for each $$\alpha$$ (not an optimal $$\bar{v}$$). The value of $$\kappa$$ used to generate this figure is one-third, which was chosen to ensure that the slopes of the contracts would be visually distinct (and not because it is economically reasonable). The outcome space $$v$$ is a set of 401 evenly-spaced values ranging from zero to 8. Figure A.3 View largeDownload slide The utility of various security designs This figure compares the utility of several security designs (debt, equity, and the optimal security design) relative to the utility of selling everything, for different values of $$\theta$$. The bottom $$x$$-axis is the value of $$\ln(\theta)$$, the top $$x$$-axis is the value of $$\kappa$$, and the $$y$$-axis is the difference in security design utility between the security (debt, equity, etc.) and selling everything. For each $$\theta$$ and corresponding $$\kappa$$, the optimal debt security, equity security, and the optimal security are determined. Then, the utility of using each of the four securities designs, given $$\theta$$ and $$\kappa$$, is computed. The cost function is a $$\alpha$$-divergence, with $$\alpha=-7$$, implying that a mix of debt and equity is optimal (see Proposition 3). The gains from trade, $$\kappa$$, vary as $$\theta$$ changes, with $$\kappa=\bar{\kappa}\theta^{-1}$$, $$\bar{\kappa}=0.0171$$. This parameter was chosen to be consistent with the calibration in the Appendix, Section C. The discounting parameter for the seller is $$\beta_{s}=0.5$$. The zero-cost distribution $$q$$ is a discretized, truncated gamma distribution with mean 2, 0.3 standard-deviation, and an upper bound of 8. The outcome space $$v$$ is a set of 401 evenly-spaced values ranging from zero to 8. The utilities are plotted for nine different values of $$\theta$$, ranging from $$2\exp(-7)$$ to $$2\exp(1)$$, and linearly interpolated between those values. Figure A.3 View largeDownload slide The utility of various security designs This figure compares the utility of several security designs (debt, equity, and the optimal security design) relative to the utility of selling everything, for different values of $$\theta$$. The bottom $$x$$-axis is the value of $$\ln(\theta)$$, the top $$x$$-axis is the value of $$\kappa$$, and the $$y$$-axis is the difference in security design utility between the security (debt, equity, etc.) and selling everything. For each $$\theta$$ and corresponding $$\kappa$$, the optimal debt security, equity security, and the optimal security are determined. Then, the utility of using each of the four securities designs, given $$\theta$$ and $$\kappa$$, is computed. The cost function is a $$\alpha$$-divergence, with $$\alpha=-7$$, implying that a mix of debt and equity is optimal (see Proposition 3). The gains from trade, $$\kappa$$, vary as $$\theta$$ changes, with $$\kappa=\bar{\kappa}\theta^{-1}$$, $$\bar{\kappa}=0.0171$$. This parameter was chosen to be consistent with the calibration in the Appendix, Section C. The discounting parameter for the seller is $$\beta_{s}=0.5$$. The zero-cost distribution $$q$$ is a discretized, truncated gamma distribution with mean 2, 0.3 standard-deviation, and an upper bound of 8. The outcome space $$v$$ is a set of 401 evenly-spaced values ranging from zero to 8. The utilities are plotted for nine different values of $$\theta$$, ranging from $$2\exp(-7)$$ to $$2\exp(1)$$, and linearly interpolated between those values. B. Timing Conventions and Bargaining In this Appendix Section, I will discuss several possible timing conventions for the sequence of decisions by the seller during the first period. In that period, the seller designs the security, sells it to the buyer (assuming the buyer accepts), and takes actions that will create or modify the assets backing the security. The timing convention refers to the order in which these three steps occur. In the first timing convention, the “shelf registration” convention (using the terminology of DeMarzo and Duffie, 1999), the security is designed before the assets are created, but sold afterward. In the second timing convention, the “origination” convention, the security is designed and sold after the assets are created. In the third timing convention, the “principal-agent” convention, the security is designed and sold before the seller takes her actions. In this last convention, it is natural to assume that the asset exists before the security is designed, but its payoffs are modified by the seller’s actions after the security is traded. For the “principal-agent” timing convention, I will also discuss the effects of Nash-bargaining of the security price, and over both the security design and the security price. Finally, I point out that a requirement for the seller to raise a certain amount of funds from the buyer, as in standard corporate finance models, would also generate “gains from trade”, even if the buyer and seller shared a common discount rate. There are asset securitization examples for each of these timing conventions. For some asset classes, such as first-lien mortgages, the security design is standardized, and the “shelf registration” timing convention is appropriate. For more unusual assets, the security design varies deal-by-deal, and the “origination” timing convention is appropriate. In some cases, such as the “Bowie bonds” (securitizations of music royalties), maintaining incentives post-securitization is important, and the principal-agent timing convention applies. Table B.1 Timing conventions during the first period Principal-agent timing Origination timing Shelf registration timing Security designed Actions taken Security designed Security traded Security designed Actions taken Actions taken Security traded Security traded Principal-agent timing Origination timing Shelf registration timing Security designed Actions taken Security designed Security traded Security designed Actions taken Actions taken Security traded Security traded Table B.1 Timing conventions during the first period Principal-agent timing Origination timing Shelf registration timing Security designed Actions taken Security designed Security traded Security designed Actions taken Actions taken Security traded Security traded Principal-agent timing Origination timing Shelf registration timing Security designed Actions taken Security designed Security traded Security designed Actions taken Actions taken Security traded Security traded The principal-agent timing convention is the simplest convention to analyse. In any sub-game perfect equilibrium, the seller takes actions that maximize the value of her retained tranche, because the price that she receives for the security has already been set. The buyer anticipates this, forming beliefs about the distribution of outcomes based on the design of the security. The buyer’s beliefs affect the price that he is willing to pay for the security, and the seller internalizes this when designing the security. Multiple equilibria are possible if the seller’s optimal actions for a particular retained tranche are not unique, or if there are multiple security designs that maximize the seller’s utility. The moral hazard, in this timing convention, can occur either because the buyer is unaware of the seller’s actions, or because he can observe those actions but is powerless to enforce any consequences based on them. Under the other two timing conventions, I use equilibrium refinements to argue that the optimal security design and actions associated with the principal-agent timing convention describe the most appealing equilibria of the game with those alternative timing conventions. I have drawn extensive-form game trees for these two timings in Figure B.1 and Figure B.2. The results I present are related to the findings of Matthews (1995) and Matthews (2001). Matthews (1995) shows, in a closely related model in which contracts are renegotiable, and there is no limited liability, that all equilibria are “second best efficient”, which is related to my result that the timing is irrelevant. Matthews (2001) extends the results of Matthews (1995) to a model with limited liability, but with only one choice (effort) for the agent. Figure B.1 View largeDownload slide Origination timing game tree This figure shows the extensive form game tree associated with the origination timing convention. The tree is stylized, in the sense that it shows only two possible actions $$p$$, and two security/price combinations $$s$$ and $$k$$. The symbols $$A$$ and $$R$$ denote acceptance or rejection of the offer. Figure B.1 View largeDownload slide Origination timing game tree This figure shows the extensive form game tree associated with the origination timing convention. The tree is stylized, in the sense that it shows only two possible actions $$p$$, and two security/price combinations $$s$$ and $$k$$. The symbols $$A$$ and $$R$$ denote acceptance or rejection of the offer. Figure B.2 View largeDownload slide Shelf registration timing game tree This figure shows the extensive form game tree associated with the shelf registration timing convention. The tree is stylized, in the sense that it shows only two possible actions $$p$$, two security designs $$s$$, and two possible prices $$k$$. The symbols $$A$$ and $$R$$ denote acceptance or rejection of the offer. Figure B.2 View largeDownload slide Shelf registration timing game tree This figure shows the extensive form game tree associated with the shelf registration timing convention. The tree is stylized, in the sense that it shows only two possible actions $$p$$, two security designs $$s$$, and two possible prices $$k$$. The symbols $$A$$ and $$R$$ denote acceptance or rejection of the offer. I assume that the actions of the seller are not observed by the buyer, ensuring there is still a moral hazard. I will discuss the benchmark, non-parametric model described in Section 2; the set of feasible actions by the seller, $$M$$, is the entire probability simplex. I use the notion of proper equilibrium defined by Myerson (1978), and developed for infinite action spaces by Simon and Stinchcombe (1995). I show that, if the principal-agent timing convention has a unique equilibrium security design, price, and set of actions taken by the seller, which involve acceptance with certainty by the buyer, then this security design, price, set of actions, and acceptance with certainty also characterize all strong proper equilibria22 of the game with the origination and shelf registration timing conventions, subject to a technical assumption. The key intuitions behind this result are the notions of “forward induction” (Kohlberg and Mertens, 1986) and “incredible beliefs” (Cho, 1987). Suppose that there is an equilibrium in the origination timing in which the buyer is always offered a particular security, $$\bar{s}$$. Now imagine that the seller plays an off-equilibrium strategy, and offers the buyer a different security, $$\hat{s}$$. What should the buyer believe about the unobservable actions taken by the seller? The notion of forward induction recognizes the seller controls both the security design and her actions, and infers from the seller’s offer of security $$\hat{s}$$ that the seller has taken actions consistent with the buyer accepting or rejecting that offer. That is, the seller might have taken actions that anticipated a lower or higher probability of the buyer accepting her offer, but the seller did not take actions that are not best responses to some acceptance strategy of the buyer, conditional on having offered the security $$\hat{s}$$ to the buyer. As a result, the buyer should accept or reject the security $$\hat{s}$$ based on the belief that the seller has acted in this way, and not rely on “incredible beliefs”. In particular, these notions rule out the idea that the buyer, when offered security $$\hat{s}$$ instead of the security $$\bar{s}$$, can believe the seller is “out to get him”, in the sense that the seller took actions that reduced her own utility to harm the buyer.23 These beliefs are not credible; the buyer cannot pretend to hold these beliefs in order to force the seller to offer him $$\bar{s}$$ instead of $$\hat{s}$$. The notions of forward induction and incredible beliefs, and their associated refinements, are not generally equivalent to the proper equilibrium concept. The proper equilibrium concept imposes the constraint that, in the sequence of mixed strategies whose limit is the equilibrium, actions that result in greater utility for the seller must be more likely than actions resulting in lower utility for the seller. The buyer’s beliefs, which are governed by Bayes’ rule, must place relatively high weight on the seller playing best-response actions. As a result, in the game I study, proper equilibrium, forward induction, and restrictions against incredible beliefs end up implementing the same idea: that, off the equilibrium path, the buyer cannot believe the seller has played an action that is not a best response, conditional on her observable choice of security design. The game is structured so that, if the buyer rejects the seller’s take-it-or-leave-it offer, the seller retains the entire asset (both the security and the retained tranche). For each security design, there is a one-dimensional manifold of best-response actions, each corresponding to a probability that the seller assigns to the buyer’s likelihood of acceptance. The worst case action in this one-dimensional manifold, from the perspective increasing the security’s value, is the action that corresponds to the seller believing the buyer will accept the security with certainty. In that case, the seller has no incentive to raise the value of the security. The other actions in this one-dimensional manifold correspond to best-responses in which the seller believes she might retain the security, and therefore acts to increase its value.24 Now consider the optimal security design and price from the principal-agent timing. If the buyer is offered this security design and price, he must be weakly willing to accept, because regardless of the probabilities he assigns to the seller’s actions over this one-dimensional manifold, the price is at least fair. The seller, recognizing that the buyer will accept this security and price25, must offer it—it maximizes her utility. This is a heuristic argument that outlines the proof in Proposition 8, as it applies to the origination timing. I will now discuss the shelf registration timing, and then discuss the technical assumptions required by the proof. In the shelf registration timing, the security is designed before the actions are taken. As a result, one might appeal to a notion of sequential rationality to capture the idea that the seller would not play non-best-response actions, conditional on the security design that has already been decided. However, the concept of sequential equilibrium is difficult to extend to games with infinite action spaces (see Myerson and Reny, 2015). I will instead use the proper equilibrium concept, recognizing that the results for the shelf registration timing might hold under a weaker equilibrium refinement. There is also a significant technical assumption required for the proof of Proposition 8. The technicality concerns the compactness of the action spaces available to the agents. The proof of Proposition 8 relies on the proof of existence of strong proper equilibria (Theorem 3.1) in Simon and Stinchcombe (1995), which itself requires that the action spaces of the agents be compact. This is problematic, because the buyer’s action space is the set of functions $$A_{b}:\:S\times\mathbb{R}\rightarrow\{0,1\}$$, where $$S$$ is the set of limited liability securities, 0 represents rejection, and 1 represents acceptance of the offered security and price. This is not a compact space; the buyer could (in theory) accept some particular security and price, while rejecting every offer of the same security with a price arbitrarily close to the price the buyer would have accepted. The potential for this type of strategy leads Simon and Stinchcombe (1995) to require compact action spaces. To circumvent these issues, I will require that the seller choose a security and price from a finite action space.26 That is, I will define the set $$S$$ of feasible security designs to be a finite set of possible security designs, all of which satisfy the limited liability constraints. I will define the set $$K$$ to be a finite set of feasible prices. First, consider the principal-agent timing. Let $$a(s,k)$$ be the buyer’s acceptance strategy. The buyer must accept if the price, $$k$$, is less than the buyer’s valuation, $$\beta_{b}\sum_{i>0}p^{i}(\eta(s))s_{i}$$, reject if the price is greater, and is indifferent if the price is equal to the buyer’s valuation. The seller’s payoff, given a particular acceptance strategy, is \begin{eqnarray*} U(s,k;a) & = & (1-a(s,k))\phi(\beta_{s}v)+a(s,k)(k+\phi(\eta(s))). \end{eqnarray*} I assume that there is a unique sub-game perfect equilibrium in the principal-agent timing, and that this equilibrium involves acceptance by the buyer with certainty. Let the $$s^{*}$$ and $$k^{*}$$ denote the security design and price in this equilibrium, and let $$p^{*}=p(\eta(s^{*}))$$ denote the corresponding optimal actions. I also assume that the security $$s^{*}$$ is not sell-nothing. I show that, under these assumptions, all strong proper equilibria of the games with the origination and shelf registration timing conventions are also characterized by the security design $$s^{*}$$, the price $$k^{*}$$, the action $$p^{*}$$, and acceptance with certainty. Proposition 8. In the non-parametric benchmark model described in Section 2, if there is a unique sub-game perfect equilibrium for the game with the principal-agent timing convention, characterized by security design $$s^{*}\in S$$, price $$k^{*}\in K$$, actions $$p^{*}\in M$$, and acceptance by the buyer, with $$s_{i}^{*}>0$$ for some $$i\in\Omega$$, then all strong proper equilibrium (in the terminology of Simon and Stinchcombe (1995)) of the origination timing and shelf registration timing are characterized by that security design, price, and action, and the buyer accepting the seller’s offer with certainty. Proof. See Online AppendixSection 3.18. ǁ The proposition argues that the timing of the game is, in essence, irrelevant. The analysis in the main body of the article, regarding when debt contracts are optimal or nearly optimal, applies regardless of the timing. The proposition, as stated, relies on the strong proper equilibrium concept defined by Simon and Stinchcombe (1995), but also applies to those authors’ weak proper equilibrium concept. Next, I will discuss, under the principal-agent timing convention, alternatives to giving all of the bargaining power to the seller. I will discuss two alternatives: first, that the seller designs the security, but then Nash-bargains with the buyer over the price, and second, that the seller and buyer bargain jointly over both the security design and price. First, suppose that the seller and buyer bargain over the price $$K(\eta)$$. Let $$1-\rho>0$$ and $$\rho>0$$ be their respective bargaining weights. The outside option is no trade: the seller retains everything, and the buyer pays and receives nothing. The price, as a function of the retained tranche (or, equivalently, of the security design), solves \[ K^{*}(\eta)\in\arg\max_{K}(\beta_{b}E^{p(\eta)}[s(\eta)]-K)^{\rho}(\phi(\eta)+K-\phi(\beta_{s}v))^{1-\rho}. \] Using the first-order conditions to solve for $$K^{*}(\eta)$$, \begin{equation} K^{*}(\eta)=(1-\rho)\beta_{b}E^{p(\eta)}[s]+\rho(\phi(\beta_{s}v)-\phi(\eta)).\label{eq:price-solution-bargaining} \end{equation} (B.1) The utility in the security design problem is \[ U(\eta)=(1-\rho)(\beta_{b}E^{p(\eta)}[s(\eta)]+\phi(\eta))+\rho\phi(\beta_{s}v). \] This is simply an affine transformation of the security design utility function described in the text (equation 2.2), and it follows that the same security design will be optimal. The bargaining power, in this case, changes only the price at which the agents trade the security. Note also that, if the buyer (instead of the seller) designs the security, and then the agents bargain over the price, a similar result follows. Now suppose that the agents bargain jointly over the security design and price. The agents maximize \[ U(s^{*})=\max_{K,s\in S}(\beta_{b}E^{p(\eta(s))}[s]-K)^{\rho}(\phi(\eta(s))+K-\phi(\beta_{s}v))^{1-\rho}. \] The optimal price, as a function of the optimal security design, is still described by equation B.1. Substituting this in, \[ U(s^{*})=\max_{s\in S}(1-\rho)^{1-\rho}\rho^{\rho}(\beta_{b}E^{p(\eta(s))}[s]+\phi(\eta(s))-\phi(\beta_{s}v)), \] which is also an affine transformation of the models described in the main text. It again follows that, if the agents bargain jointly over both the security design and price, the same security designs would be optimal. Finally, suppose that the seller and buyer share a common discount rate, $$\beta$$, but that the seller is required to raise a certain amount of funds, $$I>0$$, from the buyer. Using the principal-agent timing, in the security stage, the seller solves \[ \max_{\eta}\phi(\eta) \] subject to the limited liability constraints ($$\eta_{i}\in[0,\beta v_{i}]$$) and the fund raising constraint, \[ \beta E^{p(\eta)}[s(\eta)]\geq I. \] Let $$\lambda\geq0$$ denote the multiplier on the fundraising constraint. For any perturbation $$\eta(\epsilon)$$ satisfying the limited liability constraints, the first-order condition for the Lagrangian of this problem is \[ -(\lambda-1)\sum_{i\in\Omega}p^{i}(\eta^{*})\frac{\partial\eta_{i}}{\partial\epsilon}|_{\epsilon=0^{+}}+\lambda\beta\sum_{i,j\in\Omega}s_{j}^{*}\frac{\partial p^{j}(\eta)}{\partial\eta_{i}}|_{\eta=\eta^{*}}\frac{\partial\eta_{i}}{\partial\epsilon}|_{\epsilon=0^{+}}\leq0. \] If $$\lambda=1+\kappa>1$$, this expression is identical to equation 2.3, and it follows that the optimal security design in this case will be identical to the case studied in the main text. I will prove that $$\lambda>1$$ under the assumption that the solution to the moral hazard problem is always interior (as in the KL divergence case). Observe that it is always feasible to set \[ \frac{\partial\eta_{i}}{\partial\epsilon}|_{\epsilon=0^{+}}=s_{i}^{*}, \] a perturbation that gives some share of the security to the seller instead of the buyer. For this security, we must have $$\lambda>0$$, as $$\lambda=0$$ would imply $$E^{p(\eta)}[s(\eta)]\leq0<I$$. It follows that the constraint binds, and therefore \[ (\lambda-1)\beta^{-1}I\geq\lambda\beta\sum_{i,j\in\Omega}s_{j}^{*}\frac{\partial p^{j}(\eta)}{\partial\eta_{i}}|_{\eta=\eta^{*}}s_{i}^{*}. \] Noting that $$\phi(\eta)$$ is the convex conjugate of $$\psi(p)$$, and therefore strictly convex, and that $$\frac{\partial p^{j}(\eta)}{\partial\eta_{i}}=\partial^{i}\partial^{j}\phi(\eta)$$, the right-hand side of the above expression is strictly positive, and therefore $$\lambda>1$$, completing the proof. C. Calibration In this section of the Appendix, I will discuss possible calibration strategies for the static, non-parametric model of moral hazard discussed in the main text. I will focus on the context of mortgage securitization, and how to calibrate the key parameters $$\kappa$$ and $$\theta$$, under the assumption that the cost function is the KL divergence, or that the cost function is an invariant divergence and the first-order approximation discussed in the text is accurate. In both of these cases, a debt security design is optimal. In the context of mortgage origination, there is empirical evidence for lax screening by originators who intended to securitize their mortgage loans, which suggests that moral hazard is a relevant issue (see Demiroglu and James, 2012; Elul, 2016; Jiang et al., 2013; Keys et al., 2010; Krainer and Laderman, 2014; Mian and Sufi, 2009; Nadauld and Sherlund, 2013; Purnanandam, 2010; Rajan et al., 2015, although some of this evidence is disputed by Bubb and Kaufman (2014)). However, some of this evidence is consistent with information asymmetries but cannot distinguish between moral hazard and adverse selection. There are also mechanisms to mitigate adverse selection by the seller, such as the inability to retain loans and random selection of loans into securitization (Keys et al., 2010). I will discuss an “experimental” approach to calibration first. This approach is consistent in spirit with the empirical literature on moral hazard in mortgage lending (Keys et al., 2010; Purnanandam, 2010, others). In that literature, the quasi-experiment compares no securitization ($$\eta_{i}=\beta_{s}v_{i}$$) with securitization. If we assume securitization uses the optimal security design $$\eta^{*}$$, then $$\theta$$ can be approximated (for any invariant divergence cost function, see Section 5) as \[ \theta^{-1}\approx E^{p(\beta_{s}v)}[v_{i}]\cdot\frac{E^{p(\beta_{s}v)}[v_{i}]-E^{p^{*}}[v_{i}]}{Cov^{p^{*}}(v_{i},s_{i}^{*})}. \] This formula illustrates the difficulties of calibrating the model using the empirical work on moral hazard in mortgage lending. For the purposes of the model, what matters is the loss in expected value due to securitization, relative to the risk taken on by the buyers, ex-ante. The empirical literature estimates ex-post differences, and the magnitude of these differences varies substantially, depending on whether the data sample is from before or during the recent crash in home prices. Converting this into an ex-ante difference would require assigning beliefs to the buyer and seller about the likelihood of a crash. Estimating the ex-ante covariance, which can be understood as a measure of the quantity of “skin in the game”, is even more fraught. For these reasons, I have not pursed this calibration strategy further. The second calibration strategy, which is somewhat more promising, is to use the design of mortgage securities to infer $$\theta$$. Essentially, by (crudely) estimating the other terms in the “put option value” equation (equation 3.2), and assuming the model is correct, we can infer what the security designers thought the moral hazard was. Rearranging that equation, \[ \underbrace{\frac{\beta_{b}\bar{v}-\beta_{b}E^{p^{*}}[s_{i}]}{\beta_{b}E^{p^{*}} [s_{i}]}}_{\mathrm{Spread}}\underbrace{\frac{E^{p^{*}}[s_{i}]}{E^{p^{*}}[v_{i}]}}_{\rm Share} \left(1-\underbrace{\frac{E^{p(\beta_{s}v)}[v_{i}]-E^{p^{*}}[v_{i}]}{E^{p(\beta_{s}(v))}[v_{i}]}}_{\rm Moral\:Hazard}\right)\kappa^{-1}=\theta. \] The spread term should be thought of as reflecting the initial spread between the assets purchased by the buyer and the discount rate, under the assumption that the bonds will not default. Using a 90/10 weighting on the initial AAA and BBB 06-2 ABX coupons reported in Gorton (2008), I estimate this as 34 basis points per year. In a different setting (CLOs), the work of Nadauld and Weisbach (2012) estimates the cost of capital advantage (gains from trade) due to securitization at $$17$$ basis points per year. The “share” term is the ratio of the initial market value of the security to the initial market value of the assets. Begley and Purnanandam (2016) document that the value of the non-equity tranches was roughly 99% of the principal value in their sample of residential mortgage securitizations. Similarly, the moral hazard term is likely to be small. The estimates of Keys et al. (2010), whose interpretation is disputed by Bubb and Kaufman (2014), imply that pre-crisis, securitized mortgage loans defaulted at a 3% higher rate27 than loans held in portfolio. Assuming a 50% recovery rate, and using this as an estimate of the ex-ante expected difference in asset value, this suggests that the moral hazard term is roughly 1.5%, and therefore negligible in this calibration. Combining all of these estimates, I find $$\theta$$ of 2 is consistent with the empirical literature on securitization. This calibration assumed that the security design problem with the KL divergence was being solved. However, this formula also holds (approximately) under invariant divergences, conditional on the assumption that $$\theta^{-1}$$ and $$\kappa$$ are small enough. The value of $$\theta=2$$ can be compared with the results of Figure A.3. Under the assumptions used to generate that figure, which are described in its caption, I find that with $$\theta=2$$ and $$\kappa=0.85$$ (17 basis points per year times 5 years), debt would be achieve 99.96% of gains achieved by the optimal contract, relative to selling everything (and an even larger fraction of the gains relative to selling nothing). Under these parameters, the utility difference between the best debt security and selling nothing would be roughly 0.73% of the total asset value. While that might seem like an economically small gain, for a single deal described in Gorton (2008), SAIL 2005-6, the private gains of securitization would be roughly $\$$ 16.4mm. In contrast, the utility difference between the best equity security and selling nothing is about 0.56% of the total asset value. The private cost of using the optimal equity contract, instead of the optimal debt contract, would be roughly $\$$4mm for this particular securitization deal. The numbers discussed in this calculation depend on the assumptions used in Figure A.3, some of which are ad hoc. Nevertheless, they illustrate the general point that it is simultaneously possible for debt to be approximately optimal, and for the private gains of securitization to be large. D. Free Disposal and Free Risk-Shifting In this section, I will discuss the impact that free disposal of output by the seller and free risk-shifting would have on the models discussed in the main. For the static, non-parametric moral hazard problems discussed in Sections 3, 4, and 5, the optimal security designs feature monotone retained tranches. In the proofs, in Online AppendixLemma 1, I show this is true for any static, non-parametric security design problem with an invariant divergence cost function whose gradient (in $$p$$) is continuous in $$q$$. Intuitively, because the optimal retained tranche is monotone even without free disposal, allowing for free disposal does not change the optimal security design. To see this formally, I will show that, with free disposal, it is without loss of generality to consider monotone retained tranches and ignore the disposal option. Imagine that there is free disposal. We can write the agent’s moral hazard problem as \[ \phi(\eta)=\sup_{p\in F(r),r\in M}\left\lbrace\sum_{i>0}\eta_{i}p^{i}-\psi(r)\right\rbrace, \] where $$F(r)$$ is the set of probability distributions first-order stochastically dominated by $$r$$, under the ordering given by $$\Omega$$. The agent, in effect, makes two choices—first choosing $$r$$ using the technology discussed in the text, then following a (possibly random) output destruction strategy to create $$p$$. The buyer still receives payoff $$\beta_{b}E^{p}[s]$$, and therefore the security design utility described in equation 2.2 is still valid. Define, for any retained tranche $$\eta$$, the “monotone version” \[ \bar{\eta}_{i}(\eta)=\max_{j\in\{0,\ldots,i\}}\eta_{j}. \] Note that, because $$v_{i}$$ is weakly increasing in $$i$$, such a design does not violate the limited liability constraints. Note also that, because of the monotonicity of $$\bar{\eta}_{i}(\eta)$$, \[ \sum_{i>0}[p^{i}(\eta)-r^{i}(\eta)]\bar{\eta}_{i}(\eta)=0. \] We can rewrite the moral hazard problem as \[ \phi(\eta)=\sup_{p\in F(r),r\in M}\left\lbrace\sum_{i>0}(\eta_{i}-\bar{\eta}_{i}(\eta))p^{i}+\sum_{i>0}\bar{\eta}_{i}(\eta)r^{i}-\psi(r)\right\rbrace. \] It immediately follows that the behavior without output destruction is the same for the two securities: $$r(\eta)=r(\bar{\eta}(\eta))$$. By the definition of the retained tranche, if $$\eta_{i}<\eta_{j}$$ for some $$i>j$$, then $$s_{i}>s_{j}$$. As a result, output destruction hurts the value of the buyer’s security: \[ \sum_{i>0}[p^{i}(\eta)-r^{i}(\eta)]s_{i}(\eta)\leq0 \] for all $$\eta$$. Therefore, utility in in the security design problem is weakly higher under $$\bar{\eta}(\eta)$$ than under $$\eta$$, and it is without loss of generality to consider monotone security designs. I have shown that free disposal does not affect the static problems discussed in the text—it is equivalent to a restriction to monotone security designs in the absence of free disposal, and the optimal security designs were monotone even without such a restriction. Conveniently, essentially the same proof applies to the dynamic security design problems. Suppose we modify the stochastic process for the asset value described in Section 6 to allow for output destruction: \[ dV_{t}=b(V_{t},t)dt+u_{t}\sigma(V_{t},t)dt-dY_{t}+\sigma(V_{t},t)dW_{t}, \] where $$dY_{t}\geq0$$ is the seller’s destruction of asset value at time $$t$$. To allow such a modification, we need use as the space of asset values processes the space of RCLL functions on $$[0,1]$$, which I will denote $$\bar{\Omega}$$, instead of the space of continuous functions, which I will continue to denote $$\Omega$$. We also need to allow the security design to be a function on $$\bar{\Omega}$$. I will say that a retained tranche is monotonic in asset value if, for all $$t\in[0,1]$$, and all $$V\in\bar{\Omega}$$, $$\eta(V)$$ is weakly increasing in $$V_{t}$$. Using this definition, debt contracts are monotonic in asset value. It follows immediately that if the seller is given a retained tranche that is monotonic in asset value, she will not destroy asset value. We can define the “monotone version” of the retained tranche in the following way. Let $$F(V)$$ be the set of all RCLL functions on $$[0,1]$$ for which, for all $$f\in F(V)$$ and $$t\in[0,1]$$, \[ f_{t}\leq V_{t}. \] The monotone version of $$\eta(V)$$ is \[ \bar{\eta}(\eta,V)=\sup_{f\in F(V)}\eta(f). \] Note that, because $$f_{1}\leq V_{1}$$, this retained tranche satisfies the limited liability constraints. The “weak formulation” approach, based on Girsanov’s theorem and described in Proposition 5, can be applied. We can defined an alternative probability space, with measure $$Q$$, on which \[ dX_{t}=b(X_{t},t)dt-dY_{t}+\sigma(X_{t},t)dB_{t}, \] and a measure $$P$$, absolutely continuous with respect to $$Q$$, such that, under $$P$$, $$X$$ has the same law as $$V$$ under measure $$\tilde{P}$$. Suppose that the retained tranche is not monotonic in asset value. There is some $$t$$ and some $$X$$ such that, if the seller reaches state $$(t,X_{t})$$, she will wish to destroy output. If such a state is never reached with positive probability under measure $$P(\eta)$$ (and hence $$Q$$), the retained tranche and its monotone version achieve the same utility in the security design problem, holding the measure $$P(\eta)$$ constant. Such a state can never be reached under any measure that is absolutely continuous with respect to $$Q$$, and therefore the monotone version of the retained tranche will not affect the agent’s choice of $$P$$. It follows, in this case, that it is without loss of generality to assume monotonicity. Assume, going forward, that if a non-monotonicity exists, it is reached with positive probability. I will show that, for any retained tranche that induces the seller to destroy some asset value, there is another retained tranche that does not induce the seller to destroy asset value and achieves higher utility in the security design problem. As a result, the optimal security design is monotone. Define a modified version of the retained tranche in the following way: for each $$B\in\Omega$$, let $$X^{Y}(\eta,B)\in\bar{\Omega}$$ denote the asset value path that occurs under the seller’s optimal output destruction plan, given retained tranche $$\eta$$ and brownian motion $$B$$, and let $$X(B)$$ be the asset value path that would occur in the absence of output destruction. Note that $$X(B)$$ is not affected by the design of the retained tranche, and that there is a one-to-one mapping between $$X$$ and $$B$$. We can defined a modified version of the retained tranche, for $$X\in\Omega$$, as \[ \tilde{\eta}(X,\eta)=\eta(X^{Y}(\eta,B(X))), \] where $$B(X)$$ is the Brownian motion that induces $$X$$ in the absence of asset value destruction. For discontinuous $$X$$, let $$\tilde{\eta}(X,\eta)=0$$. Note that, because asset value destruction decreases $$X_{1}$$, this modified retained tranche satisfies the limited liability constraints. By revealed preference, $$\tilde{\eta}(X)$$ does not induce output destruction—if it did, the seller’s output destruction given $$\eta$$ would not have been optimal. Moreover, $$\tilde{\eta}$$ must also induce the same choice of $$P$$; again, if some different choice of $$P$$ was preferable, it would also be preferable under the contract $$\eta$$. It follows that the seller receives the same utility from $$\eta$$ and $$\tilde{\eta}$$. The buyer, however, receives weakly higher utility from $$\tilde{\eta}$$. By the assumptions discussed in Section 6, for any realization of the Brownian motion $$\omega\in\Omega$$, destruction of output at time $$t$$ lowers the value of the asset for all times $$s\geq t$$, relative to the asset values that would have been generated in the absence of destruction. $$E_{t}[V_{1}]$$ is always decreased by destruction (by assumption), and $$s(X)=X_{1}-\beta_{s}^{-1}\eta(X).$$ As a result, $$\tilde{\eta}$$ delivers weakly higher utility than $$\eta$$, and it is without loss of generality to study monotone security designs and assume no output destruction. Finally, I will discuss “free risk-shifting”. As discussed in the text, the strict convexity assumption on the divergences I study rules out risk-shifting that is completely free. One implication of free risk-shifting is that there is not necessarily a unique optimal probability distribution for the seller to choose in the moral hazard problem. For the purposes of discussion, suppose that there is some convention by which a single $$p(\eta)$$ is determined for each $$\eta$$. The utility of any security design can be decomposed (along the lines of Lemma 1), with free risk-shifting,as \begin{align*} U(\eta) & =\beta_{b}\sum_{i\in\Omega}q^{i}v_{i}-\kappa\sum_{i\in\Omega}p^{i}(\eta)[\eta_{i}-\gamma(\eta)\beta_{s}v_{i}]+\\ & \frac{\beta_{b}}{\beta_{s}}e(\eta)-c(e(\eta))-\kappa\sum_{i\in\Omega}p^{i}(\gamma(\eta)\beta_{s}v)\gamma(\eta)\beta_{s}v_{i}, \end{align*} where $$M(e)\subset M$$ is the set of probability distributions associated with effort level $$e$$ and \[ c(e)=\min_{p\in M(e)}\psi(p). \] The moral hazard problem can be written as \begin{align*} \phi(\eta) & =\max_{e,p\in M(e)}\sum_{i\in\Omega}p^{i}[\eta_{i}-\gamma(\eta)\beta_{s}v_{i}]\\ & +\sum_{i\in\Omega}p_{e}^{i}(e)\gamma(\eta)\beta_{s}v_{i}-c(e), \end{align*} where \[ p_{e}(e)=\arg\min_{p\in M(e)}\psi(p). \] By the seller’s optimal choice of $$p$$ in the moral hazard problem, it must be the case that \[ \sum_{i\in\Omega}p^{i}(\eta)[\eta_{i}-\gamma(\eta)\beta_{s}v_{i}]\geq0. \] It follows immediately that the equivalent equity tranche delivers higher utility in the security design problem, if it is feasible. If $$\gamma(\eta)>1$$, then \begin{align*} U(\eta) & \leq(1+\kappa)\beta_{s}\sum_{i\in\Omega}q^{i}v_{i}+(1+\kappa)e(\eta)\\ & -c(e(\eta))-\kappa(e+\beta_{s}\sum_{i\in\Omega}q^{i}v_{i})\\ & \leq\beta_{s}\sum_{i\in\Omega}q^{i}v_{i}+e(\eta)-c(e(\eta))\\ & \leq\beta_{s}\sum_{i\in\Omega}q^{i}v_{i}+e(\beta_{s}v)-c(e(\beta_{s}v)), \end{align*} implying that $$\eta_{i}=\beta_{s}v_{i}$$ is preferable, and hence $$\gamma(\eta)>1$$ is never optimal. Negative effort is also sub-optimal, and hence the optimal design features $$\gamma(\eta)\in[0,1]$$, and therefore that the equivalent equity tranche is feasible. It follows that, with free risk-shifting, an equity security is always an optimal security design. Acknowledgements The author would like to thank, in no particular order, Emmanuel Farhi, Philippe Aghion, Alp Simsek, David Laibson, Alex Edmans, Luis Viceira, Jeremy Stein, Yao Zeng, John Campbell, Ming Yang, Oliver Hart, David Scharfstein, Sam Hanson, Adi Sunderam, Guillaume Pouliot, Yuliy Sannikov, Zhiguo He, Lars Hansen, Roger Myerson, Michael Woodford, Gabriel Carroll, Drew Fudenberg, Scott Kominers, Eric Maskin, Mikkel Plagborg-Møller, Bengt Holmstrom, Arvind Krishnamurthy, Peter DeMarzo, Sebastian Di Tella, and many seminar participants for helpful feedback. The author would also like to thank Dimitri Vayanos (the editor) and three anonymous referees for comments that helped improve the article. A portion of this research was conducted while visiting the Becker Friedman Institute. All remaining errors soley depend on the author. Footnotes 1. Throughout the article, I will use she/her to refer to the seller and he/his to the buyer of the security. No association of the agents to particular genders is intended. 2. A similar result, derived from a robust contracting framework, appears in Antic (2015). 3. This article also builds on some of the methods of Yang (2015) (see the Appendix, Section 3.15). 4. For brevity, I have omitted the result for log-normal distributions from the article. It is available upon request. 5. Using a discrete outcome space simplifies the exposition, but is not necessary for the main results. 6. The gains from trade could also be motivated by a requirement that the seller raise a certain amount of funds from the buyer (see Appendix Section B). 7. Because the sample space $$\Omega$$ is a finite set of outcomes, even in the “non-parametric” case, the choice of $$p$$ can be expressed as a choice over a finite number of parameters. I am using the terms non-parametric and parametric to denote whether the set $$M$$ of feasible probability distributions is the entire simplex, or a restricted set. 8. A “divergence” is similar to a distance, except that there is no requirement that it be symmetric between $$p$$ and $$q$$, or that it satisfy the triangle inequality. 9. Other authors use different sign conventions or scaling for the $$\alpha$$ parameter. 10. Under this convention, the KL divergence corresponds to $$f(u)=u\ln u-u+1$$. 11. This equity share is not necessarily feasible—if $$\eta$$ induces a very high or very low level effort, the equivalent equity share might be more than 100% or less than 0% of the asset value. The probability distribution associated with the equivalent equity contract, $$p(\gamma(\eta)\beta_{s}v)$$, has the lowest cost among all probability distributions with the same effort level. 12. This perturbation argument builds on the suggestions of an anonymous referee. 13. Shavell (1979) mentions that flat contracts minimize variance, in a context without limited liability. A related result with limited liability can be found in Plantin (2015). 14. The model has ambiguous comparative statics for the zero-effort distribution $$q$$. A mean-preserving spread perturbation to $$q$$ can decrease the optimal debt level, because higher volatility increases the value of the put option, or increase it, because it can increase the mean of $$p^{*}$$, decreasing the value of the put option. 15. It is possible to characterize the utility in the security design problem up to second order, for all security designs, not just those that are close to the optimal security design. In fact, to first order, the utility in the security design problem is exactly a mean-variance tradeoff (Online AppendixProposition 3). 16. See Biais et al. (2007); Hellwig and Schmidt (2002); Sadzik and Stacchetti (2015) for analysis of the relationship between discrete and continuous time models. 17. The following conditions are sufficient. For all $$V\in\mathbb{R}^{+}$$ and $$t\in[0,1]$$, $$\sigma(V,t)>0$$ and $$|b(V,t)|+|\sigma(V,t)|\leq C(1+|V|)$$ for some positive constant $$C$$. For all $$t\in[0,1]$$, $$V,V'\in\mathbb{R}^{+}$$, $$|b(V,t)-b(V',t)|+|\sigma(V,t)-\sigma(V',t)|\leq D|V-V'|$$, for some positive constant $$D$$. For all $$t\in[0,1]$$, $$\lim_{v\rightarrow0^{+}}\sigma(v,t)=0$$, and $$\lim_{v\rightarrow0^{+}}b(t,v)\geq0$$. 18. If the buyer and seller were risk-averse, but shared a common risk-neutral measure $$\tilde{P}$$, the problem would be identical. The key assumption in that case would be that the problem is small, in the sense that the outcome of this particular asset and security does not alter the common risk-neutral measure. 19. Note that this formulation rules out time discounting of the effort costs. One way to motivate this assumption is to suppose that neither agent discounts the future, but the seller is required to raise $$I$$ dollars to initiate the project. In this case, the gains from trade is the multiplier on this constraint (see Appendix Section B). 20. This condition is sufficient, not necessary. I have omitted the proof for brevity. 21. These flow cost functions will generate divergences $$D_{g}(P||Q)$$ that, like the KL divergence, have the property that their “second variations” are proportional to the Fisher information. This is the infinite-dimensional analogue of the mathematical property of invariant divergences that leads to the approximate optimality of debt. 22. I have not shown that there is a unique strong proper equilibrium—in theory, there could be multiple equilibria with different acceptance strategies by the buyer for security/price combinations that never occur in equilibrium. However, results in the proof of Proposition 8 lead me to believe this is not the case. 23. This argument uses the strategic form of the game, not the agent-strategic form (see Fudenberg and Tirole (1991), chapter 8.4). That is, off-equilibrium security designs are assumed to be correlated with off-equilibrium actions by the seller. 24. This result depends on the convexity of the set $$M$$. 25. Actually, some price slightly lower but arbitrarily close to this price. 26. There are at least three possible alternative strategies. I could have required that the buyer’s strategy satisfy enough conditions to ensure compactness. Alternatively, I could have pursued the “limit-of-finite” approach described in Simon and Stinchcombe (1995). Finally, I could have attempted to explicitly construct the sequence of mixed strategies that generate the proper equilibrium. Each of these seemed to require significant technical work that is beyond the scope of this article. 27. After about one year, ~11% of securitized loans were in default, compared to ~8% of loans held in portfolio. REFERENCES ACHARYA V. , MEHRAN H. and THAKOR A. V. ( 2016 ), “Caught Between Scylla and Charybdis? Regulating Bank Leverage When There is Rent Seeking and Risk Shifting” , The Review of Corporate Finance Studies , 5 , 36 – 75 . AGHION P. and BOLTON P. ( 1992 ), “An Incomplete Contracts Approach to Financial Contracting” , The Review of Economic Studies , 59 , 473 – 494 . Google Scholar Crossref Search ADS ALI S. and SILVEY S. ( 1966 ), “A General Class of Coefficients of Divergence of One Distribution from Another” , Journal of the Royal Statistical Society. Series B (Methodological) , 28 , 131 – 142 . AMARI S. and NAGAOKA H. ( 2007 ), Methods of Information Geometry , Vol. 191 ( Providence, Rhode Island : American Mathematical Society ). ANTIC N. ( 2015 ), “Contracting with Unknown Technologies” ( Unpublished Paper , Princeton University ). BARRON D. , GEORGIADIS G. and SWINKELS J. ( 2017 ), “Optimal Contracts with a Risk-Taking Agent” (Unpublished manuscript) . BEGLEY T. A. and PURNANANDAM A. K. ( 2016 ), “Design of Financial Securities: Empirical Evidence from Private-label RMBS Deals” , The Review of Financial Studies , 30 , 120 – 161 . Google Scholar Crossref Search ADS BIAIS B. and CASAMATTA C. ( 1999 ), “Optimal Leverage and Aggregate Investment” , The Journal of Finance , 54 , 1291 – 1323 . Google Scholar Crossref Search ADS BIAIS B. , MARIOTTI T. , PLANTIN G. and ROCHET J. ( 2007 ), “Dynamic Security Design: Convergence to Continuous Time and Asset Pricing Implications” , The Review of Economic Studies , 74 , 345 – 390 . Google Scholar Crossref Search ADS BIERKENS J. and KAPPEN H. J. ( 2014 ), “Explicit Solution of Relative Entropy Weighted Control” , Systems & Control Letters , 72 , 36 – 43 . Google Scholar Crossref Search ADS BUBB R. and KAUFMAN A. ( 2014 ), “Securitization and Moral Hazard: Evidence from Credit Score Cutoff Rules” , Journal of Monetary Economics , 63 , 1 – 18 . Google Scholar Crossref Search ADS CARROLL G. ( 2015 ), “Robustness and Linear Contracts” , The American Economic Review , 105 , 536 – 563 . Google Scholar Crossref Search ADS ČENCOV N. N. ( 2000 ), Statistical Decision Rules and Optimal Inference , Vol. 53 ( Providence, Rhode Island : American Mathematical Society ). CHO I. K. ( 1987 ), “A Refinement of Sequential Equilibrium” , Econometrica: Journal of the Econometric Society , 1367 – 1389 . CSISZÁR I. ( 1967 ), “Information-Type Measures of Difference of Probability Distributions and Indirect Observations” , Studia Sci. Math. Hungar. , 2 , 299 – 318 . CVITANIĆ J. , WAN X. and ZHANG J. ( 2009 ), “Optimal Compensation with Hidden Action and Lump-Sum Payment in a Continuous-time Model” , Applied Mathematics and Optimization , 59 , 99 – 146 . Google Scholar Crossref Search ADS DANG T. , GORTON G. and HOLMSTRÖM B. ( 2011 ), “Ignorance and the Optimality of Debt for Liquidity Provision” ( Technical report, Working Paper , Yale University ). DEMARZO P. and DUFFIE D. ( 1999 ), “A Liquidity-Based Model of Security Design” , Econometrica , 67 , 65 – 99 . Google Scholar Crossref Search ADS DEMARZO P. M. and SANNIKOV Y. ( 2006 ), “Optimal Security Design and Dynamic Capital Structure in a Continuous-Time Agency Model” , The Journal of Finance , 61 , 2681 – 2724 . Google Scholar Crossref Search ADS DEMIROGLU C. and JAMES C. ( 2012 ), “How Important is Having Skin in the Game? Originator-Sponsor Affiliation and Losses on Mortgage-Backed Securities” , Review of Financial Studies , 25 , 3217 – 3258 . Google Scholar Crossref Search ADS EDMANS A. and LIU Q. ( 2010 ), “Inside Debt” , Review of Finance , 15 , 75 – 102 . Google Scholar Crossref Search ADS ELUL R. ( 2016 ), “Securitization and Mortgage Default” , Journal of Financial Services Research , 49 , 281 – 309 . Google Scholar Crossref Search ADS FENDER I. and MITCHELL J. ( 2009 ), “Incentives and Tranche Retention in Securitisation: a Screening Model” (CEPR Discussion Paper No. DP7483) . FUDENBERG D. and TIROLE J. ( 1991 ), “Game Theory” . GALE D. and HELLWIG M. ( 1985 ), “Incentive-Compatible Debt Contracts: The One-Period Problem” , The Review of Economic Studies , 52 , 647 – 663 . Google Scholar Crossref Search ADS GORTON G. ( 2008 ), “The Panic of 2007” (Technical report, National Bureau of Economic Research) . GROSSMAN S. J. and HART O. D. ( 1983 ), “An Analysis of the Principal-Agent Problem” , Econometrica: Journal of the Econometric Society , 51 , 7 – 45 . Google Scholar Crossref Search ADS HANSEN L. and SARGENT T. ( 2008 ), Robustness ( Princeton, New Jersey : Princeton University Press ). HART O. and MOORE J. ( 1994 ), “A Theory of Debt Based on the Inalienability of Human Capital” , The Quarterly Journal of Economics , 109 , 841 – 879 . Google Scholar Crossref Search ADS HELLWIG M. F. and SCHMIDT K. M. ( 2002 ), “Discrete–Time Approximations of the Holmström–Milgrom Brownian–Motion Model of Intertemporal Incentive Provision” , Econometrica , 70 , 2225 – 2264 . Google Scholar Crossref Search ADS HELLWIG M. ( 2009 ), “A Reconsideration of the Jensen-Meckling Model of Outside Finance” , Journal of Financial Intermediation , 18 , 495 – 525 . Google Scholar Crossref Search ADS HOLMSTRÖM B. and MILGROM P. ( 1987 ), “Aggregation and Linearity in the Provision of Intertemporal Incentives” , Econometrica , 55 , 303 – 328 . Google Scholar Crossref Search ADS INNES R. ( 1990 ), “Limited Liability and Incentive Contracting with Ex-ante Action Choices” , Journal of Economic Theory , 52 , 45 – 67 . Google Scholar Crossref Search ADS JENSEN M. ( 1986 ), “Agency Costs of Free Cash Flow, Corporate Finance, and Takeovers” , The American Economic Review , 76 , 323 – 329 . JENSEN M. and MECKLING W. ( 1976 ), “Theory of the Firm: Managerial Behavior, Agency Costs and Ownership Structure” , Journal of Financial Economics , 3 , 305 – 360 . Google Scholar Crossref Search ADS JIANG W. , NELSON A. A. and VYTLACIL E. ( 2013 ), “Securitization and Loan Performance: Ex Ante and Ex Post Relations in the Mortgage Market” , Review of Financial Studies , 27 , 454 – 483 . Google Scholar Crossref Search ADS KEYS B. , MUKHERJEE T. , SERU A. , et al. ( 2010 ), “Did Securitization Lead to Lax Screening? Evidence from Subprime Loans” , The Quarterly Journal of Economics , 125 , 307 – 362 . Google Scholar Crossref Search ADS KOHLBERG E. and MERTENS J. F. ( 1986 ), “On the Strategic Stability of Equilibria” , Econometrica: Journal of the Econometric Society , 1003 – 1037 . KRAINER J. and LADERMAN E. ( 2014 ), “Mortgage Loan Securitization and Relative Loan Performance” , Journal of Financial Services Research , 45 , 39 – 66 . Google Scholar Crossref Search ADS MATTHEWS S. A. ( 1995 ), “Renegotiation of Sales Contracts” , Econometrica: Journal of the Econometric Society , 567 – 589 . MATTHEWS S. A. ( 2001 ), “Renegotiating Moral Hazard Contracts under Limited Liability and Monotonicity” , Journal of Economic Theory , 97 , 1 – 29 . Google Scholar Crossref Search ADS MIAN A. and SUFI A. ( 2009 ), “The Consequences of Mortgage Credit Expansion: Evidence from the U.S. Mortgage Default Crisis” , The Quarterly Journal of Economics , 124 , 1449 – 1496 . Google Scholar Crossref Search ADS MONOYIOS M. ( 2013 ), “Malliavin Calculus Method for Asymptotic Expansion of Dual Control Problems” , SIAM Journal on Financial Mathematics , 4 , 884 – 915 . Google Scholar Crossref Search ADS MYERSON R. B. ( 1978 ), “Refinements of the Nash Equilibrium Concept” , International journal of game theory , 7 , 73 – 80 . Google Scholar Crossref Search ADS MYERSON R. B. and RENY P. J. ( 2015 ), “Sequential Equilibria of Multi-stage Games with Infinite Sets of Types and Actions” ( Manuscript , University of Chicago ). NACHMAN D. C. and NOE T. H. ( 1994 ), “Optimal Design of Securities under Asymmetric Information” , Review of Financial Studies , 7 , 1 – 44 . Google Scholar Crossref Search ADS NADAULD T. D. and SHERLUND S. M. ( 2013 ), “The Impact of Securitization on the Expansion of Subprime Credit” , Journal of Financial Economics , 107 , 454 – 476 . Google Scholar Crossref Search ADS NADAULD T. D. and WEISBACH M. S. ( 2012 ), “Did Securitization Affect the Cost of Corporate Debt?” , Journal of Financial Economics , 105 , 332 – 352 . Google Scholar Crossref Search ADS PLANTIN G. ( 2015 ), “Shadow Banking and Bank Capital Regulation” , Review of Financial Studies , 28 , 146 – 175 . Google Scholar Crossref Search ADS PURNANANDAM A. ( 2010 ): “Originate-to-distribute Model and the Subprime Mortgage Crisis” , Review of Financial Studies , 24 , 1881 – 1915 . Google Scholar Crossref Search ADS RAJAN U. , SERU A. and VIG V. ( 2015 ), “The Failure of Models that Predict Failure: Distance, Incentives and Defaults” , Journal of Financial Economics , 115 , 237 – 260 . Google Scholar Crossref Search ADS RAVID S. A. and SPIEGEL M. ( 1997 ), “Optimal Financial Contracts for a Start-up with Unlimited Operating Discretion” , Journal of Financial and Quantitative Analysis , 32 , 269 – 286 . Google Scholar Crossref Search ADS SADZIK T. and STACCHETTI E. ( 2015 ), “Agency Models With Frequent Actions” , Econometrica , 83 , 193 – 237 . Google Scholar Crossref Search ADS SANNIKOV Y. ( 2014 ), “Moral Hazard and Long-Run Incentives” (Working Paper No. 3430, Stanford Graduate School of Business) . SCHAETTLER H. and SUNG J. ( 1993 ), “The First-Order Approach to the Continuous-Time Principal–Agent Problem with Exponential Utility” , Journal of Economic Theory , 61 , 331 – 371 . Google Scholar Crossref Search ADS SHAVELL S. ( 1979 ), “Risk Sharing and Incentives in the Principal and Agent Relationship” , The Bell Journal of Economics , 55 – 73 . SIMS C. ( 2003 ), “Implications of Rational Inattention” , Journal of Monetary Economics , 50 , 665 – 690 . Google Scholar Crossref Search ADS SIMON L. K. and STINCHCOMBE M. B. ( 1995 ), “Equilibrium Refinement for Infinite Normal-Form Games” , Econometrica: Journal of the Econometric Society , 1421 – 1443 . TOWNSEND R. ( 1979 ), “Optimal Contracts and Competitive Markets with Costly State Verification” , Journal of Economic Theory , 21 , 265 – 93 . Google Scholar Crossref Search ADS VANASCO V. ( 2017 ), “The Downside of Asset Screening for Market Liquidity” , The Journal of Finance , 72 , 1937 – 1982 . Google Scholar Crossref Search ADS YANG M. ( 2015 ), “Optimality of Debt under Flexible Information Acquisition” (Available at SSRN 2103971) . © The Author(s) 2017. Published by Oxford University Press on behalf of The Review of Economic Studies Limited. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Journal

The Review of Economic StudiesOxford University Press

Published: Oct 1, 2018

References