# Bipodal Structure in Oversaturated Random Graphs

Bipodal Structure in Oversaturated Random Graphs Abstract We study the asymptotics of large simple graphs directly constrained by the limiting subgraph densities of edges and of an arbitrary fixed graph $$H$$. We prove that, for all but finitely many values of the edge density, if the density of $$H$$ is constrained to be slightly higher than that for the corresponding Erdős–Rényi graph, the typical large graph is bipodal with parameters varying analytically with the densities. Asymptotically, the parameters depend only on the degree sequence of $$H$$. 1 Introduction We study the asymptotics of large, simple, labeled graphs directly constrained to have subgraph densities $$\epsilon$$ of edges, and $$\tau$$ of some fixed subgraph $$H$$ with $$\ell \ge 2$$ edges. To study the asymptotics we use the graphon formalism of Borgs et al. [2, 3], Lovász et al. [7–9] and the large deviations theorem of Chatterjee and Varadhan [5], from which one can reduce the analysis to the study of the graphons which maximize the entropy subject to the density constraints [6, 14–16]. See definitions in Section 2. The phase space (parameter space) is the subset of $$[0,1]^2$$ consisting of accumulation points of all pairs of densities $${\bar\tau}=(\epsilon,\tau)$$ achievable by finite graphs. (See Figure 1 for the model where $$H$$ is a triangle.) Within the phase space is the “Erdős–Rényi curve” (ER curve) $$\{(\epsilon,\tau)~|~\tau=\epsilon^\ell\}$$, attained when edges are chosen independently. In this paper, we study the typical behavior of large graphs for $$\bar \tau$$ just above the ER curve. We will show that the qualitative behavior of such graphs is the same for all choices of $$H$$ and for all but finitely many choices of $$\epsilon$$ depending on $$H$$. Fig. 1. View largeDownload slide Boundary of the phase space for the edge/triangle model in solid lines, see [10]. On the right, the ER curve is shown with dashes. Fig. 1. View largeDownload slide Boundary of the phase space for the edge/triangle model in solid lines, see [10]. On the right, the ER curve is shown with dashes. To be precise, we show that for fixed $$H$$, for $$\epsilon$$ outside a finite set, and for $$\tau$$ close enough to $$\epsilon^\ell$$, there is a unique entropy-maximizing graphon (up to measure-preserving transformations of the unit interval); furthermore it is bipodal and depends analytically on $$(\epsilon,\tau)$$, implying that the entropy is an analytic function of $$(\epsilon,\tau)$$. In particular we prove the existence of one or more well-defined phases just above the ER curve. This is the first proof, as far as we know, of the existence of a phase in any constrained-density graphon model, where by phase we mean a (maximal) open set in the phase space at each point of which the entropy has a unique graphon maximizer, which varies analytically with the constraint parameters. (Conjecturally, the phase space is made up of a union of phases and a subset of lower dimension, the latter providing boundaries for the phases [14].) The unique maximizers provide an embedding of each phase into the metric space of reduced graphons. Variation of constraint values in the phase space is therefore mirrored by this embedding into variation in the space of graphons. This has the consequence that smoothness or singularity under variation can be interpreted among the graphons, which are thought of as the emergent states of the large graphs. In contrast, in exponential random graph models (see, e.g., [1, 4, 11, 13]) the parameters, which are associated with graphons by optimization of free energy rather than entropy, play a fundamentally different role; different parameters values can be associated with the same optimal graphon. For an extreme example, the whole two-dimensional parameter space for edge/2-star constraints is mapped in this way into the one-dimensional set of Erdös–Rényi graphons [4]. Clearly, smoothness or singularity under variation of parameter values in such models is more naturally interpreted as a feature of the model, as in [4], rather than as a feature of states of large constrained graphs. For further analysis of this see the discussion in the Conclusion in [6]. The study of constrained graphs in the sense we are considering, was initiated by Turán in 1941 [18], addressing in particular the case of edge and triangle constraints. The extremal graph theory of these constraints was recently completed by Razborov et al., in [12, 17], which also contain a good history of this problem. Partial results describing the entropy maximizing graphons in the interior of that phase space were then obtained in [14–16]. For the edge-$$k$$-star model, we proved multipodality of all entropy optimizers in [6]. This will be an important tool in this paper. It is also important for the heuristics as it provides a simple interpretation of the emergence of the large scale state of the constrained graphs, through partitioning of nodes. We should also mention that the region below the ER curve in the edge/triangle model seems to be more mysterious; no proof of multipodality is known, for example, except on a line segment [16], though there is good simulation evidence of it [14]. A bipodal graphon is a function $$g: [0,1]^2 \to [0,1]$$ of the form:   g(x,y)={p11x,y<c,p12x<c<y,p12y<c<x,p22x,y>c. (1) Bipodal graphons are generalizations of bipartite graphons, in which $$p_{11}=p_{22}=0$$. Here $$c,p_{11}, p_{12}$$, and $$p_{22}$$ are constants taking values between 0 and 1. We prove that as $$\tau\searrow \epsilon^\ell$$, the parameters $$c \to 0$$, $$p_{22} \to \epsilon$$, and $$p_{11}$$ and $$p_{12}$$ approach the solutions of a problem in single-variable calculus. The inputs to that calculus problem depend only on the degrees of the vertices of $$H$$. We say that a finite graph $$H$$ is $$k$$-starlike if all the vertices of $$H$$ have degree $$k$$ or 1, where $$k >1$$ is a fixed integer. $$k$$-starlike graphs include $$k$$-stars (where one vertex has degree $$k$$ and $$k$$ vertices have degree 1), and the complete graph on $$k+1$$ vertices. For fixed $$k$$, all $$k$$-starlike graphs behave essentially the same for our asymptotics. We prove our results first for $$k$$-stars, and then apply perturbation theory to show that the differences between different $$k$$-starlike graphs are irrelevant, and then prove the general case. To state our results more precisely, we need some notation. Let   S0(w)=−12[wlog⁡w+(1−w)log⁡(1−w)], (2) and define the graphon entropy (or entropy for short) of a graphon $$g$$ to be   s(g)=∫01∫01S0(g(x,y))dxdy. (3) Let   ψk(ϵ,ϵ~)=2[S0(ϵ~)−S0(ϵ)−S0′(ϵ)(ϵ~−ϵ)]ϵ~k−ϵk−kϵk−1(ϵ~−ϵ). (4) $$\psi_k(\epsilon,\tilde \epsilon)$$, viewed as a function of $$\tilde\epsilon$$, has a removable singularity at $$\tilde \epsilon=\epsilon$$, which we fill by defining   ψk(ϵ,ϵ)=2S0″(ϵ)k(k−1)ϵk−2. (5) For fixed $$\epsilon$$, let $$\zeta_k(\epsilon)$$ be the value of $$\tilde \epsilon$$ that maximizes $$\psi_k(\epsilon,\tilde \epsilon)$$. (We will prove in Theorem 3.3 below that this maximizer is unique and depends continuously on $$\epsilon$$.) Theorem 1.1. Let $$H$$ be a $$k$$-starlike graph with $$\ell\ge 2$$ edges. Let $$\epsilon \in (0,1)$$ be any point other than $$(k-1)/k$$. Then there is a number $$\tau_0> \epsilon^\ell$$ (depending on $$\epsilon$$) such that for all $$\tau \in (\epsilon^\ell, \tau_0)$$, the entropy-maximizing graphon at $$(\epsilon,\tau)$$ is unique (up to measure-preserving transformations of $$[0,1]$$) and bipodal. Its parameters $$(c, p_{11}, p_{12}, p_{22})$$ are analytic functions of $$\epsilon$$ and $$\tau$$ on the region $$\epsilon \ne (k-1)/k$$, $$\tau \in (\epsilon^\ell, \tau_0(\epsilon))$$. Furthermore, as $$\tau\searrow\epsilon^\ell$$ we have that $$p_{22} \to \epsilon$$, $$p_{12} \to \zeta_k(\epsilon)$$, $$p_{11}$$ satisfies $$S_0'(p_{11}) = 2S_0'(p_{12}) - S_0'(p_{22})$$, and $$c=O(\tau-\epsilon^\ell)$$. □ Theorem 1.1 proves that there is part of a phase just above the ER curve for $$\epsilon < (k-1)/k$$ and also for $$\epsilon > (k-1)/k$$; numerical evidence suggests these are in fact parts of a single phase; the only “singular” behavior is the manner in which the graphon approaches the constant graphon associated with the ER curve. We will see in Theorem 1.2 that this behavior is only slightly more complicated for general $$H$$ than it is for $$k$$-starlike $$H$$. When $$H$$ has vertices with different degrees $$>1$$, the problem resembles that of a formal positive linear combination of $$k$$-stars. As in the $$k$$-starlike case, we first solve the problem for the linear combination of $$k$$-stars and then use perturbation theory to extend the results to arbitrary $$H$$. Theorem 1.2. Let $$H$$ be an arbitrary graph with $$\ell$$ edges with at least one vertex of degree $$2$$ or greater. Then there exists a finite set $$B_H \subset (0,1)$$ such that if $$\epsilon \ne B_H$$, then there is a number $$\tau_0> \epsilon^\ell$$ (depending on $$\epsilon$$) such that for all $$\tau \in (\epsilon^\ell, \tau_0)$$, the entropy-maximizing graphon at $$(\epsilon,\tau)$$ is unique (up to measure-preserving transformations of $$[0,1]$$) and bipodal. Its parameters $$(c, p_{11}, p_{12}, p_{22})$$ are analytic functions of $$\epsilon$$ and $$\tau$$ on the region $$\epsilon \not \in B_H$$, $$\tau \in (\epsilon^\ell, \tau_0(\epsilon))$$. Furthermore, as $$\tau\searrow \epsilon^\ell$$ we have that $$p_{22} \to \epsilon$$, $$p_{12}$$ approaches the maximizer of an explicit function whose data depends on $$\epsilon$$, $$p_{11}$$ satisfies $$S_0'(p_{11}) = 2S_0'(p_{12}) - S_0'(p_{22})$$, and $$c=O(\tau-\epsilon^\ell)$$. □ The key differences between the Theorems 1.1 and 1.2 are: For $$k$$-starlike graphs, the set $$B_H$$ of bad values of $$\epsilon$$ consists of a single point, and this point is explicitly known: $$\epsilon = (k-1)/k$$. For $$k$$-starlike graphs, the behavior of $$\zeta_k$$ is explicit. It is a continuous and strictly decreasing function of $$\epsilon$$, and gives an involution of $$(0,1)$$. (That is, $$\zeta_k(\zeta_k(\epsilon))=\epsilon$$.) For $$k=2$$ it is given by $$\zeta_2(\epsilon)=1-\epsilon$$. In the general case, the limiting value of $$p_{12}$$, and its dependence on $$\epsilon$$, appear to be much more complicated. We do not know whether this limiting value is always continuous across the bad set $$B_H$$. The organization of this paper is as follows. In Section 2 we review the formalism of graphons and establish basic notation. In Section 3 we establish a number of technical results for $$k$$-star models. Using these results, in Section 4 we prove Theorem 1.1 for the case that $$H$$ is a $$k$$-star. In Section 5 we show that just above the ER curve a model with an arbitrary $$k$$-starlike $$H$$ can be approximated by a $$k$$-star model. By bounding the error terms, we prove Theorem 1.1 in full generality. In Section 6 we consider formal positive linear combinations of $$k$$-stars, and prove a theorem much like Theorem 1.2 for those models. Finally, in Section 7 we show that the model for an arbitrary $$H$$ can be approximated by a formal linear combination of $$k$$-stars, thus completing the proof of Theorem 1.2. 2 Notation and background We consider a simple graph $$G$$ (undirected, with no multiple edges or loops) with a vertex set $$V(G)$$ of labeled vertices. For a subgraph $$H$$ of $$G$$, let $$T_H(G)$$ be the number of maps from $$V(H)$$ into $$V(G)$$ which sends edges to edges. The density$$\tau_H(G)$$ of $$H$$ in $$G$$ is then defined to be   τH(G):=|TH(G)|n|V(H)|, (6) where $$n = |V(G)|$$. An important special case is where $$H$$ is a “$$k$$-star,” a graph with $$k$$ edges, all with a common vertex, for which we use the notation $$\tau_k(G)$$. In particular $$\tau_1(G)$$, which we also denote by $$\epsilon(G)$$, is the edge density of $$G$$. For $$\alpha > 0$$ and $${\bar\tau}=(\epsilon,\tau_H)$$ define $$\displaystyle Z^{n,\alpha}_{{\bar\tau}}$$ to be the number of graphs $$G$$ on $$n$$ vertices with densities satisfying   ϵ(G)∈(ϵ−α,ϵ+α), τH(G)∈(τH−α,τH+α). (7) Define the (constrained) entropy$$s_{{\bar\tau}}$$ to be the exponential rate of growth of $$Z^{n,\alpha}_{{\bar\tau}}$$ as a function of $$n$$:   sτ¯=limα↘0limn→∞ln⁡(Zτ¯n,α)n2. (8) The double limit defining the entropy $$s_{{\bar\tau}}$$ is known to exist [15]. To analyze it we make use of a variational characterization of $$s_{{\bar\tau}}$$, and for this we need further notation to analyze limits of graphs as $$n\to \infty$$. (This work was recently developed in [2, 3, 7–9]; see also the recent book [10].) The (symmetric) adjacency matrices of graphs on $$n$$ vertices are replaced, in this formalism, by symmetric, measurable functions $$g:[0,1]^2\to[0,1]$$; the former are recovered by using a partition of $$[0,1]$$ into $$n$$ consecutive subintervals. The functions $$g$$ are called graphons. For a graphon $$g$$ define the degree function$$d(x)$$ to be $$d(x)=\int^1_0 g(x,y){\rm d}y$$. The $$k$$-star density of $$g$$, $$\tau_k(g)$$, then takes the simple form   τk(g)=∫01d(x)kdx. (9) For any fixed graph $$H$$, the $$H$$-density $$\tau_H$$ of $$g$$ can be similarly expressed as an integral of a product of factors $$g(x_i,x_j)$$. The following is Theorem 4.1 in [16]: Theorem 2.1 (The Variational Principle). For any values $${\bar\tau}={\bar\tau}(g) := (\epsilon, \tau_H)$$ in the phase space we have $$s_{{\bar\tau}} = \max [s(g)]$$, where the entropy is maximized over all graphons $$g$$ with $${\bar\tau}(g)={\bar\tau}$$. □ (Instead of using $$s(g)$$, some authors use the rate function$$I(g):= -s(g)$$, and then minimize $$I$$.) The existence of a maximizing graphon $$g=g_{{\bar\tau}}$$ for any constraint $${\bar\tau}(g)={\bar\tau}$$ was proven in [15], again adapting a proof in [5]. If the densities are that of edges and $$k$$-star subgraphs we refer to this maximization problem as a star model, though we emphasize that the result applies much more generally [15, 16]. We consider two graphs equivalent if they are obtained from one another by relabeling the vertices. For graphons, the analogous operation is applying a measure-preserving map $$\psi$$ of $$[0,1]$$ into itself, replacing $$g(x,y)$$ with $$g(\psi(x),\psi(y))$$, see [10]. The equivalence classes of graphons under relabeling are called reduced graphons, and graphons are equivalent if and only if they have the same subgraph densities for all possible finite subgraphs [10]. In the remaining sections of the paper, whenever we claim that a graphon has a property (e.g., monotonicity in $$x$$ and $$y$$, or uniqueness as an entropy maximizer), the caveat “up to relabeling” is implied. The graphons which maximize the constrained entropy can tell us what “most” or “typical” large constrained graphs are like: if $$g_{{\bar\tau}}$$ is the only reduced graphon maximizing $$s(g)$$ with $${\bar\tau}(g)={\bar\tau}$$, then as the number $$n$$ of vertices diverges and $$\alpha_n\to 0$$, exponentially most graphs with densities $${\bar\tau}_i(G)\in (\tau_i-\alpha_n,\tau_i+\alpha_n)$$ will have reduced graphon close to $$g_{{\bar\tau}}$$ [15]. This is based on large deviations from [5]. We emphasize that this interpretation requires that the maximizer be unique; this has been difficult to prove in most cases of interest and is an important focus of this work. A graphon $$g$$ is called $$M$$-podal if there is decomposition of $$[0,1]$$ into $$M$$ intervals (“vertex clusters”) $$C_j,\ j=1,2,\ldots,M$$, and $$M(M+1)/2$$ constants $$p_{ij}$$ such that $$g(x,y)=p_{ij}$$ if $$(x,y)\in C_i\times C_j$$ (and $$p_{ji}=p_{ij}$$). We denote the length of $$C_j$$ by $$c_j$$. 3 Technical properties of star models For each star model, all entropy-maximizing graphons are multipodal with a fixed upper bound on the number of clusters, also called the podality [6]. (The term multi/bipartite is sometimes used instead of multipodal in the literature.) For any fixed podality $$M$$, an $$M$$-podal graphon is described by $$N=M(M+3)/2$$ parameters, namely the values $$p_{ij}$$ ($$1\le i\le j\le M$$) and the widths $$c_i$$ ($$1\le i\le M$$) of the clusters. When it does not cause confusion, we will use $$g$$ to denote the vector   (p11,…,p1M,p22,…,p2M,…,…,pM−1M−1,pM−1M,pMM,c1,…,cM), (10) which contains all these parameters. The problem of optimizing the graphon then reduces to a finite-dimensional calculus problem. To be precise, let us recall that for an $$M$$-podal graphon, we have   ϵ(g)=∑1≤i,j≤Mcicjpij,  τk(g)=∑1≤i≤Mcidik,  s(g)=∑1≤i,j≤McicjS0(pij), (11) where $$d_i = \sum_{1\le j\le M} c_j p_{ij}$$ is the value of the degree function on the $$i$$th cluster. The problem of searching for entropy-maximizing graphons with fixed edge density $$\epsilon$$ and $$k$$-star density $$\tau_k$$ can now be formulated as   maxg∈[0,1]Ns(g),subject to:ϵ(g)−ϵ=0,τk(g)−τ=0,C(g)=1, (12) where $$C(g) = \sum_{1\le j\le M} c_j$$. The following result says that the maximization problem (12) can be solved using the method of Lagrange multipliers. The existence of finite Lagrange multipliers was previously established in [6], treating the space of graphons as a linear space of functions $$[0,1]^2 \to [0,1]$$, intuitively considering perturbations of graphons localized about points in $$[0,1]^2$$. For star models we may restrict to $$M$$-podal graphons, as noted above, and thus consider perturbations in the relevant parameters $$p_{ij}$$ and $$c_j$$. Lemma 3.1. Let $$g$$ be a local maximizer in (12). Then for constraints $$\epsilon,\tau$$ off the ER curve, there exist unique $$\alpha,\beta,\gamma\in\mathbb R$$ such that   ∇s(g)−α∇ϵ(g)−β∇τk(g)−γ∇C(g)=0. (13) □ We do not include the proof, which follows easily from that of Lemma 3.5 in [6]. We also note that one can remove the variable $$c_M$$ and the constraint $$C(g) =1$$, eliminating the multiplier $$\gamma$$. For convenience later, we now write down the exact form of the Euler–Lagrange equation (13). We first verify that   ∂ϵ∂pij=Aij,∂ϵ∂ci=2∑j=1Mcjpij=2di, (14)  ∂τk∂pij=k2(dik−1+djk−1)Aij,∂τk∂ci=dik+k∑j=1Mcjdjk−1pij, (15)  ∂C∂pij=0,∂C∂ci=1, (16)  ∂s∂pij=S0′(pij)Aij,∂s∂ci=2∑j=1McjS0(pij), (17) where $$A_{ij}= 2 c_i c_j$$ if $$i\neq j$$ and $$A_{ij}= c_i^2$$ if $$i=j$$. We can then write down (13) explicitly as   S0′(pij)=α+βk2(dik−1+djk−1),1≤i≤j≤M (18)  2∑j=1cjS0(pij)=2αdi+β(dik+k∑j=1Mcjdjk−1pij)+γ,1≤i≤M. (19) These Euler–Lagrange equations, together with the constraints,   ϵ(g)−ϵ=0,τk(g)−τ=0,C(g)−1=0, (20) are the optimality conditions for the maximization problem (12). In principle, we can solve this system to find the maximizer $$g$$. Next we consider the significance of the Lagrange multipliers $$\alpha$$ and $$\beta$$. Suppose that $$g_0$$ is the unique entropy maximizer for $$\epsilon=\epsilon_0$$ and $$\tau=\tau_0$$. Then any sequence of graphons that maximize entropy for $$(\epsilon,\tau)$$ approaching $$(\epsilon_0,\tau_0)$$ must approach $$g_0$$: this follows from continuity of the entropy on the space of $$M$$-podal graphons and the fact that we can perturb $$g_0$$ to any nearby $$(\epsilon,\tau)$$ by changing some $$p_{ij}$$’s (as follows easily from (11)). But if $$g= g_0 + \delta g$$, then   s(g)=s(g0)+∇s(g0)⋅δg+O(‖δg‖2), (21) where $$\|\delta g\|$$ denotes the norm of $$\delta g$$ as a vector in $$\mathbb R^N$$. However, from (13) we have   s(g)=s(g0)+α∇ϵ(g0)⋅δg+β∇τ(g0)⋅δg+O(‖δg‖2)=s(g0)+α(ϵ−ϵ0)+β(τ−τ0)+O(‖δg‖2). (22) Thus $$\partial s_{(\epsilon,\tau)}/\partial \epsilon = \alpha$$ and $$\partial s_{(\epsilon,\tau)}/\partial \tau = \beta$$. If $$g_0$$ is not a unique entropy maximizer, then a similar argument shows that we have 1-sided (directional) derivatives: Lemma 3.2. The function $$s_{(\epsilon,\tau)}$$ admits directional derivatives in all directions at all points $$(\epsilon,\tau)$$ in the interior of the phase space. □ Proof Suppose that there are multiple entropy-maximizing graphons at a particular $$(\epsilon_0,\tau_0)$$. Given a vector $$v = (v_\epsilon, v_\tau) \in \mathbb R^2$$, we wish to compute $$s_{(\epsilon_0 + t v_E, \tau_0 + t v_\tau)}-s_{(\epsilon_0, \tau_0)}$$ to first order in $$t$$ for $$t$$ small and positive. As $$t \to 0$$, the optimizing graphon $$g$$ must approach an entropy-maximizing graphon $$g_0$$ with $$\epsilon(g_0)=\epsilon_0$$ and $$\tau(g_0)=\tau_0$$. But then, by (21), $$s(g)-s(g_0) = t(\alpha v_\epsilon + \beta v_\tau)+O(t^2)$$, where $$\alpha$$ and $$\beta$$ depend on the choice $$g_0$$. Among the choices for $$g_0$$, there is one (or more) that maximizes $$\alpha v_E + \beta v_\tau$$, and our directional derivative is that maximal value of $$\alpha v_E + \beta v_\tau$$. ■ Existence of directional derivatives implies the fundamental theorem of calculus, so for fixed $$\epsilon$$ we can write   s(ϵ,τ)=s(ϵ,ϵk)+∫ϵkτβ(gmax(ϵ,τ))dτ, (23) where $$g_{\rm max}(\epsilon,\tau)$$ is the entropy-maximizing graphon at $$(\epsilon,\tau)$$ that maximizes its right derivative (with respect to $$\tau$$). Before proving Theorem 1.1 for $$k$$-stars, we record some properties of the function $$\psi_k(\epsilon, \tilde \epsilon)$$ of (4) and its critical points. Theorem 3.3. For fixed $$k$$ and $$\epsilon$$, there is a unique solution to $$\partial \psi_k'(\epsilon,\tilde \epsilon)/\partial \tilde \epsilon=0$$, which we denote $$\tilde \epsilon=\zeta_k(\epsilon)$$. The function $$\zeta_k$$ is strictly decreasing, with nowhere-vanishing derivative and with fixed point at $$\epsilon=(k-1)/k$$. Furthermore, $$\zeta_k$$ is an involution: $$\tilde \epsilon = \zeta_k(\epsilon)$$ if and only if $$\epsilon = \zeta_k(\tilde \epsilon)$$. Moreover, if $$\zeta_k(\epsilon) \ne \epsilon$$, then $$\psi_k(\epsilon,\epsilon) < \psi_k(\epsilon, \zeta_k(\epsilon))$$ and $$\psi_k(\zeta_k(\epsilon), \zeta_k(\epsilon)) < \psi_k(E, \zeta_k(\epsilon)).$$ □ Even though the proof is elementary we will need some parts of it later, so we give it in the Appendix. Theorem 1.1 for $$k$$-stars Theorem 4.1. Let $$H$$ be the $$k$$-star graph and suppose that $$\epsilon \ne (k-1)/k$$. Then there exists a number $$\tau_0 > \epsilon^k$$ such that for all $$\tau \in (\epsilon^k ,\tau_0)$$, the entropy-optimizing graphon at $$(\epsilon,\tau)$$ is unique and bipodal. The parameters $$(p_{11}, p_{12}, p_{22},c)$$ are analytic functions of $$\epsilon$$ and $$\tau$$. As $$\tau$$ approaches $$\epsilon^k$$ from above, $$p_{22} \to \epsilon$$, $$p_{12} \to \zeta_k(\epsilon)$$, $$p_{11}$$ satisfies $$S_0'(p_{11}) = 2S_0'(p_{12}) - S_0'(p_{22}),$$ and $$c=O(\tau-\epsilon^k)$$. □ Proof The entropy-maximizing graphon for each $$(\epsilon,\tau)$$ is multipodal [6], and the parameters $$\{c_j\}$$ and $$\{p_{ij}\}$$ must satisfy the optimality conditions (18) and (19). The first step of the proof is to estimate the terms in the optimality equations to within $$o(1)$$. This will determine the solutions to within $$o(1)$$ and demonstrate that our optimizing graphon is close to bipodal of the desired form. The second step, based on a separate argument, will show that the optimizer is exactly bipodal. The third step shows that the optimizer is in fact unique. In doing our asymptotic analysis, our small parameter is $$\Delta \tau := \tau - \epsilon^k$$. However, we claim that   Δτ≍‖Δg‖2≍|Δs|, (24) (the notation $$A\asymp B$$ means $$A=O(B)$$ and $$B=O(A)$$) where $$\Delta s := s(g) - S_0(\epsilon)$$ and $$\|\Delta g\|^2$$ is the squared $$L^2$$ norm of $$\Delta g := g - g_0$$, where $$g_0(x,y) = \epsilon$$ (here $$g$$ denotes the graphon as a function $$[0,1]^2 \to [0,1]$$, not a vector of multipodal parameters). However, $$\Delta \tau=O(\|\Delta g\|^2)$$ (adapting the argument of [16], Theorem 3.1 to arbitrary graphs), and $$\|\Delta g \|^2=O(|\Delta s|)$$ (by equation (16) of [16]). By considering a bipodal graphon with $$p_{11}=p_{12}=\zeta_k(\epsilon)$$ and $$p_{22}$$ close to $$\epsilon$$, we see that $$|\Delta s|=O(\Delta \tau)$$. This shows (24). In the rest of the proof, unless otherwise specified, by terms such as “close to” and “small” we mean within $$o(1)$$ as $$\Delta\tau\to 0$$. Order the $$M$$ vertex clusters so that the largest cluster is the last cluster (of length $$c_M$$). By subtracting the equation (19) for $$c_M$$ from the equations for $$c_j$$, we eliminate $$\gamma$$ from our equations:   S0′(pij)=α+k2β(dik−1+djk−1)2∑j=1Mcj(S0(pij)−S0(pMj))=2α(di−dM)+β(dik−dMk+k∑j=1Mcjdjk−1(pij−pMj)). (25) Step 1. Since   ‖Δg‖2=∬(g(x,y)−ϵ)2dxdy=∑i,jcicj(pij−ϵ)2, the $$i$$th cluster must either have $$d_i=\sum_j c_jp_{ij}$$ close to $$\epsilon$$ (i.e., within $$o(1)$$), or $$c_i$$ close to zero, or both. We call a cluster Type I if $$c_i$$ is close to 0 and Type II if $$d_i$$ is close to $$\epsilon$$. (If a cluster meets both conditions, we arbitrarily throw it into one camp or the other.) The first equation in (25) implies that, for fixed $$i$$, the values of $$p_{ij}$$ are nearly constant for all $$j$$ of Type II. Since the $$c_j$$’s are small for $$j$$ of Type I, this common value must be close to $$d_i$$. Our equations then simplify to   S0′(di)=α+k2β(dik−1+ϵk−1)+o(1),S0(di)−S0(ϵ)=α(di−ϵ)+β[dik−ϵk+kϵk−1(di−ϵ)]+o(1). (26) Since $$d_M = \epsilon + o(1)$$, the first of those equations applied to $$d_M$$ implies that   α+kϵk−1β=S0′(ϵ)+o(1). (27) We can thus replace $$\alpha$$ with $$S_0'(\epsilon) - k\epsilon^{k-1} \beta + o(1)$$ throughout. This gives the equations:   2(S0′(di)−S0′(ϵ))=kβ(dik−1−ϵk−1)+o(1),2[S0(di)−S0(ϵ)−S0′(ϵ)(di−ϵ)]=β[dik−ϵk−kϵk−1(di−ϵ)]+o(1). (28) There are two solutions to these equations. One is simply to have $$d_i=\epsilon+o(1)$$, in which case both sides of both equations are $$o(1)$$. Indeed, we already know that there must be clusters with $$d_i$$ close to $$\epsilon$$. In looking for solutions with $$d_i$$ not close to $$\epsilon$$, the second equation says that $$\beta = \psi_k(\epsilon,d_i)+o(1)$$. In this case we can divide the first equation by the second to eliminate $$\beta$$. This gives an equation that is algebraically equivalent to $$\partial \psi_k(\epsilon,d_i)/ \partial d_i=o(1)$$. In other words, $$d_i$$ must be tending to the unique critical point $$\zeta_k(\epsilon)$$ of $$\psi_k$$, and $$\beta$$ must be tending to the critical value. In fact, the critical point is a maximum of $$\psi_k$$. Remember that $$s_{(\epsilon,\tau)} = s_{(\epsilon,\epsilon^k)} + \int_{\epsilon^k}^{\tau} \beta$$ from (23). Since the computation of $$\beta$$ is independent of $$\Delta \tau$$ (to lowest order), we have $$s_{(\epsilon,\tau)}-s_{(\epsilon,\epsilon^k)} = \beta \Delta \tau + o(\Delta \tau)$$, so maximizing $$\beta$$ is tantamount to maximizing $$s$$. Step 2. We have shown so far that the optimizing graphon is multipodal, with all of the clusters either having $$d_i$$ close to $$\zeta_k(\epsilon)$$ or close to $$\epsilon$$. Furthermore, the clusters with $$d_i$$ close to $$\zeta_k(\epsilon)$$ have total size $$\sum c_i = o(1)$$. We refine our definitions of Types I and II so that all the clusters with $$d_i$$ close to $$\zeta_k(\epsilon)$$ are Type I and all the clusters with $$d_i$$ close to $$\epsilon$$ are Type II. We order the clusters so that the Type I clusters come before Type II, thereby dividing $$[0,1]^2$$ into $$I\times I$$, $$I \times II$$, $$II \times I$$, and $$II \times II$$ quadrants. Since the value of $$g(x,y)$$ is determined by $$d(x)$$ and $$d(y)$$ (and $$\alpha$$ and $$\beta$$), this means that the optimizing graphon is nearly constant (i.e., with pointwise small fluctuations) on each quadrant. Let $$g_b$$ be the bipodal graphon obtained by averaging $$g$$ over each quadrant. That is, $$c$$ is the total size of all the Type I clusters, and the parameters $$p_{11}$$, $$p_{12}$$, and $$p_{22}$$ are chosen such that $$0=\iint_{I\times I}(g(x,y)-p_{11}){\rm d}x\, {\rm d}y= \iint_{I\times II}(g(x,y)-p_{12}) {\rm d}x\, {\rm d}y =\iint_{II\times II} (g(x,y)-p_{22}){\rm d}x\, {\rm d}y$$. Let $$\Delta g_f = g-g_b$$. (The $$f$$ stands for “further.”) We will show that having $$\Delta g_f$$ non-zero is an inefficient way to increase $$\tau$$, that is, $$(s(g)-s(g_b))/(\tau(g)-\tau(g_b))$$ is less than $$\beta$$. By the first equation in (25), $$S_0'(g(x,y))$$ is the sum of a function of $$x$$ and the same function of $$y$$. This means that there is a function $$F(x)$$ on $$[0,1]$$, with $$\int_I F(x) {\rm d}x = \int_{II} F(x) {\rm d}x =0$$, such that on each quadrant   S0′(g(x,y))=constant+F(x)+F(y). (29) Furthermore, $$F(x)$$ is pointwise small (meaning it approaches 0 pointwise at $$\tau \to \epsilon^k$$), so we can write the Taylor series   S0′(g(x,y))=S0′(gb(x,y)+Δgf(x,y))=S0′(gb(x,y))+S0″(gb(x,y))Δgf(x,y)+O(Δgf(x,y)2). (30) Since $$S_0'(g(x,y))$$ is not a linear function of $$g(x,y)$$, the constant in (29) is not exactly $$S_0'(g_b(x,y))$$. The correction to $$S_0'(g_b(x,y))$$ is obtained by integrating higher-order terms in the Taylor series (30) over the quadrant, and so is controlled by the squared $$L^2$$ norm of $$F$$. Using (29), on each quadrant we can solve (30) for $$\Delta g_f(x,y)$$ as   Δgf(x,y)={F(x)+F(y)S0″(p11)+O(F2) On I×IF(x)+F(y)S0″(p12)+O(F2) On I×II and II×IF(x)+F(y)S0″(p22)+O(F2) On II×II, (31) where $$O(F^2)$$ is shorthand for terms that are bounded by quadratic functions of $$F(x)$$ and $$F(y)$$ and a quadratic function of the $$L^2$$ norm of $$F$$. Corrections involving $$F(x)$$ and $$F(y)$$ come from higher terms in the Taylor series of $$S_0'(g(x,y))$$, while corrections involving the $$L^2$$ norm come from the average value of $$S_0'(g(x,y))$$ on a quadrant being slightly different from $$S_0'(p_{ij})$$. The resulting changes $$\Delta d_f$$ in the degree function $$d(x)$$ from $$g_b$$ to $$g_b + \Delta g_f$$ are then:   Δdf(x)={F(x)(cS″(p11)+1−cS″(p12))+O(F2)x∈IF(x)(cS″(p12)+1−cS″(p22))+O(F2)x∈II. (32) Next we compute $$\Delta \tau_f:= \tau(g)-\tau(g_b)$$ and $$\Delta s_f := s(g)-s(g_b)$$ to lowest order in $$F$$. If we expand $$\Delta \tau_f$$ and $$\Delta s_f$$ in powers of $$\Delta g_f$$, the linear terms vanish exactly, because $$\iint \Delta g_f$$ is exactly zero on each quadrant. For the quadratic term, we approximate $$\Delta g_f$$ using (31). The resulting errors in the quadratic term, and all of the neglected higher-order terms, are then bounded by the sup norm of $$F$$ times the squared $$L^2$$ norm, which we denote $$O(F^3)$$:   Δsf=12∬S0″(gb(x,y))Δgf(x,y)2+O(Δgf3)dxdy=∬I×IF(x)2+F(y)22S″(p11)+2∬I×IIF(x)2+F(y)22S″(p12)+∬II×IIF(x)2+F(y)22S″(p22)+O(F3)=(cS″(p11)+1−cS″(p12))∫IF(x)2dx+(cS″(p12)+1−cS″(p22))∫IIF(x)2dx+O(F3)Δτf=∫01k(k−1)d(x)k−22(Δd(x))2+O(Δd(x)3)dx=k(k−1)d1k−22(cS″(p11)+1−cS″(p12))2∫IF(x)2dx+k(k−1)d2k−22(cS″(p12)+1−cS″(p22))2∫IIF(x)2dx+O(F3), (33) where $$d_1 = cp_{11} + (1-c)p_{12}$$ and $$d_2 = c p_{12} + (1-c)p_{22}$$ are the values of the degree function for the bipodal graphon $$g_b$$. The ratio $$\Delta s_f/\Delta \tau_f$$ is then a weighted average of   2k(k−1)d1k−2(cS″(p11)+1−cS″(p12))−1 (34) and   2k(k−1)d2k−2(cS″(p12)+1−cS″(p22))−1 (35) with relative weights   d1k−2(cS″(p11)+1−cS″(p12))2∫IF(x)2dx and d2k−2(cS″(p12)+1−cS″(p22))2∫IIF(x)2dx. (36) As $$\tau \to \epsilon^k$$ (and $$c \to 0$$ and $$F \to 0$$), the first ratio being averaged approaches $$\psi_k(\zeta_k(\epsilon), \zeta_k(\epsilon))$$ and the second approaches $$\psi_k(\epsilon, \epsilon)$$. However, both of these numbers are smaller than $$\beta = \psi_k(\epsilon, \zeta_k(\epsilon))$$. We have already established that $${\rm d}s/{\rm d}\tau = \beta + o(1)$$ for changes in $$c$$ that preserve the bipodal structure. This means that, for sufficiently small $$c$$, if we perturb a bipodal graphon to maximize $$s$$ for fixed additional change $$\Delta \tau_f$$, it is better to perturb $$c$$ than to make $$F$$ non-zero. Thus $$F(x)$$ is identically zero, implying that the optimizing graphon is exactly bipodal. Step 3. We have established that the minimizing graphon is bipodal, with $$p_{22} = \epsilon + o(1)$$ and $$p_{12} = \zeta_k(\epsilon) + o(1)$$ . We now show that the form of this graphon is unique. Since the graphon is bipodal, we consider the exact optimality equations for bipodal graphons. The argument then reduces to showing that a certain four-dimensional Jacobian determinant is non-zero. After eliminating $$\gamma$$, we have   S0′(p11)=α+kβd1k−1,S0′(p12)=α+k2β(d1k−1+d2k−1),S0′(p22)=α+kβd2k−1,∂S∂c=α∂ϵ∂c+β∂τ∂c,ϵ=ϵ0,τ=τ0. (37) We use the second and third equations to solve for $$\alpha$$ and $$\beta$$:   α=−S0′(p22)(d2k−1+d1k−1)+2d2k−1S0′(p12)d2k−1−d1k−1,β=2kS0′(p22)−S0′(p12)d2k−1−d1k−1. (38) Plugging this into the first equation then gives   S0′(p11)−2S0′(p12)+S0′(p22)=0. (39) This leaves four equations in four unknowns, which we write as   (f1,f2,f3,f4)=(0,0,ϵ0,τ0), (40) where   f1(p11,p12,p22,c)=S0′(p11)−2S0′(p12)+S0′(p22),f2(p11,p12,p22,c)=∂s∂c−α∂ϵ∂c−β∂τ∂c,f3(p11,p12,p22,c)=c2p11+2c(1−c)p12+(1−c)2p22,f4(p11,p12,p22,c)=cd1k+(1−c)d2k, (41) and where $$\alpha$$ and $$\beta$$ are given by (38). We know a solution when $$\tau_0 = \epsilon_0^k$$, namely $$p_{22}=\epsilon_0$$, $$p_{12} = \zeta_k(\epsilon_0)$$, $$c=0,$$ and $$p_{11} = S_0'{}^{-1}(2S_0'[\zeta_k(\epsilon_0)] - S_0'(\epsilon_0))$$. We will show that $$d f$$ has non-zero determinant at this point. By the inverse function theorem, this implies that, when $$\tau_0$$ is close to $$\epsilon_0^k$$, there is only one value of $$(p_{11},p_{12}, p_{22}, c)$$ close to this point for which $$f(p_{11},p_{12}, p_{22}, c) = (0,0,\epsilon_0, \tau_0)$$. Moreover, the parameters $$(p_{11}, p_{12}, p_{22}, c)$$ depend analytically on $$\epsilon_0$$ and $$\tau_0$$. This will complete the proof. The derivatives of $$f_1$$, $$f_3$$, and $$f_4$$ are:   df1(p11,p12,p22,c)=(S0″(p11),−2S0″(p12),S0″(p22),0),df3(p11,p12,p22,c)=(c2,2c(1−c),(1−c)2,2cp11+2(1−2c)p12−2(1−c)p22),df4(p11,p12,p22,c)=(kc2d1k−1,kc(1−c)(d1k−1+d2k−1),k(1−c)2d2k−1,d1k−d2k+kcd1k−1(p11−p12)+k(1−c)d2k−1(p12−p22)). (42) Evaluating at $$c=0$$ gives   df1(p11,p12,p22,0)=(S0″(p11),−2S0″(p12),S0″(p22),0),df3(p11,p12,p22,0)=(0,0,1,2p12−2p22),df4(p11,p12,p22,0)=(0,0,kp22k−1,p12k−p22k+kp22k−1(p12−p22)). (43) $$df$$ is block triangular, with $$2 \times 2$$ blocks. The lower right block has determinant $$p_{12}^k -p_{22}^k - kp_{22}^{k-1}(p_{12}-p_{22}) = D(p_{22},p_{12})$$, which is non-zero when $$p_{12} \ne p_{22}$$, that is, when $$\epsilon_0 \ne (k-1)/k$$. When $$c=0$$, $$d_1$$ and $$d_2$$ are independent of $$p_{11}$$, as are $$\frac{\partial \epsilon}{\partial c}$$ and $$\frac{\partial \tau}{\partial c}$$, so $$\frac{\partial f_2}{\partial p_{11}} = 0$$. As a result,   det(df)=S0″(p11)D(p22,p12)∂f2∂p12. (44) Since $$S_0''(p_{11})$$ is never zero, and since $$D(p_{22},p_{12})$$ only vanishes when $$p_{12}=p_{22}$$ (i.e., at $$\epsilon_0=(k-1)/k$$), we need only show that $$\frac{\partial f_2}{\partial p_{12}} \ne 0$$. We compute   ∂β∂p12=2k(p22k−1−p12k−1)(−S0″(p12))−(S0′(p22)−S0′(p12))(−(k−1)p12k−2)(p22k−1−p12k−1)2=2k(k−1)p12k−2(S0′(p22)−S0′(p12))−(p22k−1−p12k−1)S0″(p12)(p22k−1−p12k−1)2 (45) at $$c = 0$$. Since $$\alpha = S_0'(p_{22}) - k \beta d_2^{k-1}$$,   ∂α∂p12=−kd2k−1∂β∂p12−k(k−1)βd2k−2∂d2∂p12=−kd2k−1∂β∂p12−k(k−1)d2k−2cβ⇒−kp22k−1∂β∂p12, (46) where $$\Rightarrow$$ denotes a limit as $$c \to 0$$. We also compute   ∂2S∂c∂p12=2(1−2c)S0′(p12)⇒2S0′(p12)∂2ϵ∂c∂p12=2(1−2c)⇒2∂2τ∂c∂p12=k(1−2c)(d1k−1+d2k−1)⇒k(p12k−1+p22k−1). (47) Finally we combine everything:   ∂f2∂p12|c=0=∂2S∂c∂p12−∂α∂p12∂ϵ∂c−α∂2ϵ∂c∂p12−∂β∂p12∂τ∂c−β∂2τ∂c∂p12=2S0′(p12)−2α−βk(p12k−1+p22k−1)+(kp22k−1(2p12−2p22)−(p12k−p22k+kp22k−1(p12−p22)))∂β∂p12. (48) The terms not involving $$\partial \beta/\partial p_{12}$$ all cancel, by the second equation of (37), and we are left with   ∂f2∂p12=−D(p12,p22)∂β∂p12. (49) Finally, we need to show that $$\partial \beta/\partial p_{12} \ne 0$$. Since $$p_{12}$$ maximizes $$\psi_k(p_{22},p_{12})$$ for fixed $$p_{22}$$, we must have (referring to the notation of the proof of Theorem 3.3) $$(N/D)'=0$$, or equivalently $$N'/D' = N/D$$, where we write $$\psi_k = N/D$$, as above. But $$\beta = N'/D'$$. If $$\partial \beta/\partial p_{12}$$ were equal to zero, then we would have $$N''/D'' = N'/D'$$. But we have previously shown that it is impossible to simultaneously have $$N/D = N'/D' = N''/D''$$, except at $$p_{12} = p_{22} = (k-1)/k$$, so $$\partial \beta/\partial p_{12}$$ must be non-zero whenever $$\epsilon_0 \ne (k-1)/k$$. This makes $$\det(df)$$ non-zero at $$(p_{11},\zeta_k(\epsilon_0), \epsilon_0,0)$$, so the solutions near this point are unique and analytic in $$(\epsilon,\tau)$$. ■ 5 Theorem 1.1 for $$k$$-starlike graphs Now suppose that $$H$$ is a $$k$$-starlike graph with $$\ell$$ edges, and with $$n_k$$ vertices of degree $$k$$, and let $$\tau$$ be the density of $$H$$ and $$\tau_k$$ be the density of $$k$$-stars. Our first result relates $$\Delta \tau := \tau - \epsilon^\ell$$ to $$\Delta \tau_k := \tau_k-\epsilon^k$$. Lemma 5.1. If $$g$$ is an entropy-maximizing graphon for $$(\epsilon,\tau)$$ with $$\tau > \epsilon^\ell$$, then $$\Delta \tau = n_k \epsilon^{\ell-k} \Delta \tau_k + O(\Delta \tau_k^{3/2})$$. □ Proof Writing $$g(x,y) = \epsilon + \Delta g(x,y)$$, we have   τ=∫dx∏g(xi,xj)=∫dx∏(ϵ+Δg(xi,xj)), (50) where there is a variable $$x_i$$ for each vertex of $$H$$ and the product is over all edges in $$H$$. Expanding the product in the integrand, we get a sum of terms: The leading order term $$\epsilon^\ell$$. Terms with one factor of $$\Delta g$$. These integrate to zero, since $$\iint \Delta g(x,y){\rm d}x\, {\rm d}y = \Delta \epsilon = 0$$. Terms with two or more factors of $$\Delta g$$, all coming from edges that share a fixed vertex of degree $$k$$. Up to an overall power of $$\epsilon^{\ell-k}$$, these are identical to the terms of order 2 and higher in $$\Delta g$$ in the expansion of $$\Delta \tau_k$$. As such, they add up to $$\epsilon^{\ell-k} \Delta \tau_k$$. Summing over the vertices of $$H$$ then gives $$n_k \epsilon^{\ell-k} \Delta \tau_k$$. Terms with two or more factors of $$\Delta g$$, corresponding to edges that do not all share a vertex. For each such term, let $$\{ e_i \}$$ denote the edges corresponding to factors of $$\Delta g$$. We classify these further into three sub-cases: If one of the $$e_i$$’s is disconnected from the rest, then the term is identically zero, since $$\iint \Delta g(x,y){\rm d}x\,{\rm d}y=0$$. If $$\{ e_i \}$$ consists of two or more connected components (each with at least two edges), then the term is a power of $$\epsilon$$ times the product of integrals, one for each connected component. However, each such integral is $$O(\|\Delta g\|^2)$$, so the term is $$O(\|\Delta g\|^4)$$. If there is a single connected component whose edges do not all share a vertex, then $$\{e_i\}$$ must contain three edges that either form a chain or a triangle. We bound such a term by taking absolute values of the $$\Delta g$$’s for the three edges and replacing all other factors of $$\Delta g$$ by 1. The resulting bound is a power of $$\epsilon$$ times either $$\iiiint |\Delta g(w,x)| |\Delta g(x,y)| |\Delta g(y,z)| {\rm d}w\, {\rm d}x \, {\rm d}y \, {\rm d}z$$ for a chain or $$\iiint |\Delta g(x,y)| |\Delta g(y,z)| |\Delta g(z,x)|{\rm d}x \, {\rm d}y \, {\rm d}z$$ for a triangle, either of which is bounded by that power of $$\epsilon$$ times $$\| \Delta g \|^3$$. Thus $$\Delta \tau = n_k \epsilon^{\ell-k} \Delta \tau_k + O(\|\Delta g\|^3)$$. Since $$g$$ is entropy maximizing, $$\Delta \tau_k$$ goes as $$\|\Delta g\|^2$$, so the error is $$O(\Delta \tau_k^{3/2})$$. ■ 5.1 Proof of Theorem 1.1 Since $$\Delta \tau$$ is proportional to $$\Delta \tau_k$$ (plus small errors), the problem of optimizing $$\Delta s/\Delta \tau$$ is a small perturbation of the problem of optimizing $$\Delta s/ \Delta \tau_k$$, or equivalently optimizing $$\Delta s$$ for fixed $$\Delta \tau_k$$, which we solved in Theorem 4.1. Since that problem has a unique optimizer, any optimizer for $$\Delta s/\Delta \tau$$ must come close to optimizing $$\Delta s/\Delta \tau_k$$, and so must be close to the bipodal graphon derived in Theorem 4.1. We can thus write $$g = g_b + \Delta g_f$$, as in the last steps of the proof of Theorem 4.1, where $$g_b = \epsilon + \Delta g_b$$ is a bipodal graphon with $$p_{22} = \epsilon +o(1)$$ and $$p_{12} = \zeta_k(\epsilon) +o(1)$$ and where $$\Delta g_f$$ is a function that averages to zero on each quadrant of $$g_b$$. We again use the convention that words like “small” and “close to” and “negligible” refer to quantities which tend to zero as $$\Delta\tau:=\tau-\varepsilon^{\ell}$$ tends to zero. A quantity is “nearly constant” if it is constant up to an $$o(1)$$ correction. Lemma 5.2. The function $$\Delta g_f$$ is pointwise small. That is, as $$\tau \to \epsilon^\ell$$, $$\Delta g_f$$ goes to zero in sup-norm. □ Proof of Lemma. Since we are no longer in the setting where the entropy maximizer is proven to be multipodal, we cannot use the equations (25) directly. However, we can still apply the method of Lagrange multipliers to pointwise variations of the graphon. (See [6] for a rigorous justification.) These variational equations are   12ln⁡(1g(x,y)−1)=δsδg(x,y)=α+βδτδg(x,y). (51) We need to compute $$\delta \tau/\delta g$$ and show that it is nearly constant on each quadrant. Since $$\alpha$$ and $$\beta$$ are constants, (51) would then imply that $$g(x,y)$$ is nearly constant on each quadrant, and hence that $$\Delta g_f$$ is pointwise small. Let $$g_0(x,y)\equiv\epsilon$$. Since $$\| \Delta g \|$$ is small (where $$\Delta g =g-g_0 = \Delta g_b + \Delta g_f$$), we can find a small constant $$a=o(1)$$ such that, for all $$x$$ outside a set $$U\subset[0,1]$$ of measure $$a$$, $$\int_0^1 |\Delta g(x,y)|{\rm d}y < a$$. (This set $$U$$ is essentially what we previously called the Type I clusters, but at this stage of the argument we are not assuming a multipodal structure. Rather, we are just using the fact that $$\tau - e^\ell = O(\| \Delta g\|^2)$$.) The functional derivative $$\delta \tau/\delta g(x,y)$$ has a diagrammatic expansion similar to the expansion of $$\tau$$ in (50). For each edge of $$H$$, we get a contribution by deleting the edge, assigning the values $$x$$ and $$y$$ to the endpoints of the edge, and integrating over the values of all other vertices. Since $$U$$ is small, we can estimate $$\delta \tau/\delta g$$ to within $$o(1)$$ by restricting the integral to $$(U^c)^{v-2}$$, where $$v$$ is the number of vertices in $$H$$ and $$U^c$$ is the complement of $$U$$. This implies that terms involving $$\Delta g$$ can only contribute non-negligibly on edges connected to $$x$$ or to $$y$$. Furthermore, they can only contribute non-negligibly when attached to $$x$$ if $$x \in U$$, and can only contribute non-negligibly when attached to $$y$$ if $$y \in U$$. We now begin a bootstrap argument. We will show that $$\delta \tau/\delta g$$ is nearly constant on each quadrant $$U^c\times U^c,U\times U^c, U\times U$$ in turn. This will show that $$g$$ is nearly constant on that quadrant, which will help us prove that $$\delta \tau/\delta g$$ is nearly constant on the next quadrant. The simplest case is when $$x$$ and $$y$$ are both in $$U^c$$. Then the contributions of the terms involving $$\Delta g$$ are negligible, so $$\delta \tau/\delta g(x,y)$$ can be computed, to within a small error, using the approximation $$g \approx g_0$$. But when $$\Delta g$$ is negligible, $$\delta \tau/\delta g(x,y)$$ is nearly independent of $$x$$ and $$y$$. Since $$\delta \tau/\delta g(x,y)$$ is nearly constant on $$U^c \times U^c$$, equation (51) implies that $$g$$ is nearly constant on $$U^c \times U^c$$. Next suppose that $$y \in U^c$$ and $$x \in U$$. Then all contributions from factors of $$\Delta g(z,y)$$ are negligible, so $$\delta \tau/\delta g(x,y)$$ is nearly independent of $$y$$. But then $$g(x,y)$$ is nearly independent of $$y$$, and is nearly equal to $$d(x)$$. The integrals involved in computing $$\delta \tau/\delta g(x,y)$$ are then easily approximated to within $$o(1)$$, using $$g_0 + \Delta g$$ on the edges connected to $$x$$, $$g_0$$ on all other edges, and only integrating over $$(U^c)^{v-2}$$. If the degree of $$x$$ is $$k$$, then the edges connected to $$x$$ contribute $$d(x)^{k-1} e^{\ell-k}$$. Summing over edges, and symmetrizing over the assignment of $$x$$ and $$y$$ to the two endpoints, we obtain the approximation   δτδg(x,y)=knkϵℓ−k2(d(x)k−1+d(y)k−1)+o(1). (52) Up to an overall factor of $$n_k \epsilon^{\ell-k}$$, this is the same functional derivative as for a $$k$$-star. This also applies if $$x \in U^c$$, except that in the latter case $$d(x) \approx \epsilon$$, and also applies if $$x \in U^c$$ and $$y \in U$$. In other words, we can use the approximation (52) in (51) whenever either$$x$$ or $$y$$ (or both) is in $$U^c$$. This implies that the integrated equations (26) apply for all $$x$$ (with $$d_i$$ replaced by $$d(x)$$, and with $$\beta$$ scaled up by $$n_k \epsilon^{\ell-k}$$). Following the exact same argument as in the proof of Theorem 4.1, we obtain that $$d(x)$$ only takes on two possible values (up to $$o(1)$$ errors), namely $$\epsilon$$ and $$\zeta_k(\epsilon)$$. We then define Types I and II points, depending on whether the degree function is close to $$\zeta_k(\epsilon)$$ or $$\epsilon$$, respectively, and can take $$U$$ to be precisely the set of Type I points. Our graphon is then nearly constant on $$U \times U^c$$ and $$U^c \times U$$, as well as on $$U^c \times U^c$$. We still need to show that the graphon is nearly constant on $$U \times U$$. Suppose that $$x$$ and $$y$$ are in $$U$$. Since $$g(x,z)$$ is nearly independent of $$x$$ for $$z$$ in $$U^c$$, and since $$\delta \tau/\delta g(x,y)$$ is computed to within $$o(1)$$ by integrating over $$(U^c)^{v-2}$$, $$\delta \tau/\delta g(x,y)$$ is nearly independent of $$x \in U$$, and likewise nearly independent of $$y \in U$$. But then $$g(x,y)$$ is nearly constant on $$U \times U$$. Note, by the way, that the approximation (52) does not apply on $$U \times U$$; in that case $$\delta \tau/\delta g$$ contains terms with powers of both $$d(x)$$ and $$d(y)$$. However, that approximation is not needed for our proof, since $$U \times U$$ (aka the $$I$$-$$I$$ quadrant) only contributes $$O(c)$$ to the integrated equations (26). ■ Returning to the proof of Theorem 1.1, we need to compare $$s(g_b + \Delta g_f) - s(g_b)$$ to $$\tau(g_b+\Delta g_f)-\tau(g_b)$$. As before, we expand $$\tau(g)$$ as the integral of a polynomial in $$g$$, obtained by assigning $$g_0 + \Delta g_b + \Delta g_f$$ to each edge of $$H$$ and integrating. The difference between $$\tau(g_b + \Delta g_f)$$ and $$\tau(g_b)$$ consists of terms with at least one $$\Delta g_f$$. However, the terms with exactly one $$\Delta g_f$$ are identically zero, since $$g_b$$ is constant on quadrants, and $$\Delta g_f$$ averages to zero on each quadrant. Furthermore, terms for which all of the $$\Delta g_b$$’s and $$\Delta g_f$$’s share a vertex are exactly what we would get from the approximation $$\Delta \tau \approx n_k \epsilon^{\ell-k}\tau_k$$. Any term that distinguishes between $$\Delta \tau$$ and $$n_k \epsilon^{\ell-k} \Delta \tau_k$$ must have at least two $$\Delta g_f$$’s and either a third $$\Delta g_f$$ or a $$\Delta g_b$$, forming either a 3-chain, a triangle, or two connected $$\Delta g_f$$’s and a disconnected $$\Delta g_b$$. Let $$\Delta g_f'(x,y) = |\Delta g_f(x,y)|$$, and let   Δgb′(x,y)={2cx,y∈II,1otherwise. (53) This is conveniently expressed in terms of outer products. Let $$| 1 \rangle \in L^2([0,1])$$ be the constant function 1, and let $$|\omega \rangle$$ be the function   ω(x)={0x<c,1x>c. (54) Then   Δgb′=|1⟩⟨1|−|ω⟩⟨ω|+2c|ω⟩⟨ω|=|1⟩⟨1−ω|+|1−ω⟩⟨ω|+2c|ω⟩⟨ω|. (55) Note that $$|\Delta g_b(x,y)| \le \Delta g_b'(x,y)$$ for all $$x,y \in (0,1)$$. To see this, the only issue is what happens when $$(x,y)$$ is in the $$II$$-$$II$$ quadrant, since otherwise we trivially have $$|\Delta g_b| \le 1$$. Since $$e(g)$$ is fixed, $$(1-c)^2$$ times $$\Delta g_b(x,y)$$ for $$x,y > c$$ equals minus the integral of $$\Delta g_b$$ over the other three quadrants. But the area of those three quadrants is $$2c-c^2 < 2c$$, and the biggest possible value of $$|\Delta g_b|$$ is $$\max(e,1-e)<1$$, so $$\frac{1}{(1-c)^2} \int |\Delta g_b|$$ (integrated over the $$I$$-$$I$$, $$I$$-$$II$$, and $$II$$-$$I$$ quadrants) is strictly less than $$2c+O(c^2)$$, and so is bounded by $$2c$$ for small $$c$$ (note that $$O(c^2)$$ errors are negligible). We obtain upper bounds on the contributions of the relevant terms in the expansion of $$\tau$$ by replacing three $$\Delta g_f(x,y)$$’s and $$\Delta g_b(x,y)$$’s with $$\Delta g_f'(x,y)$$ and $$\Delta g_b'(x,y)$$, respectively, and replacing all other terms with $$1$$. Since all graphons are symmetric, hence Hermitian, their operator norms are bounded by their $$L^2$$ norms, so for any 3-chain   ⟨1|Δg1′Δg2′Δg3′|1⟩≤‖Δg1′‖‖Δg2′‖‖Δg3′‖. (56) Since $$\| \Delta g_b' \|$$ and $$\| \Delta g_f'\|$$ are both $$o(1)$$ (more precisely, $$O(\sqrt{\tau-\epsilon^\ell}))$$, the contribution of any 3-chain is bounded by an $$o(1)$$ constant times $$\| \Delta g_f \|^2$$. As for triangles, $${\rm Tr}(\Delta g_f'^3) \le \| \Delta g_f' \|^{3} = \| \Delta g_f \|^3$$. Finally, we must estimate the trace of $$\Delta g_f' \Delta g_f' \Delta g_b'$$. But this trace is   ⟨1−ω|Δgf′Δgf′|1⟩+⟨ω|Δgf′Δgf′|1−ω⟩+2c⟨ω|Δgf′Δgf′|ω⟩. (57) Since $$\| 1 - \omega\| =\sqrt{c}$$, the total is bounded by $$(2\sqrt{c} + 2c^2) \| \Delta g_f\|^2$$. The upshot is that the ratio of $$s(g_b + \Delta g_f) - s(g_b)$$ and $$\tau(g_b+\Delta g_f)-\tau(g_b)$$ is the same as that computed for $$k$$-stars (up to an overall factor of $$n_k \epsilon^{\ell-k}$$), plus an $$o(1)$$ correction. But that ratio was bounded by a constant $$\beta_0 < \beta$$. Restricting attention to values of $$\tau$$ for which the correction is smaller than $$(\beta-\beta_0)/2$$, we still obtain the result that having a non-zero $$\Delta g_f$$ is a less efficient way of generating additional $$\tau$$ than simply changing $$c$$. Thus the optimizing graphon is exactly bipodal. Once bipodality is established, uniqueness follows exactly as in the proof of Theorem 4.1. The difference between $$\Delta \tau$$ and $$n_k \epsilon^{\ell-k} \Delta \tau_k$$ is of order $$c^{3/2}$$, and so does not affect the linearization of the optimality equations at $$c=0$$. 6 Linear combinations of $$k$$-stars We proved Theorem 1.1 by first showing that $$k$$-star models have the desired behavior, and then showing that, for an arbitrary $$k$$-starlike graph $$H$$, $$\Delta \tau$$ is well-approximated by a multiple of $$\Delta \tau_k$$, so the model with densities of edges and $$H$$ behaves essentially the same as a model with densities of edges and $$k$$-stars. To prove Theorem 1.2, we consider in this section a family of models in which we can prove bipodality and uniqueness of entropy maximizers directly, as we did for $$k$$-stars. In the next section, we will show how to approximate a model with an arbitrary $$H$$ with a model in this family. Let $$h(x) = \sum_{k\ge 1} a_k x^k$$ be a polynomial with non-negative coefficients and degree $$\ge 2$$. Let $$\tau = \sum a_k \tau_k$$, and consider graphs with fixed edge density $$\epsilon$$ and fixed $$\tau$$. In [6] it was proved that the entropy-maximizing graphons in such models are always multipodal. Most of the analysis of $$k$$-star models carries over to positive linear combinations, and so will only be sketched briefly. We will provide complete details where the arguments differ. In analogy to the notation of the proof of Theorem 3.3, let $$\psi(\epsilon, \tilde \epsilon) = N/D$$, where   N(ϵ,ϵ~)=2[S0(ϵ~)−S0(ϵ)−(ϵ~−ϵ)S0′(ϵ)],D(ϵ,ϵ~)=h(ϵ~)−h(ϵ)−(ϵ~−ϵ)h′(ϵ). (58) Since $$h''(x)$$ is positive for $$x>0$$, $$D$$ is only zero when $$\tilde \epsilon=\epsilon$$, and we fill in that removable singularity in $$\psi$$ by defining $$\psi(\epsilon,\epsilon) = 2 S_0''(\epsilon)/h''(\epsilon)$$. Theorem 6.1. For all but finitely many values of $$\epsilon$$, there is a $$\tau_0 > h(\epsilon)$$ such that, for $$\tau \in (h(\epsilon), \tau_0)$$, the entropy-optimizing graphon is bipodal and unique, with data varying analytically with $$\epsilon$$ and $$\tau$$. As $$\tau$$ approaches $$h(\epsilon)$$ from above, $$p_{22} \to \epsilon$$, $$p_{12}$$ approaches a point $$\tilde \epsilon$$ where $$\psi'(\epsilon,\tilde \epsilon)=0$$, $$p_{11}$$ satisfies $$S_0'(p_{11})=2S_0'(p_{12}) - S_0'(p_{22})$$, and $$c \to 0$$ as $$O(\Delta \tau)$$. □ Proof For a multipodal graphon, $$\tau(g) = \sum c_i h(d_i)$$. After eliminating $$\gamma$$, the optimality equations become   S0′(pij)=α+β(h′(di)+h′(dj))/2, (59)  2∑j=1cj(S0(pij)−S0(pMj))=2α(di−dM)+β[h(di)−h(dM)+∑j=1Mcjh′(dj)(pij−pMj)]. (60) As before, we distinguish between Type I clusters that are small and Type II clusters that have $$d_i \approx \epsilon$$. Summing the optimality equations over $$j$$ of Type II, and approximating $$d_j$$ by $$\epsilon$$, we obtain the equations   S0′(di)=α+β(h′(di)+h′(ϵ))/2+o(1), (61)  S0(di)−S0(ϵ)=α(di−ϵ)+β[h(di)−h(ϵ)+h′(ϵ)(di−ϵ)]+o(1). (62) We use the first equation, with $$i=M$$ (a type II cluster), to solve for $$\alpha$$, and plug it into the equations for $$i<M$$ to get   2(S0′(di)−S0′(ϵ))=β(h′(di)−h′(ϵ))+o(1), (63)  2[S0(di)−S0(ϵ)−S0′(ϵ)(di−ϵ)]=β[h(di)−h(ϵ)−h′(ϵ)(di−ϵ)]+o(1). (64) As before in the proof of Theorem 4.1, this implies that either $$d_i \approx \epsilon$$ or that $$\psi(\epsilon, d_i)$$ is maximized with respect to $$d_i$$. Unlike in the $$k$$-star case, it is not true that $$\psi'(\epsilon,\tilde \epsilon)$$ has a unique solution for each $$\epsilon$$. However, it remains true that $$\psi(\epsilon,\tilde \epsilon)$$ has a unique global maximizer (w.r.t. $$\tilde \epsilon$$) for all but finitely many values of $$\epsilon$$. Since the equations defining multiple maxima are analytic, they must be satisfied either for all $$\epsilon$$ or for only finitely many $$\epsilon$$. But it is straightforward to check that there is only one maximizer when $$\epsilon$$ is sufficiently small, since then $$h(\epsilon)$$ and $$h'(\epsilon)$$ are dominated by the lowest order term in the polynomial. Thus, for all but finitely many values of $$\epsilon$$, the values of $$d_i$$ must all either approximate $$\epsilon$$ or the unique value of $$\tilde \epsilon$$ that maximizes $$\psi(\epsilon, \tilde \epsilon)$$. This allows for a re-segregation of the clusters into Type I (with $$d_i$$ close to $$\tilde \epsilon$$) and Type II (with $$d_i$$ close to $$\epsilon$$) and yields a graphon that is approximately bipodal. Step 2 of the proof of Theorem 4.1, proving that the optimizing graphon is exactly bipodal with data of the desired form, then proceeds exactly as before. What remains is showing that the optimizing graphon is unique by linearizing the exact optimality equations for bipodal graphons near $$c=0$$. These equations are:   S0′(p11)=α+βh′(d1),S0′(p12)=α+β(h′(d1)+h′(d2))/2,S0′(p22)=α+βh′(d2),∂S∂c=α∂ϵ∂c+β∂ϵ∂c,ϵ=ϵ0,τ=τ0. (65) Using the second and third equations to eliminate $$\alpha$$ and $$\beta$$ gives:   α=2h′(d2)S0′(p12)−S0′(p22)(h′(d2)+h′(d1))h′(d2)−h′(d1),β=2(S0′(p22)−S0′(p12))h′(d2)−h′(d1). (66) We also have $$\alpha = S_0'(p_{22})-\beta h'(d_2)$$ and $$S_0'(p_{11}) = 2S_0'(p_{12})-S_0'(p_{22})$$. Note that   ∂α∂p12=−βch″(d2)−h′(d2)∂β∂p12⇒−h′(p22)∂β∂p12 (67) as $$c \searrow 0$$. We define $$f(p_{11},p_{12},p_{22},c)=(f_1,f_2,f_3,f_4)$$ as before, with $$f_3=\epsilon$$ and $$f_4=\tau$$, and compute   df3=(c2,2c(1−c),(1−c)2,2cp11+2(1−2c)p12−2(1−c)p22)⇒(0,0,1,2(p12−p22)),df4=(c2h′(d1),c(1−c)(h′(d1)+h′(d2)),(1−c)2h′(d2),h(d1)−h(d2)+ch′(d1)(p11−p12)+h′(d2)(p12−p22))⇒(0,0,h′(p22),h(p12)−h(p22)+h′(p22)(p12−p22)). (68) The lower right block of $$df$$ then gives a contribution of $$h(p_{12})-h(p_{22}) + h'(p_{22}) (p_{12}-p_{22}) - 2h'(p_{22})(p_{12}-p_{22}) = h(p_{12})-h(p_{22}) - h'(p_{22})(p_{12}-p_{22})=D(p_{22},p_{12})$$. As before, $$\frac{\partial f_2}{\partial p_{11}} = 0$$ when $$c=0$$, so $$\det(df) = S_0''(p_{11})(h(p_{11})-h(p_{22}) - h'(p_{22}) (p_{12}-p_{22})) \frac{\partial f_2}{\partial p_{11}}.$$ Now   ∂f2∂p12=∂2S∂c∂p12−α∂2ϵ∂c∂p12−β∂2τ∂c∂p12−∂α∂p12∂ϵ∂c−∂β∂p12∂τ∂c. (69) Since $$\alpha$$ and $$\beta$$ are independent of $$c$$, the first three terms are   ∂∂c(∂S∂p12−α∂ϵ∂p12−β∂τ∂p12)=∂∂c(0)=0, (70) by the second equation of (65). This leaves   ∂f2/∂p12=(h′(p22)(2p12−2p22)−(h(p12)−h(p22)+h′(p22)(p12−p22)))∂β/∂p12. (71) Combining with our earlier results, we have:   det(df)=−S0″(p11)D(p22,p12)2∂β∂p12. (72) The expression $$D(p_{22},p_{12}) = h(p_{12})-h(p_{22}) - h'(p_{22}) (p_{12}-p_{22})$$ has a double root at $$p_{12}=p_{22}$$ and is non-zero elsewhere, thanks to the monotonicity of $$h'$$. As a last step, we consider when $$\frac{\partial \beta}{\partial p_{12}}$$ can be zero. Since $$\beta = N'/D'$$, we are interested in when $$(N'/D')'=0$$. But that is equivalent to having $$N''/D'' = N'/D'$$. Since we already have $$N/D=N'/D'$$, this means that $$\psi''=(N/D)''=0$$. Since we are looking at the value of $$\tilde \epsilon$$ that maximizes $$\psi$$, having $$\psi'=\psi''=0$$ would imply $$\psi'''=0$$ (or else $$\tilde \epsilon$$ would only be a point of inflection, and not a local maximum). But if $$(N/D)'=(N/D)''=(N/D)'''=0$$, then $$N/D=N'/D' = N''/D'' = N'''/D'''$$. Note that $$N''$$, $$N'''$$, $$D''$$, and $$D'''$$ are functions of $$\tilde e$$ only, and are rational functions:   N″=2S0″(e~)=−1e~−11−e~,N‴=2S0‴(e~)=1e~2−1(1−e~)2,D″=h″(e~),D‴=h‴(e~). (73) Setting $$D''N'''=D'''N''$$ gives a polynomial equation for $$\tilde \epsilon$$, which has only finitely many roots. Since the equation $$\psi'=0$$ is symmetric is $$\epsilon$$ and $$\tilde \epsilon$$, $$\tilde \epsilon$$ determines $$\epsilon$$, so there are only finitely many values of $$\epsilon$$ for which $$\frac{\partial \beta}{\partial p_{12}}$$ is zero. In summary, we exclude the finitely many values of $$\epsilon$$ for which $$\psi$$ achieves its maximum more than once, and the finitely many values of $$\epsilon$$ for which $$\frac{\partial \beta}{\partial p_{12}}=0$$. For all other values of $$\epsilon$$, the optimizing graphon is bipodal of the prescribed form and unique. ■ 7 Proof of Theorem 1.2 The proof has three steps. Step 1. Showing that, for fixed $$\epsilon$$, $$\Delta \tau$$ can be approximated by the change in a positive linear combination of $$\tau_k$$’s. Step 2. Defining a set $$B_H \subset (0,1)$$ of “bad values,” determined by analytic equations, such that for all $$\epsilon \not \in B_H$$ and for $$\tau$$ close enough to $$\epsilon^\ell$$, the optimizing graphon is unique and bipodal and of the desired form. Step 3. Showing that $$B_H$$ is finite. Step 1. This is a repetition of the proof of Lemma 5.1. In the expansion of $$\Delta \tau$$, we get a contribution $$n_k \epsilon^{\ell-k} \Delta \tau_k$$ from diagrams where all the edges associated with $$\Delta g$$ are connected to a vertex of degree $$k$$, where $$n_k$$ is the number of vertices of $$H$$ of degree $$k$$. Summing over $$k$$, and bounding the remaining terms by $$O(\| \Delta g\|^3)$$, as before, we have   Δτ=∑knkϵℓ−kΔτk+O(Δτ3/2). (74) Step 2. For fixed $$\epsilon$$, we consider a model whose density is $$\sum_k n_k \epsilon^{\ell-k} \tau_k$$. As long as $$\psi(\epsilon,\tilde \epsilon)$$ for this model achieves its maximum at a unique value of $$\tilde \epsilon$$, and as long as $$\partial \beta/\partial p_{12} \ne 0$$ when $$p_{12}$$ equals this value of $$\tilde\epsilon$$, the proofs of Theorems 1.1 and 6.1 carry over almost verbatim. That is, the model problem has a unique bipodal maximizer by the reasoning of Theorem 6.1. The entropy maximizer for the actual problem involving $$H$$ must approximate the entropy maximizer for the model problem, and in particular must be approximately bipodal, and so can be written as $$g_b + \Delta g_f$$, where $$\Delta g_f$$ averages to zero on each quadrant. The same arguments as in the proof of Theorem 1.1 show that $$\Delta g_f$$ is pointwise small. By a power series expansion, $$\frac{s(g_b + \Delta g_f)-s(g_b)}{\tau(g_b + \Delta g_f)-\tau(g_b)}<\beta$$, so for small $$c$$ we can increase the entropy by setting $$\Delta g_f$$ to zero and varying the bipodal data to achieve the correct value of $$\tau$$. Step 3. For any fixed $$\epsilon$$, the model problem has only a finite number of bad values of $$\epsilon$$, but this is not enough to prove that $$B_H$$ is finite. Rather   BH={ϵ|ϵ is one of the bad points for the model with ak=nkϵℓ−k}, (75) where a value of $$\epsilon$$ is bad for a model if either $$\psi$$ has multiple maxima or if $$\partial \beta/\partial p_{12}=0$$. Since the bad points for any linear combination of $$k$$-stars depends analytically on the coefficients of that linear combination, and since these coefficients are powers of $$\epsilon$$, the set $$B_H$$ is cut out by analytic equations in $$\epsilon$$. As such, $$B_H$$ is either the entire interval $$(0,1)$$, or a finite set, or a countable set with limit points only at 0 and/or 1. We will show that neither $$0$$ nor $$1$$ is a limit point of $$B_H$$, implying that $$B_H$$ is finite. Let $$k_{\rm max}$$ be the largest degree of any vertex in $$H$$, and consider the model problem with $$h(x) = \sum_{k=2}^{k_{\rm max}} a_k x^k$$, where $$a_k = n_k \epsilon^{\ell - k}$$. We begin with some constraints on the values of $$\tilde \epsilon$$ for which $$\psi'=0$$. Lemma 7.1. Suppose that $$\psi'(\epsilon,\tilde \epsilon)=0$$. If $$\tilde \epsilon=\epsilon$$, or if $$\partial \beta/\partial p_{12}=0$$ when $$p_{22}=\epsilon$$ and $$p_{12}=\tilde \epsilon$$, then $$({1}/{2}) \le \tilde \epsilon \le ({k_{\rm max}-1})/{k_{\rm max}}$$. □ Proof of Lemma In both cases we are looking for solutions to $$N'' D'''=N''' D''$$. Since $$N'' = 2 S_0''(\tilde \epsilon)$$, $$N''' = 2 S_0'''(\tilde \epsilon)$$, $$D'' = h''(\tilde \epsilon)$$, and $$D'''=h'''(\epsilon)$$, this equation does not involve $$\epsilon$$ (except insofar as the coefficients of $$h$$ depend on $$\epsilon$$). We have   2S0‴(ϵ~)2S0″(ϵ~)=h‴(ϵ~)h″(ϵ~),11−ϵ~−1ϵ~=h‴(ϵ~)h″(ϵ~),2ϵ~−11−ϵ~=ϵ~h‴(ϵ~)h″(ϵ~),11−ϵ~−2=∑k(k−1)(k−2)akϵ~k−2∑k(k−1)akϵ~k−2. (76) The right-hand side of the last line is a weighted average of $$k-2$$ with weights $$k(k-1) a_k \tilde \epsilon^{k-2}$$, and so is at least zero and at most $$k_{\max}-2$$. Thus $$(1-\tilde \epsilon)^{-1}$$ is between 2 and $$k_{\max}$$ and $$\tilde \epsilon$$ is between $$1/2$$ and $$({k_{\max}-1})/{k_{\max}}$$. ■ Lemma 7.2. If $$\psi'(\epsilon, \tilde \epsilon)=0$$, and if $$\epsilon$$ is sufficiently close to 1, then $$\tilde \epsilon$$ is uniquely defined and approaches 0 as $$\epsilon \to 1$$. Likewise, if $$\epsilon$$ is sufficiently close to 0, then $$\tilde \epsilon$$ is uniquely defined and approaches 1 as $$\epsilon \to 0$$. □ Proof When $$\epsilon < 1/2$$, or when $$\epsilon > ({k_{\max}-1})/{k_{\max}}$$, we cannot have $$\tilde \epsilon = \epsilon$$, so the equation $$\psi'=0$$ is equivalent to $$ND'=DN'$$ and $$\tilde \epsilon \ne \epsilon$$. Writing $$DN'-ND'=0$$ explicitly, and doing some simple algebra, yields the equation   S0′(ϵ)[h(ϵ~)−h(ϵ)−(ϵ~−ϵ)h′(ϵ)]−S′(ϵ~)[[h(ϵ~)−h(ϵ)−(ϵ~−ϵ)h′(ϵ~)]+(S0(ϵ~)−S0(ϵ))(h′(ϵ~)−h′(ϵ))=0. (77) If $$\epsilon$$ approaches 0 or 1 and $$\tilde \epsilon$$ does not, then the first term diverges, while the other terms do not, insofar as $$S_0'$$ has singularities at 0 and 1 but $$S_0$$, $$h$$, and $$h'$$ do not. Thus $$\tilde \epsilon$$ must go to 0 or 1 as $$\epsilon$$ goes to 0 or 1. We next rule out the possibility that both $$\epsilon$$ and $$\tilde \epsilon$$ approach 1. Suppose that $$\epsilon$$ is close to 1. We expand both $$N$$ and $$D$$ in powers of $$(\tilde \epsilon - \epsilon)$$:   N=∑m=2∞2S0(m)(ϵ)m!(ϵ~−ϵ)m=−∑m=2∞(1(1−ϵ)m−1+(−1)mϵm−1)(ϵ~−ϵ)m(m−1),D=∑m=2kmaxh(m)(ϵ)m!(ϵ~−ϵ)m, (78) where $$S_0^{(m)}$$ and $$h^{(m)}$$ denote $$m$$th derivatives. The coefficients of the numerator grow rapidly with $$m$$, while the growth of the coefficients of the denominator depend only on the degree of $$h$$. For $$\tilde \epsilon > \epsilon > (k_{\max} - 1)/{k_{\max}}$$, $$\psi = N/D$$ is a decreasing function of $$\tilde \epsilon$$ (i.e., negative and increasing in magnitude), so we cannot have $$\psi'=0$$. Since the equation $$\psi'=0$$ is symmetric in $$\epsilon$$ and $$\tilde \epsilon$$ (apart from the dependence of the coefficients of $$h$$ on $$\epsilon$$), we also cannot have $$\epsilon > \tilde \epsilon > (k_{\max}-1)/{k_{\max}}$$. When $$\epsilon$$ is close to 1, we must thus have $$\tilde \epsilon$$ close to 0. But then $$N \approx 2 S_0'(\epsilon)$$, $$D \approx h'(\epsilon)-h(\epsilon)$$, $$D' \approx - h'(E)$$, and the equation   2S0′(ϵ~)=N′+2S0′(ϵ)=2S0′(ϵ)+ND′/D (79) determines $$S_0'(\tilde \epsilon)$$, and therefore $$\tilde \epsilon$$, uniquely as a function of $$\epsilon$$. Next we consider $$\epsilon \to 0$$. If $$H$$ is 2-starlike, then $$\psi$$ is a multiple of $$\psi_2$$, and the result is already known. Otherwise, it is convenient to define a new polynomial $$\bar h(z) = \sum n_k z^k$$, so that $$h(x) = \epsilon^\ell \bar h(x/\epsilon)$$. Then   D=h(ϵ~)−h(ϵ)−h′(ϵ)(ϵ~−ϵ)=ϵℓ[h¯(r)−h¯(1)−h¯′(1)(r−1)], (80) where $$r := \tilde \epsilon/\epsilon$$. Likewise,   N=−[ϵ~ln⁡(ϵ~)−ϵln⁡(ϵ)+(1−ϵ~)ln⁡(1−ϵ~)−(1−ϵ)(1−ϵ~)−(ϵ~−ϵ)(ln⁡(ϵ)−ln⁡(1−ϵ))]. (81) Since $$\epsilon$$ and $$\tilde \epsilon$$ are small, we can approximate $$\ln(1-\epsilon)$$ and $$\ln(1-\tilde \epsilon)$$ as $$-\epsilon$$ and $$-\tilde \epsilon$$, respectively, giving   N≈−ϵ[rln⁡r−r+1]+ϵ2(r−r2). (82) The ratio $$\psi = N/D$$ is negative. Since $$\bar h$$ is a polynomial of degree at least 3, $$D$$ grows faster than $$N$$ as $$r \to \infty$$, so we can always increase $$\psi$$ by taking larger and larger values of $$r = \tilde \epsilon/\epsilon$$. This argument only breaks down when the approximation $$\ln(1-\tilde E) \approx -\tilde \epsilon$$ breaks down, that is, at values of $$\tilde \epsilon$$ that are no longer close to 0. Thus we cannot have $$\tilde \epsilon$$ and $$\epsilon$$ both close to zero. Finally, if $$\epsilon$$ is close to 0 and $$\tilde \epsilon$$ is close to 1, then $$h(\epsilon)$$ and $$h'(\epsilon)$$ are close to zero, while $$h(\tilde \epsilon)$$ is close to a multiple of $$x^{k_{\max}}$$, since the coefficient of $$x^{k_{\max}}$$ is $$O(1/\epsilon)$$ larger than any other coefficient. Thus $$\psi$$ behaves like $$\psi_{k_{\max}}$$, and has a unique maximizer. ■ We have shown that when $$\epsilon$$ is close to 0 or 1, $$\psi$$ has a unique maximizer. Furthermore, $$\tilde \epsilon$$ is not between $$1/2$$ and $$({k_{\max}-1})/{k_{\max}}$$, so $$\partial \beta/\partial p_{12} \ne 0$$. So $$\epsilon \not \in B_H$$, completing Step 3 and the proof of Theorem 1.2. 8 Conclusions We have shown that just above the ER curve, entropy maximizing graphons, constrained by the densities of edges and any one other subgraph $$H$$, exhibit the same qualitative behavior for all $$H$$ and for (almost) all values of $$\epsilon$$. The optimizing graphon is unique and bipodal. These results were proven by perturbation theory, using the fact that the optimizing graphon has to be $$L^2$$-close to a constant (Erdős–Rényi) graphon. Surprisingly, the optimizing graphon is not pointwise close to constant. Rather, it is bipodal, with a small cluster of size $$O(\Delta \tau)$$. As $$\Delta \tau$$ approaches 0, the size of the small cluster shrinks, but the values of the graphon on each quadrant do not approach one another. Rather, $$p_{22}$$ approaches $$\epsilon$$, $$p_{12}$$ approaches the value of $$\tilde \epsilon$$ that maximizes a specific function $$\psi(\epsilon, \tilde \epsilon)$$, and $$p_{11}$$ satisfies $$S_0'(p_{11}) - 2 S_0'(p_{12}) + S_0'(p_{22})=0$$. Finally, the asymptotic behavior of these graphons as $$\tau \to \epsilon^\ell$$ depends only on the degree sequence of $$H$$. In particular, the cases where $$H$$ is a triangle and when $$H$$ is a 2-star are asymptotically the same. This is illustrated in Figure 2. Since $$\Delta \tau_{\text{triangle}} \approx 3 \epsilon \Delta \tau_2$$, the optimizing graphon for the 2-star model with $$\epsilon = 0.4$$ and $$\Delta \tau_2=0.002$$ should resemble the optimizing graphon for the triangle model with $$\epsilon=0.4$$ and $$\Delta \tau_{\text{triangle}}=0.0024$$. These optimizing graphons are obtained using the algorithms we developed in [14] without assuming bipodality. Numerical estimates indicate that the optimizing graphons are not exactly the same, thanks to $$O(\Delta \tau_2^{3/2})$$ corrections to $$\Delta \tau_{\text{triangle}}$$, but are still qualitatively similar. Fig. 2. View largeDownload slide Numerical estimates of the optimizing graphon for the 2-star model with $$\epsilon=0.4$$ and $$\tau_2=0.1620$$ (left) and the optimizing graphon for the triangle model with $$\epsilon=0.4$$ and $$\tau_{\text{triangle}}=0.0664$$ (right). (Although theoretically we have not tried to prove that $$\Delta\tau_2 = 0.002$$ is small enough to fit into the interval provided by Theorem 1.1, numerically it appears to be the case.) Fig. 2. View largeDownload slide Numerical estimates of the optimizing graphon for the 2-star model with $$\epsilon=0.4$$ and $$\tau_2=0.1620$$ (left) and the optimizing graphon for the triangle model with $$\epsilon=0.4$$ and $$\tau_{\text{triangle}}=0.0664$$ (right). (Although theoretically we have not tried to prove that $$\Delta\tau_2 = 0.002$$ is small enough to fit into the interval provided by Theorem 1.1, numerically it appears to be the case.) Funding This work was supported by the Simons Foundation [grant 327929 to R.K.] and National Science Foundation (NSF) [grants DMS-1208191, DMS-1509088, DMS-1321018, and DMS-1101326]. Acknowledgments The computational results shown in this work were obtained on the computational facilities in the Texas Super Computing Center (TACC). We gratefully acknowledge this computational support. Conflict of Interest I (Richard Kenyon) am an associate editor of IMRN. Appendix: Proof of Theorem 3.3 Proof Fix $$k \ge 2$$ and let   N(ϵ,ϵ~)=2[S0(ϵ~)−S0(ϵ)−S0′(ϵ)(ϵ~−ϵ)]D(ϵ,ϵ~)=ϵ~k−ϵk−kϵk−1(ϵ~−ϵ) (A1) be the numerator and denominator of the function $$\psi_k(\epsilon, \tilde \epsilon) = N/D$$. These definitions make sense for all real values of $$k$$, not just for integers. When taking derivatives of $$N$$, $$D,$$ and $$\psi$$, we will denote a derivative with respect to the first variable by a dot, and a derivative with respect to the second variable by $${}'$$. That is, $$D'(\epsilon,\tilde \epsilon) =\partial D/\partial \tilde \epsilon$$ and $$\dot D(\epsilon, \tilde \epsilon) = \partial D/\partial \epsilon$$. As noted earlier, this definition of $$\psi_k$$ has a removable singularity at $$\tilde \epsilon = \epsilon$$, which we fill in by defining   ψk(ϵ,ϵ)=N″(ϵ,ϵ)/D″(ϵ,ϵ)=2S0″(ϵ)/[k(k−1)ϵk−2]. (A2) The denominator $$D$$ vanishes only at $$\tilde \epsilon = \epsilon$$. Some useful explicit derivatives are:   N′=2[S0′(ϵ~)−S0′(ϵ)],N″=2S0″(ϵ~)=−1ϵ~(1−ϵ~),N˙=−2S0″(ϵ)(ϵ~−ϵ),N˙′=−2S0″(ϵ),D′=k[ϵ~k−1−ϵk−1],D″=k(k−1)ϵ~k−2,D˙=−k(k−1)ϵk−2(ϵ~−ϵ),D˙′=−k(k−1)ϵk−2. (A3) Note that $$D$$ and $$N$$ both vanish when $$\tilde \epsilon = \epsilon$$, so we can write   N(ϵ,ϵ~)=∫ϵϵ~N′(ϵ,x)dx=∫ϵ~ϵN˙(x,ϵ~)dx, (A4) and similarly for $$D(\epsilon, \tilde \epsilon)$$. We proceed in steps: Step 1. Analyzing $$\psi$$ near $$\tilde \epsilon = \epsilon$$ to see that $$\psi_k'(\epsilon,\epsilon) = 0$$ only when $$\epsilon = (k-1)/k$$. Step 2. Showing that we can never have $$\psi_k'=\psi_k''=0$$. Step 3. Showing that the equation $$\psi_k'(\epsilon, \tilde \epsilon)$$ is symmetric in $$\epsilon$$ and $$\tilde \epsilon$$, implying that $$\zeta_k$$ is an involution. Step 4. Showing that $$\psi_k$$ has a unique critical point. Step 5. Showing that $${\rm d} \zeta_k/{\rm d}\epsilon$$ is never zero. Step 6. Showing that $$\psi_k(\epsilon, \zeta_k(\epsilon)) > \max(\psi_k(\epsilon,\epsilon), \psi_k(\zeta(\epsilon), \zeta(\epsilon)))$$. The following calculus fact will be used repeatedly. When $$D \ne 0$$, $$\psi_k'=0$$ is equivalent to $${N}/{D} = {N'}/{D'}$$, and $$\psi_k'=\psi_k''=0$$ is equivalent to $${N}/{D} = {N'}/{D'} ={N''}/{D''}$$. This follows from the quotient rule:   ψ′=DN′−ND′D2,ψ″=DN″−ND″D2−2D′(DN′−ND′)D3. (A5) Step 1. Since $$N$$ and $$D$$ have double roots at $$\tilde \epsilon = \epsilon$$, we can do a Taylor series for both of them near $$\tilde \epsilon = \epsilon$$:   ψk(ϵ,ϵ~)=N″(ϵ,ϵ)(ϵ~−ϵ)2/2+N‴(ϵ,ϵ)(ϵ~−ϵ)3/6+⋯D″(ϵ,ϵ)(ϵ~−ϵ)2/2+D‴(ϵ,ϵ)(ϵ~−ϵ)3/6+⋯=N″(ϵ,ϵ)+N‴(ϵ,ϵ)(ϵ~−ϵ)/3+⋯D″(ϵ,ϵ)+D‴(ϵ,ϵ)(ϵ~−ϵ)/3+⋯. (A6) $$\psi_k'(\epsilon,\epsilon)=0$$ is then equivalent to   N″(ϵ,ϵ)D‴(ϵ,ϵ)=N‴(ϵ,ϵ)D″(ϵ,ϵ)−k(k−1)(k−2)ϵk−3ϵ(1−ϵ)=−k(k−1)ϵk−2(1−2ϵ)ϵ2(1−ϵ)2(k−2)(1−ϵ)=1−2ϵkϵ=k−1. (A7) Step 2. If $$\psi_k'=\psi_k''=0$$, then we must have $$N'D''=D'N''$$ and $$ND''=DN''$$. We will explore these in turn. We write   0=N′D″−D′N″=∫ϵ~ϵD″(ϵ,ϵ~)N˙′(x,ϵ~)−N″(ϵ,ϵ~)D˙′(x,ϵ~)dx. (A8) Explicitly, this becomes   0=∫ϵ~ϵk(k−1)ϵ~(1−ϵ~)x(1−x)[ϵ~k−1(1−ϵ~)−xk−1(1−x)]dx. (A9) The function $$x^{k-1}(1-x)$$ has a single maximum at $$x=(k-1)/k$$. If both $$\epsilon$$ and $$\tilde \epsilon$$ are on the same side of this maximum, then the integrand will have the same sign for all $$x$$ between $$\tilde \epsilon$$ and $$\epsilon$$, and the integral will not be zero. Thus we must have $$\epsilon < (k-1)/k < \tilde \epsilon$$, or vice-versa, and we must have $$\epsilon^{k-1}(1-\epsilon) < \tilde \epsilon^{k-1}(1-\tilde \epsilon)$$. In this case the integrand changes sign exactly once. Now we apply the same sort of analysis to the other equation:   0=ND″−DN″=∫ϵ~ϵD″(x,ϵ~)N˙(x,ϵ~)−N″(x,ϵ~)D˙(x,ϵ~)dx. (A10) Explicitly, this becomes   0=∫ϵ~ϵk(k−1)ϵ~(1−ϵ~)x(1−x)[ϵ~k−1(1−ϵ~)−xk−1(1−x)](ϵ~−x)dx. (A11) This is the same integral as before, only with an extra factor of $$(\tilde \epsilon - x)$$. If we view the first integral (A9) as a mass distribution (with total mass zero), then the second integral is (minus) the first moment of this mass distribution relative to the endpoint $$\tilde \epsilon$$. But we have already seen that the distribution changes sign exactly once, and so must have a non-zero first moment. This is a contradiction. Step 3. If $$ND'=DN'$$, then $$N/D = N'/D'$$. Call this common ration $$r$$. Then   N=rD and N′=rD′. (A12) Note that $$N'$$ and $$D'$$ are odd under interchange of $$\epsilon$$ and $$\tilde \epsilon$$, so the second equation is invariant under this interchange. Furthermore, we have $$(\tilde \epsilon-\epsilon)N' -N = r [ (\tilde \epsilon - \epsilon)D' - D]$$. However, $$(\tilde \epsilon - \epsilon)N' - N$$ is the same as $$N$$ with the roles of $$\epsilon$$ and $$\tilde \epsilon$$ reversed, while $$(\tilde \epsilon - \epsilon)D' - D$$ is the same as $$D$$ with the roles of $$\epsilon$$ and $$\tilde \epsilon$$ reversed. Thus the two equations are satisfied for $$(\epsilon, \tilde \epsilon)$$ if and only if they are satisfied for $$(\tilde \epsilon, \epsilon)$$. Step 4. For $$k=2$$ we explicitly compute that $$\psi_2'=0$$ only at $$\tilde \epsilon = 1-\epsilon$$. If $$k_{\rm min}$$ is the infimum of all values of $$k$$ for which $$\psi_k$$ has multiple critical points, then at a critical point of $$\psi_{k_{\rm min}}$$ we must have $$\psi_k'=\psi_k''=0$$, which is a contradiction. Thus $$k_{\rm min}$$ does not exist, and $$\psi_k$$ has a unique critical point for all $$k \ge 2$$. In particular, $$\zeta_k$$ is a well-defined function. Step 5. The function $$\zeta_k$$ is defined by the condition that $$D N' - N D' = 0$$ (and $$\tilde \epsilon \ne \epsilon$$, except when $$\epsilon = (k-1)/k$$). Let $$f(\tilde e, e) = DN' - ND' = D^2 \psi'$$. Moving along the curve $$\tilde \epsilon = \zeta_k(\epsilon)$$ (i.e., $$f=0$$), we differentiate implicitly:   0=df=f˙dϵ+f′dϵ~, (A13) so   dϵ~dϵ=−f˙f′. (A14) We compute $$f' = D N'' - N D''.$$ This is non-zero by Step 2. We also have   f˙=DN˙′−N˙D′+D˙N′−ND˙′=−2S0″(ϵ)(D−(ϵ~−ϵ)D)′+k(k−1)ϵk−2(N−(ϵ~−ϵ)N′)=2S0″(ϵ)[ϵk−ϵ~k+k(ϵ~−ϵ)ϵ~k−1]−2k(k−1)ϵk−2[S0(ϵ)−S0(ϵ~)+(ϵ~−ϵ)S0′(ϵ~)]=D(ϵ~,ϵ)N″(ϵ~,ϵ)−N(ϵ~,ϵ)D″(ϵ~,ϵ). (A15) That is, $$\dot f$$ is the same as $$f'$$, only with the roles of $$\epsilon$$ and $$\tilde \epsilon$$ reversed. Since the equation $$f=0$$ is symmetric in $$\epsilon$$ and $$\tilde \epsilon$$, the argument of Step 2 can be repeated to show that $$\dot f \ne 0$$. Since $$d\tilde \epsilon/d\epsilon$$ is never zero, and since $$d \tilde \epsilon/d\epsilon=-1$$ at the fixed point (by symmetry), $$\zeta_k'(\epsilon) = d\tilde \epsilon/d\epsilon$$ must always be negative. Step 6. Since $$\psi_k(\epsilon, \tilde \epsilon)$$ has a single critical point (with respect to $$\tilde \epsilon$$, for fixed $$k$$ and $$\epsilon$$), this critical point must either always be a local maximum or a local minimum, and hence a global maximum or minimum, and the answer must be the same for all $$k$$ and all $$\epsilon$$. By checking a single case (e.g., $$k=2$$ and $$\epsilon$$ approaching 0) it is easy to see that it is a maximum. Thus $$\psi_k(\epsilon, \zeta_k(\epsilon))> \psi_k(\epsilon, \epsilon)$$ for all $$\epsilon \ne (k-1)/k$$. Since the equations for a critical point are symmetric with respect to interchange of $$\epsilon$$ and $$\tilde \epsilon$$, $$\epsilon = \zeta_k(\tilde \epsilon)$$ also gives the unique critical point of $$\psi_k(\epsilon, \tilde \epsilon)$$ with respect to $$\epsilon$$. By considering the limit of $$\psi_k(\epsilon, \tilde \epsilon)$$ as $$\epsilon \to 0$$ or $$\epsilon \to 1$$, it is clear that this critical point is a maximum. Since $$\zeta_k(\zeta_k(\epsilon))=\epsilon$$, this implies that $$\psi_k(\epsilon, \zeta_k(\epsilon))> \psi_k(\zeta_k(\epsilon), \zeta_k(\epsilon))$$. ■ References [1] Aristoff D. and Radin. C. “Emergent structures in large networks.” Journal of Applied Probability  50 (2013): 883– 4. Google Scholar CrossRef Search ADS   [2] Borgs C. Chayes J. and Lovász. L. “Moments of two-variable functions and the uniqueness of graph limits.” Geometry and Functional Analysis  19 (2010): 1597– 4. Google Scholar CrossRef Search ADS   [3] Borgs C. Chayes J. Lovász L. Sós V.T. and Vesztergombi. K. “Convergent graph sequences I: subgraph frequencies, metric properties, and testing.” Advances in Mathematics  219 (2008): 1801– 4. Google Scholar CrossRef Search ADS   [4] Chatterjee S. and Diaconis. P. “Estimating and understanding exponential random graph models.” Annals of Statistics  41 (2013): 2428– 4. Google Scholar CrossRef Search ADS   [5] Chatterjee S. and Varadhan. S. R. S. “The large deviation principle for the Erdős-Rényi random graph.” European Journal of Combinatorics  32 (2011): 1000– 4. Google Scholar CrossRef Search ADS   [6] Kenyon R. Radin C. Ren K. and Sadun. L. “Multipodal structures and phase transitions in large constrained graphs.” arXiv:1405.0599, (2014). [7] Lovász L. and Szegedy. B. “Limits of dense graph sequences.” Journal of Combinatorial Theory Series B  98 (2006): 933– 4. Google Scholar CrossRef Search ADS   [8] Lovász L. and Szegedy. B. “Szemerédi’s lemma for the analyst.” Geometry and Functional Analysis  17 (2007): 252– 4. Google Scholar CrossRef Search ADS   [9] Lovász L. and Szegedy. B. “Finitely forcible graphons.” Journal of Combinatorial Theory Series B  101 (2011): 269– 4. Google Scholar CrossRef Search ADS   [10] Lovász L. Large Networks and Graph Limits.  Providence: American Mathematical Society, 2012. Google Scholar CrossRef Search ADS   [11] Lubetzky E. and Zhao. Y. “On replica symmetry of large deviations in random graphs.” Random Structures and Algorithms  47 (2015): 109– 4. Google Scholar CrossRef Search ADS   [12] Pikhurko O. and Razborov. A. “Asymptotic structure of graphs with the minimum number of triangles.” Combinatorics, Probability and Computing  (2016): 1– 23. ISSN 0963-5483 (In Press). [13] Radin C. and Yin. M. “Phase transitions in exponential random graphs.” Annals of Applied Probability  23 (2013): 2458– 4. Google Scholar CrossRef Search ADS   [14] Radin C. Ren K. and Sadun. L. “The asymptotics of large constrained graphs.” Journal of Physics A: Mathematical and Theoretical  47 (2014): 175001. Google Scholar CrossRef Search ADS   [15] Radin C. and Sadun. L. “Phase transitions in a complex network.” Journal of Physics A: Mathematical and Theoretical  46 (2013): 305002. Google Scholar CrossRef Search ADS   [16] Radin C. and Sadun. L. “Singularities in the entropy of asymptotically large simple graphs.” Journal of Statistical Physics  158 (2015): 853– 4. Google Scholar CrossRef Search ADS   [17] Razborov A. “On the minimal density of triangles in graphs.” Combinatorics, Probability and Computing  17 (2008): 603– 4. Google Scholar CrossRef Search ADS   [18] Turán P. “On an extremal problem in graph theory, (in Hungarian).” Matematikai é s Fizikai Lapok  48 (1941): 436– 4. © The Author(s) 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permission@oup.com. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Mathematics Research Notices Oxford University Press

# Bipodal Structure in Oversaturated Random Graphs

, Volume 2018 (4) – Feb 1, 2018
36 pages

/lp/ou_press/bipodal-structure-in-oversaturated-random-graphs-f4ZAZyz84l
Publisher
Oxford University Press
ISSN
1073-7928
eISSN
1687-0247
D.O.I.
10.1093/imrn/rnw261
Publisher site
See Article on Publisher Site

### Abstract

Abstract We study the asymptotics of large simple graphs directly constrained by the limiting subgraph densities of edges and of an arbitrary fixed graph $$H$$. We prove that, for all but finitely many values of the edge density, if the density of $$H$$ is constrained to be slightly higher than that for the corresponding Erdős–Rényi graph, the typical large graph is bipodal with parameters varying analytically with the densities. Asymptotically, the parameters depend only on the degree sequence of $$H$$. 1 Introduction We study the asymptotics of large, simple, labeled graphs directly constrained to have subgraph densities $$\epsilon$$ of edges, and $$\tau$$ of some fixed subgraph $$H$$ with $$\ell \ge 2$$ edges. To study the asymptotics we use the graphon formalism of Borgs et al. [2, 3], Lovász et al. [7–9] and the large deviations theorem of Chatterjee and Varadhan [5], from which one can reduce the analysis to the study of the graphons which maximize the entropy subject to the density constraints [6, 14–16]. See definitions in Section 2. The phase space (parameter space) is the subset of $$[0,1]^2$$ consisting of accumulation points of all pairs of densities $${\bar\tau}=(\epsilon,\tau)$$ achievable by finite graphs. (See Figure 1 for the model where $$H$$ is a triangle.) Within the phase space is the “Erdős–Rényi curve” (ER curve) $$\{(\epsilon,\tau)~|~\tau=\epsilon^\ell\}$$, attained when edges are chosen independently. In this paper, we study the typical behavior of large graphs for $$\bar \tau$$ just above the ER curve. We will show that the qualitative behavior of such graphs is the same for all choices of $$H$$ and for all but finitely many choices of $$\epsilon$$ depending on $$H$$. Fig. 1. View largeDownload slide Boundary of the phase space for the edge/triangle model in solid lines, see [10]. On the right, the ER curve is shown with dashes. Fig. 1. View largeDownload slide Boundary of the phase space for the edge/triangle model in solid lines, see [10]. On the right, the ER curve is shown with dashes. To be precise, we show that for fixed $$H$$, for $$\epsilon$$ outside a finite set, and for $$\tau$$ close enough to $$\epsilon^\ell$$, there is a unique entropy-maximizing graphon (up to measure-preserving transformations of the unit interval); furthermore it is bipodal and depends analytically on $$(\epsilon,\tau)$$, implying that the entropy is an analytic function of $$(\epsilon,\tau)$$. In particular we prove the existence of one or more well-defined phases just above the ER curve. This is the first proof, as far as we know, of the existence of a phase in any constrained-density graphon model, where by phase we mean a (maximal) open set in the phase space at each point of which the entropy has a unique graphon maximizer, which varies analytically with the constraint parameters. (Conjecturally, the phase space is made up of a union of phases and a subset of lower dimension, the latter providing boundaries for the phases [14].) The unique maximizers provide an embedding of each phase into the metric space of reduced graphons. Variation of constraint values in the phase space is therefore mirrored by this embedding into variation in the space of graphons. This has the consequence that smoothness or singularity under variation can be interpreted among the graphons, which are thought of as the emergent states of the large graphs. In contrast, in exponential random graph models (see, e.g., [1, 4, 11, 13]) the parameters, which are associated with graphons by optimization of free energy rather than entropy, play a fundamentally different role; different parameters values can be associated with the same optimal graphon. For an extreme example, the whole two-dimensional parameter space for edge/2-star constraints is mapped in this way into the one-dimensional set of Erdös–Rényi graphons [4]. Clearly, smoothness or singularity under variation of parameter values in such models is more naturally interpreted as a feature of the model, as in [4], rather than as a feature of states of large constrained graphs. For further analysis of this see the discussion in the Conclusion in [6]. The study of constrained graphs in the sense we are considering, was initiated by Turán in 1941 [18], addressing in particular the case of edge and triangle constraints. The extremal graph theory of these constraints was recently completed by Razborov et al., in [12, 17], which also contain a good history of this problem. Partial results describing the entropy maximizing graphons in the interior of that phase space were then obtained in [14–16]. For the edge-$$k$$-star model, we proved multipodality of all entropy optimizers in [6]. This will be an important tool in this paper. It is also important for the heuristics as it provides a simple interpretation of the emergence of the large scale state of the constrained graphs, through partitioning of nodes. We should also mention that the region below the ER curve in the edge/triangle model seems to be more mysterious; no proof of multipodality is known, for example, except on a line segment [16], though there is good simulation evidence of it [14]. A bipodal graphon is a function $$g: [0,1]^2 \to [0,1]$$ of the form:   g(x,y)={p11x,y<c,p12x<c<y,p12y<c<x,p22x,y>c. (1) Bipodal graphons are generalizations of bipartite graphons, in which $$p_{11}=p_{22}=0$$. Here $$c,p_{11}, p_{12}$$, and $$p_{22}$$ are constants taking values between 0 and 1. We prove that as $$\tau\searrow \epsilon^\ell$$, the parameters $$c \to 0$$, $$p_{22} \to \epsilon$$, and $$p_{11}$$ and $$p_{12}$$ approach the solutions of a problem in single-variable calculus. The inputs to that calculus problem depend only on the degrees of the vertices of $$H$$. We say that a finite graph $$H$$ is $$k$$-starlike if all the vertices of $$H$$ have degree $$k$$ or 1, where $$k >1$$ is a fixed integer. $$k$$-starlike graphs include $$k$$-stars (where one vertex has degree $$k$$ and $$k$$ vertices have degree 1), and the complete graph on $$k+1$$ vertices. For fixed $$k$$, all $$k$$-starlike graphs behave essentially the same for our asymptotics. We prove our results first for $$k$$-stars, and then apply perturbation theory to show that the differences between different $$k$$-starlike graphs are irrelevant, and then prove the general case. To state our results more precisely, we need some notation. Let   S0(w)=−12[wlog⁡w+(1−w)log⁡(1−w)], (2) and define the graphon entropy (or entropy for short) of a graphon $$g$$ to be   s(g)=∫01∫01S0(g(x,y))dxdy. (3) Let   ψk(ϵ,ϵ~)=2[S0(ϵ~)−S0(ϵ)−S0′(ϵ)(ϵ~−ϵ)]ϵ~k−ϵk−kϵk−1(ϵ~−ϵ). (4) $$\psi_k(\epsilon,\tilde \epsilon)$$, viewed as a function of $$\tilde\epsilon$$, has a removable singularity at $$\tilde \epsilon=\epsilon$$, which we fill by defining   ψk(ϵ,ϵ)=2S0″(ϵ)k(k−1)ϵk−2. (5) For fixed $$\epsilon$$, let $$\zeta_k(\epsilon)$$ be the value of $$\tilde \epsilon$$ that maximizes $$\psi_k(\epsilon,\tilde \epsilon)$$. (We will prove in Theorem 3.3 below that this maximizer is unique and depends continuously on $$\epsilon$$.) Theorem 1.1. Let $$H$$ be a $$k$$-starlike graph with $$\ell\ge 2$$ edges. Let $$\epsilon \in (0,1)$$ be any point other than $$(k-1)/k$$. Then there is a number $$\tau_0> \epsilon^\ell$$ (depending on $$\epsilon$$) such that for all $$\tau \in (\epsilon^\ell, \tau_0)$$, the entropy-maximizing graphon at $$(\epsilon,\tau)$$ is unique (up to measure-preserving transformations of $$[0,1]$$) and bipodal. Its parameters $$(c, p_{11}, p_{12}, p_{22})$$ are analytic functions of $$\epsilon$$ and $$\tau$$ on the region $$\epsilon \ne (k-1)/k$$, $$\tau \in (\epsilon^\ell, \tau_0(\epsilon))$$. Furthermore, as $$\tau\searrow\epsilon^\ell$$ we have that $$p_{22} \to \epsilon$$, $$p_{12} \to \zeta_k(\epsilon)$$, $$p_{11}$$ satisfies $$S_0'(p_{11}) = 2S_0'(p_{12}) - S_0'(p_{22})$$, and $$c=O(\tau-\epsilon^\ell)$$. □ Theorem 1.1 proves that there is part of a phase just above the ER curve for $$\epsilon < (k-1)/k$$ and also for $$\epsilon > (k-1)/k$$; numerical evidence suggests these are in fact parts of a single phase; the only “singular” behavior is the manner in which the graphon approaches the constant graphon associated with the ER curve. We will see in Theorem 1.2 that this behavior is only slightly more complicated for general $$H$$ than it is for $$k$$-starlike $$H$$. When $$H$$ has vertices with different degrees $$>1$$, the problem resembles that of a formal positive linear combination of $$k$$-stars. As in the $$k$$-starlike case, we first solve the problem for the linear combination of $$k$$-stars and then use perturbation theory to extend the results to arbitrary $$H$$. Theorem 1.2. Let $$H$$ be an arbitrary graph with $$\ell$$ edges with at least one vertex of degree $$2$$ or greater. Then there exists a finite set $$B_H \subset (0,1)$$ such that if $$\epsilon \ne B_H$$, then there is a number $$\tau_0> \epsilon^\ell$$ (depending on $$\epsilon$$) such that for all $$\tau \in (\epsilon^\ell, \tau_0)$$, the entropy-maximizing graphon at $$(\epsilon,\tau)$$ is unique (up to measure-preserving transformations of $$[0,1]$$) and bipodal. Its parameters $$(c, p_{11}, p_{12}, p_{22})$$ are analytic functions of $$\epsilon$$ and $$\tau$$ on the region $$\epsilon \not \in B_H$$, $$\tau \in (\epsilon^\ell, \tau_0(\epsilon))$$. Furthermore, as $$\tau\searrow \epsilon^\ell$$ we have that $$p_{22} \to \epsilon$$, $$p_{12}$$ approaches the maximizer of an explicit function whose data depends on $$\epsilon$$, $$p_{11}$$ satisfies $$S_0'(p_{11}) = 2S_0'(p_{12}) - S_0'(p_{22})$$, and $$c=O(\tau-\epsilon^\ell)$$. □ The key differences between the Theorems 1.1 and 1.2 are: For $$k$$-starlike graphs, the set $$B_H$$ of bad values of $$\epsilon$$ consists of a single point, and this point is explicitly known: $$\epsilon = (k-1)/k$$. For $$k$$-starlike graphs, the behavior of $$\zeta_k$$ is explicit. It is a continuous and strictly decreasing function of $$\epsilon$$, and gives an involution of $$(0,1)$$. (That is, $$\zeta_k(\zeta_k(\epsilon))=\epsilon$$.) For $$k=2$$ it is given by $$\zeta_2(\epsilon)=1-\epsilon$$. In the general case, the limiting value of $$p_{12}$$, and its dependence on $$\epsilon$$, appear to be much more complicated. We do not know whether this limiting value is always continuous across the bad set $$B_H$$. The organization of this paper is as follows. In Section 2 we review the formalism of graphons and establish basic notation. In Section 3 we establish a number of technical results for $$k$$-star models. Using these results, in Section 4 we prove Theorem 1.1 for the case that $$H$$ is a $$k$$-star. In Section 5 we show that just above the ER curve a model with an arbitrary $$k$$-starlike $$H$$ can be approximated by a $$k$$-star model. By bounding the error terms, we prove Theorem 1.1 in full generality. In Section 6 we consider formal positive linear combinations of $$k$$-stars, and prove a theorem much like Theorem 1.2 for those models. Finally, in Section 7 we show that the model for an arbitrary $$H$$ can be approximated by a formal linear combination of $$k$$-stars, thus completing the proof of Theorem 1.2. 2 Notation and background We consider a simple graph $$G$$ (undirected, with no multiple edges or loops) with a vertex set $$V(G)$$ of labeled vertices. For a subgraph $$H$$ of $$G$$, let $$T_H(G)$$ be the number of maps from $$V(H)$$ into $$V(G)$$ which sends edges to edges. The density$$\tau_H(G)$$ of $$H$$ in $$G$$ is then defined to be   τH(G):=|TH(G)|n|V(H)|, (6) where $$n = |V(G)|$$. An important special case is where $$H$$ is a “$$k$$-star,” a graph with $$k$$ edges, all with a common vertex, for which we use the notation $$\tau_k(G)$$. In particular $$\tau_1(G)$$, which we also denote by $$\epsilon(G)$$, is the edge density of $$G$$. For $$\alpha > 0$$ and $${\bar\tau}=(\epsilon,\tau_H)$$ define $$\displaystyle Z^{n,\alpha}_{{\bar\tau}}$$ to be the number of graphs $$G$$ on $$n$$ vertices with densities satisfying   ϵ(G)∈(ϵ−α,ϵ+α), τH(G)∈(τH−α,τH+α). (7) Define the (constrained) entropy$$s_{{\bar\tau}}$$ to be the exponential rate of growth of $$Z^{n,\alpha}_{{\bar\tau}}$$ as a function of $$n$$:   sτ¯=limα↘0limn→∞ln⁡(Zτ¯n,α)n2. (8) The double limit defining the entropy $$s_{{\bar\tau}}$$ is known to exist [15]. To analyze it we make use of a variational characterization of $$s_{{\bar\tau}}$$, and for this we need further notation to analyze limits of graphs as $$n\to \infty$$. (This work was recently developed in [2, 3, 7–9]; see also the recent book [10].) The (symmetric) adjacency matrices of graphs on $$n$$ vertices are replaced, in this formalism, by symmetric, measurable functions $$g:[0,1]^2\to[0,1]$$; the former are recovered by using a partition of $$[0,1]$$ into $$n$$ consecutive subintervals. The functions $$g$$ are called graphons. For a graphon $$g$$ define the degree function$$d(x)$$ to be $$d(x)=\int^1_0 g(x,y){\rm d}y$$. The $$k$$-star density of $$g$$, $$\tau_k(g)$$, then takes the simple form   τk(g)=∫01d(x)kdx. (9) For any fixed graph $$H$$, the $$H$$-density $$\tau_H$$ of $$g$$ can be similarly expressed as an integral of a product of factors $$g(x_i,x_j)$$. The following is Theorem 4.1 in [16]: Theorem 2.1 (The Variational Principle). For any values $${\bar\tau}={\bar\tau}(g) := (\epsilon, \tau_H)$$ in the phase space we have $$s_{{\bar\tau}} = \max [s(g)]$$, where the entropy is maximized over all graphons $$g$$ with $${\bar\tau}(g)={\bar\tau}$$. □ (Instead of using $$s(g)$$, some authors use the rate function$$I(g):= -s(g)$$, and then minimize $$I$$.) The existence of a maximizing graphon $$g=g_{{\bar\tau}}$$ for any constraint $${\bar\tau}(g)={\bar\tau}$$ was proven in [15], again adapting a proof in [5]. If the densities are that of edges and $$k$$-star subgraphs we refer to this maximization problem as a star model, though we emphasize that the result applies much more generally [15, 16]. We consider two graphs equivalent if they are obtained from one another by relabeling the vertices. For graphons, the analogous operation is applying a measure-preserving map $$\psi$$ of $$[0,1]$$ into itself, replacing $$g(x,y)$$ with $$g(\psi(x),\psi(y))$$, see [10]. The equivalence classes of graphons under relabeling are called reduced graphons, and graphons are equivalent if and only if they have the same subgraph densities for all possible finite subgraphs [10]. In the remaining sections of the paper, whenever we claim that a graphon has a property (e.g., monotonicity in $$x$$ and $$y$$, or uniqueness as an entropy maximizer), the caveat “up to relabeling” is implied. The graphons which maximize the constrained entropy can tell us what “most” or “typical” large constrained graphs are like: if $$g_{{\bar\tau}}$$ is the only reduced graphon maximizing $$s(g)$$ with $${\bar\tau}(g)={\bar\tau}$$, then as the number $$n$$ of vertices diverges and $$\alpha_n\to 0$$, exponentially most graphs with densities $${\bar\tau}_i(G)\in (\tau_i-\alpha_n,\tau_i+\alpha_n)$$ will have reduced graphon close to $$g_{{\bar\tau}}$$ [15]. This is based on large deviations from [5]. We emphasize that this interpretation requires that the maximizer be unique; this has been difficult to prove in most cases of interest and is an important focus of this work. A graphon $$g$$ is called $$M$$-podal if there is decomposition of $$[0,1]$$ into $$M$$ intervals (“vertex clusters”) $$C_j,\ j=1,2,\ldots,M$$, and $$M(M+1)/2$$ constants $$p_{ij}$$ such that $$g(x,y)=p_{ij}$$ if $$(x,y)\in C_i\times C_j$$ (and $$p_{ji}=p_{ij}$$). We denote the length of $$C_j$$ by $$c_j$$. 3 Technical properties of star models For each star model, all entropy-maximizing graphons are multipodal with a fixed upper bound on the number of clusters, also called the podality [6]. (The term multi/bipartite is sometimes used instead of multipodal in the literature.) For any fixed podality $$M$$, an $$M$$-podal graphon is described by $$N=M(M+3)/2$$ parameters, namely the values $$p_{ij}$$ ($$1\le i\le j\le M$$) and the widths $$c_i$$ ($$1\le i\le M$$) of the clusters. When it does not cause confusion, we will use $$g$$ to denote the vector   (p11,…,p1M,p22,…,p2M,…,…,pM−1M−1,pM−1M,pMM,c1,…,cM), (10) which contains all these parameters. The problem of optimizing the graphon then reduces to a finite-dimensional calculus problem. To be precise, let us recall that for an $$M$$-podal graphon, we have   ϵ(g)=∑1≤i,j≤Mcicjpij,  τk(g)=∑1≤i≤Mcidik,  s(g)=∑1≤i,j≤McicjS0(pij), (11) where $$d_i = \sum_{1\le j\le M} c_j p_{ij}$$ is the value of the degree function on the $$i$$th cluster. The problem of searching for entropy-maximizing graphons with fixed edge density $$\epsilon$$ and $$k$$-star density $$\tau_k$$ can now be formulated as   maxg∈[0,1]Ns(g),subject to:ϵ(g)−ϵ=0,τk(g)−τ=0,C(g)=1, (12) where $$C(g) = \sum_{1\le j\le M} c_j$$. The following result says that the maximization problem (12) can be solved using the method of Lagrange multipliers. The existence of finite Lagrange multipliers was previously established in [6], treating the space of graphons as a linear space of functions $$[0,1]^2 \to [0,1]$$, intuitively considering perturbations of graphons localized about points in $$[0,1]^2$$. For star models we may restrict to $$M$$-podal graphons, as noted above, and thus consider perturbations in the relevant parameters $$p_{ij}$$ and $$c_j$$. Lemma 3.1. Let $$g$$ be a local maximizer in (12). Then for constraints $$\epsilon,\tau$$ off the ER curve, there exist unique $$\alpha,\beta,\gamma\in\mathbb R$$ such that   ∇s(g)−α∇ϵ(g)−β∇τk(g)−γ∇C(g)=0. (13) □ We do not include the proof, which follows easily from that of Lemma 3.5 in [6]. We also note that one can remove the variable $$c_M$$ and the constraint $$C(g) =1$$, eliminating the multiplier $$\gamma$$. For convenience later, we now write down the exact form of the Euler–Lagrange equation (13). We first verify that   ∂ϵ∂pij=Aij,∂ϵ∂ci=2∑j=1Mcjpij=2di, (14)  ∂τk∂pij=k2(dik−1+djk−1)Aij,∂τk∂ci=dik+k∑j=1Mcjdjk−1pij, (15)  ∂C∂pij=0,∂C∂ci=1, (16)  ∂s∂pij=S0′(pij)Aij,∂s∂ci=2∑j=1McjS0(pij), (17) where $$A_{ij}= 2 c_i c_j$$ if $$i\neq j$$ and $$A_{ij}= c_i^2$$ if $$i=j$$. We can then write down (13) explicitly as   S0′(pij)=α+βk2(dik−1+djk−1),1≤i≤j≤M (18)  2∑j=1cjS0(pij)=2αdi+β(dik+k∑j=1Mcjdjk−1pij)+γ,1≤i≤M. (19) These Euler–Lagrange equations, together with the constraints,   ϵ(g)−ϵ=0,τk(g)−τ=0,C(g)−1=0, (20) are the optimality conditions for the maximization problem (12). In principle, we can solve this system to find the maximizer $$g$$. Next we consider the significance of the Lagrange multipliers $$\alpha$$ and $$\beta$$. Suppose that $$g_0$$ is the unique entropy maximizer for $$\epsilon=\epsilon_0$$ and $$\tau=\tau_0$$. Then any sequence of graphons that maximize entropy for $$(\epsilon,\tau)$$ approaching $$(\epsilon_0,\tau_0)$$ must approach $$g_0$$: this follows from continuity of the entropy on the space of $$M$$-podal graphons and the fact that we can perturb $$g_0$$ to any nearby $$(\epsilon,\tau)$$ by changing some $$p_{ij}$$’s (as follows easily from (11)). But if $$g= g_0 + \delta g$$, then   s(g)=s(g0)+∇s(g0)⋅δg+O(‖δg‖2), (21) where $$\|\delta g\|$$ denotes the norm of $$\delta g$$ as a vector in $$\mathbb R^N$$. However, from (13) we have   s(g)=s(g0)+α∇ϵ(g0)⋅δg+β∇τ(g0)⋅δg+O(‖δg‖2)=s(g0)+α(ϵ−ϵ0)+β(τ−τ0)+O(‖δg‖2). (22) Thus $$\partial s_{(\epsilon,\tau)}/\partial \epsilon = \alpha$$ and $$\partial s_{(\epsilon,\tau)}/\partial \tau = \beta$$. If $$g_0$$ is not a unique entropy maximizer, then a similar argument shows that we have 1-sided (directional) derivatives: Lemma 3.2. The function $$s_{(\epsilon,\tau)}$$ admits directional derivatives in all directions at all points $$(\epsilon,\tau)$$ in the interior of the phase space. □ Proof Suppose that there are multiple entropy-maximizing graphons at a particular $$(\epsilon_0,\tau_0)$$. Given a vector $$v = (v_\epsilon, v_\tau) \in \mathbb R^2$$, we wish to compute $$s_{(\epsilon_0 + t v_E, \tau_0 + t v_\tau)}-s_{(\epsilon_0, \tau_0)}$$ to first order in $$t$$ for $$t$$ small and positive. As $$t \to 0$$, the optimizing graphon $$g$$ must approach an entropy-maximizing graphon $$g_0$$ with $$\epsilon(g_0)=\epsilon_0$$ and $$\tau(g_0)=\tau_0$$. But then, by (21), $$s(g)-s(g_0) = t(\alpha v_\epsilon + \beta v_\tau)+O(t^2)$$, where $$\alpha$$ and $$\beta$$ depend on the choice $$g_0$$. Among the choices for $$g_0$$, there is one (or more) that maximizes $$\alpha v_E + \beta v_\tau$$, and our directional derivative is that maximal value of $$\alpha v_E + \beta v_\tau$$. ■ Existence of directional derivatives implies the fundamental theorem of calculus, so for fixed $$\epsilon$$ we can write   s(ϵ,τ)=s(ϵ,ϵk)+∫ϵkτβ(gmax(ϵ,τ))dτ, (23) where $$g_{\rm max}(\epsilon,\tau)$$ is the entropy-maximizing graphon at $$(\epsilon,\tau)$$ that maximizes its right derivative (with respect to $$\tau$$). Before proving Theorem 1.1 for $$k$$-stars, we record some properties of the function $$\psi_k(\epsilon, \tilde \epsilon)$$ of (4) and its critical points. Theorem 3.3. For fixed $$k$$ and $$\epsilon$$, there is a unique solution to $$\partial \psi_k'(\epsilon,\tilde \epsilon)/\partial \tilde \epsilon=0$$, which we denote $$\tilde \epsilon=\zeta_k(\epsilon)$$. The function $$\zeta_k$$ is strictly decreasing, with nowhere-vanishing derivative and with fixed point at $$\epsilon=(k-1)/k$$. Furthermore, $$\zeta_k$$ is an involution: $$\tilde \epsilon = \zeta_k(\epsilon)$$ if and only if $$\epsilon = \zeta_k(\tilde \epsilon)$$. Moreover, if $$\zeta_k(\epsilon) \ne \epsilon$$, then $$\psi_k(\epsilon,\epsilon) < \psi_k(\epsilon, \zeta_k(\epsilon))$$ and $$\psi_k(\zeta_k(\epsilon), \zeta_k(\epsilon)) < \psi_k(E, \zeta_k(\epsilon)).$$ □ Even though the proof is elementary we will need some parts of it later, so we give it in the Appendix. Theorem 1.1 for $$k$$-stars Theorem 4.1. Let $$H$$ be the $$k$$-star graph and suppose that $$\epsilon \ne (k-1)/k$$. Then there exists a number $$\tau_0 > \epsilon^k$$ such that for all $$\tau \in (\epsilon^k ,\tau_0)$$, the entropy-optimizing graphon at $$(\epsilon,\tau)$$ is unique and bipodal. The parameters $$(p_{11}, p_{12}, p_{22},c)$$ are analytic functions of $$\epsilon$$ and $$\tau$$. As $$\tau$$ approaches $$\epsilon^k$$ from above, $$p_{22} \to \epsilon$$, $$p_{12} \to \zeta_k(\epsilon)$$, $$p_{11}$$ satisfies $$S_0'(p_{11}) = 2S_0'(p_{12}) - S_0'(p_{22}),$$ and $$c=O(\tau-\epsilon^k)$$. □ Proof The entropy-maximizing graphon for each $$(\epsilon,\tau)$$ is multipodal [6], and the parameters $$\{c_j\}$$ and $$\{p_{ij}\}$$ must satisfy the optimality conditions (18) and (19). The first step of the proof is to estimate the terms in the optimality equations to within $$o(1)$$. This will determine the solutions to within $$o(1)$$ and demonstrate that our optimizing graphon is close to bipodal of the desired form. The second step, based on a separate argument, will show that the optimizer is exactly bipodal. The third step shows that the optimizer is in fact unique. In doing our asymptotic analysis, our small parameter is $$\Delta \tau := \tau - \epsilon^k$$. However, we claim that   Δτ≍‖Δg‖2≍|Δs|, (24) (the notation $$A\asymp B$$ means $$A=O(B)$$ and $$B=O(A)$$) where $$\Delta s := s(g) - S_0(\epsilon)$$ and $$\|\Delta g\|^2$$ is the squared $$L^2$$ norm of $$\Delta g := g - g_0$$, where $$g_0(x,y) = \epsilon$$ (here $$g$$ denotes the graphon as a function $$[0,1]^2 \to [0,1]$$, not a vector of multipodal parameters). However, $$\Delta \tau=O(\|\Delta g\|^2)$$ (adapting the argument of [16], Theorem 3.1 to arbitrary graphs), and $$\|\Delta g \|^2=O(|\Delta s|)$$ (by equation (16) of [16]). By considering a bipodal graphon with $$p_{11}=p_{12}=\zeta_k(\epsilon)$$ and $$p_{22}$$ close to $$\epsilon$$, we see that $$|\Delta s|=O(\Delta \tau)$$. This shows (24). In the rest of the proof, unless otherwise specified, by terms such as “close to” and “small” we mean within $$o(1)$$ as $$\Delta\tau\to 0$$. Order the $$M$$ vertex clusters so that the largest cluster is the last cluster (of length $$c_M$$). By subtracting the equation (19) for $$c_M$$ from the equations for $$c_j$$, we eliminate $$\gamma$$ from our equations:   S0′(pij)=α+k2β(dik−1+djk−1)2∑j=1Mcj(S0(pij)−S0(pMj))=2α(di−dM)+β(dik−dMk+k∑j=1Mcjdjk−1(pij−pMj)). (25) Step 1. Since   ‖Δg‖2=∬(g(x,y)−ϵ)2dxdy=∑i,jcicj(pij−ϵ)2, the $$i$$th cluster must either have $$d_i=\sum_j c_jp_{ij}$$ close to $$\epsilon$$ (i.e., within $$o(1)$$), or $$c_i$$ close to zero, or both. We call a cluster Type I if $$c_i$$ is close to 0 and Type II if $$d_i$$ is close to $$\epsilon$$. (If a cluster meets both conditions, we arbitrarily throw it into one camp or the other.) The first equation in (25) implies that, for fixed $$i$$, the values of $$p_{ij}$$ are nearly constant for all $$j$$ of Type II. Since the $$c_j$$’s are small for $$j$$ of Type I, this common value must be close to $$d_i$$. Our equations then simplify to   S0′(di)=α+k2β(dik−1+ϵk−1)+o(1),S0(di)−S0(ϵ)=α(di−ϵ)+β[dik−ϵk+kϵk−1(di−ϵ)]+o(1). (26) Since $$d_M = \epsilon + o(1)$$, the first of those equations applied to $$d_M$$ implies that   α+kϵk−1β=S0′(ϵ)+o(1). (27) We can thus replace $$\alpha$$ with $$S_0'(\epsilon) - k\epsilon^{k-1} \beta + o(1)$$ throughout. This gives the equations:   2(S0′(di)−S0′(ϵ))=kβ(dik−1−ϵk−1)+o(1),2[S0(di)−S0(ϵ)−S0′(ϵ)(di−ϵ)]=β[dik−ϵk−kϵk−1(di−ϵ)]+o(1). (28) There are two solutions to these equations. One is simply to have $$d_i=\epsilon+o(1)$$, in which case both sides of both equations are $$o(1)$$. Indeed, we already know that there must be clusters with $$d_i$$ close to $$\epsilon$$. In looking for solutions with $$d_i$$ not close to $$\epsilon$$, the second equation says that $$\beta = \psi_k(\epsilon,d_i)+o(1)$$. In this case we can divide the first equation by the second to eliminate $$\beta$$. This gives an equation that is algebraically equivalent to $$\partial \psi_k(\epsilon,d_i)/ \partial d_i=o(1)$$. In other words, $$d_i$$ must be tending to the unique critical point $$\zeta_k(\epsilon)$$ of $$\psi_k$$, and $$\beta$$ must be tending to the critical value. In fact, the critical point is a maximum of $$\psi_k$$. Remember that $$s_{(\epsilon,\tau)} = s_{(\epsilon,\epsilon^k)} + \int_{\epsilon^k}^{\tau} \beta$$ from (23). Since the computation of $$\beta$$ is independent of $$\Delta \tau$$ (to lowest order), we have $$s_{(\epsilon,\tau)}-s_{(\epsilon,\epsilon^k)} = \beta \Delta \tau + o(\Delta \tau)$$, so maximizing $$\beta$$ is tantamount to maximizing $$s$$. Step 2. We have shown so far that the optimizing graphon is multipodal, with all of the clusters either having $$d_i$$ close to $$\zeta_k(\epsilon)$$ or close to $$\epsilon$$. Furthermore, the clusters with $$d_i$$ close to $$\zeta_k(\epsilon)$$ have total size $$\sum c_i = o(1)$$. We refine our definitions of Types I and II so that all the clusters with $$d_i$$ close to $$\zeta_k(\epsilon)$$ are Type I and all the clusters with $$d_i$$ close to $$\epsilon$$ are Type II. We order the clusters so that the Type I clusters come before Type II, thereby dividing $$[0,1]^2$$ into $$I\times I$$, $$I \times II$$, $$II \times I$$, and $$II \times II$$ quadrants. Since the value of $$g(x,y)$$ is determined by $$d(x)$$ and $$d(y)$$ (and $$\alpha$$ and $$\beta$$), this means that the optimizing graphon is nearly constant (i.e., with pointwise small fluctuations) on each quadrant. Let $$g_b$$ be the bipodal graphon obtained by averaging $$g$$ over each quadrant. That is, $$c$$ is the total size of all the Type I clusters, and the parameters $$p_{11}$$, $$p_{12}$$, and $$p_{22}$$ are chosen such that $$0=\iint_{I\times I}(g(x,y)-p_{11}){\rm d}x\, {\rm d}y= \iint_{I\times II}(g(x,y)-p_{12}) {\rm d}x\, {\rm d}y =\iint_{II\times II} (g(x,y)-p_{22}){\rm d}x\, {\rm d}y$$. Let $$\Delta g_f = g-g_b$$. (The $$f$$ stands for “further.”) We will show that having $$\Delta g_f$$ non-zero is an inefficient way to increase $$\tau$$, that is, $$(s(g)-s(g_b))/(\tau(g)-\tau(g_b))$$ is less than $$\beta$$. By the first equation in (25), $$S_0'(g(x,y))$$ is the sum of a function of $$x$$ and the same function of $$y$$. This means that there is a function $$F(x)$$ on $$[0,1]$$, with $$\int_I F(x) {\rm d}x = \int_{II} F(x) {\rm d}x =0$$, such that on each quadrant   S0′(g(x,y))=constant+F(x)+F(y). (29) Furthermore, $$F(x)$$ is pointwise small (meaning it approaches 0 pointwise at $$\tau \to \epsilon^k$$), so we can write the Taylor series   S0′(g(x,y))=S0′(gb(x,y)+Δgf(x,y))=S0′(gb(x,y))+S0″(gb(x,y))Δgf(x,y)+O(Δgf(x,y)2). (30) Since $$S_0'(g(x,y))$$ is not a linear function of $$g(x,y)$$, the constant in (29) is not exactly $$S_0'(g_b(x,y))$$. The correction to $$S_0'(g_b(x,y))$$ is obtained by integrating higher-order terms in the Taylor series (30) over the quadrant, and so is controlled by the squared $$L^2$$ norm of $$F$$. Using (29), on each quadrant we can solve (30) for $$\Delta g_f(x,y)$$ as   Δgf(x,y)={F(x)+F(y)S0″(p11)+O(F2) On I×IF(x)+F(y)S0″(p12)+O(F2) On I×II and II×IF(x)+F(y)S0″(p22)+O(F2) On II×II, (31) where $$O(F^2)$$ is shorthand for terms that are bounded by quadratic functions of $$F(x)$$ and $$F(y)$$ and a quadratic function of the $$L^2$$ norm of $$F$$. Corrections involving $$F(x)$$ and $$F(y)$$ come from higher terms in the Taylor series of $$S_0'(g(x,y))$$, while corrections involving the $$L^2$$ norm come from the average value of $$S_0'(g(x,y))$$ on a quadrant being slightly different from $$S_0'(p_{ij})$$. The resulting changes $$\Delta d_f$$ in the degree function $$d(x)$$ from $$g_b$$ to $$g_b + \Delta g_f$$ are then:   Δdf(x)={F(x)(cS″(p11)+1−cS″(p12))+O(F2)x∈IF(x)(cS″(p12)+1−cS″(p22))+O(F2)x∈II. (32) Next we compute $$\Delta \tau_f:= \tau(g)-\tau(g_b)$$ and $$\Delta s_f := s(g)-s(g_b)$$ to lowest order in $$F$$. If we expand $$\Delta \tau_f$$ and $$\Delta s_f$$ in powers of $$\Delta g_f$$, the linear terms vanish exactly, because $$\iint \Delta g_f$$ is exactly zero on each quadrant. For the quadratic term, we approximate $$\Delta g_f$$ using (31). The resulting errors in the quadratic term, and all of the neglected higher-order terms, are then bounded by the sup norm of $$F$$ times the squared $$L^2$$ norm, which we denote $$O(F^3)$$:   Δsf=12∬S0″(gb(x,y))Δgf(x,y)2+O(Δgf3)dxdy=∬I×IF(x)2+F(y)22S″(p11)+2∬I×IIF(x)2+F(y)22S″(p12)+∬II×IIF(x)2+F(y)22S″(p22)+O(F3)=(cS″(p11)+1−cS″(p12))∫IF(x)2dx+(cS″(p12)+1−cS″(p22))∫IIF(x)2dx+O(F3)Δτf=∫01k(k−1)d(x)k−22(Δd(x))2+O(Δd(x)3)dx=k(k−1)d1k−22(cS″(p11)+1−cS″(p12))2∫IF(x)2dx+k(k−1)d2k−22(cS″(p12)+1−cS″(p22))2∫IIF(x)2dx+O(F3), (33) where $$d_1 = cp_{11} + (1-c)p_{12}$$ and $$d_2 = c p_{12} + (1-c)p_{22}$$ are the values of the degree function for the bipodal graphon $$g_b$$. The ratio $$\Delta s_f/\Delta \tau_f$$ is then a weighted average of   2k(k−1)d1k−2(cS″(p11)+1−cS″(p12))−1 (34) and   2k(k−1)d2k−2(cS″(p12)+1−cS″(p22))−1 (35) with relative weights   d1k−2(cS″(p11)+1−cS″(p12))2∫IF(x)2dx and d2k−2(cS″(p12)+1−cS″(p22))2∫IIF(x)2dx. (36) As $$\tau \to \epsilon^k$$ (and $$c \to 0$$ and $$F \to 0$$), the first ratio being averaged approaches $$\psi_k(\zeta_k(\epsilon), \zeta_k(\epsilon))$$ and the second approaches $$\psi_k(\epsilon, \epsilon)$$. However, both of these numbers are smaller than $$\beta = \psi_k(\epsilon, \zeta_k(\epsilon))$$. We have already established that $${\rm d}s/{\rm d}\tau = \beta + o(1)$$ for changes in $$c$$ that preserve the bipodal structure. This means that, for sufficiently small $$c$$, if we perturb a bipodal graphon to maximize $$s$$ for fixed additional change $$\Delta \tau_f$$, it is better to perturb $$c$$ than to make $$F$$ non-zero. Thus $$F(x)$$ is identically zero, implying that the optimizing graphon is exactly bipodal. Step 3. We have established that the minimizing graphon is bipodal, with $$p_{22} = \epsilon + o(1)$$ and $$p_{12} = \zeta_k(\epsilon) + o(1)$$ . We now show that the form of this graphon is unique. Since the graphon is bipodal, we consider the exact optimality equations for bipodal graphons. The argument then reduces to showing that a certain four-dimensional Jacobian determinant is non-zero. After eliminating $$\gamma$$, we have   S0′(p11)=α+kβd1k−1,S0′(p12)=α+k2β(d1k−1+d2k−1),S0′(p22)=α+kβd2k−1,∂S∂c=α∂ϵ∂c+β∂τ∂c,ϵ=ϵ0,τ=τ0. (37) We use the second and third equations to solve for $$\alpha$$ and $$\beta$$:   α=−S0′(p22)(d2k−1+d1k−1)+2d2k−1S0′(p12)d2k−1−d1k−1,β=2kS0′(p22)−S0′(p12)d2k−1−d1k−1. (38) Plugging this into the first equation then gives   S0′(p11)−2S0′(p12)+S0′(p22)=0. (39) This leaves four equations in four unknowns, which we write as   (f1,f2,f3,f4)=(0,0,ϵ0,τ0), (40) where   f1(p11,p12,p22,c)=S0′(p11)−2S0′(p12)+S0′(p22),f2(p11,p12,p22,c)=∂s∂c−α∂ϵ∂c−β∂τ∂c,f3(p11,p12,p22,c)=c2p11+2c(1−c)p12+(1−c)2p22,f4(p11,p12,p22,c)=cd1k+(1−c)d2k, (41) and where $$\alpha$$ and $$\beta$$ are given by (38). We know a solution when $$\tau_0 = \epsilon_0^k$$, namely $$p_{22}=\epsilon_0$$, $$p_{12} = \zeta_k(\epsilon_0)$$, $$c=0,$$ and $$p_{11} = S_0'{}^{-1}(2S_0'[\zeta_k(\epsilon_0)] - S_0'(\epsilon_0))$$. We will show that $$d f$$ has non-zero determinant at this point. By the inverse function theorem, this implies that, when $$\tau_0$$ is close to $$\epsilon_0^k$$, there is only one value of $$(p_{11},p_{12}, p_{22}, c)$$ close to this point for which $$f(p_{11},p_{12}, p_{22}, c) = (0,0,\epsilon_0, \tau_0)$$. Moreover, the parameters $$(p_{11}, p_{12}, p_{22}, c)$$ depend analytically on $$\epsilon_0$$ and $$\tau_0$$. This will complete the proof. The derivatives of $$f_1$$, $$f_3$$, and $$f_4$$ are:   df1(p11,p12,p22,c)=(S0″(p11),−2S0″(p12),S0″(p22),0),df3(p11,p12,p22,c)=(c2,2c(1−c),(1−c)2,2cp11+2(1−2c)p12−2(1−c)p22),df4(p11,p12,p22,c)=(kc2d1k−1,kc(1−c)(d1k−1+d2k−1),k(1−c)2d2k−1,d1k−d2k+kcd1k−1(p11−p12)+k(1−c)d2k−1(p12−p22)). (42) Evaluating at $$c=0$$ gives   df1(p11,p12,p22,0)=(S0″(p11),−2S0″(p12),S0″(p22),0),df3(p11,p12,p22,0)=(0,0,1,2p12−2p22),df4(p11,p12,p22,0)=(0,0,kp22k−1,p12k−p22k+kp22k−1(p12−p22)). (43) $$df$$ is block triangular, with $$2 \times 2$$ blocks. The lower right block has determinant $$p_{12}^k -p_{22}^k - kp_{22}^{k-1}(p_{12}-p_{22}) = D(p_{22},p_{12})$$, which is non-zero when $$p_{12} \ne p_{22}$$, that is, when $$\epsilon_0 \ne (k-1)/k$$. When $$c=0$$, $$d_1$$ and $$d_2$$ are independent of $$p_{11}$$, as are $$\frac{\partial \epsilon}{\partial c}$$ and $$\frac{\partial \tau}{\partial c}$$, so $$\frac{\partial f_2}{\partial p_{11}} = 0$$. As a result,   det(df)=S0″(p11)D(p22,p12)∂f2∂p12. (44) Since $$S_0''(p_{11})$$ is never zero, and since $$D(p_{22},p_{12})$$ only vanishes when $$p_{12}=p_{22}$$ (i.e., at $$\epsilon_0=(k-1)/k$$), we need only show that $$\frac{\partial f_2}{\partial p_{12}} \ne 0$$. We compute   ∂β∂p12=2k(p22k−1−p12k−1)(−S0″(p12))−(S0′(p22)−S0′(p12))(−(k−1)p12k−2)(p22k−1−p12k−1)2=2k(k−1)p12k−2(S0′(p22)−S0′(p12))−(p22k−1−p12k−1)S0″(p12)(p22k−1−p12k−1)2 (45) at $$c = 0$$. Since $$\alpha = S_0'(p_{22}) - k \beta d_2^{k-1}$$,   ∂α∂p12=−kd2k−1∂β∂p12−k(k−1)βd2k−2∂d2∂p12=−kd2k−1∂β∂p12−k(k−1)d2k−2cβ⇒−kp22k−1∂β∂p12, (46) where $$\Rightarrow$$ denotes a limit as $$c \to 0$$. We also compute   ∂2S∂c∂p12=2(1−2c)S0′(p12)⇒2S0′(p12)∂2ϵ∂c∂p12=2(1−2c)⇒2∂2τ∂c∂p12=k(1−2c)(d1k−1+d2k−1)⇒k(p12k−1+p22k−1). (47) Finally we combine everything:   ∂f2∂p12|c=0=∂2S∂c∂p12−∂α∂p12∂ϵ∂c−α∂2ϵ∂c∂p12−∂β∂p12∂τ∂c−β∂2τ∂c∂p12=2S0′(p12)−2α−βk(p12k−1+p22k−1)+(kp22k−1(2p12−2p22)−(p12k−p22k+kp22k−1(p12−p22)))∂β∂p12. (48) The terms not involving $$\partial \beta/\partial p_{12}$$ all cancel, by the second equation of (37), and we are left with   ∂f2∂p12=−D(p12,p22)∂β∂p12. (49) Finally, we need to show that $$\partial \beta/\partial p_{12} \ne 0$$. Since $$p_{12}$$ maximizes $$\psi_k(p_{22},p_{12})$$ for fixed $$p_{22}$$, we must have (referring to the notation of the proof of Theorem 3.3) $$(N/D)'=0$$, or equivalently $$N'/D' = N/D$$, where we write $$\psi_k = N/D$$, as above. But $$\beta = N'/D'$$. If $$\partial \beta/\partial p_{12}$$ were equal to zero, then we would have $$N''/D'' = N'/D'$$. But we have previously shown that it is impossible to simultaneously have $$N/D = N'/D' = N''/D''$$, except at $$p_{12} = p_{22} = (k-1)/k$$, so $$\partial \beta/\partial p_{12}$$ must be non-zero whenever $$\epsilon_0 \ne (k-1)/k$$. This makes $$\det(df)$$ non-zero at $$(p_{11},\zeta_k(\epsilon_0), \epsilon_0,0)$$, so the solutions near this point are unique and analytic in $$(\epsilon,\tau)$$. ■ 5 Theorem 1.1 for $$k$$-starlike graphs Now suppose that $$H$$ is a $$k$$-starlike graph with $$\ell$$ edges, and with $$n_k$$ vertices of degree $$k$$, and let $$\tau$$ be the density of $$H$$ and $$\tau_k$$ be the density of $$k$$-stars. Our first result relates $$\Delta \tau := \tau - \epsilon^\ell$$ to $$\Delta \tau_k := \tau_k-\epsilon^k$$. Lemma 5.1. If $$g$$ is an entropy-maximizing graphon for $$(\epsilon,\tau)$$ with $$\tau > \epsilon^\ell$$, then $$\Delta \tau = n_k \epsilon^{\ell-k} \Delta \tau_k + O(\Delta \tau_k^{3/2})$$. □ Proof Writing $$g(x,y) = \epsilon + \Delta g(x,y)$$, we have   τ=∫dx∏g(xi,xj)=∫dx∏(ϵ+Δg(xi,xj)), (50) where there is a variable $$x_i$$ for each vertex of $$H$$ and the product is over all edges in $$H$$. Expanding the product in the integrand, we get a sum of terms: The leading order term $$\epsilon^\ell$$. Terms with one factor of $$\Delta g$$. These integrate to zero, since $$\iint \Delta g(x,y){\rm d}x\, {\rm d}y = \Delta \epsilon = 0$$. Terms with two or more factors of $$\Delta g$$, all coming from edges that share a fixed vertex of degree $$k$$. Up to an overall power of $$\epsilon^{\ell-k}$$, these are identical to the terms of order 2 and higher in $$\Delta g$$ in the expansion of $$\Delta \tau_k$$. As such, they add up to $$\epsilon^{\ell-k} \Delta \tau_k$$. Summing over the vertices of $$H$$ then gives $$n_k \epsilon^{\ell-k} \Delta \tau_k$$. Terms with two or more factors of $$\Delta g$$, corresponding to edges that do not all share a vertex. For each such term, let $$\{ e_i \}$$ denote the edges corresponding to factors of $$\Delta g$$. We classify these further into three sub-cases: If one of the $$e_i$$’s is disconnected from the rest, then the term is identically zero, since $$\iint \Delta g(x,y){\rm d}x\,{\rm d}y=0$$. If $$\{ e_i \}$$ consists of two or more connected components (each with at least two edges), then the term is a power of $$\epsilon$$ times the product of integrals, one for each connected component. However, each such integral is $$O(\|\Delta g\|^2)$$, so the term is $$O(\|\Delta g\|^4)$$. If there is a single connected component whose edges do not all share a vertex, then $$\{e_i\}$$ must contain three edges that either form a chain or a triangle. We bound such a term by taking absolute values of the $$\Delta g$$’s for the three edges and replacing all other factors of $$\Delta g$$ by 1. The resulting bound is a power of $$\epsilon$$ times either $$\iiiint |\Delta g(w,x)| |\Delta g(x,y)| |\Delta g(y,z)| {\rm d}w\, {\rm d}x \, {\rm d}y \, {\rm d}z$$ for a chain or $$\iiint |\Delta g(x,y)| |\Delta g(y,z)| |\Delta g(z,x)|{\rm d}x \, {\rm d}y \, {\rm d}z$$ for a triangle, either of which is bounded by that power of $$\epsilon$$ times $$\| \Delta g \|^3$$. Thus $$\Delta \tau = n_k \epsilon^{\ell-k} \Delta \tau_k + O(\|\Delta g\|^3)$$. Since $$g$$ is entropy maximizing, $$\Delta \tau_k$$ goes as $$\|\Delta g\|^2$$, so the error is $$O(\Delta \tau_k^{3/2})$$. ■ 5.1 Proof of Theorem 1.1 Since $$\Delta \tau$$ is proportional to $$\Delta \tau_k$$ (plus small errors), the problem of optimizing $$\Delta s/\Delta \tau$$ is a small perturbation of the problem of optimizing $$\Delta s/ \Delta \tau_k$$, or equivalently optimizing $$\Delta s$$ for fixed $$\Delta \tau_k$$, which we solved in Theorem 4.1. Since that problem has a unique optimizer, any optimizer for $$\Delta s/\Delta \tau$$ must come close to optimizing $$\Delta s/\Delta \tau_k$$, and so must be close to the bipodal graphon derived in Theorem 4.1. We can thus write $$g = g_b + \Delta g_f$$, as in the last steps of the proof of Theorem 4.1, where $$g_b = \epsilon + \Delta g_b$$ is a bipodal graphon with $$p_{22} = \epsilon +o(1)$$ and $$p_{12} = \zeta_k(\epsilon) +o(1)$$ and where $$\Delta g_f$$ is a function that averages to zero on each quadrant of $$g_b$$. We again use the convention that words like “small” and “close to” and “negligible” refer to quantities which tend to zero as $$\Delta\tau:=\tau-\varepsilon^{\ell}$$ tends to zero. A quantity is “nearly constant” if it is constant up to an $$o(1)$$ correction. Lemma 5.2. The function $$\Delta g_f$$ is pointwise small. That is, as $$\tau \to \epsilon^\ell$$, $$\Delta g_f$$ goes to zero in sup-norm. □ Proof of Lemma. Since we are no longer in the setting where the entropy maximizer is proven to be multipodal, we cannot use the equations (25) directly. However, we can still apply the method of Lagrange multipliers to pointwise variations of the graphon. (See [6] for a rigorous justification.) These variational equations are   12ln⁡(1g(x,y)−1)=δsδg(x,y)=α+βδτδg(x,y). (51) We need to compute $$\delta \tau/\delta g$$ and show that it is nearly constant on each quadrant. Since $$\alpha$$ and $$\beta$$ are constants, (51) would then imply that $$g(x,y)$$ is nearly constant on each quadrant, and hence that $$\Delta g_f$$ is pointwise small. Let $$g_0(x,y)\equiv\epsilon$$. Since $$\| \Delta g \|$$ is small (where $$\Delta g =g-g_0 = \Delta g_b + \Delta g_f$$), we can find a small constant $$a=o(1)$$ such that, for all $$x$$ outside a set $$U\subset[0,1]$$ of measure $$a$$, $$\int_0^1 |\Delta g(x,y)|{\rm d}y < a$$. (This set $$U$$ is essentially what we previously called the Type I clusters, but at this stage of the argument we are not assuming a multipodal structure. Rather, we are just using the fact that $$\tau - e^\ell = O(\| \Delta g\|^2)$$.) The functional derivative $$\delta \tau/\delta g(x,y)$$ has a diagrammatic expansion similar to the expansion of $$\tau$$ in (50). For each edge of $$H$$, we get a contribution by deleting the edge, assigning the values $$x$$ and $$y$$ to the endpoints of the edge, and integrating over the values of all other vertices. Since $$U$$ is small, we can estimate $$\delta \tau/\delta g$$ to within $$o(1)$$ by restricting the integral to $$(U^c)^{v-2}$$, where $$v$$ is the number of vertices in $$H$$ and $$U^c$$ is the complement of $$U$$. This implies that terms involving $$\Delta g$$ can only contribute non-negligibly on edges connected to $$x$$ or to $$y$$. Furthermore, they can only contribute non-negligibly when attached to $$x$$ if $$x \in U$$, and can only contribute non-negligibly when attached to $$y$$ if $$y \in U$$. We now begin a bootstrap argument. We will show that $$\delta \tau/\delta g$$ is nearly constant on each quadrant $$U^c\times U^c,U\times U^c, U\times U$$ in turn. This will show that $$g$$ is nearly constant on that quadrant, which will help us prove that $$\delta \tau/\delta g$$ is nearly constant on the next quadrant. The simplest case is when $$x$$ and $$y$$ are both in $$U^c$$. Then the contributions of the terms involving $$\Delta g$$ are negligible, so $$\delta \tau/\delta g(x,y)$$ can be computed, to within a small error, using the approximation $$g \approx g_0$$. But when $$\Delta g$$ is negligible, $$\delta \tau/\delta g(x,y)$$ is nearly independent of $$x$$ and $$y$$. Since $$\delta \tau/\delta g(x,y)$$ is nearly constant on $$U^c \times U^c$$, equation (51) implies that $$g$$ is nearly constant on $$U^c \times U^c$$. Next suppose that $$y \in U^c$$ and $$x \in U$$. Then all contributions from factors of $$\Delta g(z,y)$$ are negligible, so $$\delta \tau/\delta g(x,y)$$ is nearly independent of $$y$$. But then $$g(x,y)$$ is nearly independent of $$y$$, and is nearly equal to $$d(x)$$. The integrals involved in computing $$\delta \tau/\delta g(x,y)$$ are then easily approximated to within $$o(1)$$, using $$g_0 + \Delta g$$ on the edges connected to $$x$$, $$g_0$$ on all other edges, and only integrating over $$(U^c)^{v-2}$$. If the degree of $$x$$ is $$k$$, then the edges connected to $$x$$ contribute $$d(x)^{k-1} e^{\ell-k}$$. Summing over edges, and symmetrizing over the assignment of $$x$$ and $$y$$ to the two endpoints, we obtain the approximation   δτδg(x,y)=knkϵℓ−k2(d(x)k−1+d(y)k−1)+o(1). (52) Up to an overall factor of $$n_k \epsilon^{\ell-k}$$, this is the same functional derivative as for a $$k$$-star. This also applies if $$x \in U^c$$, except that in the latter case $$d(x) \approx \epsilon$$, and also applies if $$x \in U^c$$ and $$y \in U$$. In other words, we can use the approximation (52) in (51) whenever either$$x$$ or $$y$$ (or both) is in $$U^c$$. This implies that the integrated equations (26) apply for all $$x$$ (with $$d_i$$ replaced by $$d(x)$$, and with $$\beta$$ scaled up by $$n_k \epsilon^{\ell-k}$$). Following the exact same argument as in the proof of Theorem 4.1, we obtain that $$d(x)$$ only takes on two possible values (up to $$o(1)$$ errors), namely $$\epsilon$$ and $$\zeta_k(\epsilon)$$. We then define Types I and II points, depending on whether the degree function is close to $$\zeta_k(\epsilon)$$ or $$\epsilon$$, respectively, and can take $$U$$ to be precisely the set of Type I points. Our graphon is then nearly constant on $$U \times U^c$$ and $$U^c \times U$$, as well as on $$U^c \times U^c$$. We still need to show that the graphon is nearly constant on $$U \times U$$. Suppose that $$x$$ and $$y$$ are in $$U$$. Since $$g(x,z)$$ is nearly independent of $$x$$ for $$z$$ in $$U^c$$, and since $$\delta \tau/\delta g(x,y)$$ is computed to within $$o(1)$$ by integrating over $$(U^c)^{v-2}$$, $$\delta \tau/\delta g(x,y)$$ is nearly independent of $$x \in U$$, and likewise nearly independent of $$y \in U$$. But then $$g(x,y)$$ is nearly constant on $$U \times U$$. Note, by the way, that the approximation (52) does not apply on $$U \times U$$; in that case $$\delta \tau/\delta g$$ contains terms with powers of both $$d(x)$$ and $$d(y)$$. However, that approximation is not needed for our proof, since $$U \times U$$ (aka the $$I$$-$$I$$ quadrant) only contributes $$O(c)$$ to the integrated equations (26). ■ Returning to the proof of Theorem 1.1, we need to compare $$s(g_b + \Delta g_f) - s(g_b)$$ to $$\tau(g_b+\Delta g_f)-\tau(g_b)$$. As before, we expand $$\tau(g)$$ as the integral of a polynomial in $$g$$, obtained by assigning $$g_0 + \Delta g_b + \Delta g_f$$ to each edge of $$H$$ and integrating. The difference between $$\tau(g_b + \Delta g_f)$$ and $$\tau(g_b)$$ consists of terms with at least one $$\Delta g_f$$. However, the terms with exactly one $$\Delta g_f$$ are identically zero, since $$g_b$$ is constant on quadrants, and $$\Delta g_f$$ averages to zero on each quadrant. Furthermore, terms for which all of the $$\Delta g_b$$’s and $$\Delta g_f$$’s share a vertex are exactly what we would get from the approximation $$\Delta \tau \approx n_k \epsilon^{\ell-k}\tau_k$$. Any term that distinguishes between $$\Delta \tau$$ and $$n_k \epsilon^{\ell-k} \Delta \tau_k$$ must have at least two $$\Delta g_f$$’s and either a third $$\Delta g_f$$ or a $$\Delta g_b$$, forming either a 3-chain, a triangle, or two connected $$\Delta g_f$$’s and a disconnected $$\Delta g_b$$. Let $$\Delta g_f'(x,y) = |\Delta g_f(x,y)|$$, and let   Δgb′(x,y)={2cx,y∈II,1otherwise. (53) This is conveniently expressed in terms of outer products. Let $$| 1 \rangle \in L^2([0,1])$$ be the constant function 1, and let $$|\omega \rangle$$ be the function   ω(x)={0x<c,1x>c. (54) Then   Δgb′=|1⟩⟨1|−|ω⟩⟨ω|+2c|ω⟩⟨ω|=|1⟩⟨1−ω|+|1−ω⟩⟨ω|+2c|ω⟩⟨ω|. (55) Note that $$|\Delta g_b(x,y)| \le \Delta g_b'(x,y)$$ for all $$x,y \in (0,1)$$. To see this, the only issue is what happens when $$(x,y)$$ is in the $$II$$-$$II$$ quadrant, since otherwise we trivially have $$|\Delta g_b| \le 1$$. Since $$e(g)$$ is fixed, $$(1-c)^2$$ times $$\Delta g_b(x,y)$$ for $$x,y > c$$ equals minus the integral of $$\Delta g_b$$ over the other three quadrants. But the area of those three quadrants is $$2c-c^2 < 2c$$, and the biggest possible value of $$|\Delta g_b|$$ is $$\max(e,1-e)<1$$, so $$\frac{1}{(1-c)^2} \int |\Delta g_b|$$ (integrated over the $$I$$-$$I$$, $$I$$-$$II$$, and $$II$$-$$I$$ quadrants) is strictly less than $$2c+O(c^2)$$, and so is bounded by $$2c$$ for small $$c$$ (note that $$O(c^2)$$ errors are negligible). We obtain upper bounds on the contributions of the relevant terms in the expansion of $$\tau$$ by replacing three $$\Delta g_f(x,y)$$’s and $$\Delta g_b(x,y)$$’s with $$\Delta g_f'(x,y)$$ and $$\Delta g_b'(x,y)$$, respectively, and replacing all other terms with $$1$$. Since all graphons are symmetric, hence Hermitian, their operator norms are bounded by their $$L^2$$ norms, so for any 3-chain   ⟨1|Δg1′Δg2′Δg3′|1⟩≤‖Δg1′‖‖Δg2′‖‖Δg3′‖. (56) Since $$\| \Delta g_b' \|$$ and $$\| \Delta g_f'\|$$ are both $$o(1)$$ (more precisely, $$O(\sqrt{\tau-\epsilon^\ell}))$$, the contribution of any 3-chain is bounded by an $$o(1)$$ constant times $$\| \Delta g_f \|^2$$. As for triangles, $${\rm Tr}(\Delta g_f'^3) \le \| \Delta g_f' \|^{3} = \| \Delta g_f \|^3$$. Finally, we must estimate the trace of $$\Delta g_f' \Delta g_f' \Delta g_b'$$. But this trace is   ⟨1−ω|Δgf′Δgf′|1⟩+⟨ω|Δgf′Δgf′|1−ω⟩+2c⟨ω|Δgf′Δgf′|ω⟩. (57) Since $$\| 1 - \omega\| =\sqrt{c}$$, the total is bounded by $$(2\sqrt{c} + 2c^2) \| \Delta g_f\|^2$$. The upshot is that the ratio of $$s(g_b + \Delta g_f) - s(g_b)$$ and $$\tau(g_b+\Delta g_f)-\tau(g_b)$$ is the same as that computed for $$k$$-stars (up to an overall factor of $$n_k \epsilon^{\ell-k}$$), plus an $$o(1)$$ correction. But that ratio was bounded by a constant $$\beta_0 < \beta$$. Restricting attention to values of $$\tau$$ for which the correction is smaller than $$(\beta-\beta_0)/2$$, we still obtain the result that having a non-zero $$\Delta g_f$$ is a less efficient way of generating additional $$\tau$$ than simply changing $$c$$. Thus the optimizing graphon is exactly bipodal. Once bipodality is established, uniqueness follows exactly as in the proof of Theorem 4.1. The difference between $$\Delta \tau$$ and $$n_k \epsilon^{\ell-k} \Delta \tau_k$$ is of order $$c^{3/2}$$, and so does not affect the linearization of the optimality equations at $$c=0$$. 6 Linear combinations of $$k$$-stars We proved Theorem 1.1 by first showing that $$k$$-star models have the desired behavior, and then showing that, for an arbitrary $$k$$-starlike graph $$H$$, $$\Delta \tau$$ is well-approximated by a multiple of $$\Delta \tau_k$$, so the model with densities of edges and $$H$$ behaves essentially the same as a model with densities of edges and $$k$$-stars. To prove Theorem 1.2, we consider in this section a family of models in which we can prove bipodality and uniqueness of entropy maximizers directly, as we did for $$k$$-stars. In the next section, we will show how to approximate a model with an arbitrary $$H$$ with a model in this family. Let $$h(x) = \sum_{k\ge 1} a_k x^k$$ be a polynomial with non-negative coefficients and degree $$\ge 2$$. Let $$\tau = \sum a_k \tau_k$$, and consider graphs with fixed edge density $$\epsilon$$ and fixed $$\tau$$. In [6] it was proved that the entropy-maximizing graphons in such models are always multipodal. Most of the analysis of $$k$$-star models carries over to positive linear combinations, and so will only be sketched briefly. We will provide complete details where the arguments differ. In analogy to the notation of the proof of Theorem 3.3, let $$\psi(\epsilon, \tilde \epsilon) = N/D$$, where   N(ϵ,ϵ~)=2[S0(ϵ~)−S0(ϵ)−(ϵ~−ϵ)S0′(ϵ)],D(ϵ,ϵ~)=h(ϵ~)−h(ϵ)−(ϵ~−ϵ)h′(ϵ). (58) Since $$h''(x)$$ is positive for $$x>0$$, $$D$$ is only zero when $$\tilde \epsilon=\epsilon$$, and we fill in that removable singularity in $$\psi$$ by defining $$\psi(\epsilon,\epsilon) = 2 S_0''(\epsilon)/h''(\epsilon)$$. Theorem 6.1. For all but finitely many values of $$\epsilon$$, there is a $$\tau_0 > h(\epsilon)$$ such that, for $$\tau \in (h(\epsilon), \tau_0)$$, the entropy-optimizing graphon is bipodal and unique, with data varying analytically with $$\epsilon$$ and $$\tau$$. As $$\tau$$ approaches $$h(\epsilon)$$ from above, $$p_{22} \to \epsilon$$, $$p_{12}$$ approaches a point $$\tilde \epsilon$$ where $$\psi'(\epsilon,\tilde \epsilon)=0$$, $$p_{11}$$ satisfies $$S_0'(p_{11})=2S_0'(p_{12}) - S_0'(p_{22})$$, and $$c \to 0$$ as $$O(\Delta \tau)$$. □ Proof For a multipodal graphon, $$\tau(g) = \sum c_i h(d_i)$$. After eliminating $$\gamma$$, the optimality equations become   S0′(pij)=α+β(h′(di)+h′(dj))/2, (59)  2∑j=1cj(S0(pij)−S0(pMj))=2α(di−dM)+β[h(di)−h(dM)+∑j=1Mcjh′(dj)(pij−pMj)]. (60) As before, we distinguish between Type I clusters that are small and Type II clusters that have $$d_i \approx \epsilon$$. Summing the optimality equations over $$j$$ of Type II, and approximating $$d_j$$ by $$\epsilon$$, we obtain the equations   S0′(di)=α+β(h′(di)+h′(ϵ))/2+o(1), (61)  S0(di)−S0(ϵ)=α(di−ϵ)+β[h(di)−h(ϵ)+h′(ϵ)(di−ϵ)]+o(1). (62) We use the first equation, with $$i=M$$ (a type II cluster), to solve for $$\alpha$$, and plug it into the equations for $$i<M$$ to get   2(S0′(di)−S0′(ϵ))=β(h′(di)−h′(ϵ))+o(1), (63)  2[S0(di)−S0(ϵ)−S0′(ϵ)(di−ϵ)]=β[h(di)−h(ϵ)−h′(ϵ)(di−ϵ)]+o(1). (64) As before in the proof of Theorem 4.1, this implies that either $$d_i \approx \epsilon$$ or that $$\psi(\epsilon, d_i)$$ is maximized with respect to $$d_i$$. Unlike in the $$k$$-star case, it is not true that $$\psi'(\epsilon,\tilde \epsilon)$$ has a unique solution for each $$\epsilon$$. However, it remains true that $$\psi(\epsilon,\tilde \epsilon)$$ has a unique global maximizer (w.r.t. $$\tilde \epsilon$$) for all but finitely many values of $$\epsilon$$. Since the equations defining multiple maxima are analytic, they must be satisfied either for all $$\epsilon$$ or for only finitely many $$\epsilon$$. But it is straightforward to check that there is only one maximizer when $$\epsilon$$ is sufficiently small, since then $$h(\epsilon)$$ and $$h'(\epsilon)$$ are dominated by the lowest order term in the polynomial. Thus, for all but finitely many values of $$\epsilon$$, the values of $$d_i$$ must all either approximate $$\epsilon$$ or the unique value of $$\tilde \epsilon$$ that maximizes $$\psi(\epsilon, \tilde \epsilon)$$. This allows for a re-segregation of the clusters into Type I (with $$d_i$$ close to $$\tilde \epsilon$$) and Type II (with $$d_i$$ close to $$\epsilon$$) and yields a graphon that is approximately bipodal. Step 2 of the proof of Theorem 4.1, proving that the optimizing graphon is exactly bipodal with data of the desired form, then proceeds exactly as before. What remains is showing that the optimizing graphon is unique by linearizing the exact optimality equations for bipodal graphons near $$c=0$$. These equations are:   S0′(p11)=α+βh′(d1),S0′(p12)=α+β(h′(d1)+h′(d2))/2,S0′(p22)=α+βh′(d2),∂S∂c=α∂ϵ∂c+β∂ϵ∂c,ϵ=ϵ0,τ=τ0. (65) Using the second and third equations to eliminate $$\alpha$$ and $$\beta$$ gives:   α=2h′(d2)S0′(p12)−S0′(p22)(h′(d2)+h′(d1))h′(d2)−h′(d1),β=2(S0′(p22)−S0′(p12))h′(d2)−h′(d1). (66) We also have $$\alpha = S_0'(p_{22})-\beta h'(d_2)$$ and $$S_0'(p_{11}) = 2S_0'(p_{12})-S_0'(p_{22})$$. Note that   ∂α∂p12=−βch″(d2)−h′(d2)∂β∂p12⇒−h′(p22)∂β∂p12 (67) as $$c \searrow 0$$. We define $$f(p_{11},p_{12},p_{22},c)=(f_1,f_2,f_3,f_4)$$ as before, with $$f_3=\epsilon$$ and $$f_4=\tau$$, and compute   df3=(c2,2c(1−c),(1−c)2,2cp11+2(1−2c)p12−2(1−c)p22)⇒(0,0,1,2(p12−p22)),df4=(c2h′(d1),c(1−c)(h′(d1)+h′(d2)),(1−c)2h′(d2),h(d1)−h(d2)+ch′(d1)(p11−p12)+h′(d2)(p12−p22))⇒(0,0,h′(p22),h(p12)−h(p22)+h′(p22)(p12−p22)). (68) The lower right block of $$df$$ then gives a contribution of $$h(p_{12})-h(p_{22}) + h'(p_{22}) (p_{12}-p_{22}) - 2h'(p_{22})(p_{12}-p_{22}) = h(p_{12})-h(p_{22}) - h'(p_{22})(p_{12}-p_{22})=D(p_{22},p_{12})$$. As before, $$\frac{\partial f_2}{\partial p_{11}} = 0$$ when $$c=0$$, so $$\det(df) = S_0''(p_{11})(h(p_{11})-h(p_{22}) - h'(p_{22}) (p_{12}-p_{22})) \frac{\partial f_2}{\partial p_{11}}.$$ Now   ∂f2∂p12=∂2S∂c∂p12−α∂2ϵ∂c∂p12−β∂2τ∂c∂p12−∂α∂p12∂ϵ∂c−∂β∂p12∂τ∂c. (69) Since $$\alpha$$ and $$\beta$$ are independent of $$c$$, the first three terms are   ∂∂c(∂S∂p12−α∂ϵ∂p12−β∂τ∂p12)=∂∂c(0)=0, (70) by the second equation of (65). This leaves   ∂f2/∂p12=(h′(p22)(2p12−2p22)−(h(p12)−h(p22)+h′(p22)(p12−p22)))∂β/∂p12. (71) Combining with our earlier results, we have:   det(df)=−S0″(p11)D(p22,p12)2∂β∂p12. (72) The expression $$D(p_{22},p_{12}) = h(p_{12})-h(p_{22}) - h'(p_{22}) (p_{12}-p_{22})$$ has a double root at $$p_{12}=p_{22}$$ and is non-zero elsewhere, thanks to the monotonicity of $$h'$$. As a last step, we consider when $$\frac{\partial \beta}{\partial p_{12}}$$ can be zero. Since $$\beta = N'/D'$$, we are interested in when $$(N'/D')'=0$$. But that is equivalent to having $$N''/D'' = N'/D'$$. Since we already have $$N/D=N'/D'$$, this means that $$\psi''=(N/D)''=0$$. Since we are looking at the value of $$\tilde \epsilon$$ that maximizes $$\psi$$, having $$\psi'=\psi''=0$$ would imply $$\psi'''=0$$ (or else $$\tilde \epsilon$$ would only be a point of inflection, and not a local maximum). But if $$(N/D)'=(N/D)''=(N/D)'''=0$$, then $$N/D=N'/D' = N''/D'' = N'''/D'''$$. Note that $$N''$$, $$N'''$$, $$D''$$, and $$D'''$$ are functions of $$\tilde e$$ only, and are rational functions:   N″=2S0″(e~)=−1e~−11−e~,N‴=2S0‴(e~)=1e~2−1(1−e~)2,D″=h″(e~),D‴=h‴(e~). (73) Setting $$D''N'''=D'''N''$$ gives a polynomial equation for $$\tilde \epsilon$$, which has only finitely many roots. Since the equation $$\psi'=0$$ is symmetric is $$\epsilon$$ and $$\tilde \epsilon$$, $$\tilde \epsilon$$ determines $$\epsilon$$, so there are only finitely many values of $$\epsilon$$ for which $$\frac{\partial \beta}{\partial p_{12}}$$ is zero. In summary, we exclude the finitely many values of $$\epsilon$$ for which $$\psi$$ achieves its maximum more than once, and the finitely many values of $$\epsilon$$ for which $$\frac{\partial \beta}{\partial p_{12}}=0$$. For all other values of $$\epsilon$$, the optimizing graphon is bipodal of the prescribed form and unique. ■ 7 Proof of Theorem 1.2 The proof has three steps. Step 1. Showing that, for fixed $$\epsilon$$, $$\Delta \tau$$ can be approximated by the change in a positive linear combination of $$\tau_k$$’s. Step 2. Defining a set $$B_H \subset (0,1)$$ of “bad values,” determined by analytic equations, such that for all $$\epsilon \not \in B_H$$ and for $$\tau$$ close enough to $$\epsilon^\ell$$, the optimizing graphon is unique and bipodal and of the desired form. Step 3. Showing that $$B_H$$ is finite. Step 1. This is a repetition of the proof of Lemma 5.1. In the expansion of $$\Delta \tau$$, we get a contribution $$n_k \epsilon^{\ell-k} \Delta \tau_k$$ from diagrams where all the edges associated with $$\Delta g$$ are connected to a vertex of degree $$k$$, where $$n_k$$ is the number of vertices of $$H$$ of degree $$k$$. Summing over $$k$$, and bounding the remaining terms by $$O(\| \Delta g\|^3)$$, as before, we have   Δτ=∑knkϵℓ−kΔτk+O(Δτ3/2). (74) Step 2. For fixed $$\epsilon$$, we consider a model whose density is $$\sum_k n_k \epsilon^{\ell-k} \tau_k$$. As long as $$\psi(\epsilon,\tilde \epsilon)$$ for this model achieves its maximum at a unique value of $$\tilde \epsilon$$, and as long as $$\partial \beta/\partial p_{12} \ne 0$$ when $$p_{12}$$ equals this value of $$\tilde\epsilon$$, the proofs of Theorems 1.1 and 6.1 carry over almost verbatim. That is, the model problem has a unique bipodal maximizer by the reasoning of Theorem 6.1. The entropy maximizer for the actual problem involving $$H$$ must approximate the entropy maximizer for the model problem, and in particular must be approximately bipodal, and so can be written as $$g_b + \Delta g_f$$, where $$\Delta g_f$$ averages to zero on each quadrant. The same arguments as in the proof of Theorem 1.1 show that $$\Delta g_f$$ is pointwise small. By a power series expansion, $$\frac{s(g_b + \Delta g_f)-s(g_b)}{\tau(g_b + \Delta g_f)-\tau(g_b)}<\beta$$, so for small $$c$$ we can increase the entropy by setting $$\Delta g_f$$ to zero and varying the bipodal data to achieve the correct value of $$\tau$$. Step 3. For any fixed $$\epsilon$$, the model problem has only a finite number of bad values of $$\epsilon$$, but this is not enough to prove that $$B_H$$ is finite. Rather   BH={ϵ|ϵ is one of the bad points for the model with ak=nkϵℓ−k}, (75) where a value of $$\epsilon$$ is bad for a model if either $$\psi$$ has multiple maxima or if $$\partial \beta/\partial p_{12}=0$$. Since the bad points for any linear combination of $$k$$-stars depends analytically on the coefficients of that linear combination, and since these coefficients are powers of $$\epsilon$$, the set $$B_H$$ is cut out by analytic equations in $$\epsilon$$. As such, $$B_H$$ is either the entire interval $$(0,1)$$, or a finite set, or a countable set with limit points only at 0 and/or 1. We will show that neither $$0$$ nor $$1$$ is a limit point of $$B_H$$, implying that $$B_H$$ is finite. Let $$k_{\rm max}$$ be the largest degree of any vertex in $$H$$, and consider the model problem with $$h(x) = \sum_{k=2}^{k_{\rm max}} a_k x^k$$, where $$a_k = n_k \epsilon^{\ell - k}$$. We begin with some constraints on the values of $$\tilde \epsilon$$ for which $$\psi'=0$$. Lemma 7.1. Suppose that $$\psi'(\epsilon,\tilde \epsilon)=0$$. If $$\tilde \epsilon=\epsilon$$, or if $$\partial \beta/\partial p_{12}=0$$ when $$p_{22}=\epsilon$$ and $$p_{12}=\tilde \epsilon$$, then $$({1}/{2}) \le \tilde \epsilon \le ({k_{\rm max}-1})/{k_{\rm max}}$$. □ Proof of Lemma In both cases we are looking for solutions to $$N'' D'''=N''' D''$$. Since $$N'' = 2 S_0''(\tilde \epsilon)$$, $$N''' = 2 S_0'''(\tilde \epsilon)$$, $$D'' = h''(\tilde \epsilon)$$, and $$D'''=h'''(\epsilon)$$, this equation does not involve $$\epsilon$$ (except insofar as the coefficients of $$h$$ depend on $$\epsilon$$). We have   2S0‴(ϵ~)2S0″(ϵ~)=h‴(ϵ~)h″(ϵ~),11−ϵ~−1ϵ~=h‴(ϵ~)h″(ϵ~),2ϵ~−11−ϵ~=ϵ~h‴(ϵ~)h″(ϵ~),11−ϵ~−2=∑k(k−1)(k−2)akϵ~k−2∑k(k−1)akϵ~k−2. (76) The right-hand side of the last line is a weighted average of $$k-2$$ with weights $$k(k-1) a_k \tilde \epsilon^{k-2}$$, and so is at least zero and at most $$k_{\max}-2$$. Thus $$(1-\tilde \epsilon)^{-1}$$ is between 2 and $$k_{\max}$$ and $$\tilde \epsilon$$ is between $$1/2$$ and $$({k_{\max}-1})/{k_{\max}}$$. ■ Lemma 7.2. If $$\psi'(\epsilon, \tilde \epsilon)=0$$, and if $$\epsilon$$ is sufficiently close to 1, then $$\tilde \epsilon$$ is uniquely defined and approaches 0 as $$\epsilon \to 1$$. Likewise, if $$\epsilon$$ is sufficiently close to 0, then $$\tilde \epsilon$$ is uniquely defined and approaches 1 as $$\epsilon \to 0$$. □ Proof When $$\epsilon < 1/2$$, or when $$\epsilon > ({k_{\max}-1})/{k_{\max}}$$, we cannot have $$\tilde \epsilon = \epsilon$$, so the equation $$\psi'=0$$ is equivalent to $$ND'=DN'$$ and $$\tilde \epsilon \ne \epsilon$$. Writing $$DN'-ND'=0$$ explicitly, and doing some simple algebra, yields the equation   S0′(ϵ)[h(ϵ~)−h(ϵ)−(ϵ~−ϵ)h′(ϵ)]−S′(ϵ~)[[h(ϵ~)−h(ϵ)−(ϵ~−ϵ)h′(ϵ~)]+(S0(ϵ~)−S0(ϵ))(h′(ϵ~)−h′(ϵ))=0. (77) If $$\epsilon$$ approaches 0 or 1 and $$\tilde \epsilon$$ does not, then the first term diverges, while the other terms do not, insofar as $$S_0'$$ has singularities at 0 and 1 but $$S_0$$, $$h$$, and $$h'$$ do not. Thus $$\tilde \epsilon$$ must go to 0 or 1 as $$\epsilon$$ goes to 0 or 1. We next rule out the possibility that both $$\epsilon$$ and $$\tilde \epsilon$$ approach 1. Suppose that $$\epsilon$$ is close to 1. We expand both $$N$$ and $$D$$ in powers of $$(\tilde \epsilon - \epsilon)$$:   N=∑m=2∞2S0(m)(ϵ)m!(ϵ~−ϵ)m=−∑m=2∞(1(1−ϵ)m−1+(−1)mϵm−1)(ϵ~−ϵ)m(m−1),D=∑m=2kmaxh(m)(ϵ)m!(ϵ~−ϵ)m, (78) where $$S_0^{(m)}$$ and $$h^{(m)}$$ denote $$m$$th derivatives. The coefficients of the numerator grow rapidly with $$m$$, while the growth of the coefficients of the denominator depend only on the degree of $$h$$. For $$\tilde \epsilon > \epsilon > (k_{\max} - 1)/{k_{\max}}$$, $$\psi = N/D$$ is a decreasing function of $$\tilde \epsilon$$ (i.e., negative and increasing in magnitude), so we cannot have $$\psi'=0$$. Since the equation $$\psi'=0$$ is symmetric in $$\epsilon$$ and $$\tilde \epsilon$$ (apart from the dependence of the coefficients of $$h$$ on $$\epsilon$$), we also cannot have $$\epsilon > \tilde \epsilon > (k_{\max}-1)/{k_{\max}}$$. When $$\epsilon$$ is close to 1, we must thus have $$\tilde \epsilon$$ close to 0. But then $$N \approx 2 S_0'(\epsilon)$$, $$D \approx h'(\epsilon)-h(\epsilon)$$, $$D' \approx - h'(E)$$, and the equation   2S0′(ϵ~)=N′+2S0′(ϵ)=2S0′(ϵ)+ND′/D (79) determines $$S_0'(\tilde \epsilon)$$, and therefore $$\tilde \epsilon$$, uniquely as a function of $$\epsilon$$. Next we consider $$\epsilon \to 0$$. If $$H$$ is 2-starlike, then $$\psi$$ is a multiple of $$\psi_2$$, and the result is already known. Otherwise, it is convenient to define a new polynomial $$\bar h(z) = \sum n_k z^k$$, so that $$h(x) = \epsilon^\ell \bar h(x/\epsilon)$$. Then   D=h(ϵ~)−h(ϵ)−h′(ϵ)(ϵ~−ϵ)=ϵℓ[h¯(r)−h¯(1)−h¯′(1)(r−1)], (80) where $$r := \tilde \epsilon/\epsilon$$. Likewise,   N=−[ϵ~ln⁡(ϵ~)−ϵln⁡(ϵ)+(1−ϵ~)ln⁡(1−ϵ~)−(1−ϵ)(1−ϵ~)−(ϵ~−ϵ)(ln⁡(ϵ)−ln⁡(1−ϵ))]. (81) Since $$\epsilon$$ and $$\tilde \epsilon$$ are small, we can approximate $$\ln(1-\epsilon)$$ and $$\ln(1-\tilde \epsilon)$$ as $$-\epsilon$$ and $$-\tilde \epsilon$$, respectively, giving   N≈−ϵ[rln⁡r−r+1]+ϵ2(r−r2). (82) The ratio $$\psi = N/D$$ is negative. Since $$\bar h$$ is a polynomial of degree at least 3, $$D$$ grows faster than $$N$$ as $$r \to \infty$$, so we can always increase $$\psi$$ by taking larger and larger values of $$r = \tilde \epsilon/\epsilon$$. This argument only breaks down when the approximation $$\ln(1-\tilde E) \approx -\tilde \epsilon$$ breaks down, that is, at values of $$\tilde \epsilon$$ that are no longer close to 0. Thus we cannot have $$\tilde \epsilon$$ and $$\epsilon$$ both close to zero. Finally, if $$\epsilon$$ is close to 0 and $$\tilde \epsilon$$ is close to 1, then $$h(\epsilon)$$ and $$h'(\epsilon)$$ are close to zero, while $$h(\tilde \epsilon)$$ is close to a multiple of $$x^{k_{\max}}$$, since the coefficient of $$x^{k_{\max}}$$ is $$O(1/\epsilon)$$ larger than any other coefficient. Thus $$\psi$$ behaves like $$\psi_{k_{\max}}$$, and has a unique maximizer. ■ We have shown that when $$\epsilon$$ is close to 0 or 1, $$\psi$$ has a unique maximizer. Furthermore, $$\tilde \epsilon$$ is not between $$1/2$$ and $$({k_{\max}-1})/{k_{\max}}$$, so $$\partial \beta/\partial p_{12} \ne 0$$. So $$\epsilon \not \in B_H$$, completing Step 3 and the proof of Theorem 1.2. 8 Conclusions We have shown that just above the ER curve, entropy maximizing graphons, constrained by the densities of edges and any one other subgraph $$H$$, exhibit the same qualitative behavior for all $$H$$ and for (almost) all values of $$\epsilon$$. The optimizing graphon is unique and bipodal. These results were proven by perturbation theory, using the fact that the optimizing graphon has to be $$L^2$$-close to a constant (Erdős–Rényi) graphon. Surprisingly, the optimizing graphon is not pointwise close to constant. Rather, it is bipodal, with a small cluster of size $$O(\Delta \tau)$$. As $$\Delta \tau$$ approaches 0, the size of the small cluster shrinks, but the values of the graphon on each quadrant do not approach one another. Rather, $$p_{22}$$ approaches $$\epsilon$$, $$p_{12}$$ approaches the value of $$\tilde \epsilon$$ that maximizes a specific function $$\psi(\epsilon, \tilde \epsilon)$$, and $$p_{11}$$ satisfies $$S_0'(p_{11}) - 2 S_0'(p_{12}) + S_0'(p_{22})=0$$. Finally, the asymptotic behavior of these graphons as $$\tau \to \epsilon^\ell$$ depends only on the degree sequence of $$H$$. In particular, the cases where $$H$$ is a triangle and when $$H$$ is a 2-star are asymptotically the same. This is illustrated in Figure 2. Since $$\Delta \tau_{\text{triangle}} \approx 3 \epsilon \Delta \tau_2$$, the optimizing graphon for the 2-star model with $$\epsilon = 0.4$$ and $$\Delta \tau_2=0.002$$ should resemble the optimizing graphon for the triangle model with $$\epsilon=0.4$$ and $$\Delta \tau_{\text{triangle}}=0.0024$$. These optimizing graphons are obtained using the algorithms we developed in [14] without assuming bipodality. Numerical estimates indicate that the optimizing graphons are not exactly the same, thanks to $$O(\Delta \tau_2^{3/2})$$ corrections to $$\Delta \tau_{\text{triangle}}$$, but are still qualitatively similar. Fig. 2. View largeDownload slide Numerical estimates of the optimizing graphon for the 2-star model with $$\epsilon=0.4$$ and $$\tau_2=0.1620$$ (left) and the optimizing graphon for the triangle model with $$\epsilon=0.4$$ and $$\tau_{\text{triangle}}=0.0664$$ (right). (Although theoretically we have not tried to prove that $$\Delta\tau_2 = 0.002$$ is small enough to fit into the interval provided by Theorem 1.1, numerically it appears to be the case.) Fig. 2. View largeDownload slide Numerical estimates of the optimizing graphon for the 2-star model with $$\epsilon=0.4$$ and $$\tau_2=0.1620$$ (left) and the optimizing graphon for the triangle model with $$\epsilon=0.4$$ and $$\tau_{\text{triangle}}=0.0664$$ (right). (Although theoretically we have not tried to prove that $$\Delta\tau_2 = 0.002$$ is small enough to fit into the interval provided by Theorem 1.1, numerically it appears to be the case.) Funding This work was supported by the Simons Foundation [grant 327929 to R.K.] and National Science Foundation (NSF) [grants DMS-1208191, DMS-1509088, DMS-1321018, and DMS-1101326]. Acknowledgments The computational results shown in this work were obtained on the computational facilities in the Texas Super Computing Center (TACC). We gratefully acknowledge this computational support. Conflict of Interest I (Richard Kenyon) am an associate editor of IMRN. Appendix: Proof of Theorem 3.3 Proof Fix $$k \ge 2$$ and let   N(ϵ,ϵ~)=2[S0(ϵ~)−S0(ϵ)−S0′(ϵ)(ϵ~−ϵ)]D(ϵ,ϵ~)=ϵ~k−ϵk−kϵk−1(ϵ~−ϵ) (A1) be the numerator and denominator of the function $$\psi_k(\epsilon, \tilde \epsilon) = N/D$$. These definitions make sense for all real values of $$k$$, not just for integers. When taking derivatives of $$N$$, $$D,$$ and $$\psi$$, we will denote a derivative with respect to the first variable by a dot, and a derivative with respect to the second variable by $${}'$$. That is, $$D'(\epsilon,\tilde \epsilon) =\partial D/\partial \tilde \epsilon$$ and $$\dot D(\epsilon, \tilde \epsilon) = \partial D/\partial \epsilon$$. As noted earlier, this definition of $$\psi_k$$ has a removable singularity at $$\tilde \epsilon = \epsilon$$, which we fill in by defining   ψk(ϵ,ϵ)=N″(ϵ,ϵ)/D″(ϵ,ϵ)=2S0″(ϵ)/[k(k−1)ϵk−2]. (A2) The denominator $$D$$ vanishes only at $$\tilde \epsilon = \epsilon$$. Some useful explicit derivatives are:   N′=2[S0′(ϵ~)−S0′(ϵ)],N″=2S0″(ϵ~)=−1ϵ~(1−ϵ~),N˙=−2S0″(ϵ)(ϵ~−ϵ),N˙′=−2S0″(ϵ),D′=k[ϵ~k−1−ϵk−1],D″=k(k−1)ϵ~k−2,D˙=−k(k−1)ϵk−2(ϵ~−ϵ),D˙′=−k(k−1)ϵk−2. (A3) Note that $$D$$ and $$N$$ both vanish when $$\tilde \epsilon = \epsilon$$, so we can write   N(ϵ,ϵ~)=∫ϵϵ~N′(ϵ,x)dx=∫ϵ~ϵN˙(x,ϵ~)dx, (A4) and similarly for $$D(\epsilon, \tilde \epsilon)$$. We proceed in steps: Step 1. Analyzing $$\psi$$ near $$\tilde \epsilon = \epsilon$$ to see that $$\psi_k'(\epsilon,\epsilon) = 0$$ only when $$\epsilon = (k-1)/k$$. Step 2. Showing that we can never have $$\psi_k'=\psi_k''=0$$. Step 3. Showing that the equation $$\psi_k'(\epsilon, \tilde \epsilon)$$ is symmetric in $$\epsilon$$ and $$\tilde \epsilon$$, implying that $$\zeta_k$$ is an involution. Step 4. Showing that $$\psi_k$$ has a unique critical point. Step 5. Showing that $${\rm d} \zeta_k/{\rm d}\epsilon$$ is never zero. Step 6. Showing that $$\psi_k(\epsilon, \zeta_k(\epsilon)) > \max(\psi_k(\epsilon,\epsilon), \psi_k(\zeta(\epsilon), \zeta(\epsilon)))$$. The following calculus fact will be used repeatedly. When $$D \ne 0$$, $$\psi_k'=0$$ is equivalent to $${N}/{D} = {N'}/{D'}$$, and $$\psi_k'=\psi_k''=0$$ is equivalent to $${N}/{D} = {N'}/{D'} ={N''}/{D''}$$. This follows from the quotient rule:   ψ′=DN′−ND′D2,ψ″=DN″−ND″D2−2D′(DN′−ND′)D3. (A5) Step 1. Since $$N$$ and $$D$$ have double roots at $$\tilde \epsilon = \epsilon$$, we can do a Taylor series for both of them near $$\tilde \epsilon = \epsilon$$:   ψk(ϵ,ϵ~)=N″(ϵ,ϵ)(ϵ~−ϵ)2/2+N‴(ϵ,ϵ)(ϵ~−ϵ)3/6+⋯D″(ϵ,ϵ)(ϵ~−ϵ)2/2+D‴(ϵ,ϵ)(ϵ~−ϵ)3/6+⋯=N″(ϵ,ϵ)+N‴(ϵ,ϵ)(ϵ~−ϵ)/3+⋯D″(ϵ,ϵ)+D‴(ϵ,ϵ)(ϵ~−ϵ)/3+⋯. (A6) $$\psi_k'(\epsilon,\epsilon)=0$$ is then equivalent to   N″(ϵ,ϵ)D‴(ϵ,ϵ)=N‴(ϵ,ϵ)D″(ϵ,ϵ)−k(k−1)(k−2)ϵk−3ϵ(1−ϵ)=−k(k−1)ϵk−2(1−2ϵ)ϵ2(1−ϵ)2(k−2)(1−ϵ)=1−2ϵkϵ=k−1. (A7) Step 2. If $$\psi_k'=\psi_k''=0$$, then we must have $$N'D''=D'N''$$ and $$ND''=DN''$$. We will explore these in turn. We write   0=N′D″−D′N″=∫ϵ~ϵD″(ϵ,ϵ~)N˙′(x,ϵ~)−N″(ϵ,ϵ~)D˙′(x,ϵ~)dx. (A8) Explicitly, this becomes   0=∫ϵ~ϵk(k−1)ϵ~(1−ϵ~)x(1−x)[ϵ~k−1(1−ϵ~)−xk−1(1−x)]dx. (A9) The function $$x^{k-1}(1-x)$$ has a single maximum at $$x=(k-1)/k$$. If both $$\epsilon$$ and $$\tilde \epsilon$$ are on the same side of this maximum, then the integrand will have the same sign for all $$x$$ between $$\tilde \epsilon$$ and $$\epsilon$$, and the integral will not be zero. Thus we must have $$\epsilon < (k-1)/k < \tilde \epsilon$$, or vice-versa, and we must have $$\epsilon^{k-1}(1-\epsilon) < \tilde \epsilon^{k-1}(1-\tilde \epsilon)$$. In this case the integrand changes sign exactly once. Now we apply the same sort of analysis to the other equation:   0=ND″−DN″=∫ϵ~ϵD″(x,ϵ~)N˙(x,ϵ~)−N″(x,ϵ~)D˙(x,ϵ~)dx. (A10) Explicitly, this becomes   0=∫ϵ~ϵk(k−1)ϵ~(1−ϵ~)x(1−x)[ϵ~k−1(1−ϵ~)−xk−1(1−x)](ϵ~−x)dx. (A11) This is the same integral as before, only with an extra factor of $$(\tilde \epsilon - x)$$. If we view the first integral (A9) as a mass distribution (with total mass zero), then the second integral is (minus) the first moment of this mass distribution relative to the endpoint $$\tilde \epsilon$$. But we have already seen that the distribution changes sign exactly once, and so must have a non-zero first moment. This is a contradiction. Step 3. If $$ND'=DN'$$, then $$N/D = N'/D'$$. Call this common ration $$r$$. Then   N=rD and N′=rD′. (A12) Note that $$N'$$ and $$D'$$ are odd under interchange of $$\epsilon$$ and $$\tilde \epsilon$$, so the second equation is invariant under this interchange. Furthermore, we have $$(\tilde \epsilon-\epsilon)N' -N = r [ (\tilde \epsilon - \epsilon)D' - D]$$. However, $$(\tilde \epsilon - \epsilon)N' - N$$ is the same as $$N$$ with the roles of $$\epsilon$$ and $$\tilde \epsilon$$ reversed, while $$(\tilde \epsilon - \epsilon)D' - D$$ is the same as $$D$$ with the roles of $$\epsilon$$ and $$\tilde \epsilon$$ reversed. Thus the two equations are satisfied for $$(\epsilon, \tilde \epsilon)$$ if and only if they are satisfied for $$(\tilde \epsilon, \epsilon)$$. Step 4. For $$k=2$$ we explicitly compute that $$\psi_2'=0$$ only at $$\tilde \epsilon = 1-\epsilon$$. If $$k_{\rm min}$$ is the infimum of all values of $$k$$ for which $$\psi_k$$ has multiple critical points, then at a critical point of $$\psi_{k_{\rm min}}$$ we must have $$\psi_k'=\psi_k''=0$$, which is a contradiction. Thus $$k_{\rm min}$$ does not exist, and $$\psi_k$$ has a unique critical point for all $$k \ge 2$$. In particular, $$\zeta_k$$ is a well-defined function. Step 5. The function $$\zeta_k$$ is defined by the condition that $$D N' - N D' = 0$$ (and $$\tilde \epsilon \ne \epsilon$$, except when $$\epsilon = (k-1)/k$$). Let $$f(\tilde e, e) = DN' - ND' = D^2 \psi'$$. Moving along the curve $$\tilde \epsilon = \zeta_k(\epsilon)$$ (i.e., $$f=0$$), we differentiate implicitly:   0=df=f˙dϵ+f′dϵ~, (A13) so   dϵ~dϵ=−f˙f′. (A14) We compute $$f' = D N'' - N D''.$$ This is non-zero by Step 2. We also have   f˙=DN˙′−N˙D′+D˙N′−ND˙′=−2S0″(ϵ)(D−(ϵ~−ϵ)D)′+k(k−1)ϵk−2(N−(ϵ~−ϵ)N′)=2S0″(ϵ)[ϵk−ϵ~k+k(ϵ~−ϵ)ϵ~k−1]−2k(k−1)ϵk−2[S0(ϵ)−S0(ϵ~)+(ϵ~−ϵ)S0′(ϵ~)]=D(ϵ~,ϵ)N″(ϵ~,ϵ)−N(ϵ~,ϵ)D″(ϵ~,ϵ). (A15) That is, $$\dot f$$ is the same as $$f'$$, only with the roles of $$\epsilon$$ and $$\tilde \epsilon$$ reversed. Since the equation $$f=0$$ is symmetric in $$\epsilon$$ and $$\tilde \epsilon$$, the argument of Step 2 can be repeated to show that $$\dot f \ne 0$$. Since $$d\tilde \epsilon/d\epsilon$$ is never zero, and since $$d \tilde \epsilon/d\epsilon=-1$$ at the fixed point (by symmetry), $$\zeta_k'(\epsilon) = d\tilde \epsilon/d\epsilon$$ must always be negative. Step 6. Since $$\psi_k(\epsilon, \tilde \epsilon)$$ has a single critical point (with respect to $$\tilde \epsilon$$, for fixed $$k$$ and $$\epsilon$$), this critical point must either always be a local maximum or a local minimum, and hence a global maximum or minimum, and the answer must be the same for all $$k$$ and all $$\epsilon$$. By checking a single case (e.g., $$k=2$$ and $$\epsilon$$ approaching 0) it is easy to see that it is a maximum. Thus $$\psi_k(\epsilon, \zeta_k(\epsilon))> \psi_k(\epsilon, \epsilon)$$ for all $$\epsilon \ne (k-1)/k$$. Since the equations for a critical point are symmetric with respect to interchange of $$\epsilon$$ and $$\tilde \epsilon$$, $$\epsilon = \zeta_k(\tilde \epsilon)$$ also gives the unique critical point of $$\psi_k(\epsilon, \tilde \epsilon)$$ with respect to $$\epsilon$$. By considering the limit of $$\psi_k(\epsilon, \tilde \epsilon)$$ as $$\epsilon \to 0$$ or $$\epsilon \to 1$$, it is clear that this critical point is a maximum. Since $$\zeta_k(\zeta_k(\epsilon))=\epsilon$$, this implies that $$\psi_k(\epsilon, \zeta_k(\epsilon))> \psi_k(\zeta_k(\epsilon), \zeta_k(\epsilon))$$. ■ References [1] Aristoff D. and Radin. C. “Emergent structures in large networks.” Journal of Applied Probability  50 (2013): 883– 4. Google Scholar CrossRef Search ADS   [2] Borgs C. Chayes J. and Lovász. L. “Moments of two-variable functions and the uniqueness of graph limits.” Geometry and Functional Analysis  19 (2010): 1597– 4. Google Scholar CrossRef Search ADS   [3] Borgs C. Chayes J. Lovász L. Sós V.T. and Vesztergombi. K. “Convergent graph sequences I: subgraph frequencies, metric properties, and testing.” Advances in Mathematics  219 (2008): 1801– 4. Google Scholar CrossRef Search ADS   [4] Chatterjee S. and Diaconis. P. “Estimating and understanding exponential random graph models.” Annals of Statistics  41 (2013): 2428– 4. Google Scholar CrossRef Search ADS   [5] Chatterjee S. and Varadhan. S. R. S. “The large deviation principle for the Erdős-Rényi random graph.” European Journal of Combinatorics  32 (2011): 1000– 4. Google Scholar CrossRef Search ADS   [6] Kenyon R. Radin C. Ren K. and Sadun. L. “Multipodal structures and phase transitions in large constrained graphs.” arXiv:1405.0599, (2014). [7] Lovász L. and Szegedy. B. “Limits of dense graph sequences.” Journal of Combinatorial Theory Series B  98 (2006): 933– 4. Google Scholar CrossRef Search ADS   [8] Lovász L. and Szegedy. B. “Szemerédi’s lemma for the analyst.” Geometry and Functional Analysis  17 (2007): 252– 4. Google Scholar CrossRef Search ADS   [9] Lovász L. and Szegedy. B. “Finitely forcible graphons.” Journal of Combinatorial Theory Series B  101 (2011): 269– 4. Google Scholar CrossRef Search ADS   [10] Lovász L. Large Networks and Graph Limits.  Providence: American Mathematical Society, 2012. Google Scholar CrossRef Search ADS   [11] Lubetzky E. and Zhao. Y. “On replica symmetry of large deviations in random graphs.” Random Structures and Algorithms  47 (2015): 109– 4. Google Scholar CrossRef Search ADS   [12] Pikhurko O. and Razborov. A. “Asymptotic structure of graphs with the minimum number of triangles.” Combinatorics, Probability and Computing  (2016): 1– 23. ISSN 0963-5483 (In Press). [13] Radin C. and Yin. M. “Phase transitions in exponential random graphs.” Annals of Applied Probability  23 (2013): 2458– 4. Google Scholar CrossRef Search ADS   [14] Radin C. Ren K. and Sadun. L. “The asymptotics of large constrained graphs.” Journal of Physics A: Mathematical and Theoretical  47 (2014): 175001. Google Scholar CrossRef Search ADS   [15] Radin C. and Sadun. L. “Phase transitions in a complex network.” Journal of Physics A: Mathematical and Theoretical  46 (2013): 305002. Google Scholar CrossRef Search ADS   [16] Radin C. and Sadun. L. “Singularities in the entropy of asymptotically large simple graphs.” Journal of Statistical Physics  158 (2015): 853– 4. Google Scholar CrossRef Search ADS   [17] Razborov A. “On the minimal density of triangles in graphs.” Combinatorics, Probability and Computing  17 (2008): 603– 4. Google Scholar CrossRef Search ADS   [18] Turán P. “On an extremal problem in graph theory, (in Hungarian).” Matematikai é s Fizikai Lapok  48 (1941): 436– 4. © The Author(s) 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permission@oup.com.

### Journal

International Mathematics Research NoticesOxford University Press

Published: Feb 1, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create lists to

Export lists, citations