TY - JOUR AU - Li,, Gen AB - Abstract We study a spectral initialization method that serves a key role in recent work on estimating signals in non-convex settings. Previous analysis of this method focuses on the phase retrieval problem and provides only performance bounds. In this paper, we consider arbitrary generalized linear sensing models and present a precise asymptotic characterization of the performance of the method in the high-dimensional limit. Our analysis also reveals a phase transition phenomenon that depends on the ratio between the number of samples and the signal dimension. When the ratio is below a minimum threshold, the estimates given by the spectral method are no better than random guesses drawn from a uniform distribution on the hypersphere, thus carrying no information; above a maximum threshold, the estimates become increasingly aligned with the target signal. The computational complexity of the method, as measured by the spectral gap, is also markedly different in the two phases. Worked examples and numerical results are provided to illustrate and verify the analytical predictions. In particular, simulations show that our asymptotic formulas provide accurate predictions for the actual performance of the spectral method even at moderate signal dimensions. 1. Introduction We consider the problem of estimating an |$n$|-dimensional vector |$\boldsymbol{\xi }$| from a number of generalized linear measurements. Let |$\left \{\boldsymbol{a}_i\right \}_{1 \le i \le m}$| be a set of sensing vectors in |$\mathbb{R}^n$|⁠. Given |$\left \{\boldsymbol{a}_i^\top \boldsymbol{\xi }\right \}$|⁠, the measurements are drawn independently from $$\begin{equation} y_i \sim f(y \, \vert \, \boldsymbol{a}_i^\top \boldsymbol{\xi}), \end{equation}$$ (1.1) where |$f(\cdot \, \vert \, \cdot )$| is a conditional density function modeling the acquisition process. This model arises in many problems in signal processing and statistical learning. Examples include photon-limited imaging [45,50], phase retrieval [17], signal recovery from quantized measurements [39] and various single-index and generalized linear regression problems [16,20]. The standard method for recovering |$\boldsymbol{\xi }$| is to use the estimator $$\begin{equation} \widehat{\boldsymbol{\xi}} = \underset{\boldsymbol{x}}{\textrm{arg}\,\min} \sum_{i=1}^m \ell(y_i, \boldsymbol{a}_i^\top \boldsymbol{x}), \end{equation}$$ (1.2) where |$\ell : \mathbb{R}^2 \rightarrow \mathbb{R}$| is some loss function (e.g., the negative log-likelihood of the observation model as used in maximum likelihood estimation). In many applications, however, the natural loss function is not convex with respect to |$\boldsymbol{x}$|⁠. There is often no effective way to convexify (1.2). In those cases for which convex relaxations do exist, the resulting algorithms can be computationally expensive. The problem of phase retrieval, where |$y_i = (\boldsymbol{a}_i^\top \boldsymbol{\xi })^2 + \varepsilon _i$| for some noise terms |$ \{\varepsilon _i \}$|⁠, is an example in the latter scenario. Convex relaxation schemes such as those based on lifting and semidefinite programming (e.g., [8,10,22,47]) have been successfully developed for solving the phase retrieval problem, but the challenges facing these schemes lie in their actual implementation. In practice, the computational complexity and memory requirement associated with these convex-relaxation methods are prohibitive for signal dimensions that are encountered in real-word applications such as imaging. In light of these issues, there is strong recent interest in developing and analyzing efficient iterative methods that directly solve non-convex forms of (1.2). Examples include the alternating minimization scheme for phase retrieval [36], the Wirtinger Flow algorithm and its variants [9,12,42,48,51], iterative projection methods [13,28] and recent schemes for phase retrieval using linear programming [2,14,15,18]. A common ingredient that contributes to the success of these algorithms for non-convex estimation is that they all use some carefully designed spectral method as an initialization step, which is then followed by further (iterative) refinement. Beyond the signal estimation problem considered in this paper, related spectral methods have also been successfully applied to initialize algorithms for solving other non-convex problems such as matrix completion [25], low-rank matrix recovery [23], blind deconvolution [26,30], sparse coding [1] and joint alignment from pairwise differences [11]. In this paper, we present an exact high-dimensional analysis of a widely used spectral method [9,12,36] for estimating |$\boldsymbol{\xi }$|⁠. The method consists of only two steps: first, construct a data matrix from the sensing vectors and measurements as $$\begin{equation} \boldsymbol{D}_m \overset{\textrm{def}}{=} \frac{1}{m} \sum_{i=1}^m T(y_i) \boldsymbol{a}_i \boldsymbol{a}_i^\top, \end{equation}$$ (1.3) where |$T: \mathbb{R} \rightarrow \mathbb{R}$| is a preprocessing function (e.g., a trimming or truncation step). Secondly, compute a normalized eigenvector, denoted by |$\boldsymbol{x}_1$|⁠, that corresponds to the largest eigenvalue of |$\boldsymbol{D}_m$|⁠. The vector |$\boldsymbol{x}_1$| is then our estimate of |$\boldsymbol{\xi }$| (up to an unknown scalar). It is notable that this method is model-free in that the algorithm does not require the knowledge of the exact acquisition process [i.e., the conditional density |$f(\cdot \, \vert \, \cdot )$| in (1.1)]. The idea of this spectral method can be traced back to the early work of Li [29], under the name of Principal Hessian Directions, for general multi-index models. Similar spectral techniques were also proposed in [23,25], for initializing algorithms for matrix completion. In [36], Netrapalli et al. used this method to address the problem of phase retrieval. Under the assumption that the sensing vectors consist of i.i.d. Gaussian random variables, these authors show that the leading eigenvector |$\boldsymbol{x}_1$| is aligned with the target vector |$\boldsymbol{\xi }$| in direction when there are sufficiently many measurements. More specifically, they show that the squared cosine similarity $$\begin{equation} \rho(\boldsymbol{\xi}, \boldsymbol{x}_1) \overset{\textrm{def}}{=} \frac{({\boldsymbol{\xi}^\top \boldsymbol{x}_1})^2}{\lVert\boldsymbol{\xi}\rVert^2\lVert\boldsymbol{x}_1\rVert^2}, \end{equation}$$ (1.4) which measures the degree of the alignment between the two vectors, approaches |$1$| with high probability, when the number of samples |$m \ge c_1 n \log ^3 n$|⁠. This sufficient condition on sample complexity was later improved to |$m \ge c_2 n \log n$| in [10], and further improved to |$m \ge c_3 n$| in [12] with an additional trimming step on the measurements. In these expressions, |$c_1, c_2, c_3$| stand for some unspecified numerical constants. In this paper, we provide a precise asymptotic characterization of the performance of the spectral method under Gaussian measurements. Our analysis considers general acquisition models under arbitrary conditional distributions |$f(y \,\vert \, \boldsymbol{a}_i^\top \boldsymbol{\xi })$|⁠, of which the phase retrieval problem is a special case. Unlike the previous work, which only provides bounds for |$\rho (\boldsymbol{\xi }, \boldsymbol{x}_1)$|⁠, we derive the exact high-dimensional limit of this value. In particular, we show that, as |$n$| and |$m$| both tend to infinity with the sampling ratio|$\alpha \overset{\textrm{def}}{=} m/n$| kept fixed, the squared cosine similarity |$\rho $| converges in probability to a limit value |$\rho (\alpha )$|⁠. Explicit formulas are provided for computing |$\rho (\alpha )$|⁠. Geometrically, the squared cosine similarity |$\rho (\boldsymbol{\xi }, \boldsymbol{x}_1)$| as defined in (1.4) specifies the angle |$\theta $| between |$\boldsymbol{\xi }$| and |$\boldsymbol{x}_1$|⁠. The values of |$\rho $| vary from |$0$| to |$1$|⁠: |$\rho = 1$| means perfect alignment, i.e., |$\theta = 0$| or |$\pi $| and |$\rho = 0$| is the opposite case, meaning |$\boldsymbol{x}_1$| is orthogonal to (i.e., uncorrelated with) |$\boldsymbol{\xi }$|⁠. That the spectral method can yield an estimate |$\boldsymbol{x}_1$| with a positive |$\rho $| in high-dimensional settings is a non-trivial property. To see this, assume that |$\boldsymbol{\xi }$| is pointing toward the ‘north pole’ in the unit |$(n-1)$|-sphere |$S^{n-1}$|⁠, as illustrated in Fig. 1. If we choose |$\boldsymbol{x}_1$| uniformly at random from |$S^{n-1}$|⁠, then with high probability, the resulting correlation |$\sqrt{\rho (\boldsymbol{\xi }, \boldsymbol{x}_1)}$| will be of order |$\mathcal{O}(1/\sqrt{n})$|⁠. In other words, for large |$n$|⁠, most of the uniform measure on |$S^{n-1}$| is concentrated within a very thin band of width |$\mathcal{O}(1/\sqrt{n})$| near the ‘equator’ of the sphere (see Fig. 1). Fig. 1. Open in new tabDownload slide Illustrations of the phase transitions of the spectral method. Depending on the sampling ratio |$\alpha $|⁠, the asymptotic performance of the spectral method can be in one of two very different phases: in the uncorrelated phase, its estimate |$\boldsymbol{x}_1$| is asymptotically orthogonal to |$\boldsymbol{\xi }$|⁠. The performance in this case is no better than an arbitrary guess drawn uniformly at random from the hypersphere |$S^{n-1}$|⁠. In the correlated phase, the estimate |$\boldsymbol{x}_1$| (or its negative version |$-\boldsymbol{x}_1$|⁠) will be concentrated on the surface of a right circular cone, making an angle |$\theta = \arccos \big (\sqrt{\rho (\alpha )}\big )$| with the target vector |$\boldsymbol{\xi }$|⁠. Fig. 1. Open in new tabDownload slide Illustrations of the phase transitions of the spectral method. Depending on the sampling ratio |$\alpha $|⁠, the asymptotic performance of the spectral method can be in one of two very different phases: in the uncorrelated phase, its estimate |$\boldsymbol{x}_1$| is asymptotically orthogonal to |$\boldsymbol{\xi }$|⁠. The performance in this case is no better than an arbitrary guess drawn uniformly at random from the hypersphere |$S^{n-1}$|⁠. In the correlated phase, the estimate |$\boldsymbol{x}_1$| (or its negative version |$-\boldsymbol{x}_1$|⁠) will be concentrated on the surface of a right circular cone, making an angle |$\theta = \arccos \big (\sqrt{\rho (\alpha )}\big )$| with the target vector |$\boldsymbol{\xi }$|⁠. Our analysis reveals a phase transition phenomenon that occurs at certain critical values of the sampling ratio. In particular, there exist a lower threshold and an upper threshold, denoted by |$\alpha _{c,\min }$| and |$\alpha _{c,\max }$|⁠, respectively, that mark the transitions between two very different phases. (a) An uncorrelated phase takes place when the sampling ratio |$\alpha < \alpha _{c,\min }$|⁠. Within this phase, the limiting value |$\rho (\alpha ) = 0$|⁠, meaning that the estimate from the spectral method is asymptotically uncorrelated with the target vector |$\boldsymbol{\xi }$|⁠. In this case, the spectral method is not effective, as its estimate |$\boldsymbol{x}_1$| is no better than a random guess drawn uniformly from the hypersphere |$S^{n-1}$|⁠. (b) A correlated phase takes place when |$\alpha> \alpha _{c,\max }$|⁠, with |$\alpha _{c,\max }$| being the upper threshold. Within this phase, the limiting value |$\rho (\alpha )> 0$|⁠. Geometrically, the estimate |$\boldsymbol{x}_1$| (or its negative version |$-\boldsymbol{x}_1$|⁠) will be concentrated on the surface of a right-circular cone (see Fig. 1) whose generating lines make an angle |$\theta = \arccos \big (\sqrt{\rho (\alpha )}\big )$| to the target vector |$\boldsymbol{\xi }$|⁠. Moreover, |$\rho (\alpha )$| tends to 1 as |$\alpha \rightarrow \infty $|⁠. In many signal estimation models that we have studied so far, the two thresholds coincide, i.e., |$\alpha _{c,\min } = \alpha _{c,\max }$|⁠, meaning that the phase transition happens at a single critical value of the sampling ratio. However, it is indeed possible that |$\alpha _{c,\min } < \alpha _{c,\max }$|⁠, in which case a finite number of correlated and uncorrelated phases alternative when |$\alpha $| varies within the interval |$(\alpha _{c,\min }, \alpha _{c,\max })$|⁠. A concrete example demonstrating this more complicated situation can be found in Section 4.3. The above phase transition phenomenon also has implications in terms of the computational complexity of the spectral method. In a correlated phase, there is a non-zero gap between the largest and the second largest eigenvalues of |$\boldsymbol{D}_m$|⁠. As a result, the leading eigenvector |$\boldsymbol{x}_1$| can be efficiently computed by using power iterations on |$\boldsymbol{D}_m$|⁠. In contrast, within an uncorrelated phase, the gap of the eigenvalues converges to zero, making power iterations inefficient. The rest of the paper is organized as follows. After precisely laying out the various technical assumptions, we present in Section 2 the main results of this work, stated as Theorem 2.1 and Proposition 2.2. Examples and numerical simulations are also provided there to demonstrate and verify these analytical results. In particular, as a worked example, we derive a universal closed-form expression for the limiting values |$\rho (\alpha )$| for all acquisition models that generate one-bit |$\left \{0, 1\right \}$| measurements. We prove Theorem 2.1 in Section 3. Key to our proof is a deterministic, fixed-point characterization of the squared cosine similarity |$\rho (\boldsymbol{\xi }, \boldsymbol{x}_1)$|⁠, which is valid for any finite dimension |$n$| and for any deterministic sensing vectors |$\left \{\boldsymbol{a}_i\right \}$|⁠. When specialized to Gaussian measurements, this fixed-point characterization allows us to connect our problem to a generalized version of the spiked population model (see, e.g., [5,6,24]) studied in random matrix theory. In Section 4, we look more closely at the phase transition phenomenon predicted by our asymptotic results and prove Proposition 2.2. Section 5 concludes the paper with discussions on possible generalizations and improvements of our results as well as their connections to related work in the literature. Notations: To study the high-dimensional limit of the spectral initialization method, we shall consider a sequence of problem instances, indexed by the ambient dimension |$n$|⁠. For each |$n$|⁠, we seek to estimate an underlying signal denoted by |$\boldsymbol{\xi }_n \in \mathbb{R}^n$|⁠. Formally, we should use |$D_{m(n)}$| to denote the data matrix, where |$m(n)$| is the number of measurements as a function of the dimension |$n$|⁠. However, to lighten the notation, we will simply write it as |$\boldsymbol{D}_m$|⁠, keeping the dependence of |$m$| on |$n$| implicit. |$\boldsymbol{x}_1^n$| stands for a leading eigenvector of |$\boldsymbol{D}_m$|⁠. We use |$\overset{{{\mathcal{P}}}}{\longrightarrow }$| and |$\overset{\textrm{a.s.}}{\longrightarrow }$| to denote convergence in probability and almost sure convergence, respectively. Let |$\boldsymbol{M}$| be a symmetric matrix. Its eigenvalues in descending order are written as |$\lambda _1^{\boldsymbol{M}} \ge \lambda _2^{\boldsymbol{M}} \ge \ldots \ge \lambda _n^{\boldsymbol{M}}$|⁠. In particular, |$\lambda _1^{\boldsymbol{M}}$|⁠, sometimes also written as |$\lambda _1(\boldsymbol{M})$|⁠, denotes the largest eigenvalue of |$\boldsymbol{M}$|⁠. Throughout the paper, |$\boldsymbol{A}^{1/2}$| stands for the principal square root of a positive semi-definite symmetric matrix |$\boldsymbol{A}$|⁠. For any |$a, b \in \mathbb{R}$|⁠, we write |$\max \left \{a, b\right \}$| as |$a \vee b$|⁠. Finally, |$\mathbb{1}_{x \in \mathcal{I}}$| stands for the indicator function of a set |$\mathcal{I}$|⁠. 2. Main Results 2.1 Technical assumptions In what follows, we first state the basic assumptions under which our results are proved. (A.1) The sensing vectors are independent Gaussian random vectors. Specifically, let |$(a_{ij})$|⁠, for |$i, j \ge 1$|⁠, be a doubly infinite array of i.i.d. standard normal random variables. Then the |$i$|th sensing vector |$\boldsymbol{a}_i = [a_{i1}, a_{i2}, \ldots , a_{in}]^\top $|⁠. (A.2) |$m = m(n)$| with |$\alpha _n = m(n) / n \rightarrow \alpha> 0$| as |$n \rightarrow \infty $|⁠. (A.3) |$\lVert \boldsymbol{\xi }_n\rVert = \kappa> 0$|⁠. (A.4) Let |$s, y$| and |$z$| be three random variables such that $$\begin{equation} s \sim \mathcal{N}\,(0, 1),\ \mathbb{P}(y \,\vert\, s) = f(y \, \vert \, \kappa s) \ \textrm{and}\ z = T(y), \end{equation}$$ (2.1) where |$f(\cdot \,\vert \, \cdot )$| is the conditional density function (1.1) associated with the observation model and |$T(\cdot )$| is the preprocessing step used in the construction of |$\boldsymbol{D}_m$| in (1.3). We shall assume that the probability measure of the random variable |$z$| is supported within a finite interval |$[0, \tau ]$|⁠. Throughout the paper, we always take |$\tau $| to be the tightest such upper bound. (A.5) As |$\lambda $| approaches |$\tau $| from the right, $$\begin{equation} \lim_{\lambda \rightarrow \tau^+} \mathbb{E}\frac{z}{(\lambda - z)^2} = \lim_{\lambda \rightarrow \tau^+} \mathbb{E}\frac{z s^2}{\lambda - z} = \infty. \end{equation}$$ (2.2) (A.6) The random variables |$z$| and |$s^2$| are positively correlated: |$\textrm{cov}(z, s^2) = \mathbb{E}\,z s^2 - \mathbb{E}z \, \mathbb{E}s^2> 0$|⁠, which is equivalent to $$\begin{equation} \mathbb{E}\,z s^2> \mathbb{E}z. \end{equation}$$ (2.3) The last three assumptions require some explanations. First, we note that assumption (A.4) requires that |$z$| should take values within a finite interval on the positive axis. This can be enforced by choosing a suitable function |$T(\cdot )$|⁠. For example, in the problem of phase retrieval, the measurement model (⁠|$y = s^2$|⁠) leads to unbounded |$ \{y_i \}$|⁠. We can set $$\begin{equation} z = T(y) = y \, \mathbb{1}_{\lvert y\rvert \le t^2}, \end{equation}$$ (2.4) where |$t> 0$| is some parameter and |$\mathbb{1}_{\lvert y\rvert \le t^2}$| denotes the indicator function for the condition |$\lvert y\rvert \le t^2$|⁠. This is indeed the trimming strategy proposed in [12]. As shown there, this boundedness condition on the support of |$z$| is an essential ingredient in achieving linear sample complexities. The assumption that |$z$| be non-negative is largely made to simplify our analysis, but this restriction can be removed. In a recent work, Mondelli and Montanari [35] extended our results by showing that the same asymptotic predictions presented in this paper still hold under cases where |$z$| can take negative values. See also Remark 2.2 in Section 2.2. In assumption (A.5), the expressions in (2.2) essentially require that the random variable |$z$| should have sufficient probability mass near the upper bound |$\tau $|⁠. Let |$h(z) = \mathbb{E}_{s\, \vert\, z}(s^2 \,\vert\, z)$|⁠. We show in Appendix A.1 that (2.2) holds when there exist some positive constants |$c_0$| and |$\varepsilon $| such that the probability density function |$p_Z(z)$| of |$z$| and the conditional moment |$h(z)$| are both bounded below by |$c_0$| for all |$z \in [\tau - \varepsilon , \tau ]$|⁠. The model in (2.4) represents one such case. Another sufficient condition for (2.2) to hold is when the law of |$z$| has a point mass at |$\tau $|⁠. The acquisition models described in (2.23) and (2.24) in later sections are examples for which this condition is applicable. The inequality in (A.6) is also a natural requirement. To see this, we note that the data matrix |$\boldsymbol{D}_m$| in (1.3) is the sample average of |$m$| i.i.d. random rank-one matrices |$\left \{T(y_i) \boldsymbol{a}_i \boldsymbol{a}_i^\top \right \}_{i \le m}$|⁠. When the number of samples |$m$| is large, this sample average should be ‘close’ to the statistical expectation, i.e., $$\begin{equation} \boldsymbol{D}_m \approx \mathbb{E} (z_i \, \boldsymbol{a}_i \boldsymbol{a}_i^\top), \end{equation}$$ (2.5) where |$z_i \overset{\textrm{def}}{=} T(y_i)$|⁠. To compute the above expectation, it will be convenient to assume that the underlying signal |$\boldsymbol{\xi } = \kappa \boldsymbol{e}_1$|⁠, where |$\boldsymbol{e}_1$| is the first vector of the canonical basis of |$\mathbb{R}^n$|⁠. (This assumption can be made without loss of generality, due to the rotational invariance of the multivariate normal distribution.) Correspondingly, we can partition each sensing vector into two parts, as $$\begin{equation} \boldsymbol{a}_i^\top = \begin{bmatrix} s_i & \boldsymbol{u}_i^\top \end{bmatrix}, \end{equation}$$ (2.6) so that |$\boldsymbol{a}_i^\top \boldsymbol{\xi } = \kappa s_i$| and the conditional density of |$y_i$| given |$s_i$| is |$f(y \,\vert \, \kappa s_i)$|⁠. Since |$s_i, y_i$| and |$z_i$| are all independent of |$\boldsymbol{u}_i$|⁠, $$\begin{align} \mathbb{E} (z_i \, \boldsymbol{a}_i \boldsymbol{a}_i^\top) &= \mathbb{E} \begin{bmatrix} z_i s_i^2 & z_i s_i \boldsymbol{u}_i^\top\\ z_i s_i \boldsymbol{u}_i & z_i \boldsymbol{u}_i \boldsymbol{u}_i^\top \end{bmatrix} \nonumber\\[2pt] &= \begin{bmatrix} \mathbb{E}\,z s^2 & \boldsymbol{0}\\ \boldsymbol{0} & \mathbb{E} z \,\boldsymbol{I}_{n-1} \end{bmatrix}, \end{align}$$ (2.7) where |$\boldsymbol{I}_{n-1}$| is the identity matrix of size |$(n-1)$|⁠. If the inequality |$\mathbb{E}\,z s^2> \mathbb{E}z$|⁠, as required in (A.6), indeed holds, the leading eigenvector of the expected matrix will be |$\boldsymbol{e}_1$|⁠, which is perfectly aligned with the target vector |$\boldsymbol{\xi }$|⁠. Now since the data matrix |$\boldsymbol{D}_m$| is an approximation of the expectation, the sample eigenvector should also be an approximation of |$\boldsymbol{\xi }$|⁠. The above argument provides an intuitive, but non-rigorous explanation, for why the spectral initialization method would work. The approximation in (2.5) can be made exact if the signal dimension |$n$| is kept fixed and the number of measurement |$m$| goes to infinity. However, we consider the case when |$m$| and |$n$| both tend to infinity, at a constant ratio |$\alpha = m/n$| bounded away from |$0$| and |$\infty $|⁠. In this regime, the approximation in (2.5) will not become an equality even if |$m \rightarrow \infty $|⁠. As we will show, the correlation |$\rho (\boldsymbol{\xi }_n, \boldsymbol{x}_1^n)$| between the target vector |$\boldsymbol{\xi }_n$| and the sample eigenvector |$\boldsymbol{x}_1^n$| will converge to a deterministic value |$\rho (\alpha )$| that depends on the sampling ratio |$\alpha $|⁠. A notable exception to (2.3) is when $$\begin{equation} g(s) \overset{\textrm{def}}{=} \mathbb{E}_{z\, \vert\, s}(z\, \vert\, s) \end{equation}$$ (2.8) is an odd function plus some arbitrary constant |$C$|⁠. In this case, |$\mathbb{E} zs^2 = \mathbb{E}[ g(s) s^2] = C$| and |$\mathbb{E} z = \mathbb{E} g(s) = C$| and thus (2.3) does not hold. In practice, this means that the spectral method will not be effective for acquisition models such as |$z = {\operatorname{sign}}(s) + C$|⁠. We will revisit this point in Section 5 where we describe an alternative initialization scheme that can handle such cases. A final remark before we present our main results: since the eigenvector |$\boldsymbol{x}_1^n$| is always normalized, the spectral method cannot provide any information about the norm of |$\boldsymbol{\xi }_n$|⁠. However, in many cases where the sensing vectors are drawn from certain random ensembles, there are simple methods to accurately estimate |$\kappa = \lVert \boldsymbol{\xi }_n\rVert $|⁠. We provide some discussions on how to do this in Appendix A.2. 2.2 Main results: asymptotic characterizations In this section, we summarize the main results of our work on an asymptotic characterization of the spectral method with Gaussian measurements. To state our results, we first need to introduce several helper functions. Let |$s, z$| be the random variables defined in (2.1). We consider two functions $$\begin{equation} \phi(\lambda) \overset{\textrm{def}}{=} \lambda \, \mathbb{E}\frac{ z s^2}{\lambda - z} \end{equation}$$ (2.9) and $$\begin{equation} \psi_\alpha(\lambda) \overset{\textrm{def}}{=} \lambda \, \Big(1/\alpha + \mathbb{E} \frac{z}{\lambda-z}\Big), \end{equation}$$ (2.10) both defined on the open interval |$(\tau , \infty )$|⁠, where |$\tau $| is the bound in assumption (A.4). Within their domains, it is easy to check that both functions are convex. In particular, |$\psi _\alpha (\lambda )$| achieves its minimum at a unique point denoted by $$\begin{equation} \overline{\lambda}_\alpha \overset{\textrm{def}}{=} \underset{\lambda> \tau}{\arg\,\min} \ \psi_\alpha(\lambda). \end{equation}$$ (2.11) Finally, let $$\begin{equation} \zeta_\alpha(\lambda) \overset{\textrm{def}}{=} \psi_\alpha\big(\lambda \vee \overline{\lambda}_\alpha\big) \end{equation}$$ (2.12) be a modification of |$\psi _\alpha (\lambda )$|⁠. This new function is again defined for |$\lambda \in (\tau , \infty )$|⁠. Theorem 2.1 Under (A.1)–(A.6), the following hold: There is a unique solution, denoted by |$\lambda ^\ast _\alpha $|⁠, to the equation $$\begin{equation} \zeta_\alpha(\lambda) = \phi(\lambda), \qquad \lambda> \tau. \end{equation}$$ (2.13) As |$n \rightarrow \infty $|⁠, $$\begin{equation} \rho(\boldsymbol{\xi}_n, \boldsymbol{x}_1^n) \overset{{{\mathcal{P}}}}{\longrightarrow} \begin{cases} 0, &\textrm{if}\ \psi^{\prime}_\alpha(\lambda^\ast_\alpha) < 0, \\ \frac{\psi^{\prime}_\alpha(\lambda^\ast_\alpha)}{\psi^{\prime}_\alpha(\lambda^\ast_\alpha) - \phi^{\prime}(\lambda^\ast_\alpha)}, &\textrm{if}\ \psi^{\prime}_\alpha(\lambda^\ast_\alpha)> 0, \end{cases} \end{equation}$$ (2.14) where |$\psi ^{\prime}_\alpha (\cdot )$| and |$\phi ^{\prime}(\cdot )$| denote the derivatives of the two functions. Let |$\lambda _1^{\boldsymbol{D}_m} \ge \lambda _2^{\boldsymbol{D}_m}$| be the top two eigenvalues of |$\boldsymbol{D}_m$|⁠. $$\begin{equation} \lambda_1^{\boldsymbol{D}_m} \overset{{{\mathcal{P}}}}{\longrightarrow} \zeta_\alpha(\lambda^\ast_\alpha) \quad \textrm{and} \quad \lambda_2^{\boldsymbol{D}_m} \overset{{{\mathcal{P}}}}{\longrightarrow} \zeta_\alpha(\overline{\lambda}_\alpha) \end{equation}$$ (2.15) as |$n \rightarrow \infty $|⁠. Moreover, |$\zeta _\alpha (\lambda ^\ast _\alpha ) \ge \zeta _\alpha (\overline{\lambda }_\alpha )$|⁠, with the inequality becoming strict if and only if|$\psi ^{\prime}_\alpha (\lambda ^\ast _\alpha )> 0$|⁠. Remark 2.1 The above theorem, whose proof is given in Section 3, provides a complete asymptotic characterization of the performance of the spectral method. In particular, the theorem shows that the squared cosine similarity |$\rho (\boldsymbol{\xi }_n, \boldsymbol{x}_1^n)$| converges in probability to a deterministic value in the high-dimensional limit. Moreover, there exists a generic phase transition phenomenon: depending on the sign of the derivative |$\psi ^{\prime}_\alpha (\cdot )$| at |$\lambda ^\ast _\alpha $|⁠, the limiting value can be either zero (i.e., the uncorrelated phase) or strictly positive (i.e., the correlated phase). The computational complexity of the spectral method is also very different in the two phases. Within the uncorrelated phase, the gap between the top two leading eigenvalues, |$\lambda _1^{\boldsymbol{D}_m}$| and |$\lambda _2^{\boldsymbol{D}_m}$|⁠, diminishes to zero, making iterative algorithms such as power iterations increasingly difficult to converge. In contrast, within the correlated phase, the spectral gap converges to a positive value. Remark 2.2 The results of this work were first reported in [31,32]. When this paper was under review, the characterizations given in Theorem 2.1 were further extended by Mondelli and Montanari in [35]. In particular, these authors extended our asymptotic predictions from the real-valued case to the complex-valued case, and more importantly, they showed that the same predictions still hold under cases where the variable |$z$| defined in assumption (A.4) can take negative values. See [35, Lemma 2] for details. It will be more convenient to characterize the phase transitions predicted by Theorem 2.1 in terms of the sampling ratio |$\alpha $|⁠. To do so, we first introduce a set |$\varLambda $|⁠, containing all the zero-crossings of the function |$\varDelta (\lambda ) = \lambda \, \mathbb{E} \frac{z}{(\lambda -z)^2} - \mathbb{E}\frac{zs^2}{\lambda - z}$| on the open interval |$(\tau , \infty )$|⁠. We can show that |$\varLambda $| is always non-empty and that it contains a finite number of points (see Lemma 4.1 in Section 4.1). Let $$\begin{equation*} \lambda_{c, \min} \overset{\textrm{def}}{=} \underset{\lambda \in \varLambda}{\min} \, \lambda \quad \textrm{and} \quad \lambda_{c, \max} \overset{\textrm{def}}{=} \underset{\lambda \in \varLambda}{\max} \, \lambda \end{equation*}$$ denote the smallest and the largest elements in |$\varLambda $|⁠, respectively. Proposition 2.2 Under (A.1)–(A.6), and as |$n \rightarrow \infty $|⁠, $$\begin{equation*} \rho(\boldsymbol{\xi}_n, \boldsymbol{x}_1^n) \overset{{{\mathcal{P}}}}{\longrightarrow} \begin{cases} 0, &\textrm{if}\ \alpha < \alpha_{c,\min}, \\ \rho(\alpha), &\textrm{if}\ \alpha> \alpha_{c,\max}, \end{cases} \end{equation*}$$ where $$\begin{equation} \alpha_{c,\min}^{-1} = \mathbb{E} \frac{z^2}{(\lambda_{c, \min} - z)^2}, \ \alpha_{c,\max}^{-1} = \mathbb{E} \frac{z^2}{(\lambda_{c, \max} - z)^2}, \end{equation}$$ (2.16) and |$\rho (\alpha )$| is a function with the following parametric representation in terms of a parameter |$\lambda $|⁠: $$\begin{align} 1/\alpha = \mathbb{E}\frac{zs^2 - z}{\lambda - z} \end{align}$$ (2.17) $$\begin{align} 1/\rho = {1 + \Big(\mathbb{E}\frac{zs^2 - z}{\lambda - z} - \mathbb{E}\frac{z^2}{(\lambda - z)^2}\Big)^{-1}\mathbb{E}\frac{z^2 s^2}{(\lambda - z)^2}}, \end{align}$$ (2.18) for all |$\lambda> \lambda _{c,\max }$|⁠. Moreover, |$\rho (\alpha ) \rightarrow 1$| as |$\alpha \rightarrow \infty $|⁠. Remark 2.3 In many of the signal acquisition models we have studied, the set |$\varLambda $| contains exactly one element. In this case, |$\lambda _{c, \min } = \lambda _{c, \max }$| and hence |$\alpha _{c,\min } = \alpha _{c,\max }$|⁠. Consequently, the phase transition of the spectral method takes place at a single threshold value |$\alpha _c$|⁠, which separates the uncorrelated phase from the correlated one. However, it is indeed possible to find cases for which |$\alpha _{c,\min } < \alpha _{c,\max }$|⁠. This leads to a more complicated scenario, where a finite number of correlated and uncorrelated phases can alternatively take place within the interval |$(\alpha _{c,\min }, \alpha _{c,\max })$|⁠. One such example is given in Section 4.3. 2.3 Worked-example: binary models To illustrate the results presented above, we consider here a special case where |$z_i$| takes only binary values |$\left \{0,1\right \}$|⁠. This situation naturally appears in problems such as logistic regression and one-bit quantized sensing, where the measurements |$y_i \in \left \{0, 1\right \}$| and we can set |$z_i = y_i$|⁠. For cases where the measurements |$\left \{y_i\right \}$| are not necessarily binary, this type of one-bit model is still relevant whenever the preprocessing function |$z = T(x)$| generates binary outputs. The simplicity of this setting allows us to obtain closed-form expressions for the various quantities in Theorem 2.1 and Proposition 2.2. To proceed, we first explicitly compute the functions |$\phi (\lambda )$| and |$\psi _\alpha (\lambda )$| defined in Section 2.2 as $$\begin{equation*} \phi(\lambda) = \frac{c \lambda}{\lambda - 1} \quad\textrm{and}\quad \psi_\alpha(\lambda) = \lambda\Big(1/\alpha + \frac{d}{\lambda - 1}\Big), \end{equation*}$$ where $$\begin{equation} c \overset{\textrm{def}}{=} \mathbb{E}\,{z s^2} \quad \textrm{and} \quad d \overset{\textrm{def}}{=} \mathbb{E}\,{z} \end{equation}$$ (2.19) and both functions are defined on the interval |$\lambda> 1$|⁠. The minimum of |$\psi _\alpha (\lambda )$| is achieved as |$\overline{\lambda }_\alpha = 1 + \sqrt{\alpha d}$|⁠, and thus $$\begin{align*} \zeta_\alpha(\lambda) = \begin{cases} \lambda/\alpha + \lambda d/(\lambda-1), &\textrm{for}\ \lambda \ge 1 + \sqrt{\alpha d}\\ (\sqrt{d} + 1/\sqrt{\alpha})^2, &\textrm{for}\ 1 < \lambda < 1 + \sqrt{\alpha d}. \end{cases} \end{align*}$$ Solving equation (2.13) and using (2.14), we get $$\begin{equation} \rho(\boldsymbol{\xi}_n, \boldsymbol{x}_1^n) \overset{{{\mathcal{P}}}}{\longrightarrow} \begin{cases} 0, &\textrm{for}\ \alpha < \alpha_c, \\ \frac{\alpha - d/(c-d)^2}{\alpha + 1/(c-d)}, &\textrm{for}\ \alpha> \alpha_c, \end{cases} \end{equation}$$ (2.20) where |$\alpha _c = \frac{d}{(c-d)^2}$|⁠. (Note that this result can also be obtained by invoking the parametric characterization of |$\rho (\alpha )$| given in Proposition 2.2.) Finally, the asymptotic predictions (2.15) for the top two eigenvalues can be computed as $$\begin{equation} \lambda_1^{\boldsymbol{D}_m} \overset{{{\mathcal{P}}}}{\longrightarrow} \begin{cases} (\sqrt{d} + 1/\sqrt{\alpha})^2, &\textrm{for}\ \alpha < \alpha_c, \\ c + \frac{c}{\alpha(c-d)}, &\textrm{for}\ \alpha> \alpha_c, \end{cases} \end{equation}$$ (2.21) and $$\begin{equation} \lambda_2^{\boldsymbol{D}_m} \overset{{{\mathcal{P}}}}{\longrightarrow} (\sqrt{d} + 1/\sqrt{\alpha})^2 \end{equation}$$ (2.22) for all |$\alpha $|⁠. Remark 2.4 It is interesting to note that the asymptotic characterizations given in (2.20), (2.21) and (2.22) are universal, in the sense that they only depend on the two constants |$c$| and |$d$| defined in (2.19), but not on the exact details of the joint probability distributions of |$s, y$| and |$z$|⁠. Thus, for one-bit models, it suffices to compute the constants in (2.19), which then completely determine the asymptotic performance of the spectral method. 2.4 Numerical simulations Example 2.3 (Logistic regression). Consider the case where |$\left \{y_i\right \}$| are binary random variables generated according to the following conditional distribution: $$\begin{equation} f(y \,\vert\, \boldsymbol{a}^\top \boldsymbol{\xi}_n) \sim \textrm{Bernoulli}\left(\frac{1}{1 + \exp\left\{-\boldsymbol{a}^\top \boldsymbol{\xi}_n + \beta\right\}}\right), \end{equation}$$ (2.23) where |$\beta $| is some constant. Let |$z_i = T(y_i) = y_i$|⁠. Since |$z_i \in \left \{0, 1\right \}$|⁠, we just need to compute the constants |$c$| and |$d$| in (2.19), after which we can use the closed-form expressions (2.20), (2.21) and (2.22) to obtain the asymptotic predictions. In Fig. 2(a) we compare the analytical prediction (2.20) of the squared cosine similarity with results of numerical simulations. In our experiment, we set the signal dimension to |$n = 4096$|⁠. The norm of |$\boldsymbol{\xi }_n$| is |$\kappa = 3$|⁠, and |$\beta = 6$|⁠. The sample averages and error bars (corresponding to one standard deviation) shown in the figure are calculated over 16 independent trials. We can see that the analytical predictions match numerical results very well. Figure 2(b) shows the top two eigenvalues. When |$\alpha < \alpha _c$|⁠, the two eigenvalues are asymptotically equal, but they start to diverge as |$\alpha $| becomes larger than |$\alpha _c$|⁠. To clearly illustrate this phenomenon, we plot in the insert the eigengap |$\lambda _1 - \lambda _2$| as a function of |$\alpha $|⁠. Fig. 2. Open in new tabDownload slide Analytical predictions vs. numerical simulations for the binary logistic model in (2.23). Numerical results are averaged over 16 independent trials. Fig. 2. Open in new tabDownload slide Analytical predictions vs. numerical simulations for the binary logistic model in (2.23). Numerical results are averaged over 16 independent trials. Fig. 3. Open in new tabDownload slide (a) Analytical predictions vs. numerical simulations for two different algorithms for phase retrieval in the noiseless setting (i.e., |$\sigma = 0$|⁠). (b) The critical sampling ratio |$\alpha _c$| of the two algorithms as functions of the threshold value |$t$|⁠, at two different noise levels |$\sigma = 0$| and |$\sigma = 2$|⁠. Fig. 3. Open in new tabDownload slide (a) Analytical predictions vs. numerical simulations for two different algorithms for phase retrieval in the noiseless setting (i.e., |$\sigma = 0$|⁠). (b) The critical sampling ratio |$\alpha _c$| of the two algorithms as functions of the threshold value |$t$|⁠, at two different noise levels |$\sigma = 0$| and |$\sigma = 2$|⁠. Example 2.4 (Phase retrieval). In the second example, we consider the problem of phase retrieval, where $$\begin{equation*} y_i = (\boldsymbol{a}_i^\top \boldsymbol{\xi}_n)^2 + \sigma \omega_i. \end{equation*}$$ Here, |$\omega _i \sim _{\text{i.i.d.}} \mathcal{N}(0, 1)$| and |$\sigma \ge 0$| is the standard deviation of the noise. In [12], the authors show that it is important to omit large values of |$\left \{y_i\right \}$|⁠, and they propose to use the scheme in (2.4) when constructing the data matrix |$\boldsymbol{D}_m$|⁠. A different strategy can be found in [48], where the authors propose to use $$\begin{equation} z_i = \mathbb{1}_{\lvert y_i\rvert> t^2}. \end{equation}$$ (2.24) In what follows, we shall refer to (2.4) and (2.24) as the trimming algorithm and the subset algorithm, respectively. Figure 3(a) shows the asymptotic performance of these two algorithms and compare them with numerical results (⁠|$n = 4096$| and 16 independent trials). The performance of the subset algorithm (for which we choose the parameter |$t = 1.5$|⁠) can be characterized by the closed-form formula (2.20). The trimming algorithm (for which we use |$t = 3$|⁠) is more complicated as |$z_i$| is no longer binary. We use the parametric characterization in Proposition 2.2 to obtain its asymptotic performance. Again, our analytical predictions match numerical results. The performance of both algorithms clearly depends on the choice of the thresholding parameter |$t$|⁠. To show this, we plot in Fig. 3(b) the critical phase transition points |$\alpha _c$| of both algorithms as functions of |$t$|⁠, at two different noise levels |$\sigma = 0$| and |$\sigma = 2$|⁠. This points to the possibility of using our analytical prediction to optimally tune the algorithmic parameters and, more generally, to optimize the functional form of the preprocessing function |$T(\cdot )$|⁠. Indeed, the optimal design of |$T(\cdot )$| was obtained in a recent work [33], which leverages the asymptotic characterizations given here. Interestingly, under a mild technical condition, it is shown that there exists a simple fixed design that is uniformly optimal over all sampling ratios; see [33, Theorem 1]. 3. Proof of the Main Results In this section, we prove Theorem 2.1, which provides an exact characterization of the asymptotic performance of the spectral method for signal estimation. 3.1 Overview We first rewrite the data matrix |$\boldsymbol{D}_m$| in (1.3) as $$\begin{equation} \boldsymbol{D}_m = \tfrac{1}{m}\boldsymbol{A} \boldsymbol{Z} \boldsymbol{A}^\top, \end{equation}$$ (3.1) where |$\boldsymbol{A} = [\boldsymbol{a}_1, \boldsymbol{a}_2, \ldots , \boldsymbol{a}_m]$| is an |$n \times m$| matrix of i.i.d. normal random variables and $$\begin{equation} \boldsymbol{Z} \overset{\textrm{def}}{=} {\operatorname{diag}}\left\{z_1, z_2, \ldots, z_m\right\} \end{equation}$$ (3.2) is a diagonal matrix with entries |$z_i = T(y_i)$|⁠. Our goal boils down to studying the largest eigenvalue of |$\boldsymbol{D}_m$| and the associated eigenvector |$\boldsymbol{x}_1^n$|⁠. To simplify notation, we shall first assume that |$\boldsymbol{\xi }_n = \kappa \boldsymbol{e}_1$|⁠, with |$\boldsymbol{e}_1$| being the first vector in the canonical basis. Remark 3.1 The non-null eigenvalues of |$\boldsymbol{D}_m$| are equal to those of a companion matrix $$\begin{equation*} \widetilde{\boldsymbol{D}}_m = \tfrac{1}{m} \boldsymbol{Z}^{1/2} \boldsymbol{A}^\top \boldsymbol{A} \boldsymbol{Z}^{1/2}, \end{equation*}$$ which bears strong resemblance to a sample covariance matrix. Limiting spectral distributions (LSDs) of sample covariance matrices have been extensively studied in random matrix theory; see for instance [3] and the references given there. As a special case, when |$\boldsymbol{Z}$| is the identity matrix, the LSD of |$\widetilde{\boldsymbol{D}}_m$| is given by the classical Marčenko-Pastur law [34]. Results for more general diagonal matrices |$\boldsymbol{Z}$| are also available [40]. However, in these studies, |$\boldsymbol{Z}$| and |$\boldsymbol{A}$| need to be independent. A challenge in our problem is that |$\boldsymbol{Z}$| and |$\boldsymbol{A}$| are correlated. To see this, we partition each sensing vector |$\boldsymbol{a}_i$| into two parts as in (2.6). We can then write $$\begin{equation} \boldsymbol{A} = \begin{bmatrix} \boldsymbol{s}^\top\\ \boldsymbol{U} \end{bmatrix}, \end{equation}$$ (3.3) where |$\boldsymbol{s} \overset{\textrm{def}}{=} [s_1, s_2, \ldots , s_m]^\top $| is an |$m$|-dimensional Gaussian random vector, and |$\boldsymbol{U}$| is an |$(n-1) \times m$| matrix consisting of i.i.d. standard normal random variables. Since |$\boldsymbol{\xi }_n = \kappa \boldsymbol{e}_1$|⁠, the diagonal elements of |$\boldsymbol{Z}$| are independent of |$\boldsymbol{U}$|⁠, but they do depend on |$\boldsymbol{s}$| through |$y_i \sim f(y \, \vert \, \kappa s_i)$|⁠. Consequently, we cannot apply existing results on the LSD of sample covariance matrices to our case. Our proof of Theorem 2.1 consists of two main ingredients. First, we will show in Proposition 3.1 that |$\lambda _1^{\boldsymbol{D}_m}$| and |$\rho (\boldsymbol{\xi }_n, \boldsymbol{x}_1^n)$| can be obtained from a fixed-point equation involving a function |$L_m(\mu )$|⁠, to be defined in (3.8), where |$\mu> 0$| is an auxiliary variable. The main benefit of introducing the variable |$\mu $| and the function |$L_m(\mu )$| is that, for each |$\mu> 0$|⁠, the above-mentioned correlation between |$\boldsymbol{A}$| and |$\boldsymbol{Z}$| can be effectively decoupled. This then allows us to obtain the second ingredient of our proof: using results from random matrix theory [5,7], we show in Section 3.3 that |$L_m(\mu )$|⁠, under the assumption of Gaussian sensing vectors, will converge almost surely to a deterministic limit function as the dimension |$n \rightarrow \infty $| (see Proposition 3.3). 3.2 A fixed-point characterization By substituting (3.3) into (3.1), we can write |$\boldsymbol{D}_m$| in a more compact block-partitioned form as where $$\begin{equation} a_m \overset{\textrm{def}}{=} \tfrac{1}{m}\sum_{i = 1}^m z_i s_i^2 \end{equation}$$ (3.5) is a scalar that converges to |$\mathbb{E}zs^2$| as |$m \rightarrow \infty $|⁠, $$\begin{equation} \boldsymbol{P}_m \overset{\textrm{def}}{=} \tfrac{1}{m}\boldsymbol{U} \boldsymbol{Z} \boldsymbol{U}^\top \end{equation}$$ (3.6) is a symmetric matrix, and $$\begin{equation} \boldsymbol{q}_m \overset{\textrm{def}}{=} \tfrac{1}{m}\boldsymbol{U} \boldsymbol{v}\, \ \textrm{with}\, \ \boldsymbol{v} \overset{\textrm{def}}{=} [z_1 s_1, z_2 s_2, \ldots, z_m s_m]^\top. \end{equation}$$ (3.7) Next, we consider a parametric family of matrices |$\left \{\boldsymbol{P}_m + \mu \hspace{0.5pt} \boldsymbol{q}_m \boldsymbol{q}_m^\top : {\mu> 0}\right \}$|⁠, and let |$L_m(\mu )$| denote their largest eigenvalues, i.e., $$\begin{equation} L_m(\mu) \overset{\textrm{def}}{=} \lambda_1(\boldsymbol{P}_m + \mu \boldsymbol{q}_m \boldsymbol{q}_m^\top). \end{equation}$$ (3.8) In what follows, we show how to compute |$\lambda _1^{\boldsymbol{D}_m}$| and |$\rho (\boldsymbol{\xi }_n, \boldsymbol{x}_1^n)$| via a fixed-point equation involving |$L_m(\mu )$|⁠. Since we assume that |$\boldsymbol{\xi } = \kappa \, \boldsymbol{e}_1$| and that the leading eigenvector |$\boldsymbol{x}_1^n$| is normalized, the quantity |$\rho (\boldsymbol{\xi }_n, \boldsymbol{x}_1^n)$| is equal to |$(\boldsymbol{e}_1^\top \boldsymbol{x}_1^n)^2$|⁠, the squared magnitude of the first element of the eigenvector. Our discussions below are general and they apply to any block-partitioned matrix in the form Its components |$a \in \mathbb{R}$|⁠, |$\boldsymbol{P} \in \mathbb{R}^{(n-1)\times (n-1)}$| and |$\boldsymbol{q} \in \mathbb{R}^{n-1}$| can be arbitrarily chosen, not necessarily defined as in (3.5), (3.6) and (3.7). Our only requirements are that |$\boldsymbol{P}$| is a symmetric matrix and that |$\lVert \boldsymbol{q}\rVert \neq 0$|⁠. Let |$\lambda _1^{\boldsymbol{P}} \ge \lambda _2^{\boldsymbol{P}} \ge \ldots \lambda _{n-1}^{\boldsymbol{P}}$| be the set of eigenvalues of |$\boldsymbol{P}$|⁠, and let |$\boldsymbol{w}_1, \boldsymbol{w}_2, \ldots , \boldsymbol{w}_{n-1}$| be a corresponding set of orthonormal eigenvectors. Consider a function $$\begin{equation} R(\lambda) \overset{\textrm{def}}{=} \boldsymbol{q}^\top (\boldsymbol{P} - \lambda \boldsymbol{I})^{-1} \boldsymbol{q} = \sum_{i=1}^{n-1} \frac{(\boldsymbol{w}_i^\top \boldsymbol{q})^2}{\lambda_i^{\boldsymbol{P}} - \lambda}, \end{equation}$$ (3.10) which has poles on those eigenvalues for which |$\boldsymbol{w}_i^\top \boldsymbol{q} \neq 0$|⁠. In what follows, we restrict the domain of |$R(\lambda )$| to $$\begin{equation*} \lambda \in \Big(\max\left\{\lambda_i^{\boldsymbol{P}}: \boldsymbol{w}_i^\top \boldsymbol{q} \neq 0\right\}, \infty\Big). \end{equation*}$$ Within this open interval, |$R(\lambda )$| is a well-defined smooth function. It increases monotonically from |$-\infty $| to |$0$|⁠, and thus it admits a functional inverse, denoted by |$R^{-1}(x)$|⁠, for all |$x < 0$|⁠. Similar to (3.8), we define $$\begin{equation*} L(\mu) = \lambda_1(\boldsymbol{P} + \mu \boldsymbol{q} \boldsymbol{q}^\top) \end{equation*}$$ for all |$\mu> 0$|⁠. Lemma 3.1 Let |$\boldsymbol{P}$| be a symmetric matrix and |$\boldsymbol{q}$| a non-zero vector. Then, for each |$\mu> 0$|⁠, $$\begin{equation} L(\mu) = R^{-1}(-1/\mu) \vee \lambda_1^{\boldsymbol{P}}. \end{equation}$$ (3.11) Moreover, |$L(\mu )$| is a non-decreasing convex function with |$\lim _{\mu \rightarrow \infty }L(\mu ) = \infty $|⁠. It is differentiable everywhere on |$(0, \infty )$| except at (up to) one point. Proof. Since |$\boldsymbol{P}$| is diagonalizable by an orthonormal matrix, we can assume without loss of generality that |$\boldsymbol{P}$| is a diagonal matrix. In this case, we can simply write |$R(\lambda ) = \sum _i \frac{q_i^2}{\lambda _i^{\boldsymbol{P}} - \lambda }$|⁠, and this function is defined on the open interval |$(\max \left \{\lambda _i^{\boldsymbol{P}}: q_i \neq 0\right \}, \infty )$|⁠. Using the matrix determinant lemma [19], we can compute the characteristic polynomial of |$\boldsymbol{P} + \mu \hspace{0.5pt} \boldsymbol{q} \boldsymbol{q}^\top $| as $$\begin{align} c(\lambda) &= \det(\lambda \boldsymbol{I} - \boldsymbol{P} - \mu \boldsymbol{q} \boldsymbol{q}^\top) \nonumber\\ &= \det(\lambda \boldsymbol{I} - \boldsymbol{P}) - \mu \boldsymbol{q}^\top{\operatorname{adj}}(\lambda \boldsymbol{I} - \boldsymbol{P}) \boldsymbol{q} \end{align}$$ (3.12) $$\begin{align} &= \prod_i (\lambda - \lambda_i^{\boldsymbol{P}}) - \mu \sum_i q_i^2 \prod_{j \neq i} (\lambda - \lambda_j^{\boldsymbol{P}}). \end{align}$$ (3.13) In (3.12), |${\operatorname{adj}}(\cdot )$| stands for the adjugate of a matrix. To reach (3.13), we have used the fact that, for any diagonal matrix |$\boldsymbol{A} = {\operatorname{diag}}\left \{d_1, d_2, \ldots , d_{n-1}\right \}$|⁠, |${\operatorname{adj}}(\boldsymbol{A}) = {\operatorname{diag}}\left \{\prod _{j \neq 1} d_j, \prod _{j \neq 2} d_j, \ldots , \prod _{j \neq n-1} d_j\right \}$|⁠. Partition the set |$\left \{1, 2, \ldots , n-1\right \}$| into two subsets: $$\begin{equation} \mathcal{I}_1 = \left\{i: q_i \neq 0\right\} \quad \textrm{and} \quad \mathcal{I}_2 = \left\{i: q_i = 0\right\}. \end{equation}$$ (3.14) We observe that the characteristic polynomial can be factored into |$c(\lambda ) = c_1(\lambda ) c_2(\lambda )$|⁠, where |$c_2(\lambda ) = \prod _{i \in \mathcal{I}_2} (\lambda - \lambda _i^{\boldsymbol{P}})$| and $$\begin{equation*} c_1(\lambda) = \prod_{i \in \mathcal{I}_1} (\lambda - \lambda_i^{\boldsymbol{P}}) - \mu \sum_{i \in \mathcal{I}_1} q_i^2 \prod_{j \in \mathcal{I}_1 \setminus i} (\lambda - \lambda_j^{\boldsymbol{P}}). \end{equation*}$$ It is possible that the second subset |$\mathcal{I}_2$| is empty, in which case |$c_2(\lambda )$| is understood to be equal to |$1$|⁠, but |$\mathcal{I}_1$| is never empty, since |$\boldsymbol{q} \neq \boldsymbol{0}$|⁠. Next, we study the largest root of the polynomial |$c_1(\lambda )$|⁠. For any |$\lambda> \max \left \{\lambda _i^{\boldsymbol{P}}: i \in \mathcal{I}_1\right \}$|⁠, we can write $$\begin{align} c_1(\lambda) &= \Big(1 - \mu \sum_{i \in \mathcal{I}_1} \frac{q_i^2}{\lambda - \lambda_i^{\boldsymbol{P}}}\Big) \prod_{i \in \mathcal{I}_1} (\lambda - \lambda_i^{\boldsymbol{P}})\nonumber\\ &= \big(1 + \mu R(\lambda)\big) \prod_{i \in \mathcal{I}_1} (\lambda - \lambda_i^{\boldsymbol{P}}). \end{align}$$ (3.15) Recall that |$R(\lambda )$| is the function defined in (3.10) and |$R^{-1}(\cdot )$| is its functional inverse. It follows from (3.15) that |$R^{-1}(-1/\mu )$| is the only root of |$c_1(\lambda )$| in the interval |$\left(\max \left \{\lambda _i^{\boldsymbol{P}}: i \in \mathcal{I}_1\right \}, \infty \right)$|⁠, and therefore it is also the largest root. Due to the factorization |$c(\lambda ) = c_1(\lambda ) c_2(\lambda )$|⁠, we have $$\begin{equation*} L(\mu) = R^{-1}(-1/\mu) \vee \max\left\{\lambda_i: i \in \mathcal{I}_2\right\}. \end{equation*}$$ Finally, since |$R^{-1}(-1/\mu )> \max \left \{\lambda _i: i \in \mathcal{I}_1\right \}$|⁠, we reach the formula in (3.11). By construction, |$R^{-1}(-1/\mu )$| is strictly increasing and |$\lim _{\mu \rightarrow \infty } R^{-1}(-1/\mu ) = \infty $|⁠. It is also differentiable everywhere on |$(0, \infty )$|⁠. It follows that |$L(\mu )$| is non-decreasing with |$L(\infty ) = \infty $|⁠, and that the function is differentiable everywhere except for at most one point |$\mu _0$|⁠, which, if it exists, must satisfy the identity |$R^{-1}(-1/\mu _0) = \lambda _1^{\boldsymbol{P}}$|⁠. Finally, the convexity of |$L(\mu )$| follows from the fact that it is the maximum of a set of linear functions, as |$L(\mu ) = \lambda _1(\boldsymbol{P} + \mu \hspace{0.5pt} \boldsymbol{q} \boldsymbol{q}^\top ) = \max _{\boldsymbol{x}: \, \lVert \boldsymbol{x}\rVert = 1} \boldsymbol{x}^\top (\boldsymbol{P} + \mu \hspace{0.5pt} \boldsymbol{q} \boldsymbol{q}^\top ) \boldsymbol{x}$|⁠. Given a block-partitioned matrix |$\boldsymbol{D}$|⁠, the following proposition shows that its leading eigenvalue |$\lambda _1^{\boldsymbol{D}}$| and the squared cosine similarity |$(\boldsymbol{e}_1^\top \boldsymbol{x}_1)^2$| can be obtained from the function |$L(\mu )$|⁠. Proposition 3.1 Let |$\mu ^\ast> 0$| be the unique solution to the fixed-point equation $$\begin{equation} \mu = (L(\mu) - a)^{-1}. \end{equation}$$ (3.16) Then |$\lambda _1^{\boldsymbol{D}} = L(\mu ^\ast )$| and $$\begin{equation} (\boldsymbol{e}_1^\top \boldsymbol{x}_1)^2 \in \left[\frac{\partial_{-}L(\mu^\ast)}{\partial_{-}L(\mu^\ast) + (1 / \mu^\ast)^2}, \frac{\partial_{+}L(\mu^\ast)}{\partial_{+}L(\mu^\ast) + (1 / \mu^\ast)^2}\right], \end{equation}$$ (3.17) where |$\partial _{-}L(\mu )$| and |$\partial _{+}L(\mu )$| denote the left and right derivatives of |$L(\mu )$|⁠, respectively. In particular, if |$L(\mu )$| is differentiable at |$\mu ^\ast $|⁠, then $$\begin{equation} (\boldsymbol{e}_1^\top \boldsymbol{x}_1)^2 = \frac{L^{\prime}(\mu^\ast)}{L^{\prime}(\mu^\ast) + (1 / \mu^\ast)^2}. \end{equation}$$ (3.18) Remark 3.2 We prove this result in Appendix A.3. Note that (3.16) is equivalent to $$\begin{equation} L(\mu) = a + 1 / \mu. \end{equation}$$ (3.19) Since |$L(\mu )$| is non-decreasing with |$L(\infty ) = \infty $| whereas |$a + 1/\mu $| decreases monotonically from |$\infty $| to |$0$|⁠, the equation (3.19), and thus (3.16), always admits one and only one solution. Moreover, by Lemma 3.1, |$L(\mu )$| is a convex function, and therefore its left and right derivatives always exist. 3.3 Asymptotic limit of |$L_m(\mu )$| The characterization given in Proposition 3.1 is valid for any block-partitioned matrix in the form of (3.9). When applied to the specific case of our data matrix in (3.4), with its components |$a_m$|⁠, |$\boldsymbol{P}_m$| and |$\boldsymbol{q}_m$| defined as in (3.5), (3.6) and (3.7), this result provides a very general deterministic characterization of the performance of the spectral method that is valid for any finite dimension |$n$| and for any sensing vectors. Next, we specialize to the case of i.i.d. Gaussian sensing vectors and show that |$L_m(\mu )$| converges almost surely to a deterministic function as |$m,n \rightarrow \infty $|⁠. To that end, we note that |$L_m(\mu )$| is the leading eigenvalue of $$\begin{equation} \boldsymbol{P}_m + \mu \boldsymbol{q}_m \boldsymbol{q}_m^\top = \tfrac{1}{m} \boldsymbol{U} \boldsymbol{M}_m \boldsymbol{U}^\top, \end{equation}$$ (3.20) where $$\begin{equation} \boldsymbol{M}_m \overset{\textrm{def}}{=} \boldsymbol{Z} + \tfrac{\mu}{m} \boldsymbol{v} \boldsymbol{v}^\top \end{equation}$$ (3.21) is a rank-one perturbation of the diagonal matrix |$\boldsymbol{Z}$| given in (3.2). Since |$\boldsymbol{U}$| and |$\boldsymbol{M}_m$| are independent, we first study the spectrum of |$\boldsymbol{M}_m$|⁠. Let |$\lambda _1^{\boldsymbol{M}_m} \ge \lambda _2^{\boldsymbol{M}_m} \ge \ldots \ge \lambda _m^{\boldsymbol{M}_m}$| be the set of eigenvalues of |$\boldsymbol{M}_m$| in descending order. Let $$\begin{equation*} f^{\boldsymbol{M}_m}(\lambda) \overset{\textrm{def}}{=} \frac{1}{m-1} \sum_{i=2}^{m} \delta\big(\lambda - \lambda_i^{\boldsymbol{M}_m}\big) \end{equation*}$$ be the empirical spectral measure of the last |$m-1$| eigenvalues. Proposition 3.2 Fix |$\mu> 0$|⁠. As |$m, n \rightarrow \infty $|⁠, the empirical spectral measure |$f^{\boldsymbol{M}_m}(\lambda )$| converges almost surely to the probability law of the random variable |$z$|⁠. Meanwhile, $$\begin{equation} \lambda_1^{\boldsymbol{M}_m} \overset{\textrm{a.s.}}{\longrightarrow} Q^{-1}(1/\mu), \end{equation}$$ (3.22) where |$Q^{-1}(\cdot )$| is the functional inverse of the function $$\begin{equation} Q(\lambda) = \mathbb{E}\frac{z^2 s^2}{\lambda-z}. \end{equation}$$ (3.23) The domain of |$Q(\lambda )$| is the open interval |$(\tau , \infty )$|⁠, with |$\tau $| being the upper bound of the support of the probability law of |$z$|⁠. Remark 3.3 By construction, |$Q(\lambda )$| is a continuous and strictly decreasing function with |$Q(\infty ) = 0$|⁠. Assumption (A.5) further guarantees that |$\lim _{\lambda \rightarrow \tau ^{+}} Q(\lambda ) = \infty $|⁠. Thus, |$Q(\lambda )$| admits a functional inverse and that |$Q^{-1}(1/\mu )$| is well defined for all |$\mu> 0$|⁠. According to assumption (A.4) stated in Section 2.1, the law of |$z$| is supported within the interval |$[0, \tau ]$|⁠. The above proposition, whose proof can be found in Appendix A.4, shows that the spectrum of |$\boldsymbol{M}_m$| consists of two parts: a ‘bulk spectrum’ of |$m-1$| eigenvalues supported within |$[0, \tau ]$| and a single spiked eigenvalue |$\lambda _1^{\boldsymbol{M}_m}$| well separated from the bulk. This setting is a generalization of the classical spiked population model [24]. Adapting the results given in [5] (see also [7] for related results under more general settings), we thus reach the second important ingredient of our proof of Theorem 2.1, characterizing the asymptotic limit of |$L_m(\mu )$|⁠. Proposition 3.3 For each fixed |$\mu> 0$|⁠, $$\begin{equation} L_m(\mu) \overset{\textrm{a.s.}}{\longrightarrow} \zeta_\alpha(Q^{-1}(1/\mu)), \end{equation}$$ (3.24) where |$\zeta _\alpha (\cdot )$| is the function defined in (2.12) and |$Q^{-1}(1/\mu )$| is the limit value in (3.22). Proof. Recall from (3.20) that |$L_m(\mu )$| is the leading eigenvalue of |$\tfrac{1}{m} \boldsymbol{U} \boldsymbol{M}_m \boldsymbol{U}^\top $|⁠. Since |$\boldsymbol{U}$| and |$\boldsymbol{M}_m$| are independent, and since |$\boldsymbol{U}$| is a Gaussian random matrix with a rotationally invariant distribution, we can equivalently study the leading eigenvalue of the following matrix $$\begin{equation} \tfrac{1}{m} \boldsymbol{U} {\operatorname{diag}}\left\{\lambda_1^{\boldsymbol{M}_m}, \lambda_2^{\boldsymbol{M}_m}, \ldots, \lambda_m^{\boldsymbol{M}_m}\right\} \boldsymbol{U}^\top. \end{equation}$$ (3.25) Proposition 3.2 shows that |$\left \{\lambda _i^{\boldsymbol{M}}: i \ge 2\right \}$| form a bulk spectrum, which converges to the law of |$z$| as |$m \rightarrow \infty $|⁠, whereas |$\lambda _1^{\boldsymbol{M}}$| converges to a ‘spike’ |$\lambda _\mu = Q^{-1}(1/\mu )> \tau $|⁠, which is separated from the bulk. The asymptotic limits of extreme sample eigenvalues of matrices in the form of (3.25) have been studied in [5,7]. In our proof, we use the asymptotic characterization given in [5]. Key to this asymptotic analysis is the function |$\psi _\alpha (\lambda )$| defined1 in (2.10). The asymptotic behaviors of the leading sample eigenvalue turn out to depend on the sign of |$\psi ^{\prime}_\alpha (\lambda )$| at the point |$\lambda _{\mu }$|⁠: In particular, applying [5, Theorem 4.1], we have $$\begin{equation} L_m(\mu) \overset{\textrm{a.s.}}{\longrightarrow} \psi_\alpha(\lambda_\mu) \quad \textrm{if}\ \psi^{\prime}_\alpha(\lambda_\mu)> 0. \end{equation}$$ (3.26) The case when |$\psi ^{\prime}_\alpha (\lambda _\mu ) \le 0$| is covered in [5, Theorem 4.2]. Adapting that result to our specific setting, we have $$\begin{equation} L_m(\mu) \overset{\textrm{a.s.}}{\longrightarrow} \min_{\lambda> \tau} \psi_\alpha(\lambda) \quad \textrm{if}\ \psi^{\prime}_\alpha(\lambda_\mu) \le 0. \end{equation}$$ (3.27) As an equivalent form, we can write |$\psi _\alpha (\lambda ) = \mathbb{E} z + \frac{\lambda }{\alpha } + \mathbb{E}\frac{z^2}{\lambda - z}$|⁠. From this, we can easily check that |$\psi _\alpha (\lambda )$| is a convex function and that it admits a unique minimum within its domain |$(\tau , \infty )$|⁠. It follows that the two separate cases in (3.26) and (3.27) can be more compactly written as |$L_m(\mu ) \overset{\textrm{a.s.}}{\longrightarrow } \zeta _\alpha (\lambda _\mu )$|⁠, where |$\zeta _\alpha (\cdot )$| is the modified function defined in (2.12). 3.4 Proof of Theorem 2.1 We are now ready to prove our asymptotic characterizations given in Theorem 2.1. Since the sensing vectors |$\boldsymbol{a}_i$| are drawn from the rotationally invariant multivariate normal distribution, the quantity |$\rho (\boldsymbol{\xi }_n, \boldsymbol{x}_1^n)$| for a general vector |$\boldsymbol{\xi }_n$| (with |$\lVert \boldsymbol{\xi }_n\rVert = \kappa $|⁠) and |$\rho (\kappa \boldsymbol{e}_1, \boldsymbol{x}_1^n)$| for the special case |$\boldsymbol{\xi }_n = \kappa \boldsymbol{e}_1$| have exactly the same probability distribution. In what follows, we will carry out the proof by assuming that the target vector |$\boldsymbol{\xi }_n = \kappa \boldsymbol{e}_1$|⁠. By showing that |$\rho (\kappa \boldsymbol{e}_1, \boldsymbol{x}_1^n)$| converges to the right-hand side of (2.14) almost surely, the convergence to the same limit in probability for a general |$\boldsymbol{\xi }_n$| then follows as an immediate consequence. To start, we use the deterministic characterization given in Proposition 3.1. For each |$m \ge 1$|⁠, let |$\mu _m$| be the unique fixed point of (3.16). Equivalently, |$\mu _m$| satisfies the identity $$\begin{equation*} L_m(\mu_m) - 1 / \mu_m = a_m. \end{equation*}$$ By Proposition 3.3, for every fixed |$\mu $|⁠, $$\begin{equation} L_m(\mu) - 1/\mu \overset{\textrm{a.s.}}{\longrightarrow} \zeta_\alpha(Q^{-1}(1/\mu)) - 1/\mu \end{equation}$$ (3.28) as |$m \rightarrow \infty $|⁠. Since |$L_m(\mu )$| and |$\zeta _\alpha (\mu )$| are non-decreasing, the two functions on both sides of (3.28) are strictly increasing. This condition, together with the fact that |$a_m \overset{\textrm{a.s.}}{\longrightarrow } \mathbb{E}{zs^2} $|⁠, allows us to apply Lemma A.1 in Appendix A.5 to conclude |$\mu _m \overset{\textrm{a.s.}}{\longrightarrow } \mu ^\ast $|⁠, where |$\mu ^\ast $| is the unique point such that $$\begin{equation} \zeta_\alpha(Q^{-1}(1/\mu^\ast)) = \mathbb{E}{zs^2} + 1 / \mu^\ast. \end{equation}$$ (3.29) To determine the asymptotic behavior of the leading eigenvector |$\boldsymbol{x}_1^n$|⁠, we use the characterization given in (3.17). Since |$\left \{L_m(\mu )\right \}$| are convex functions, we apply Lemma A.2 in Appendix A.5. In particular, if |$\zeta _\alpha (Q^{-1}(1/\mu ))$| is differentiable at |$\mu = \mu ^\ast $|⁠, that lemma gives us $$\begin{equation*} \partial_- L_m(\mu_m) \overset{\textrm{a.s.}}{\longrightarrow}\left. \frac{\textrm{d}\zeta_\alpha(Q^{-1}(1/\mu))}{\textrm{d}\mu}\right\vert_{\mu^{\ast}}=\frac{-\zeta^{\prime}_\alpha(Q^{-1}(1/\mu^\ast))}{Q^{\prime}(Q^{-1}(1/\mu^\ast))(\mu^\ast)^2}\end{equation*}$$ and similarly $$\begin{equation*} \partial_+ L_m(\mu_m) \overset{\textrm{a.s.}}{\longrightarrow} \frac{-\zeta^{\prime}_\alpha(Q^{-1}(1/\mu^\ast))}{Q^{\prime}(Q^{-1}(1/\mu^\ast))(\mu^\ast)^2}. \end{equation*}$$ Substituting these limits into (3.17), we get $$\begin{equation} (\boldsymbol{e}_1^\top \boldsymbol{x}_1^n)^2 \overset{\textrm{a.s.}}{\longrightarrow} \frac{\zeta^{\prime}_\alpha(Q^{-1}(1/\mu^\ast))}{\zeta^{\prime}_\alpha(Q^{-1}(1/\mu^\ast)) - Q^{\prime}(Q^{-1}(1/\mu^\ast))}. \end{equation}$$ (3.30) To simplify the above expression, we introduce a change of variable, writing |$\lambda = Q^{-1}(1/\mu )$|⁠. In particular, |$\lambda ^\ast = Q^{-1}(1/\mu ^\ast )$|⁠. Using the characterization (3.29) and recalling the definition of |$Q(\lambda )$| in (3.23), we get $$\begin{equation} \zeta_\alpha(\lambda^\ast) = \mathbb{E} zs^2 + Q(\lambda^\ast) = \phi(\lambda^\ast), \end{equation}$$ (3.31) where |$\phi (\cdot )$| is defined in (2.9). By their constructions, it is easily checked that |$\zeta _\alpha (\lambda )$| is a non-decreasing continuous function on |$(\tau , \infty )$| whereas |$\phi (\lambda )$| is a strictly decreasing continuous function. Moreover, by assumption (A.5), |$\lim _{\lambda \rightarrow \tau ^+} \phi (\lambda ) = \infty $|⁠. Thus, the existence of |$\lambda ^\ast $| satisfying (3.31) and its uniqueness are guaranteed. Substituting |$\lambda ^\ast = Q^{-1}(1/\mu ^\ast )$| into (3.30) gives us $$\begin{equation*} (\boldsymbol{e}_1^\top \boldsymbol{x}_1^n)^2 \overset{\textrm{a.s.}}{\longrightarrow} \frac{\zeta^{\prime}_\alpha(\lambda^\ast)}{\zeta^{\prime}_\alpha(\lambda^\ast) - \phi^{\prime}(\lambda^\ast)}, \end{equation*}$$ where we have also used the fact that |$Q^{\prime}(\lambda ) = \phi ^{\prime}(\lambda )$|⁠. To reach the characterization (2.14) given in the theorem, we just need to note that, by its definition in (2.12), |$\zeta ^{\prime}_\alpha (\lambda ) = \psi ^{\prime}_\alpha (\lambda )$| if |$\psi ^{\prime}_\alpha (\lambda )> 0$| and |$\zeta ^{\prime}_\alpha (\lambda ) = 0$| if |$\psi ^{\prime}_\alpha (\lambda ) < 0$|⁠. Next, we characterize the first two eigenvalues |$\lambda _1^{\boldsymbol{D}_m}$| and |$\lambda _2^{\boldsymbol{D}_m}$|⁠. By Proposition 3.1, the leading eigenvalue |$\lambda _1^{\boldsymbol{D}_m} = L_m(\mu _m)$|⁠. Since |$\mu _m \overset{\textrm{a.s.}}{\longrightarrow } \mu ^\ast $|⁠, applying Lemma A.1 stated in Appendix A.5 leads to $$\begin{equation*} \lambda_1^{\boldsymbol{D}_m} \overset{\textrm{a.s.}}{\longrightarrow} \zeta_\alpha(Q^{-1}(1/\mu^\ast)) = \zeta_\alpha(\lambda^\ast). \end{equation*}$$ Recall from (3.4) that |$\boldsymbol{P}_m$| is a principal submatrix of |$\boldsymbol{D}_m$| obtained by deleting the first row and column of |$\boldsymbol{D}_m$|⁠. It follows from the standard Cauchy interlacing theorem (see, e.g., [21, Theorem 4.3.8]) that $$\begin{equation} \lambda_2^{\boldsymbol{P}_m} \le\lambda_2^{\boldsymbol{D}_m} \le \lambda_1^{\boldsymbol{P}_m}. \end{equation}$$ (3.32) Applying [5, Lemma 3.1] (which is due to [41]), the upper edge of the support of the limiting spectral density of |$\boldsymbol{P}_m$| is given by $$\begin{equation*} \min_{\lambda> \tau} \psi_\alpha(\lambda) = \zeta_\alpha(\overline{\lambda}_\alpha), \end{equation*}$$ where |$\overline{\lambda }_\alpha $| is the minimizing point defined in (2.11). It follows that |$\lambda _2^{\boldsymbol{P}_m} \overset{\textrm{a.s.}}{\longrightarrow } \zeta _\alpha (\overline{\lambda }_\alpha )$| and |$\lambda _1^{\boldsymbol{P}_m} \overset{\textrm{a.s.}}{\longrightarrow } \zeta _\alpha (\overline{\lambda }_\alpha )$|⁠, and thus $$\begin{equation*} \lambda_2^{\boldsymbol{D}_m} \overset{\textrm{a.s.}}{\longrightarrow} \zeta_\alpha(\overline{\lambda}_\alpha) \end{equation*}$$ by the interlacing inequalities in (3.32). Finally, by the constructions of |$\psi _\alpha (\lambda )$| and |$\zeta _\alpha (\lambda )$|⁠, we have |$\zeta _\alpha (\lambda )> \zeta _\alpha (\overline{\lambda }_\alpha )$| if and only if |$\psi ^{\prime}_\alpha (\lambda )> 0$|⁠, and the proof is complete. 4. Sampling Ratios and Phase Transitions In this section, we study the phase transition phenomena characterized in Theorem 2.1 in more detail. In particular, we prove Proposition 2.2 (as stated in Section 2.2), which specifies the phase transitions and the asymptotic limits of the cosine similarities in terms of the sampling ratio |$\alpha $|⁠. 4.1 Critical sampling ratios By Theorem 2.1, whether the leading eigenvector |$\boldsymbol{x}_1^n$| is asymptotically correlated or uncorrelated with the target vector |$\boldsymbol{\xi }_n$| depends on the sign of the derivative |$\psi ^{\prime}_\alpha (\lambda )$| evaluated at a point |$\lambda ^\ast _\alpha $|⁠. And this point is uniquely defined through the equation |$\zeta _\alpha (\lambda ^\ast _\alpha ) = \phi (\lambda ^\ast _\alpha )$|⁠. Let |$\overline{\lambda }_\alpha $|⁠, defined in (2.11), be the point at which the strictly convex function |$\psi _\alpha (\lambda )$| achieves its minimum. Calculating the derivative of |$\psi _\alpha (\lambda )$| and setting it to zero, we get $$\begin{equation} {1}/{\alpha} = \mathbb{E} \frac{z^2}{(\overline{\lambda}_\alpha - z)^2}. \end{equation}$$ (4.1) By the construction of the function |$\zeta _\alpha (\lambda )$| in (2.12) and by the monotonicity of |$\phi (\lambda )$|⁠, we can conclude that |$\psi ^{\prime}(\lambda ^\ast _\alpha )> 0$|if and only if $$\begin{equation*} \psi_\alpha(\overline{\lambda}_\alpha) < \phi(\overline{\lambda}_\alpha). \end{equation*}$$ Substituting (4.1) into (2.10) gives us |$\psi _\alpha (\overline{\lambda }_\alpha ) = \overline{\lambda }_\alpha ^2 \, \mathbb{E} \frac{z}{(\overline{\lambda }_\alpha - z)^2}$|⁠. Thus, transitions between the correlated and uncorrelated phases take place exactly at the zero-crossings of the function $$\begin{equation} \varDelta(\lambda) = \mathbb{E}\frac{\lambda z}{(\lambda-z)^2} - \mathbb{E}\frac{zs^2}{\lambda -z}, \end{equation}$$ (4.2) where |$\varDelta (\lambda )$| is obtained by removing a common factor |$\overline{\lambda }_\alpha $| from the difference |$\psi _\alpha (\overline{\lambda }_\alpha ) - \phi (\overline{\lambda }_\alpha )$| and by writing |$\overline{\lambda }_\alpha $| simply as |$\lambda $|⁠. Let |$\varLambda $| be the set consisting of all the zero-crossings of |$\varDelta (\lambda )$| within the open interval |$(\tau , \infty )$|⁠. Using (4.1), we can then establish a one-to-one mapping between points in |$\varLambda $| and a set of critical values of the sampling ratios. Lemma 4.1 The set |$\varLambda $| is non-empty. It contains a finite number of points, denoted by |$\lambda _{c,1} \le \lambda _{c,2} \le \ldots \le \lambda _{c, r}$| for some |$r \ge 1$|⁠. Moreover, $$\begin{equation} \lambda_{c,r} \le \frac{\tau}{1-\sqrt{\mathbb{E} z / \mathbb{E} zs^2}}. \end{equation}$$ (4.3) Proof. We first show that |$\varLambda $| is non-empty. For |$\lambda> \tau $|⁠, applying the Cauchy–Schwartz inequality gives us $$\begin{equation*} \begin{aligned} \varDelta(\lambda) &\ge \mathbb{E}\frac{\lambda z}{(\lambda-z)^2} - \Big(\mathbb{E}\frac{z^2}{(\lambda - z)^2}\Big)^{1/2} (\mathbb{E} \, s^4)^{1/2}\\ &\ge \mathbb{E}\frac{\tau z}{(\lambda-z)^2} - \Big(\mathbb{E}\frac{3 \tau z}{(\lambda-z)^2}\Big)^{1/2}. \end{aligned} \end{equation*}$$ By assumption (A.5), |$\mathbb{E}\frac{z}{(\lambda -z)^2} \rightarrow \infty $| as |$\lambda $| approaches |$\tau $| from the right. Thus, we have $$\begin{equation} \lim_{\lambda \rightarrow \tau^+} \varDelta(\lambda) = \infty. \end{equation}$$ (4.4) To study the function |$\varDelta (\lambda )$| as |$\lambda \rightarrow \infty $|⁠, we note that $$\begin{equation} \varDelta(\lambda) \le \frac{1}{\lambda}\left(\frac{\lambda^2}{(\lambda - \tau)^2} \mathbb{E} z - \mathbb{E} zs^2\right). \end{equation}$$ (4.5) By assumption (A.6), |$\mathbb{E} z < \mathbb{E} zs^2$|⁠. We can then conclude from inequality (4.5) that $$\begin{equation} \varDelta(\lambda) < 0, \quad \textrm{for all sufficiently large}\ \lambda. \end{equation}$$ (4.6) Since |$\varDelta (\lambda )$| is a continuous function, (4.4) and (4.5) imply that there must exist at least one zero-crossing. Next, we show the upper bound given in (4.3). For any |$\lambda _c \in \varLambda $|⁠, we have from (4.2) that $$\begin{equation} \lambda_c = \Big(\mathbb{E}\frac{zs^2}{\lambda-z}\Big) \Big(\mathbb{E}\frac{z}{(\lambda-z)^2}\Big)^{-1}. \end{equation}$$ (4.7) By assumption (A.4), |$z$| is bounded within |$[0, \tau ]$|⁠. It follows that $$\begin{equation*} \mathbb{E}\frac{zs^2}{\lambda-z} \ge \frac{\mathbb{E}zs^2}{\lambda} \quad \textrm{and} \quad \mathbb{E}\frac{z}{(\lambda-z)^2} \le \frac{\mathbb{E} z}{(\lambda - \tau)^2}. \end{equation*}$$ Substituting the above inequalities into (4.7) gives us |$(\mathbb{E} z) \lambda _c^2 \ge (\mathbb{E}zs^2) (\lambda _c - \tau )^2$|⁠, which, after some simple manipulations, leads to the upper bound given in (4.3). Finally, to show that |$\varLambda $| is a finite set, we extend |$\varDelta (\lambda )$| in (4.2) to the complex domain |$\left \{\lambda \in \mathbb{C}: \operatorname{Re}(\lambda )> \tau \right \}$|⁠. Since |$\varDelta (\lambda )$| is analytic and it is not zero everywhere, by the principle of permanence, it has at most a finite number of zeros in the bounded domain |$\Big(\tau , \frac{\tau }{1-\sqrt{\mathbb{E} z / \mathbb{E} zs^2}}\Big)$|⁠. 4.2 Proof of Proposition 2.2 Write |$\lambda _{c,\min } = \lambda _{c, 1}$| and |$\lambda _{c, \max } = \lambda _{c, r}$|⁠. The corresponding critical sampling ratios |$\alpha _{c,\min }$| and |$\alpha _{c, \max }$|⁠, as defined in (2.16), are obtained through the one-to-one mapping given in (4.1). Fix |$\alpha < \alpha _{c, \min }$|⁠. By the monotonicity of the mapping (4.1), the corresponding |$\overline{\lambda }_\alpha $| is strictly less than the smallest zero-crossing point |$\lambda _{c, 1}$|⁠. From the proof of Lemma 4.1, we conclude that |$\varDelta (\overline{\lambda }_\alpha )> 0$|⁠, and thus |$\zeta _\alpha (\cdot )$| and |$\phi _\alpha (\cdot )$| intersects at a point |$\lambda ^\ast _\alpha < \overline{\lambda }_\alpha $|⁠. This implies that |$\psi ^{\prime}_\alpha (\lambda ^\ast _\alpha ) < 0$| and thus Theorem 2.1 gives us $$\begin{equation*} \rho(\boldsymbol{\xi}_n, \boldsymbol{x}_1^n) \overset{{{\mathcal{P}}}}{\longrightarrow} 0. \end{equation*}$$ Now fix |$\alpha> \alpha _{c, \max }$|⁠, in which case |$\overline{\lambda }_\alpha> \lambda _{c, \max }$|⁠. Since |$\varDelta (\overline{\lambda }_\alpha ) < 0$|⁠, we must have |$\lambda ^\ast _\alpha> \overline{\lambda }_\alpha $| and thus |$\psi ^{\prime}_\alpha (\lambda ^\ast _\alpha )> 0$|⁠. To derive the parametric form of |$\rho (\alpha )$| given in the statement of the proposition, we note that $$\begin{equation*} \zeta_\alpha(\lambda) = \psi_\alpha(\lambda) \quad \textrm{for all}\ \lambda> \overline{\lambda}_\alpha. \end{equation*}$$ Thus, the equation |$\zeta _\alpha (\lambda ^\ast _\alpha ) = \phi (\lambda ^\ast _\alpha )$| becomes |$\psi _\alpha (\lambda ^\ast _\alpha ) = \phi (\lambda ^\ast _\alpha )$|⁠. Using the explicit definitions of these functions given in (2.9) and (2.10), we get $$\begin{equation} 1/\alpha = \mathbb{E}\frac{zs^2 - z}{\lambda^\ast_\alpha - z}. \end{equation}$$ (4.8) We can also explicitly compute $$\begin{align} \psi^{\prime}_\alpha(\lambda^\ast_\alpha) = 1/\alpha - \mathbb{E}\frac{z^2}{(\lambda^\ast_\alpha -z)^2} \end{align}$$ (4.9) $$\begin{align} = \mathbb{E}\frac{zs^2 - z}{\lambda^\ast_\alpha - z} - \mathbb{E}\frac{z^2}{(\lambda^\ast_\alpha -z)^2}. \end{align}$$ (4.10) Similarly, we can write $$\begin{equation} \phi^{\prime}(\lambda^\ast_\alpha) = - \mathbb{E}\frac{z^2 s^2}{(\lambda^\ast_\alpha - z)^2}. \end{equation}$$ (4.11) Substituting (4.10) and (4.11) into the asymptotic characterization (2.14) gives us (2.18), which, together with (4.8), provides a parametric representation of the function |$\rho (\alpha )$|⁠. Finally, we show that |$\rho (\alpha ) \rightarrow 1$| as |$\alpha \rightarrow \infty $|⁠. From (4.8) and after some simple manipulations, we have $$\begin{equation*} {\lambda^\ast_\alpha}/{\alpha} = \mathbb{E}(zs^2 - z) + \mathbb{E}\frac{z^2s^2 - z^2}{\lambda^\ast_\alpha - z}. \end{equation*}$$ Since |$\lambda ^\ast _\alpha \rightarrow \infty $| as |$\alpha \rightarrow \infty $|⁠, the above formula gives us $$\begin{equation*} \lambda^\ast_\alpha = \mathcal{O}(\alpha), \end{equation*}$$ where the leading coefficient |$\mathbb{E}(zs^2 - z)$| is positive by assumption (A.6). By the boundedness of |$z$|⁠, $$\begin{equation*} \frac{\mathbb{E} z^2}{(\lambda^\ast_\alpha)^2} \le \mathbb{E}\frac{z^2}{(\lambda^\ast_\alpha - z)^2} \le \frac{\mathbb{E} z^2}{(\lambda^\ast_\alpha - \tau)^2}, \end{equation*}$$ and thus |$\mathbb{E}\frac{z^2}{(\lambda ^\ast _\alpha - z)^2} = \mathcal{O}(1/\alpha ^2)$|⁠. It follows from (4.9) that |$\psi ^{\prime}_\alpha (\lambda ^\ast _\alpha ) = \mathcal{O}(1/\alpha )$|⁠. Similarly, we conclude from (4.11) that |$\lvert \phi ^{\prime}(\lambda ^\ast _\alpha )\rvert = \mathcal{O}(1/\alpha ^2)$|⁠. Substituting these limiting expressions into (2.14) then gives us |$\lim _{\alpha \rightarrow \infty } \rho (\alpha ) = 1$|⁠, and this completes the proof.□ Remark 4.1 When the set |$\varLambda $| consists of a single element, which is the case for many signal acquisition models we have studied, |$\lambda _{c,\min } = \lambda _{c, \max }$| and thus |$\alpha _{c, \min } = \alpha _{c, \max }$|⁠. There then exists a single critical sampling ratio |$\alpha _c$| separating the uncorrelated phase from the correlated one. For |$\alpha < \alpha _c$|⁠, the estimates from the spectral method is asymptotically orthogonal to |$\boldsymbol{\xi }_n$|⁠; for |$\alpha> \alpha _c$|⁠, the estimates will be concentrated on the surface of a right-circular cone whose generating lines make an angle |$\theta = \arccos (\sqrt{\rho (\alpha )})$| to the target vector |$\boldsymbol{\xi }_n$|⁠. The situation is more complicated when |$\varLambda $| contains multiple zero-crossings, in which case a finite number of correlated and uncorrelated phases can alternatively take place between |$\alpha _{c, \min }$| and |$\alpha _{c, \max }$|⁠. A concrete example demonstrating this situation is shown in the next subsection. 4.3 Multiple phase transitions: an example Consider the following model: $$\begin{equation*} z_i = y_i = \begin{cases} 1, &\textrm{if}\ \Big\lvert\boldsymbol{a}_i^\top \boldsymbol{\xi}_n\Big\rvert \in \mathcal{I}_1\\ \theta, &\textrm{if}\ \Big\lvert\boldsymbol{a}_i^\top \boldsymbol{\xi}_n\Big\rvert \in \mathcal{I}_2\\ 0, &\textrm{otherwise}, \end{cases} \end{equation*}$$ where |$0 < \theta < 1$|⁠, and |$\mathcal{I}_1, \mathcal{I}_2$| are two non-overlapping intervals on the positive real axis. We also set |$\kappa = \lVert \boldsymbol{\xi }_n\rVert = 1$|⁠, and thus |$\boldsymbol{a}_i^\top \boldsymbol{\xi }_n$| has the same distribution as a standard normal random variable, denoted by |$s$|⁠. Define $$\begin{equation*} \beta_\ell = 2 \, \mathbb{E} \, \mathbb{1}_{\mathcal{I}_\ell}(s) \quad \textrm{and} \quad \omega_\ell = 2 \, \mathbb{E} \, [s^2 \mathbb{1}_{\mathcal{I}_\ell}(s)], \end{equation*}$$ where |$\mathbb{1}_{\mathcal{I}_\ell }(\cdot )$| is the indicator function of |$\mathcal{I}_\ell $|⁠, for |$\ell = 1, 2$|⁠. As |$z$| takes only three different values, we can explicitly compute |$\varDelta (\lambda )$| in (4.2) as $$\begin{equation*} \varDelta(\lambda) = \frac{\beta_1 \lambda}{(\lambda - 1)^2} + \frac{\theta \beta_2 \lambda}{(\lambda - \theta)^2} - \frac{\omega_1}{\lambda - 1} - \frac{\theta \omega_2}{\lambda - \theta}. \end{equation*}$$ Choose |$\theta = 0.48$|⁠, |$\mathcal{I}_1 = [4.7947, 4.9847]$| and |$\mathcal{I}_2 = [0.8995, 0.8998]$|⁠. We then have |$\beta _1 = 1.0086\times 10^{-6}$|⁠, |$\beta _2 = 1.5970\times 10^{-4}$|⁠, |$\omega _1 = 2.3976\times 10^{-5}$| and |$\omega _2 = 1.2926\times 10^{-4}$|⁠. In this case, |$\varDelta (\lambda )$| turns out to have three zero-crossings: $$\begin{equation*} \lambda_{c, 1} = 1.0765, \quad \lambda_{c, 2} = 1.1844, \quad \lambda_{c, 3} = 3.3127. \end{equation*}$$ By the mapping in (4.1), they correspond to three critical sampling ratios: $$\begin{equation*} \alpha_{c, 1} = 3.6279 \times 10^3, \alpha_{c, 2} = 9.6302 \times 10^3, \alpha_{c, 3} = 2.0947 \times 10^5. \end{equation*}$$ Using the characterization given in Theorem 2.1, we obtain the limiting values of the squared cosine similarity as a function of the sampling ratio |$\alpha $|⁠. Figure 4 illustrates this function |$\rho (\alpha )$|⁠. We can see that, when |$\alpha < \alpha _{c, 1}$|⁠, the estimates given by the spectral method are asymptotically uncorrelated with |$\boldsymbol{\xi }_n$|⁠. When |$\alpha $| is in the interval |$(\alpha _{c, 1}, \alpha _{c, 2})$|⁠, however, the function |$\rho (\alpha )$| has a small ‘bump’ (see the insert for a zoomed-in view), meaning that the estimates become asymptotically correlated with |$\boldsymbol{\xi }_n$|⁠. However, the correlation returns to zero as |$\alpha $| moves past the second phase transition point |$\alpha _{c, 2}$|⁠. Finally, when |$\alpha> \alpha _{c, 3}$|⁠, the estimates become correlated with |$\boldsymbol{\xi }_n$| again, and |$\rho (\alpha )$| tends to one as |$\alpha \rightarrow \infty $|⁠. Fig. 4. Open in new tabDownload slide An example of multiple phase transitions. Shown in the figure is the limiting squared cosine similarity |$\rho (\alpha )$| as a function of the sampling ratio |$\alpha $|⁠. As |$\alpha $| increases, the function |$\rho (\alpha )$| alternates between the uncorrelated and correlated phases, with three phase transition points |$\left \{\alpha _{c, i}\right \}_{1 \le i \le 3}$|⁠. The insert shows a zoomed-in view of the first correlated phase which takes place within the interval |$(\alpha _{c,1}, \alpha _{c, 2})$|⁠. Fig. 4. Open in new tabDownload slide An example of multiple phase transitions. Shown in the figure is the limiting squared cosine similarity |$\rho (\alpha )$| as a function of the sampling ratio |$\alpha $|⁠. As |$\alpha $| increases, the function |$\rho (\alpha )$| alternates between the uncorrelated and correlated phases, with three phase transition points |$\left \{\alpha _{c, i}\right \}_{1 \le i \le 3}$|⁠. The insert shows a zoomed-in view of the first correlated phase which takes place within the interval |$(\alpha _{c,1}, \alpha _{c, 2})$|⁠. Remark 4.2 It would be desirable to obtain a deeper understanding of the above phenomenon involving multiple phase transitions. The example provided here is purely theoretical, as its phase transitions take place at very large values of |$\alpha $|⁠. It will be interesting to explore other possible examples of multiple phase transitions with more practical values of |$\alpha $|⁠. Moreover, as most signal acquisition models we have studied seem to involve only a single phase transition point, it will be interesting to seek easy-to-verify conditions for the function |$\varDelta (\lambda )$| defined in (4.2) to have only one zero-crossing. We leave these as interesting open questions. 5. Discussion In this paper, we have presented a precise asymptotic characterization of the performance of a spectral method for estimating signals from generalized linear measurements with Gaussian sensing vectors. Our analysis also reveals a phase transition phenomenon that takes place at certain critical sampling ratios. Below a minimum threshold, estimates given by the methods are nearly orthogonal to the true signal |$\boldsymbol{\xi }$|⁠, thus carrying no information; above a maximum threshold, the estimates become increasingly aligned with |$\boldsymbol{\xi }$|⁠. The computational complexity of the spectral method is also markedly different in the two phases. Within the uncorrelated phase, the gap between the top two leading eigenvalues diminishes to zero. In contrast, a non-zero spectral gap emerges within the correlated phase. In this section, we close the paper by discussing some possible directions for extending and improving our results as well as their connections to related work in the literature. The rate of convergence and more refined analysis. The performance of the spectral method was first studied in [36] for the problem of phase retrieval. In that paper, it is shown that, for each |$\delta \in (0, 1)$|⁠, there is a constant |$c_1(\delta )$| such that |$\rho (\xi _n, x_1^n)> 1 - \delta $| with high probability when $$\begin{equation*} m> c_1(\delta) n \log^3 n. \end{equation*}$$ This estimate of the sample complexity was improved to |$m> c_2(\delta ) n \log n$| in [10] and to |$m> c_3(\delta ) n$| in [12]. The key technical tools underlying these previous estimates are matrix concentration inequalities (see, e.g., [46]), which guarantee that the spectral norm of the difference between the data matrix |$\boldsymbol{D}_m$| and its expectation |$\mathbb{E} \boldsymbol{D}_m$| will be small when the sampling ratio |$m/n$| is sufficiently large. The closeness of the corresponding leading eigenvectors of |$\boldsymbol{D}_m$| and |$\mathbb{E} \boldsymbol{D}_m$| then follow from standard perturbation arguments. (See also our discussions toward the end of Section 2.1.) Our work differs from and complements these finite-sample bounds in that we obtain sharp asymptotics to characterize the exact performance of the spectral method in the high-dimensional regime. A (theoretical) limitation of our analysis is that it is asymptotic in nature, requiring both |$m, n \rightarrow \infty $|⁠. Although numerical simulations shown in Section 2.4 indicate that the asymptotic predictions are accurate even for moderate signal dimensions, it will be useful to quantify the rate of convergence toward the asymptotic limits in future work. Another possible direction to further refine our analysis is to consider second-order asymptotics at the level of central limit theorems (CLTs). See for instance [4] for a related CLT analysis for the extreme eigenvalues of spiked covariance models. Alternative initialization schemes. The spectral method considered in this paper is certainly not the only choice for initialization purposes. For example, an interesting alternative is the following simple linear estimator studied in [38]: $$\begin{equation} \boldsymbol{x}^n_{\textrm{linear}} = \frac{1}{m} \sum_{i=1}^m T(y_i) \boldsymbol{a}_i. \end{equation}$$ (5.1) By using the moment calculations in [38, Proposition 1.1] and bounding high-order moments, one can easily obtain that $$\begin{equation} \rho(\boldsymbol{\xi}_n, \boldsymbol{x}^n_{\textrm{linear}}) \overset{{{\mathcal{P}}}}{\longrightarrow} \frac{(\mathbb{E} zs)^2}{(\mathbb{E} zs)^2 + \mathbb{E}z^2 / \alpha}, \end{equation}$$ (5.2) where |$s$| and |$z$| are the random variables defined in (2.1). Recall the function |$g(s)$| introduced in (2.8) and our discussions thereafter, where we point out that the spectral method is not suitable for acquisition models for which |$g(s)$| is an odd function plus a constant. Such cases will pose no problem for the linear estimator in (5.1). However, it is interesting to note that the linear estimator will be ineffective when |$g(s)$| is an even function, as is the case in phase retrieval. To see this, we note that |$\mathbb{E} zs = \mathbb{E} g(s)s = 0$| when |$g(s)$| is even. It then follows from (5.2) that the linear estimator will be asymptotically uncorrelated with the target signal |$\boldsymbol{\xi }_n$|⁠. For cases where the function |$g(s)$| is neither odd nor even, the choice between the spectral method and the linear estimator is not as clear-cut. The spectral method exhibits phase transition behaviors with its estimates in the uncorrelated phase at small values of |$\alpha $|⁠. In contrast, as shown in (5.2), the performance of the linear estimator increases as a monotonic function of |$\alpha $|⁠. As a result, in the regime of very small |$\alpha $|⁠, the linear estimator will be preferable. For (moderately) larger values of |$\alpha $|⁠, the comparison between the spectral method and the linear estimator cannot be easily made, as their performance also depends on the preprocessing function |$T(\cdot )$| used in (1.3) and (5.1). The incorporation of priors. In this work, we assume that the target signal |$\boldsymbol{\xi }_n$| is an arbitrary unknown (deterministic) signal. In many applications, the underlying signals satisfy additional constraints (such as sparsity). In [38], the authors considered a two-step scheme, where the initial linear estimate given in (5.1) is further projected onto a set which encapsulates one’s prior knowledge about |$\boldsymbol{\xi }$|⁠. It will be interesting to consider and analyze similar projection schemes for the estimates obtained by the spectral method. Universality and more realistic sensing vectors. Our asymptotic analysis assumes that the sensing vectors are real-valued i.i.d. Gaussian random vectors. Numerical simulations seem to suggest that the theoretical predictions given in Theorem 2.1 remain valid for more general random measurement ensembles and for complex-valued sensing vectors. To demonstrate this, we show in Fig. 5 the results of applying the spectral method to estimate a |$64 \times 64$|cameraman image from phaseless measurements under Poisson noise $$\begin{equation} y_i \sim \textrm{Poisson}(\,\lvert\boldsymbol{a}_i^* \boldsymbol{\xi}\rvert^2) \quad \textrm{and}\ z_i = \min\left\{y_i, \tau\right\}, \end{equation}$$ (5.3) Fig. 5. Open in new tabDownload slide Demonstrating the potential universality of our asymptotic characterizations. The figure shows the results of using the spectral method to estimate a |$64 \times 64$|cameraman image from phaseless measurements under Poisson noise. Theoretical predictions (the blue and red lines) are shown together with simulation results averaged over 16 independent trials. The error bars show one standard deviation. The blue line corresponds to real-valued sensing vectors drawn from the i.i.d. Rademacher ensemble, whereas the red line shows the results of using complex-valued sensing vectors drawn from the complex Gaussian distribution. Fig. 5. Open in new tabDownload slide Demonstrating the potential universality of our asymptotic characterizations. The figure shows the results of using the spectral method to estimate a |$64 \times 64$|cameraman image from phaseless measurements under Poisson noise. Theoretical predictions (the blue and red lines) are shown together with simulation results averaged over 16 independent trials. The error bars show one standard deviation. The blue line corresponds to real-valued sensing vectors drawn from the i.i.d. Rademacher ensemble, whereas the red line shows the results of using complex-valued sensing vectors drawn from the complex Gaussian distribution. where the bound |$\tau $| is set to 5 and |$\lVert \boldsymbol{\xi }\rVert $| is normalized to 1 in our simulations. Two measurement ensembles are considered: real-valued sensing vectors whose elements are independent Rademacher (⁠|$\pm 1$|⁠) random variables and complex-valued sensing vectors with elements drawn from the complex Gaussian distribution |$\mathcal{N}\, (0, \tfrac{1}{2}) + j \mathcal{N}\,(0, \tfrac{1}{2})$|⁠. We see from the figure that the theoretical predictions (the solid lines) have excellent agreement with simulation results for this moderately sized problem, even though the sensing vectors can be non-Gaussian. Rigorously establishing the validity of our asymptotic predictions without the Gaussian assumption will be an important future work. Thanks to the deterministic characterization given in Proposition 3.1, this task boils down to showing that the result of Proposition 3.3 still holds when the sensing matrix consists of i.i.d. entries drawn from more general distributions. Low-rank matrix recovery. The spectral method studied in this paper belongs to a more general theme. Let |$\boldsymbol{X}^\star \in \mathbb{R}^{p \times n}$| be a rank-|$r$| matrix and |$ \{\boldsymbol{A}_i \}_{1 \le i \le m}$| a collection of sensing matrices of the same size as |$\boldsymbol{X}^\star $|⁠. To recover |$\boldsymbol{X}^\star $| from linear measurements of the form |$\left \{y_i = {\operatorname{Tr}} \boldsymbol{A}_i^\top \boldsymbol{X}\right \}_i$|⁠, we can consider the following rank-constrained least squares problem $$\begin{equation*} \underset{\boldsymbol{X} \in \mathbb{R}^{p \times n}: \, \textrm{rank}\, \boldsymbol{X}=r}{{arg\,\, min}} \sum_{i = 1}^m (y_i - {\operatorname{Tr}} \boldsymbol{A}_i^\top \boldsymbol{X})^2 \end{equation*}$$ and try to solve it via projected gradient descent $$\begin{equation} \boldsymbol{X}_{k+1} = \mathcal{P}_r \Big(\boldsymbol{X}_k + \mu \sum_{i = 1}^m (y_i - {\operatorname{Tr}} \boldsymbol{A}_i^\top \boldsymbol{X}_k) \boldsymbol{A}_i\Big), \end{equation}$$ (5.4) where |$\mathcal{P}_r$| denotes projection onto the set of rank-|$r$| matrices, and |$\mu> 0$| is the step size. As pointed out in [44], the spectral method studied in this paper can be viewed as the very first iteration of (5.4), if we start the algorithm from |$\boldsymbol{X}_0 = \boldsymbol{0}_{n \times n}$| and consider the special case of recovering a symmetric rank-one matrix (i.e., |$r = 1$|⁠, |$p = n$|⁠) with |$\boldsymbol{A}_i = \boldsymbol{a}_i \boldsymbol{a}_i^\top $|⁠. Thus, an interesting line of future research is to extend the results of this work, notably the key characterization given in Proposition 3.1, to more general settings with |$r> 1$| and |$p \neq n$| and to other sensing matrices. Such extensions will be useful in applications such as low-rank matrix recovery, covariance estimation and blind deconvolution. Funding U.S. Army Research Office (W911NF-16-1-0265); U.S. National Science Foundation (CCF-1319140 and CCF-1718698). Acknowledgements Part of this work was done during the first author’s visit to the Information Initiative at Duke (iiD) in Spring 2016. He thanks members of this interdisciplinary program for their hospitality. The second author contributed to this work when he was a summer visiting undergraduate student at the John A. Paulson School of Engineering and Applied Sciences, Harvard University. Footnotes 1 We have adapted the original definition of |$\psi _\alpha (\lambda )$| in [5, eq. (3.2)] because our matrix in (3.25) has a slightly different scaling from the one considered in [5]. A. Appendix A.1 Sufficient conditions for Assumption 2.1 to hold In this Appendix, we provide two sufficient conditions for Assumption (A.5) to hold. Case 1: Suppose that the probability law of the random variable |$z$| contains a point mass |$c \, \delta (z - \tau )$| at its upper boundary |$\tau $|⁠, where |$c$| is some positive constant. This applies to the logistic regression model in Example 2.3, the subset algorithm (2.24) in Example 2.4, the noisy phase retrieval model in (5.3) and the quantization model described in Section 4.3. In this case, $$\begin{align*} \mathbb{E} \frac{z}{(\lambda - z)^2} \ge \frac{c \tau}{(\lambda - \tau)^2} \rightarrow \infty \end{align*}$$ as |$\lambda \rightarrow \tau ^+$|⁠. To verify the second expression in (2.2), let |$h(z) = \mathbb{E}_{s \,\vert\, z}(s^2\, \vert\, z)$|⁠. Since |$\mathbb{P}(z = \tau )> 0$|⁠, we must have |$h(\tau )> 0$|⁠. Thus, $$\begin{equation*} \mathbb{E} \frac{zs^2}{\lambda - z} = \mathbb{E} \frac{z h(z)}{\lambda - z} \ge \frac{c \tau h(\tau)}{\lambda - \tau}, \end{equation*}$$ which tends to |$\infty $| as |$\lambda $| approaches |$\tau $| from the right. Case 2: Suppose that there exist some positive constants |$c$| and |$\varepsilon $| such that the probability density function |$p_Z(z)$| of |$z$| and the conditional moment |$h(z)$| are both bounded below by |$c$| for all |$z \in [\tau - \varepsilon , \tau ]$|⁠. The model in (2.4) represents one such case. Under this setting, $$\begin{equation*} \begin{aligned} \mathbb{E}\frac{zs^2}{\lambda - z} &\ge \int_{\tau - \varepsilon}^{\tau} \frac{z h(z)}{\lambda - z} p_Z(z) \textrm{d}\,z\\ &\ge (\tau - \varepsilon) c^2 \log\left(1 + \frac{\varepsilon}{\lambda - \tau}\right) \xrightarrow{\lambda \rightarrow \tau^+}\infty. \end{aligned} \end{equation*}$$ Similarly, we can verify that |$\mathbb{E} \frac{z}{(\lambda - z)^2} \rightarrow \infty $| as |$\lambda \rightarrow \tau ^+$|⁠. A.2 Norm estimation The spectral initialization method estimates the orientation of the vector |$\boldsymbol{\xi }_n$|⁠, but it provides no information about its norm, as the eigenvector |$\boldsymbol{x}_1^n$| is always normalized. In many cases where the sensing vectors come from certain random ensembles, the norm |$\lVert \boldsymbol{\xi }_n\rVert $| can be accurately estimated from the measurements. As a simple illustrative example, we can consider the (noiseless) phase retrieval problem: |$y_i = (\boldsymbol{a}_i^\top \boldsymbol{\xi }_n)^2$|⁠, where |$\boldsymbol{\xi }_n$| is a deterministic unknown vector with |$\kappa = \lVert \boldsymbol{\xi }_n\rVert $|⁠, and the sensing vectors |$\left \{\boldsymbol{a}_i\right \}$| are i.i.d. standard normal random vectors. Since |$\boldsymbol{a}_i^\top \boldsymbol{\xi }_n \sim \mathcal{N}\,(0, \kappa ^2)$|⁠, the measurement |$y_i$| can be represented as $$\begin{equation*} y_i \sim \kappa^2 s_i^2, \end{equation*}$$ where |$s_i$| (for |$1 \le i \le m$|⁠) are i.i.d. standard normal random variables. A simple estimator of the norm is then $$\begin{equation} \widehat{\kappa} = \sqrt{\frac{\sum_{i=1}^m y_i}{m}}, \end{equation}$$ (A.1) which is asymptotically consistent as |$m \rightarrow \infty $|⁠. More generally, consider an observation model |$y_i \sim f(y \,\vert \, \boldsymbol{a}_i^\top \boldsymbol{\xi }_n)$|⁠, where |$f(\cdot \,\vert \, \cdot )$| is a conditional probability density function and |$\boldsymbol{a}_i \overset{\text{i.i.d.}}{\sim } \mathcal{N}\,(0, \boldsymbol{I}_n)$|⁠. Again, writing |$\boldsymbol{a}_i^\top \boldsymbol{\xi }_n = \kappa s_i$| for i.i.d. normal random variables |$ \{s_i \}$|⁠, we can represent the probability distributions of the measurements |$ \{y_i \}$| as $$\begin{align*} y_i \overset{\text{i.i.d.}}{\sim} \int_{-\infty}^{\infty} f(y \,\vert\, \kappa s) \frac{1}{\sqrt{2\pi}} \textrm{e}^{-\frac{s^2}{2}} \, \textrm{d}s \overset{\textrm{def}}{=} p_{\kappa}(y). \end{align*}$$ Let |$w(\kappa ) \overset{\textrm{def}}{=} \mathbb{E}(y_i) = \int y \, p_{\kappa }(y) \textrm{d}\,y$|⁠. If |$w(\kappa )$| is monotonic on the positive real line, the method of moments gives an estimator $$\begin{align} \widehat{\kappa}_{\textrm{MoM}} = w^{-1}\left(\sum_{i=1}^m y_i / {m}\right). \end{align}$$ (A.2) We note that the estimator in (A.1) is a special case of (A.2). More generally, one could also estimate |$\kappa $| by using maximum likelihood $$\begin{equation*} \widehat{\kappa}_{\textrm{MLE}} = \underset{\kappa> 0}{\arg\,\max} \sum_{i \le m} \log \, p_\kappa(y_i), \end{equation*}$$ whose asymptotic consistency can be established under standard conditions [27] on the parametric density function |$p_\kappa (y)$|⁠. A.3 Proof of Proposition 3.1 By a suitable choice of a transformation matrix where |${\boldsymbol{W}} \in \mathbb{R}^{(n-1)\times (n-1)}$| is an orthogonal matrix involving the last |$(n-1)$| rows and columns only, we can get a matrix where |$\mathcal{I}_1, \mathcal{I}_2$| are the two sets of indices defined in (3.12) and |$\widetilde{\boldsymbol{q}}$| is a vector consisting of all the non-zero elements of |$\boldsymbol{W}^\top \boldsymbol{q}$|⁠. Let |$\lambda _1^{\widetilde{\boldsymbol{D}}}$| and |$\widetilde{\boldsymbol{x}}_1$| be the largest eigenvalue of |$\widetilde{\boldsymbol{D}}$| and an associated unit-norm eigenvector, respectively. Clearly, |$\lambda _1^{\boldsymbol{D}} = \lambda _1^{\widetilde{\boldsymbol{D}}}$| and |$(\boldsymbol{e}_1^\top \boldsymbol{x}_1)^2 = (\boldsymbol{e}_1^\top \widetilde{\boldsymbol{x}}_1)^2$|⁠. Thus, we just need to consider |$\widetilde{\boldsymbol{D}}$| in our proof. Due to its block-diagonal form, the eigenvalues of |$\widetilde{\boldsymbol{D}}$| are the union of those of its top-left submatrix and those of its bottom-right submatrix |${\operatorname{diag}}\left \{\lambda _i^{\boldsymbol{P}}: i \in \mathcal{I}_2\right \}$|⁠. In particular, $$\begin{equation} \lambda_1^{\widetilde{D}} = \lambda_1^{\boldsymbol{s}} \vee \max\left\{\lambda_i^{\boldsymbol{P}}: i \in \mathcal{I}_2\right\}. \end{equation}$$ (A.4) The eigenvectors associated with |${\operatorname{diag}}\left \{\lambda _i^{\boldsymbol{P}}: i \in \mathcal{I}_2\right \}$| are easy to characterize. Clearly, each |$\lambda _i^{\boldsymbol{P}}, i \in \mathcal{I}_2$| is an eigenvalue of |$\widetilde{\boldsymbol{D}}$|⁠, and it corresponds to an eigenvector |$\boldsymbol{e}_{j(i)}$|⁠, where |$j(i) \ge 3$| is the row index of |$\lambda _i^{\boldsymbol{P}}$| in |$\widetilde{\boldsymbol{D}}$|⁠. The eigenvalues and eigenvectors of |$\boldsymbol{s}$| can also be precisely characterized. Due to its shape, |$\boldsymbol{s}$| is sometimes referred to in the literature as an arrowhead matrix [37,43]. It can be shown (see for instance [49, pp. 94–97]) that |$\lambda _1^{\boldsymbol{s}}$| is the unique point within the interval |$\lambda> \max \left \{\lambda _i: i \in \mathcal{I}_1\right \}$| to satisfy the equation $$\begin{equation} a = \lambda_1^{\boldsymbol{s}} + R(\lambda_1^{\boldsymbol{s}}), \end{equation}$$ (A.5) where |$R(\lambda )$| is the function defined in (3.8). (Alternatively, we can use the Laplace expansion to explicitly derive the characteristic polynomial of |$\boldsymbol{S}$| as $$\begin{equation*} (\lambda - a) \prod_{i \in \mathcal{I}_1} (\lambda - \lambda_i^{\boldsymbol{P}}) - \sum_{i \in \mathcal{I}_1} (\boldsymbol{w}_i^\top \boldsymbol{q})^2 \prod_{j \in \mathcal{I}_1 \setminus i}(\lambda - \lambda^{\boldsymbol{P}}_j). \end{equation*}$$ Then by following similar arguments as those used in the proof of Lemma 3.1, we can reach the characterization (A.5) about |$\lambda _1^{\boldsymbol{s}}$|⁠.) Furthermore, let |$\boldsymbol{x}_1^{\boldsymbol{s}}$| be a unit-norm eigenvector of |$\widetilde{\boldsymbol{D}}$| associated with |$\lambda _1^{\boldsymbol{s}}$|⁠. It is easily checked that $$\begin{equation} \boldsymbol{x}_1^{\boldsymbol{s}} = \begin{bmatrix}1 & \boldsymbol{y} & \boldsymbol{0}_r\end{bmatrix} / (1 + \lVert\boldsymbol{y}\rVert^2)^{1/2}, \end{equation}$$ (A.6) where |$\boldsymbol{y} = (\lambda _1^{\boldsymbol{s}}\boldsymbol{I} - {\operatorname{diag}}\left \{\lambda _i^{\boldsymbol{P}}\right \}_{i \in \mathcal{I}_1})^{-1} \widetilde{\boldsymbol{q}}$| and |$\boldsymbol{0}_r$| is a row vector of |$r$| zeroes with |$r$| being the cardinality of |$\mathcal{I}_2$|⁠. It follows that $$\begin{align} (\boldsymbol{e}_1^\top \boldsymbol{x}_1^{\boldsymbol{s}})^2 &= \Big(1 + \widetilde{\boldsymbol{q}}^\top\big(\lambda_1^{\boldsymbol{s}}\boldsymbol{I} - {\operatorname{diag}}\big\{\lambda_i^{\boldsymbol{P}}\big\}_{i \in \mathcal{I}_1}\big)^{-2} \widetilde{\boldsymbol{q}}\Big)^{-1}\nonumber\\ &= \big(1 + \boldsymbol{q}^\top (\lambda_1^{\boldsymbol{s}}\boldsymbol{I} - \boldsymbol{P})^{-2} \boldsymbol{q}\big)^{-1}\nonumber\\ &= \big(1 + R^{\prime}(\lambda_1^{\boldsymbol{s}})\big)^{-1}, \end{align}$$ (A.7) where |$R^{\prime}(\lambda )$| denotes the derivative of the function |$R(\lambda )$|⁠. To show the claim of the proposition, we consider the following three cases. Case 1: |$\lambda _1^{\boldsymbol{s}}> \max \left \{\lambda _i^{\boldsymbol{P}}: i \in \mathcal{I}_2\right \}$|⁠. We choose |$\mu ^\ast = -1/R(\lambda _1^{\boldsymbol{s}})$|⁠, and thus |$\lambda _1^{\boldsymbol{s}} = R^{-1}(-1/\mu ^\ast )$|⁠. It follows from Lemma 3.1 that $$\begin{equation*} L(\mu^\ast) = \lambda_1^{\boldsymbol{s}} \vee \lambda_1^{\boldsymbol{P}} = \lambda_1^{\boldsymbol{s}} = \lambda_1^{\widetilde{\boldsymbol{D}}}, \end{equation*}$$ where the second equality is due to the fact that $$\begin{equation} \lambda_1^{\boldsymbol{s}}> \max\big\{\lambda_i^{\boldsymbol{P}}: i \in \mathcal{I}_1\big\}, \end{equation}$$ (A.8) and the last equality comes from (A.4). Using the identity (A.5) for |$\lambda _1^{\boldsymbol{s}}$|⁠, we can also verify that |$\mu ^\ast $| indeed satisfies the equation (3.14). (Its uniqueness is always guaranteed; see Remark 3.2 at the end of Section 3.2.) The unit-norm leading eigenvector of |$\widetilde{\boldsymbol{D}}$| in this case is the vector |$\boldsymbol{x}_1^{\boldsymbol{s}}$| defined in (A.6). Since |$L(\mu ) = R^{-1}(-1/\mu )$| in a neighborhood of |$\mu ^\ast $|⁠, the function |$L(\mu )$| is differentiable at |$\mu ^\ast $| and $$\begin{equation} L^{\prime}(\mu^\ast) = \left(1 / R^{\prime}(\lambda_1^{\boldsymbol{s}})\right) (\mu^\ast)^{-2}. \end{equation}$$ (A.9) Substituting (A.9) into (A.7) leads to (3.16). Case 2: |$\lambda _1^{\boldsymbol{s}} < \max \left \{\lambda _i^{\boldsymbol{P}}: i \in \mathcal{I}_2\right \}$|⁠, in which case |$\lambda _1^{\widetilde{D}} = \max \left \{\lambda _i^{\boldsymbol{P}}: i \in \mathcal{I}_2\right \} = \lambda _1^{\boldsymbol{P}}$|⁠, where the last equality is due to (A.8). The corresponding leading eigenvector has non-zero elements only in its last |$r$| entries, where |$r$| is the cardinality of |$\mathcal{I}_2$|⁠. Thus, $$\begin{equation} (\boldsymbol{e}_1^\top \widetilde{\boldsymbol{x}}_1)^2 = 0. \end{equation}$$ (A.10) We set |$\mu ^\ast = (\lambda _1^{\boldsymbol{P}} - a)^{-1}$|⁠. (Note that we are guaranteed to have |$\mu ^\ast> 0$|⁠. This can be verified by observing that |$\lambda _1^{\boldsymbol{P}}> \lambda _1^{\boldsymbol{s}} = a - R(\lambda _1^{\boldsymbol{s}}) > a$|⁠, where the equality is due to (A.5) and the last inequality follows from the fact that |$R(\lambda ) < 0$|⁠.) Since |$R(\lambda )$| is a strictly increasing function, we have $$\begin{equation*} R^{-1}(-1/\mu^\ast) < R^{-1}(a - \lambda_1^{\boldsymbol{s}}) = \lambda_1^{\boldsymbol{s}} < \lambda_1^{\boldsymbol{P}}, \end{equation*}$$ where the equality comes from (A.5). It then follows from Lemma 3.1 that |$L(\mu ^\ast ) = \lambda _1^{\boldsymbol{P}} = \lambda _1^{\widetilde{\boldsymbol{D}}}$| and, moreover, |$\mu ^\ast $| satisfies the equation (3.14). To characterize the eigenvector, we note that |$L(\mu ) \equiv \lambda _1^{\boldsymbol{P}}$| in a neighborhood of |$\mu ^\ast $|⁠. We then have |$L^{\prime}(\mu ^\ast ) = 0$|⁠, which, together with (A.10), leads to (3.16). Case 3: |$\lambda _1^{\boldsymbol{s}} = \max \left \{\lambda _i^{\boldsymbol{P}}: i \in \mathcal{I}_2\right \}$|⁠. This is a special case, where the algebraic multiplicity of the leading eigenvalue |$\lambda _1^{\widetilde{\boldsymbol{D}}} = \lambda _1^{\boldsymbol{s}} = \lambda _1^{\boldsymbol{P}}$| is greater than one. The leading eigenvectors are not unique, and they can be any vector in the form of $$\begin{equation*} c_1 \boldsymbol{x}_1^{\boldsymbol{s}} + c_2 \boldsymbol{v}, \end{equation*}$$ where |$\boldsymbol{x}_1^{\boldsymbol{s}}$| is the eigenvector defined in (A.6) and |$\boldsymbol{v}$| is an eigenvector associated with |$\max \left \{\lambda _i^{\boldsymbol{P}}: i \in \mathcal{I}_2\right \}$|⁠, and |$c_1, c_2$| are two constants satisfying |$c_1^2 + c_2^2 = 1$|⁠. Since |$\boldsymbol{e}_1^\top \boldsymbol{v} = 0$|⁠, we have from (A.7) that $$\begin{equation} (\boldsymbol{e}_1^\top \widetilde{\boldsymbol{x}}_1)^2 \in \big[0, \big(1 + R^{\prime}(\lambda_1^{\boldsymbol{s}})\big)^{-1}\big]. \end{equation}$$ (A.11) Same as what we did in Case 2, we set |$\mu ^\ast = (\lambda _1^{\boldsymbol{P}} - a)^{-1}$|⁠. Following the same arguments there, we can show that |$L(\mu ^\ast ) = \lambda _1^{\boldsymbol{P}} = \lambda _1^{\widetilde{\boldsymbol{D}}}$| and |$\mu ^\ast $| satisfies the equation (3.14). Moreover, we can see that |$L(\mu ) = R^{-1}(-1/\mu )$| for |$\mu> \mu ^\ast $| and |$L(\mu ) \equiv \lambda _1^{\boldsymbol{P}}$| for |$\mu < \mu ^\ast $|⁠. The function |$L(\mu )$| is not differentiable at |$\mu ^\ast $|⁠, but its right and left derivatives do exist. It is easy to get |$\partial _{+}L(\mu ^\ast ) = \left (1 / R^{\prime}(\lambda _1^{\boldsymbol{s}})\right ) (\mu ^\ast )^{-2}$| [see (A.9)] and |$\partial _{-}L(\mu ^\ast ) = 0$|⁠. Substituting these quantities into (A.11), we reach the characterization given in (3.15). A.4 Proof of Proposition 3.2 To establish the almost-sure convergence of the random measure |$f^{\boldsymbol{M}_m}(\lambda )$| to the probability law of |$z$|⁠, we just need to show that, almost surely, the empirical distribution function $$\begin{align*} F^{\boldsymbol{M}_m}(\lambda) = \frac{1}{m-1} \#\left\{2 \le j \le m: \lambda_j^{\boldsymbol{M}_m} \le \lambda\right\} \end{align*}$$ converges to |$F_z(\lambda )$|⁠, the cumulative distribution function of z, at all points |$\lambda $| where |$F_z(\lambda )$| is continuous. Since |$\boldsymbol{M}_m$| is a rank-one perturbation of the diagonal matrix |$\boldsymbol{Z}$|⁠, standard interlacing theorems (see [21, Theorem 4.3.4]) give us $$\begin{equation} \lambda_k^{\boldsymbol{Z}} \ge \lambda_{k+1}^{\boldsymbol{M}_m} \ge \lambda_{k+2}^{\boldsymbol{Z}}, \end{equation}$$ (A.12) for |$1 \le k \le m-2$|⁠. Let |$F^{\boldsymbol{Z}}(\lambda ) = \frac{1}{m} \# \{1 \le j \le m: z_j \le \lambda \}$| be the empirical distribution function of the eigenvalues of |$\boldsymbol{Z}$|⁠. We can then easily verify from (A.12) that $$\begin{equation} m F^{\boldsymbol{Z}}(\lambda) - 2 \le (m-1) F^{\boldsymbol{M}_m}(\lambda) \le m F^{\boldsymbol{Z}}(\lambda) + 1. \end{equation}$$ (A.13) Since |$ \{z_i \}_{1 \le i \le m}$| is an i.i.d. sample of the random variable |$z$|⁠, with probability one |$F^{\boldsymbol{Z}}(\lambda )$| converges to |$F_z(\lambda )$| all points |$\lambda $| where |$F_z(\lambda )$| is continuous. It then follows from (A.13) that |$F^{\boldsymbol{M}_m}(\lambda )$| converges almost surely to the same limit |$F_z(\lambda )$|⁠. To study the leading eigenvalue |$\lambda _1^{\boldsymbol{M}_m}$|⁠, we use Lemma 3.1. To apply that result, we require |$\boldsymbol{v} = [z_1 s_1, z_2 s_2, \ldots , z_m s_m]^\top $| as defined in (3.6) to be not equal to the all-zero vector. This condition holds almost surely for all sufficiently large |$m$|⁠. To see this, we note that |$\mathbb{P}(z_i = 0) < 1$|⁠, as otherwise assumption (A.7) will not hold. Moreover, |$s_i \neq 0$| with probability one. It follows that the i.i.d. sequence |$z_1 s_1, z_2 s_2, z_3 s_3, \ldots $| has an infinite number of non-zero elements. Thus, almost surely, the |$m$|-dimensional vector |$\boldsymbol{v} \neq \boldsymbol{0}$| for sufficiently large |$m$|⁠. Applying (3.9) to our case, we have |$\lambda _1^{\boldsymbol{M}_m} = R_m^{-1}(-1/\mu ) \vee \max \{z_i \}_{1 \le i \le m}$|⁠, where $$\begin{equation*} R_m(\lambda) = \frac{1}{m}\sum_{i=1}^{m} \frac{z_i^2 s_i^2}{z_i - \lambda} \end{equation*}$$ with this function defined on |$\lambda> \max \{z_i \}_{1 \le i \le m}$|⁠. Since |$R_m^{-1}(-1/\mu )> \max \{z_i \}_{1 \le i \le m}$|⁠, we can further simplify the characterization to $$\begin{equation*} \lambda_1^{\boldsymbol{M}_m} = R_m^{-1}(-1/\mu). \end{equation*}$$ For every |$\lambda> \tau $|⁠, with |$\tau $| being the upper bound of the support of the probability distribution of |$z$|⁠, it follows from the strong law of large numbers that |$R_m(\lambda )$| converges almost surely to $$\begin{equation*} \mathbb{E}\frac{z^2 s^2}{z - \lambda} = -Q(\lambda), \end{equation*}$$ where |$Q(\lambda )$| is defined in (3.21). On its domain |$\lambda> \tau $|⁠, the function |$-Q(\lambda )$| is strictly increasing and thus it admits a functional inverse |$(-Q)^{-1}(x) = Q^{-1}(-x)$|⁠. Applying Lemma A.1 in Appendix A.5, we have $$\begin{equation*} \lambda_1^{\boldsymbol{M}_m} = R_m^{-1}(-1/\mu) \overset{\textrm{a.s.}}{\longrightarrow} Q^{-1}(1/\mu) \end{equation*}$$ as |$m \rightarrow \infty $|⁠. A.5 Auxiliary lemmas We prove here two auxiliary lemmas that are used in our proofs of Proposition 3.2 and Theorem 2.1. Lemma A.1 Let |$\left \{f_n(x)\right \}_{n \ge 1}$| be a family of (random) functions defined on an open interval |$(a, b)$|⁠. Each |$f_n(x)$| is continuous and non-decreasing. For each |$x \in (a, b)$|⁠, |$f_n(x) \overset{\textrm{a.s.}}{\longrightarrow } f(x)$| as |$n \rightarrow \infty $|⁠, where |$f(x)$| is a continuous and non-decreasing function. Then, for any sequence |$\left \{x_n\right \} \subset (a, b)$| with |$x_n \overset{\textrm{a.s.}}{\longrightarrow } x^\ast \in (a, b)$|⁠, we have $$\begin{equation} f_n(x_n) \overset{\textrm{a.s.}}{\longrightarrow} f(x^\ast). \end{equation}$$ (A.14) If, in addition, the functions |$\left \{f_n(x)\right \}$| and |$f(x)$| are strictly increasing, we denote by |$\left \{f_n^{-1}(x)\right \}_{n \ge 1}$| and |$f^{-1}(x)$| the corresponding functional inverses. Assume that the domains of |$\left \{f_n^{-1}(x)\right \}_{n \ge 1}$| and |$f^{-1}(x)$| contain a common open interval |$\mathcal{I}$|⁠. Then for any sequence |$\left \{y_n\right \}_{n \ge 1} \subset \mathcal{I}$| such that |$y_n \overset{\textrm{a.s.}}{\longrightarrow } y \in \mathcal{I}$|⁠, we have $$\begin{equation} f_n^{-1}(y_n) \overset{\textrm{a.s.}}{\longrightarrow} f^{-1}(y). \end{equation}$$ (A.15) Proof. We first show (A.14). Let |$\beta _k = x^\ast - h / k$| for |$k = 1, 2, \ldots $| be a sequence that converges to |$x^\ast $| from the left. We choose |$h < \min \left \{x^\ast - a, b - x^\ast \right \}$| so that the entire sequence stays within the interval |$(a, b)$|⁠. Similarly, define a sequence |$\gamma _k = x^\ast + h/k$|⁠, for |$k = 1, 2, \ldots $|⁠, that converges to |$x^\ast $| from the right. Denote by |$\mathcal{A}$| the intersection of the event that |$f_n(x) \rightarrow f(x)$| for all |$x \in \left \{\beta _k\right \}_k \cup \left \{\gamma _k\right \}_k$| and the event that |$x_n \rightarrow x$|⁠. Clearly, |$\mathbb{P}(\mathcal{A}) = 1$|⁠. Next we show that (A.14) holds within this almost sure event. Fix |$k \ge 1$|⁠. As |$x_n \rightarrow x^\ast $|⁠, we have |$\beta _k \le x_n \le \gamma _k$| for all sufficiently large |$n$|⁠. By the monotonicity of |$f_n(x)$|⁠, $$\begin{equation*} f_n(\beta_k) \le f_n(x_n) \le f_n(\gamma_k). \end{equation*}$$ It follows that $$\begin{equation*} f(\beta_k) \le \underset{n}{\lim\inf} \,f_n(x_n) \le \underset{n}{\lim\sup} \, f_n(x_n) \le f(\gamma_k). \end{equation*}$$ As |$k$| is arbitrary, we take the |$k \rightarrow \infty $| limit, which leads to |$\lim _n f_n(x_n) = f(x)$| by the continuity of |$f(x)$|⁠. The proof of (A.15) is similar. We establish it under the additional assumption that |$\left \{f_n(x)\right \}$| and |$f(x)$| are strictly increasing. Construct two sequences |$\left \{\beta _k\right \}$| and |$\left \{\gamma _k\right \}$| as above, with |$x^\ast $| replaced by |$f^{-1}(y)$|⁠. Also define the event |$\mathcal{A}$| similarly. We show that, within the almost sure event |$\mathcal{A}$|⁠, we have |$f_n^{-1}(y_n) \rightarrow f^{-1}(y)$|⁠. Fix |$k \ge 1$|⁠. Since |$f(x)$| is strictly increasing, |$\beta _k < f^{-1}(y) < \gamma _k$| implies that $$\begin{equation*} f(\beta_k) < y < f(\gamma_k). \end{equation*}$$ As |$f_n(\beta _k) \rightarrow f(\beta _k)$|⁠, |$f_n(\gamma _k) \rightarrow f(\gamma _k)$| and |$y_n \rightarrow y$|⁠, the inequalities $$\begin{equation*} f_n(\beta_k) < y_n < f_n(\gamma_k), \end{equation*}$$ hold for all sufficiently large |$n$|⁠. By the strict monotonicity of |$f_n(x)$|⁠, $$\begin{equation*} \beta_k < f_n^{-1}(y_n) < \gamma_k, \end{equation*}$$ for all sufficiently large |$n$|⁠. It then follows that |$\beta _k \le \lim \inf _n f_n^{-1}(y_n) \le \lim \sup _n f_n^{-1}(y_n) \le \gamma _k$|⁠, for each |$k$|⁠. As |$\beta _k \rightarrow f^{-1}(y)$| and |$\gamma _k \rightarrow f^{-1}(y)$|⁠, we are done. Lemma A.2 Let |$\left \{f_n(x)\right \}_{n \ge 1}$| be a sequence of (random) convex functions defined on an open interval |$(a, b)$|⁠. For each |$x \in (a, b)$|⁠, |$f_n(x) \overset{\textrm{a.s.}}{\longrightarrow } f(x)$|⁠. Let |$ \{x_n \}_{n \ge 1} \subset (a, b)$| be a sequence such that |$x_n \overset{\textrm{a.s.}}{\longrightarrow } x^\ast $| for some |$x^\ast \in (a, b)$|⁠. If |$f(x)$| is differentiable at |$x^\ast $|⁠, then $$\begin{equation} \partial_- f_n(x_n) \overset{\textrm{a.s.}}{\longrightarrow} f^{\prime}(x^\ast) \quad\textrm{and}\quad \partial_+ f_n(x_n) \overset{\textrm{a.s.}}{\longrightarrow} f^{\prime}(x^\ast), \end{equation}$$ (A.16) where |$\partial _- f_n(x)$| and |$\partial _+ f_n(x)$| denote the left and right derivatives of |$f_n(x)$|⁠, respectively. Proof. Similar to the proof of Lemma A.1, we construct two sequences: |$\left \{\beta _k\right \}_{k \ge 1}$| is strictly increasing and converges to |$x^\ast $| from the left, whereas |$\left \{\gamma _k\right \}_{k \ge 1}$| is strictly decreasing and converges to |$x^\ast $| from the right. Denote by |$\mathcal{A}$| the intersection of the event that |$f_n(x) \rightarrow f(x)$| for all |$x \in \left \{\beta _k\right \}_k \cup \left \{\gamma _k\right \}_k$| and the event that |$x_n \rightarrow x^\ast $|⁠. It is easily checked that |$\mathbb{P}(\mathcal{A}) = 1$|⁠. Next we establish (A.16) within this almost sure event. For any |$i < j$|⁠, since |$\beta _i < \beta _j < x^\ast $| and |$x_n \rightarrow x^\ast $|⁠, we must have |$\beta _i < \beta _j < x_n$| for all sufficiently large |$n$|⁠. By the convexity of |$f_n(x)$|⁠, its left derivatives always exist and we have $$\begin{equation*} \partial_- f_n(x_n) \ge \frac{f_n(\beta_i) - f_n(\beta_j)}{\beta_i - \beta_j}, \end{equation*}$$ for all sufficiently large |$n$|⁠. It follows that $$\begin{equation*} \underset{n}{\lim\inf} \; \partial_- f_n(x_n) \ge \frac{f(\beta_i) - f(\beta_j)}{\beta_i - \beta_j} \end{equation*}$$ for all |$i < j$|⁠. Since $$\begin{align*} \lim_{i \rightarrow \infty} \left(\lim_{j \rightarrow \infty} \frac{f(a_i) - f(a_j)}{a_i - a_j}\right) = f^{\prime}(x^\ast), \end{align*}$$ we must have $$\begin{equation} \underset{n}{\lim\inf} \; \partial_- f_n(x_n) \ge f^{\prime}(x^\ast). \end{equation}$$ (A.17) Working with the sequence |$ \{\gamma _k \}_{k \ge 1}$| and using similar arguments as above, we can show that $$\begin{equation} \underset{n}{\lim\sup} \; \partial_+ f_n(x_n) \le f^{\prime}(x^\ast). \end{equation}$$ (A.18) Since |$\lim \sup _n \partial _-f_n(x_n) \le \lim \sup _n \partial _+f_n(x_n)$|⁠, we use (A.17) and (A.18) to conclude that |$\lim _n \partial _-f_n(x_n)$| exists and that it is equal to |$f^{\prime}(x^\ast )$|⁠. By similar arguments, the same claim also holds for the sequence |$\left \{\partial _+f_n(x_n)\right \}$|⁠, and thus the proof is complete. References 1. Arora , S. , Ge , R. , Ma , T. & Moitra , A. ( 2015 ) Simple, efficient, and neural algorithms for sparse coding . Proc. ACM Conference on Learning Theory (COLT) , pp. 113 – 149 . 2. Bahmani , S. & Romberg , J. ( 2016 ) Phase retrieval meets statistical learning theory: a flexible convex relaxation . Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS'17), vol. 54 of Proceedings of Machine Learning Research , 252 – 260 . 3. Bai , Z. & Silverstein , J. W. ( 2010 ) Spectral Analysis of Large Dimensional Random Matrices . Springer Series in Statistics . New York, NY : Springer New York . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC 4. Bai , Z. & Yao , J.-F. ( 2008 ) Central limit theorems for eigenvalues in a spiked population model . Ann. Inst. Henri Poincar’e Probab. Stat. , 44 , 447 – 474 . Google Scholar Crossref Search ADS WorldCat 5. Bai , Z. D. & Yao , J. ( 2012 ) On sample eigenvalues in a generalized spiked population model . J. Multivariate Anal. , 106 , 167 – 177 . Google Scholar Crossref Search ADS WorldCat 6. Baik , J. , Arous , G. B. & Péché , S. ( 2005 ) Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices . Ann. Probab. , 33 , 1643 – 1697 . Google Scholar Crossref Search ADS WorldCat 7. Benaych-Georges , F. & Nadakuditi , R. R. ( 2011 ) The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices . Adv. Math. , 227 , 494 – 521 . Google Scholar Crossref Search ADS WorldCat 8. Candes , E. J. & Li , X. ( 2014 ) Solving quadratic equations via PhaseLift when there are about as many equations as unknowns . Found. Comput. Math. , 14 , 1017 – 1026 . Google Scholar Crossref Search ADS WorldCat 9. Candes , E. J. , Li , X. & Soltanolkotabi , M. ( 2015 ) Phase retrieval via Wirtinger flow: theory and algorithms . IEEE Trans. Inform. Theory , 61 , 1985 – 2007 . Google Scholar Crossref Search ADS WorldCat 10. Candes , E. J. , Strohmer , T. & Voroninski , V. ( 2013 ) Phaselift: exact and stable signal recovery from magnitude measurements via convex programming . Comm. Pure Appl. Math. , 66 , 1241 – 1274 . Google Scholar Crossref Search ADS WorldCat 11. Chen , Y. & Candès , E. ( 2018 ) The projected power method: an efficient algorithm for joint alignment from pairwise differences . Communications on Pure and Applied Mathematics , 71 ( 8 ), 1648 – 1714 . 12. Chen , Y. & Candes , E. J. ( 2017 ) Solving random quadratic systems of equations is nearly as easy as solving linear systems . Communications on Pure and Applied Mathematics , 70 , 822 – 883 . 13. Chi , Y. & Lu , Y. M. ( 2016 ) Kaczmarz method for solving quadratic equations . IEEE Signal Process. Lett. , 23 , 1183 – 1187 . Google Scholar Crossref Search ADS WorldCat 14. Dhifallah , O. & Lu , Y. M. ( 2017 ) Fundamental limits of PhaseMax for phase retrieval: a replica analysis . Proc. 7th IEEE Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP) , Curacao : Dutch Antilles . 15. Dhifallah , O. , Thrampoulidis , C. & Lu , Y. M. ( 2018 ) Phase retreival via polytope optimization: geometry, phase transitions, and new algorithms . arXiv preprint:1805.09555 [cs.IT] . 16. Dobson , A. J. & Barnett , A. ( 2008 ) An Introduction to Generalized Linear Models , 3rd edn. Boca Raton : Chapman and Hall/CRC . Google Preview WorldCat COPAC 17. Fienup , J. R. ( 1982 ) Phase retrieval algorithms: a comparison . Appl. Optics , 21 , 2758 – 2769 . Google Scholar Crossref Search ADS WorldCat 18. Goldstein , T. & Studer , C. ( 2018 ) PhaseMax: convex phase retrieval via basis pursuit . IEEE Transactions on Information Theory , 64 , 2675 – 2689 . 19. Harville , D. A. ( 2000 ) Matrix Algebra from a Statistician’s Perspective . New York, NY : Springer-Verlag . Google Preview WorldCat COPAC 20. Hastie , T. , Tibshirani , R. & Friedman , J. ( 2009 , 2011 ) The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2nd edn. New York, NY : Springer corr. 7th printing 2013 edition . Google Preview WorldCat COPAC 21. Horn , R. A. & Johnson , C. R. ( 1985 ) Matrix Analysis . Cambridge, UK : Cambridge University Press . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC 22. Jaganathan , K. , Oymak , S. & Hassibi , B. ( 2013 ) Sparse phase retrieval: convex algorithms and limitations . 2013 IEEE International Symposium on Information Theory Proceedings (ISIT) . Istanbul , pp. 1022 – 1026 . Google Preview WorldCat COPAC 23. Jain , P. , Netrapalli , P. & Sanghavi , S. ( 2013 ) Low-rank matrix completion using alternating minimization . Proc. ACM Symposium on Theory of Computing (STOC) . Palo Alto, USA , pp. 665 – 674 . Google Preview WorldCat COPAC 24. Johnstone , I. M. ( 2001 ) On the distribution of the largest eigenvalue in principal components analysis . Ann. Statist. , 29 , 295 – 327 . Google Scholar Crossref Search ADS WorldCat 25. Keshavan , R. H. , Montanari , A. & Oh , S. ( 2010 ) Matrix completion from a few entries . IEEE Trans. Inf. Theory , 56 , 2980 – 2998 . Google Scholar Crossref Search ADS WorldCat 26. Lee , K. , Li , Y. , Junge , M. & Bresler , Y. ( 2017 ) Blind recovery of sparse signals from subsampled convolution . IEEE Trans. Inf. Theory , 63 , 802 – 821 . Google Scholar Crossref Search ADS WorldCat 27. Lehmann , E. L. & Casella , G. ( 2003 ) Theory of Point Estimation , 2nd edn. New York : Springer . Google Preview WorldCat COPAC 28. Li , G. , Gu , Y. & Lu , Y. M. ( 2015 ) Phase retrieval using iterative projections: dynamics in the large systems limit . Proc. Allerton Conference on Communication, Control and Computing . Monticello, IL . Google Preview WorldCat COPAC 29. Li , K.-C. ( 1992 ) On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma . J. Am. Stat. Assoc. , 87 , 1025 – 1039 . Google Scholar Crossref Search ADS WorldCat 30. Li , X. , Ling , S. , Strohmer , T. & Wei , K. ( 2016 ) Rapid, robust, and reliable blind rapid, robust, and reliable blind deconvolution via nonconvex optimization . Applied and Computational Harmonic Analysis , 47 , 893 – 934 . 31. Lu , Y. M. & Li , G. ( 2017a ) Phase transitions of spectral initialization for high-dimensional nonconvex estimation . arXiv preprint:1702.06435 [cs.IT] . 32. Lu , Y. M. & Li , G. ( 2017b ) Spectral initialization for nonconvex estimation: high-dimensional limit and phase transitions . Proc. IEEE International Symposium on Information Theory . Aachen, Germany. 33. Luo , W. , Alghamdi , W. & Lu , Y. M. ( 2019 ) Optimal spectral initialization for signal recovery with applications to phase retrieval . IEEE Trans. Signal Process. , 67 , 2347 – 2356 . Google Scholar Crossref Search ADS WorldCat 34. Marčenko , V. A. & Pastur , L. A. ( 1967 ) Distribution of eigenvalues for some sets of random matrices . Math. USSR-Sbornik , 1 , 457 . Google Scholar Crossref Search ADS WorldCat 35. Mondelli , M. & Montanari , A. ( 2017 ) Fundamental limits of weak recovery with applications to phase retrieval . Foundations of Computational Mathematics , 19 , 703 – 773 . 36. Netrapalli , P. , Jain , P. & Sanghavi , S. ( 2013 ) Phase retrieval using alternating minimization . Advances in Neural Information Processing Systems , Lake Tahoe , pp. 2796 – 2804 . 37. O’Leary , D. P. & Stewart , G. W. ( 1990 ) Computing the eigenvalues and eigenvectors of symmetric arrowhead matrices . J. Comput. Phys. , 90 , 497 – 505 . Google Scholar Crossref Search ADS WorldCat 38. Plan , Y. , Vershynin , R. & Yudovina , E. ( 2017 ) High-dimensional estimation with geometric constraints . Inf. Inference , 6 , 1 – 40 . WorldCat 39. Rangan , S. & Goyal , V. K. ( 2001 ) Recursive consistent estimation with bounded noise . IEEE Trans. Inform. Theory , 47 , 457 – 464 . Google Scholar Crossref Search ADS WorldCat 40. Silverstein , J. W. & Bai , Z. D. ( 1995 ) On the empirical distribution of eigenvalues of a class of large dimensional random matrices . J. Multivariate Anal. , 54 , 175 – 192 . Google Scholar Crossref Search ADS WorldCat 41. Silverstein , J. W. & Choi , S. I. ( 1995 ) Analysis of the limiting spectral distribution of large dimensional random matrices . J. Multivariate Anal. , 54 , 295 – 309 . Google Scholar Crossref Search ADS WorldCat 42. Soltanolkotabi , M. ( 2017 ) Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization . IEEE Transactions on Information Theory , 65 , 2374 – 2400 . 43. Stor , N. J. , Slapnicar , I. & Barlow , J. L. ( 2015 ) Accurate eigenvalue decomposition of arrowhead matrices and applications . Linear Algebra Appl. , 464 , 62 – 89 . Google Scholar Crossref Search ADS WorldCat 44. Tu , S. , Boczar , R. , Simchowitz , M. , Soltanolkotabi , M. & Recht , B. ( 2016 ) Low-rank solutions of linear matrix equations via procrustes flow . Proc. International Conference on Machine Learning (ICML) . New York, NY . Google Preview WorldCat COPAC 45. Unser , M. & Eden , M. ( 1988 ) Maximum likelihood estimation of linear signal parameters for Poisson processes . IEEE Trans. Acoust., Speech, Signal Process. , 36 , 942 – 945 . Google Scholar Crossref Search ADS WorldCat 46. Vershynin , R. ( 2012 ) Introduction to the non-asymptotic analysis of random matrices . Compressed Sensing: Theory and Applications . ( Y. C. Eldar & G. Kutyniok eds). Cambridge, UK : Cambridge University Press , pp. 210 – 268 . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC 47. Waldspurger , I. , d’Aspremont , A. & Mallat , S. ( 2015 ) Phase recovery, maxcut and complex semidefinite programming . Math. Programming , 149 , 47 – 81 . Google Scholar Crossref Search ADS WorldCat 48. Wang , G. , Giannakis , G. B. & Eldar , Y. C. ( 2016 ) Solving systems of random quadratic equations via truncated amplitude flow . IEEE Transactions on Information Theory , 64 , 773 – 794 . 49. Wilkinson , J. H. ( 1988 ) The Algebraic Eigenvalue Problem . Oxford, UK : Clarendon Press . Google Preview WorldCat COPAC 50. Yang , F. , Lu , Y. M. , Sbaiz , L. & Vetterli , M. ( 2012 ) Bits from photons: oversampled image acquisition using binary Poisson statistics . IEEE Trans. Image Process. , 21 , 1421 – 1436 . Google Scholar Crossref Search ADS PubMed WorldCat 51. Zhang , H. & Liang , Y. ( 2016 ) Reshaped wirtinger flow for solving quadratic system of equations . Advances in Neural Information Processing Systems , Barcelona, Spain , pp. 2622 – 2630 . © The Author(s) 2019. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - Phase transitions of spectral initialization for high-dimensional non-convex estimation JF - Information and Inference: A Journal of the IMA DO - 10.1093/imaiai/iaz020 DA - 2006-10-01 UR - https://www.deepdyve.com/lp/oxford-university-press/phase-transitions-of-spectral-initialization-for-high-dimensional-non-2M2LsHO2eL SP - 1 VL - Advance Article IS - DP - DeepDyve ER -