p^m_c$|. For every |$\epsilon>0$| the following two conditions hold: (1) For every constant |$M$| there exists |$l$| such that \begin{equation} \mathbf{P}_{p}^{m}(|\hat{\xi} _{t}^{0}|\geq M\ \forall t\geq l)\geq \rho _{p}^{m}-\epsilon . \label{eq:com1} \end{equation} (A.1) (2) There exists a constant |$M_\epsilon$| such that \begin{equation} \label{eq:com2} \mathbf{P}^m_{p}( |C|\leq M_\epsilon)>1-\rho^m_p-\epsilon. \end{equation} (A.2) Proof. The existence of |$M_\epsilon$| readily follows from the fact that |$|C|$| is finite with probability |$1-\rho^m_p$|. We turn to the proof of the first statement. We shall show first that for every constant |$M_p$|, \begin{equation}\label{eq:ne1} \mathbf{P}^m_p( 1\leq|\hat{\xi}^0_t|\leq M_p\ \text{infinitely often}\ )=0. \end{equation} (A.3) For each |$k$|, let |$A_{k}$| be the event that the inequality |$1\leq |\hat{\xi} _{t}^{0}|\leq M_{p}$| holds for at least |$k$| distinct times. For |$l\geq k,$| let |$A_{k,l}$| be the event that at time |$l$| the inequality |$1\leq |\hat{\xi} _{t}^{0}|\leq M_{p}$| holds for the |$k$|-th time between times |$0$| and |$l$|. Note that |$A_{k}=\cup _{l\geq k}A_{k,l}$| and that |$A_{k,l}\cap A_{k,m}=\varnothing $| for |$m\neq l$|. Note that if |$|\hat{\xi}_t^0|\leq M_p$|, then the conditional probability that |$\hat{\xi}_{t+1}^0=\varnothing$| is at least |$(1-p)^{mM_{p}}$|, which is the probability that no edge exiting from |$M_p$| nodes is formed. Therefore, conditional on |$|\hat{\xi}^0_t|\leq M_p,$| the probability that |$\hat{\xi}^0_{t+1}\neq\varnothing$| is bounded from above by |$\delta =1-(1-p)^{mM_{p}}.$| We shall show by induction that |$\mathbf{P}^m_{p}(A_{k})\leq \delta ^{k-1}$|. For |$ k=1$| we obviously have |$\mathbf{P}^m_{p}(A_{1})\leq 1$|. Assume that |$\mathbf{P}^m_{p}(A_{k})\leq \delta ^{k-1}$|. We shall show that the following holds for |$ \mathbf{P}_{p}^m(A_{k+1})$|: \begin{align*} \mathbf{P}^m_{p}(A_{k+1}) \notag =\sum_{l=k}^{\infty }\mathbf{P}^m_p(A_{k,l})\mathbf{P}^m_{p}(A_{k+1}|A_{k,l}) \leq \sum_{l=k}^{\infty }\mathbf{P}^m_{p}(A_{k,l})\delta \label{eq:2} =\mathbf{P}^m_{p}(A_{k})\delta \leq \delta ^{k}. \end{align*} The first equality follows from the law of total probability and the fact that |$A_{k+1}\subset A_{k}$|. The fact that |$\mathbf{P}^m_{p}(|\hat{\xi}_{l+1}^{0}|=0|\ |\hat{\xi}_{l}^{0}|\leq M_{p})\geq 1-\delta $| together with |$A_{k+1}\cap A_{k,l}\subset \{|\hat{\xi}_{l+1}^{0}|=0\}^{c}\cap A_{k,l}$| imply that |$\mathbf{P}^m _{p}(A_{k+1}|A_{k,l})\leq \delta.$| Hence, the first inequality follows. The last inequality follows from the induction hypothesis. Hence, since |$ \{A_{k}\}_{k}$| is a decreasing family of events, we have \begin{equation*} \mathbf{P}^m_{p}(1\leq |\hat{\xi}_{t}^{0}|\leq M_{p}\ \text{infinitely often})=\mathbf{P}^m_{p}(\bigcap_{k}A_{k})=\lim_{k}\mathbf{P}^m_{p}(A_{k})=0. \end{equation*} Note that if |$C$| is infinite, then, in particular, |$\hat{\xi}^0_t\geq 1$| for every |$t$|. By equation (A.3), for every constant |$M_p$|, |$ \mathbf{P}^m_p(1\leq |\hat{\xi}_{t}^{0}|\leq M_{p}\ \text{infinitely often}\big||C|=\infty)=0. $| Therefore, we must have that |$\mathbf{P}^m_p(\lim_{t\rightarrow\infty} |\hat{\xi}_{t}^{0}|=\infty\big||C|=\infty)=1.$| That is, conditional on the event that the origin is part of an infinite component, it holds that |$|\hat{\xi}_{t}^{0}|$|, the number of nodes that lie at distance |$t$| from the origin, goes to infinity. Since |$\mathbf{P}^m_p(|C|=\infty)=\rho^m_p$| the first part of the lemma follows. ‖ Essentially, Lemma 3 classifies the two possible events regarding the set of nodes that can be reached from the origin by a directed path. It is either the case that |$C$| is infinite, which happens with probability |$\rho _{p}^{m}$| (by definition) and in which case |$|\hat{\xi}_{t}^{0}| $| grows to infinity with probability one, or that |$C$| is finite and bounded by |$M_{\epsilon }$| with probability |$1-\rho _{p}^{m}-\epsilon $|. Our result in the standard percolation model connects with our observation structure as follows. Recall from Section 3 that |$|\xi_{l}^{\mathbf{x}}|$| and |$|\hat{\xi}_{l}^{0}|$| have an identical distribution. This follows from the fact that there exists a natural isomorphism between the set of nodes that lie at a distance of |$l$| from the origin and those agents who decide |$l$| periods before |$\mathbf{x}$|. Unlike in the standard percolation model, in our model the set of agents whom any agent |$\mathbf{x}$| observes is finite and bounded above by a constant. Nonetheless, if |$\mathbf{x}_{-}$| is large, some conclusion can be drawn using our identification. Corollary 5. Let |$p\in(p^m_c,1)$| and consider the random observation structure induced by |$\mathbf{P}^m_p$| on |$Z^m_+$|. For every |$k>0$| and every |$\epsilon>0$| there exists a constant |$t_{k,\epsilon}$| such that if |$\mathbf{x}\in Z^m_+$| satisfies |$\mathbf{x}_{-}\geq t_{k,\epsilon}$|, then |$\mathbf{P}^m_p$| assigns a probability of at least |$\rho^m_p-\epsilon$| to the following event: |$B_G(\mathbf{x})$| contains at least |$k$|isolated agents who lie at a distance of exactly |$t_{k,\epsilon}$| from |$\mathbf{x}$|. Proof of Corollary 5. It follows from Lemma 3 that for every |$M$| and |$\epsilon>0$| there exists |$l$| such that |$\mathbf{P}^m_{p}( |\xi^0_t|\geq M)>\rho^m_p-\epsilon$| for every |$t\geq l$|. Let |$\mathbf{x}$| be a node such that |$\mathbf{x}_{-}\geq l$|. By the above identification, |$|\xi _{l}^{ \mathbf{x}}|$| and |$|\hat{\xi}_{l}^{0}|$| have an identical distribution. Therefore |$\mathbf{P}^m_{p}( |\xi _{l}^{ \mathbf{x}}|\geq M)>\rho^m_p-\epsilon.$| Consider the event that an agent |$\mathbf{x}\in B_t$| observes at least |$M$| agents from |$B_{t-l}$|. That is, |$|\xi_{l}^{ \mathbf{x}}|\geq M$|. We note that all agents in |$B_{t-l}$| who are observed by |$\mathbf{x}$| are isolated independently with probability |$(1-p)^m$|. Hence, for every |$\epsilon>0$|, if |$|\xi _{l}^{ \mathbf{x}}|\geq M$| then for sufficiently large |$M$| agent |$\mathbf{x}$| observes at least |$\frac{(1-p)^m M}{2}$| isolated agents with probability |$1-\epsilon$|. Therefore, for every |$k$| and |$\epsilon$| there exists a large enough |$t_{k,\epsilon}$| such that if |$\mathbf{x}_{-}\geq t_{k,\epsilon}$|, then agent |$\mathbf{x}$| observes at least |$k$| isolated agents in |$\xi_{t_{k,\epsilon}}^{ \mathbf{x}}$| with a probability of at least |$\rho^m_p-\epsilon$|. ‖ Let |$\epsilon >0$| and |$k\in \mathbb{N}$|. By Corollary 5 there exists |$t_{k,\epsilon }$| such that if |$\mathbf{x}\in B_{t}$| and |$\mathbf{x}_{-}\geq t_{k,\epsilon }$|, then |$\mathbf{x}$| observes at least |$k$| isolated agents with a probability of at least |$\rho _{p}^{m}-\epsilon $|. It follows that for sufficiently large |$k$|, an agent |$\mathbf{x}$| who observes at least |$k$| isolated agents takes the optimal action with a probability that is arbitrarily close to one. As |$t$| grows the proportion of agents |$\mathbf{x}\in B_{t}$| for which |$\mathbf{x}_{-}\geq l$| goes to one. We can deduce that the average expected welfare of agents in |$B_{t}$| is arbitrarily close to |$\rho _{p}^{m}$| as |$t$| grows. The rest of the proof is devoted to establishing that if |$p>p^{m}(\alpha )$|, then a proportion of |$\alpha $| of agents is guaranteed to take the optimal action in the long run. The following lemma shows that if |$p\in (p^m_c,1)$|, then for every |$\epsilon>0$| and |$k,$| the proportion of agents in |$B_t$| who observe |$k$| isolated agents lies above |$\rho^m_p-\epsilon$| with a probability that approaches one as |$t$| goes to infinity. Let |$R^{k}_t $| be the set of agents in |$B_t$| who can observe at least |$k$| isolated agents, and let |$r^k_t$| be the size of |$R^{k}_t $|. Lemma 4. For every |$p\in (p^m_c,1)$|, |$k\in\mathbb{N},$| and |$\epsilon>0$|, |$\lim_{t\rightarrow \infty }\mathbf{P}^m_{p}(\frac{r_{t}^{k}}{b_{t}}>\rho^m_p-\epsilon)=1.$| For the proof of Lemma 4 we require the following result. Lemma 5. For every |$t\in\mathbb{N}$|, let |$\{X^t_i\}_{1\leq i\leq m_t} $| be a sequence of Bernoulli random variables for which there exists |$\epsilon>0$|, and |$\beta>0$| such that |$E(X^t_i)\geq\beta+\epsilon$| for every |$i$|. Assume that there exists an integer |$n$| such that for every |$i$| the random variable |$X^t_i$| depends on at most |$n$| other random variables from |$\{X^t_i\}_{1\leq i\leq m_t}$|, and that |$m_t\rightarrow_{t\rightarrow \infty}\infty$|. Then, \begin{equation*} \lim_{t\rightarrow\infty}\mathbf{P}(\frac{1}{m_t}\sum_{i=1}^{m_t} X^t_i>\beta)=1. \end{equation*} Proof. The proof follows from Theorem 2 in Andrews (1988) and also follows directly from Chebyshev’s inequality. ‖ Proof of Lemma 4. For every agent |$\mathbf{x}\in B_t$| define a random variable |$h_{\mathbf{x}}$| to be equal to |$1$| if |$\xi^{\mathbf{x}}_{l}$| contains at least |$k$| isolated agents. It follows from Corollary 5 that for every |$\mathbf{x}$| with |$\mathbf{x}_{-}\geq t_{k,\frac{\epsilon}{2}}$| it holds that |$\mathbf{P}^m_{p} (h_{\mathbf{x}}=1)>\rho^m_p-\frac{\epsilon}{2}.$| In addition, note that if |$\mathbf{x},\mathbf{y}\in B_t$| are such that |$d_{Z^m_{+}}(\mathbf{x},\mathbf{y})\geq 2t_{k,\frac{\epsilon}{2}}+1$|, then |$ \xi^\mathbf{x}_{t-t_{k,\frac{\epsilon}{2}}}\cap\xi^\mathbf{y}_{t-t_{k,\frac{\epsilon}{2}}}=\varnothing$| with probability one. Hence, if |$d_{Z^m_{+}}(\mathbf{x},\mathbf{y})\geq 2t_{k,\frac{\epsilon}{2}}+1$|, then |$h_\mathbf{x}$| and |$h_\mathbf{y} $| are independent random variables. Therefore, for every |$t$| and |$\mathbf{x}\in B_t,$| the random variable |$h_{\mathbf{x}}$| depends on at most |$n$| random variables |$h_{\mathbf{y}}$| in24|$B_t$|. Moreover, the proportion of agents |$\mathbf{x}\in B_t$| for which |$\mathbf{x}_{-}\leq t_{k,\frac{\epsilon}{2}}$| goes to zero as |$t$| grows to infinity. Hence, based on Lemma 5, it follows that: \begin{equation*} \label{eq:sum} \lim_{t\rightarrow\infty}\mathbf{P}^m_{p} (\sum_{\mathbf{x}\in B^t}\frac{h_\mathbf{x}}{b_t}>\rho^m_p-\epsilon)=1. \end{equation*} Since, by definition, |$\sum_{\mathbf{x}\in B^t}\frac{h_\mathbf{x}}{b_t}= \frac{r^k_t}{b_t}$| we have that |$\lim_{t\rightarrow\infty}\mathbf{P}^m_{p}(\frac{r^k_t}{b_t}>\rho^m_p-\epsilon)=1.$| This concludes the proof of the lemma. ‖ Lemma 4 shows that for |$p\in (p_{c}^{m},1)$| the proportion of agents |$\mathbf{x}\in B_{t}$| who observe at least |$k$| isolated agents lies above |$\rho _{p}^{m}-\epsilon $|, for any |$\epsilon >0$|, with a probability that is arbitrarily close to one as |$t$| grows. This is true for every natural number |$k$|. Observing the decision of at least |$k$| isolated agents serves as a sufficient statistic for taking the optimal action as |$k$| grows. Hence a proportion of |$\rho _{p}^{m}$| agents in |$B_{t}$| must take the optimal action with a probability that approaches one as |$t$| goes to infinity. Therefore, in particular, |$\alpha $|-proportional learning holds. The rest of the proof is devoted to formally establishing this intuition. Assume that an agent |$\mathbf{x}$| observes at least |$k$| isolated agents. In equilibrium, as |$k$| grows to infinity, agent |$\mathbf{x}$| would learn the true state of the world and therefore choose the optimal action with arbitrarily high probability. Therefore, there exists a sequence |$\{q_{k}\}_{k} $| converging to one such that if a given agent observes at least |$k$| isolated agents, then his expected utility is at least25|$q_{k}$|. We fix a Perfect Bayesian equilibrium |$\sigma$| of |$\Gamma^m_p$|. For every agent |$\mathbf{x}\in B_{t}$| we let |$Y_{\mathbf{x}}$| be the random variable that represents the utility of agent |$\mathbf{x}$|. Corollary 6. For every |$\epsilon>0$|, |$\delta>0$|, and |$p\in(p^m_c,1),$| there exists |$k_0$| such that for every |$k \geq k_0,$| and |$t\geq k$|, \begin{equation*} \mathbf{P}^m_{\sigma,p}(\frac{ \sum_{\mathbf{x}\in R^k_t}Y_{\mathbf{x}}}{% r^k_t}\geq 1-\delta|\frac{r^k_t}{b_t}>\rho^m_p-\epsilon)\geq 1-\delta. \end{equation*} Proof of Corollary 6. Note that |$E^m_{\sigma,p}(Y_{\mathbf{x}}=1|\mathbf{x}\in R^k_t)\geq q_{k}$|. Let |$k$| be such that |$q_k\geq 1-\delta^2$|. It follows that |$E^m_{\sigma,p}[\frac{\sum_{\mathbf{x}\in R^k_t}Y_{\mathbf{x}}}{r^k_t}|\frac{r^k_t}{b_t}>\rho^m_p-\epsilon]\geq 1-\delta^2$|. Since |$\frac{\sum_{\mathbf{x}\in R^k_t}Y_{\mathbf{x}}}{r^k_t}\in[0,1]$| it must hold that |$\mathbf{P}^m_{\sigma,p}(\frac{ \sum_{\mathbf{x}\in R^k_t}Y_{\mathbf{x}}}{r^k_t}\geq 1-\delta|\frac{r^k_t}{b_t}>\rho^m_p-\epsilon)> 1-\delta.$| ‖ Proof of Theorem 1. Let |$p\in(p^m_c,1)$| and |$\sigma$| be an equilibrium strategy of |$\Gamma^m_p$|. For every |$\epsilon>0$|, |$\frac{\epsilon}{2}\geq\delta>0$|, and |$k$| it holds that \begin{equation}\label{eq:nap} \mathbf{P}^m_{\sigma,p}(\frac{\sum_{\mathbf{x}\in B_t}Y_{\mathbf{x}}}{b_t}>\rho^p_m-\epsilon)\geq \mathbf{P}^m_{\sigma,p}(\frac{r^k_t}{b_t}>\rho^p_m-\frac{\epsilon}{2})\mathbf{P}^m_{\sigma,p}(\frac{\sum_{\mathbf{x}\in R^k_t}Y_{\mathbf{x}}}{r^k_t}\geq 1-\delta|\frac{r^k_t}{b_t}>\rho^p_m-\frac{\epsilon}{2}). \end{equation} (A.4) Lemma 4 implies that the first expression on the right-hand side of (A.4) goes to |$1$| as |$t$| goes to infinity. Moreover, it follows from Corollary 6 that the second expression on the right-hand side of (A.4) is at least |$1-\delta$| for sufficiently large |$k$| and all |$t\geq k$|. Therefore, for every |$\epsilon>0$|, |$\frac{\epsilon}{2}\geq\delta>0$| it holds that, |$ \liminf_{t\rightarrow\infty}\mathbf{P}^m_{\sigma,p}(\frac{\sum_{\mathbf{x}\in B_t}Y_{\mathbf{x}}}{b_t}>\rho^p_m-\epsilon)\geq 1-\delta. $| Taking |$\delta$| to zero yields that for every |$\epsilon>0$|, \begin{equation}\label{eq:final} \lim_{t\rightarrow\infty}\mathbf{P}^m_{\sigma,p}(\frac{\sum_{\mathbf{x}\in B_t}Y_{\mathbf{x}}}{b_t}>\rho^p_m-\epsilon)=1. \end{equation} (A.5) We now conclude the proof of Theorem 1. Let |$\alpha>0$| and |$p\in (p^m(\alpha),1)$|. By definition |$\alpha<\rho^m_p$|; hence, for |$\epsilon=\rho^m_p-\alpha>0$| equation (A.5) implies that |$\lim_{t\rightarrow\infty}\mathbf{P}^m_{\sigma,p}(\frac{\sum_{\mathbf{x}\in B_t}Y_{\mathbf{x}}}{b_t}>\alpha)=1.$| ‖ A.2. Proof of Theorem 2 Let |$\tau $| be a Perfect Bayesian equilibrium of the private observation model introduced in Section 3. Recall that |$p_{n}=\mathbf{P}_{\sigma }(\omega =1|h_{n})$| is the probability that the state is |$\omega =1$| conditional on |$h_{n}\in \{0,1\}^{n-1},$| the history of decisions of the first |$n-1$| agents |$\{i_{1},\ldots ,i_{n-1}\}$| observed by the observer. Lemma 1 shows that when |$n$| grows large the observer can infer the true state with arbitrarily high probability. Assume, without loss of generality, that |$\omega =1$| and recall that |$a_{n}$| is the action of agent |$i_{n}$|. Note first that \begin{equation*} \mathbf{P}_{\tau }(\omega =1|h_{n},a_{n}=a)=\frac{p_{n}\mathbf{P}_{\tau }(a_{n}=a|h_{n},\omega =1))}{p_{n}\mathbf{P}_{\tau }(a_{n}=a|h_{n},\omega =1)+(1-p_{n})\mathbf{P}_{\tau }(a_{n}=a|h_{n},\omega =0)}, \end{equation*} for |$a\in\{0,1\}$|. Let |$l_{n}=\log (\frac{p_{n}}{1-p_{n}})$| be the log-likelihood ratio at stage |$n$|. Conditional on |$\omega=1$| and |$h_n$| we can write the distribution of |$l_{n+1} $| as a function of |$l_n$| as follows. With probability |$\mathbf{P} _\tau(a_{n}=a|h_n,\omega=1)$| we have \begin{equation} \label{eq:pc} l_{n+1}=l_n+\log(\frac{\mathbf{P}_\tau(a_{n}=a|h_n,\omega=1)}{\mathbf{P}% _\tau(a_{n}=a|h_n,\omega=0)}), \end{equation} (A.6) for |$a=0,1$|. We claim the following: Lemma 6. There exist constants |$\beta>1$| and |$r>0$| such that for every |$n$| and |$h_n$|, \begin{equation*} r\geq \frac{\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1)}{\mathbf{P}% _\tau(a_{n}=1|h_n,\omega=0)}=\beta_n\geq\beta\text{ and } \frac{\mathbf{P}% _\tau(a_{n}=0|h_n,\omega=1)}{\mathbf{P}_\tau(a_{n}=0|h_n,\omega=0)}\geq \frac{1}{r}. \end{equation*} Proof. For any |$n$|, recall that |$k_n$| is the probability that |$K_n=\varnothing $|. We can write \begin{eqnarray*} & &\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1)\\ &=&\mathbf{P}_\tau(K_n=\varnothing)\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1,K_n=\varnothing)\\ &+&\mathbf{P}_\tau(K_n\neq \varnothing)\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1,K_n\neq\varnothing)\\ &=&k_n\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1,K_n=\varnothing)+(1-k_n)\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1,K_n\neq\varnothing). \end{eqnarray*} Similarly, we can write \begin{eqnarray}\label{eq:a-iso} \notag& &\mathbf{P}_\tau(a_{n}=1|h_n,\omega=0)\\ &=&k_n\mathbf{P}_\tau(a_{n}=1|h_n,\omega=0,K_n=\varnothing)+(1-k_n)\mathbf{P}_\tau(a_{n}=1|h_n,\omega=0,K_n\neq\varnothing). \end{eqnarray} (A.7) We let |$\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1,K_n=\varnothing)=\theta,$| and |$\mathbf{P}_\tau(a_{n}=1|h_n,\omega=0,K_n=\varnothing)=\eta.$| Hence we can write \begin{equation}\label{eq:eq-bound} \frac{\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1)}{\mathbf{P}_\tau(a_{n}=1|h_n,\omega=0)}=\frac{k_n\theta+(1-k_n)\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1,K_n\neq\varnothing)}{k_n\eta+(1-k_n)\mathbf{P}_\tau(a_{n}=1|h_n,\omega=0,K_n\neq\varnothing)}. \end{equation} (A.8) By (c) of Lemma A1 in Acemoglu et al. (2014) we have that |$1>\theta>\eta>0$|. Since, by definition, |$k_n\geq e$| we can bound Equation (A.8) from above by |$\frac{e\theta+1-e}{e\eta}$|. Similarly, \begin{equation*} \frac{\mathbf{P}_\tau(a_{n}=0|h_n,\omega=1)}{\mathbf{P}_\tau(a_{n}=0|h_n,\omega=0)}\geq \frac{e(1-\theta)}{e(1-\eta)+1-e}. \end{equation*} By letting |$r=\max\{\frac{e\theta+1-e}{e\eta},\frac{e(1-\eta)+1-e}{e(1-\theta)}\}$| we have that |$ r\geq \frac{\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1)}{\mathbf{P}% _\tau(a_{n}=1|h_n,\omega=0)}\text{ and } \frac{\mathbf{P}_\tau(a_{n}=0|h_n,\omega=1)}{\mathbf{P}% _\tau(a_{n}=0|h_n,\omega=0)}\geq \frac{1}{r}. $| We next prove the existence of such a |$\beta>1$|. Again (c) of Lemma A1 in Acemoglu et al. (2014) shows that \begin{equation*} \mathbf{P}_\tau(a_{n}=1|h_n,\omega=1,K_n\neq\varnothing)\geq \mathbf{P}_\tau(a_{n}=1|h_n,\omega=0,K_n\neq\varnothing). \end{equation*} Therefore it follows from Equation (A.8), |$ \frac{\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1)}{\mathbf{P}_\tau(a_{n}=1|h_n,\omega=0)}\geq\frac{k_n\theta+(1-k_n)\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1,K_n\neq\varnothing)}{k_n\eta+(1-k_n)\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1,K_n\neq\varnothing)}. $| Since |$k_n\theta>k_n\eta$|, and since |$(1-k_n)\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1,K_n\neq\varnothing)\leq 1$|, we have |$ \frac{\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1)}{\mathbf{P}_\tau(a_{n}=1|h_n,\omega=0)}\geq\frac{k_n\theta+1}{k_n\eta+1}. $| Since |$k_n\geq e$| we have, |$ \frac{\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1)}{\mathbf{P}_\tau(a_{n}=1|h_n,\omega=0)}\geq\frac{e\theta+1}{e\eta+1}. $| Since |$\theta>\eta$|, letting |$\beta=\frac{e\theta+1}{e\eta+1}>1$| concludes the proof of the lemma. ‖ Lemma 7. Let |$00. \end{equation*} Proof. We show first that |$g_x(\gamma)=f(x,\gamma)$| is strictly decreasing in |$\gamma$| for every |$x>0$|. It is easy to see that |$ g'_x(\gamma)=-\frac{x(1-\gamma)}{\gamma(1-\gamma x)}. $| Since |$x,\gamma>0$| it holds that |$g'_x(\gamma)<0$|. Hence, |$g_x(\gamma)$| is strictly decreasing in |$\gamma$| and |$\min\limits_{\gamma\in(0,\alpha]}f(x,\gamma)=f(x,\alpha).$| We note that |$f(x,1)=0$|. Therefore, since |$f(x,\gamma)$| is strictly decreasing in |$\gamma$|, and since |$\alpha<1,$| we get that |$f(x,\alpha)>0$| for every |$x\in[a,1]$|. From the continuity of |$f(x,\gamma)$| we get |$ \min_{x\in[a,1]}\min_{\gamma\in(0,\alpha]}f(x,\gamma)=\min_{x\in[a,1]}f(x,\alpha)=w>0. $| This concludes the proof of the lemma. ‖ Proof of Lemma 1. Let |$\alpha=\frac{1}{\beta}$|. We claim that Lemma 7 implies that, |$ E_\tau[l_{n+1}-l_n|h_n,l_n,\omega=1]\geq w>0. $| To see this, we note that by Equation (A.6), we can write \begin{eqnarray}\label{eq:ce} & &E_\tau[l_{n+1}-l_n|h_n,\omega=1]\\ &=&\notag\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1)\log(\frac{\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1)}{\mathbf{P}_\tau(a_{n}=1|h_n,\omega=0)})\\ \notag&+&\mathbf{P}_\tau(a_{n}=0|h_n,\omega=1)\log(\frac{\mathbf{P}_\tau(a_{n}=0|h_n,\omega=1)}{\mathbf{P}_\tau(a_{n}=0|h_n,\omega=0)}). \end{eqnarray} (A.9) Following Lemma 6, |$ \frac{\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1)}{\mathbf{P}_\tau(a_{n}=1|h_n,\omega=0)}=\beta_n\geq\beta. $| Letting |$x_n=\mathbf{P}_\tau(a_{n}=1|h_n,\omega=1)$| we can rewrite Equation (A.9) as follows: \begin{eqnarray}\label{eq:de} & &E_\tau[l_{n+1}-l_n|h_n,\omega=1] =-x_n\log(\frac{1}{\beta_n})+(1-x_n)\log(\frac{1-x_n}{1-\frac{1}{\beta_n}x_n}). \end{eqnarray} (A.10) Letting |$\alpha_n=\frac{1}{\beta_n}$| we can rewrite (A.10) as follows: \begin{equation*} E_\tau[l_{n+1}-l_n|h_n,\omega=1]=-x_n\log(\alpha_n)+(1-x_n)\log(\frac{1-x_n}{1-\alpha_n x_n}). \end{equation*} It follows from Equation (A.7) that |$x_n\geq e\theta>0$|. Hence, |$x_n\in[a,1]$| for |$a=e\theta$|. Moreover, since |$1<\beta\leq\beta_n$| we have |$ \alpha_n=\frac{1}{\beta_n}\leq \frac{1}{\beta}=\alpha<1. $| Therefore, |$\alpha_n\in[0,\alpha]$| with probability one. We can now deduce from Lemma 7 that for every history |$h_n\in \{0,1\}^{n-1},$| |$ E_\tau[l_{n+1}-l_n|h_n,\omega=1]\geq w>0. $| Let |$\mathbf{P}_{\tau,\omega=1}$| be the probability distribution of the learning process conditional on the realized state |$\omega=1$|. We note that |$X_n=l_n-wn$| is a sub-martingale with respect to |$\mathbf{P}_{\tau,\omega=1}$|. Moreover, Lemma 6 implies that |$|X_{n+1}-X_n|\leq c$| for |$c=\ln(r)+w$|. We can therefore apply Azuma’s inequality to the sub-martingale |$X_n$| (see Alon and Spencer (2004), Theorem 7.2.1), which implies that for all |$t\geq 0$|, \begin{equation}\label{eq:azuma} \mathbf{P}_{\tau,\omega=1}(l_n-wn\leq-t)\leq \exp(\frac{-t^2}{2nc^2}). \end{equation} (A.11) For |$t=wn^{\frac{2}{3}}$| it follows that: |$\mathbf{P}_{\tau,\omega=1}(l_n-wn\leq -wn^{\frac{2}{3}})\leq \exp(-\frac{w^2}{2c^2}n^{\frac{1}{3}}).$| That is, |$ \mathbf{P}_{\tau,\omega=1}(l_n>w(n-n^{\frac{2}{3}}))\geq 1-\exp(-\frac{w^2}{2c^2}n^{\frac{1}{3}}). $| Since, by definition, |$p_n=\frac{\exp(l_n)}{\exp(l_n)+1}$|, for every |$\epsilon$| there exists |$n_\epsilon$| such that for every |$n\geq n_\epsilon,$| |$ \mathbf{P}_{\tau}(p_n\geq 1-\epsilon|\omega=1)\geq 1-\epsilon. $| This concludes the proof of Lemma 1. ‖ Proof of Theorem 2. It follows directly from Lemma 3 that for any |$\epsilon$| and |$M$| there exists |$l$| such that if agent |$\mathbf{x}\in B_t$| satisfies |$\mathbf{x}_{-}\geq l$|, then it holds that |$|B_G(\mathbf{x})|\geq M$| with probability |$\rho^m_p-\epsilon$|. It further follows from Corollary 2 that if |$M\geq n_\epsilon$|, then |$\hat{E}^m_{\sigma,p}(Y_{\mathbf{x}}=1||B_G(\mathbf{x})|\geq n_\epsilon)\geq 1-\epsilon.$| We can now continue exactly as in the proof of Theorem 1, from Lemma 4 onward, to deduce that if |$p\in (p^m(\alpha),1)$|, then |$\alpha$|-proportional learning holds. ‖ A.3. Proof of Theorem 3 In the proof of Theorem 3 we use the notation introduced in Section A.1. Proof of Theorem 3. We first provide the proof of the commonly known observation structure and then explain how to adapt the proof to the model with independent private observations. Let |$p>p_c^m$|, let |$F$| be any signal distribution, and let |$\sigma \in \Sigma _{F,p}^{m}$| be any Perfect Bayesian equilibrium of |$\Gamma^m_p$|. It follows from the definition of |$R^k_t$| that for every |$\epsilon>0$| there exists large enough |$k$| such that if |$\mathbf{x}\in R^k_t$|, then |$E^m_{\sigma,p}\big[Y_{\mathbf{x}}\big]\geq 1-\epsilon.$| Since |$E^m_{\sigma,p}\big[Y_{\mathbf{x}}\big]\geq y_F$| for every agent |$\mathbf{x}$|, we have that \begin{eqnarray*} &&\underline{l}_{\sigma ,p}^{m}(F)=\liminf_{t\rightarrow\infty}E_{\sigma,p}\big[\frac{% \sum_{\mathbf{x}\in R^k_t}Y_{\mathbf{x}}}{b_t}+\frac{ \sum_{\mathbf{x}\not\in R^k_t}Y_{\mathbf{x}}}{b_t}\big]\geq \liminf_{t\rightarrow\infty}E^m_{\sigma,p}\big[\frac{(1-\epsilon)r^k_t}{b_t}+y_F(\frac{b_t-r^k_t}{b_t})\big]\\ &\geq& (1-\epsilon)\rho^m_p+(1-\rho^m_p)y_F. \end{eqnarray*} The last inequality follows since Lemma 4 implies that |$\liminf\limits_{t\rightarrow\infty}E^m_{\sigma,p}[\frac{r^k_t}{b_t}]\geq\rho^m_p.$| This shows that |$ \underline{l}_{\sigma ,p}^{m}(F)\geq \rho^m_p+(1-\rho^m_p)y_F=\underline{w}_{p}^{m}. $| To adapt the proof to the independent observation structure, note that Lemma 3 implies that for every |$\epsilon>0$| and |$M$| there exists |$l$| such that if |$\mathbf{x}\in B_t$| satisfies |$\mathbf{x}_{-}\geq l$|, then it holds that |$|B_{G_{\mathbf{x}}}(\mathbf{x})|\geq M$| with probability |$\rho^m_p-\epsilon$|. It therefore follows from Corollary 2 that if |$M\geq n_\epsilon$|, then |$\hat{P}^m_{\sigma,p}(Y_{\mathbf{x}}=1||B_{G_{\mathbf{x}}}(\mathbf{x})|\geq n_\epsilon)\geq 1-\epsilon.$| We can then apply identical arguments to those applied of the commonly known observation structure. ‖ A.4. Proof of Theorem 4 In the proof of Theorem 4 we use the notation introduced in Section A.1. Proof of Theorem 4. We prove the theorem for the commonly known observation model. The proof of the independent observation structure follows very similar lines and is therefore omitted. Let |$y_F$| and |$y_{F'}$| be the success probabilities under |$F$| and |$F'$|, respectively. Assume that |$y_F>y_{F'}$|; we need to show that there exists |$\hat p$| such that |$\underline{l}_{\sigma ,p}^{m}(F)>\underline{l}_{\sigma',p}^{m}(F')$| for all |$p\in(\hat p,1)$| and any |$\sigma \in \Sigma _{F,p}^{m},$||$\sigma' \in \Sigma _{F',p}^{m}$|. Theorem 3 implies that |$\underline{l}_{\sigma ,p}^{m}(F)\geq \underline{w}_{p}^{m}=\rho^m_p+(1-\rho^m_p)y_F.$| Therefore it is sufficient to show that there exists |$\hat{p}\in(0,1)$| such that |$\underline{l}_{\sigma',p}^{m}(F')<\underline{w}_{p}^{m}(F)$| for all |$p\in(\hat{p},1)$| and any |$\sigma' \in \Sigma _{F',p}^{m}.$| We consider first the standard percolation model. Consider the probability of the event that the origin can reach no agent conditional on the event that his observation set contains at most |$n$| nodes, i.e., |$\mathbf{P}^m_p(C=\emptyset||C|\leq n)$|. It follows from Durrett (1984) that |$\mathbf{P}^m_p(C=\emptyset||C|\leq n)$| goes to one uniformly in |$n$| as |$p$| goes to one. That is, for every |$\delta$| there exists |$\hat p(\delta)\in(0,1)$| such that for all |$p>\hat p(\delta)$| it holds for all |$n\geq 1$| that |$ \mathbf{P}^m_p(|C|=0||C|\leq n)> 1-\delta. $| Fix |$\delta>0$| and let |$ p\in(p(\delta),1)$|. Let |$\sigma'\in \Sigma^m_{p,F'}$| be any equilibrium with signal distribution |$F$|. Lemma 3 implies that for every |$\epsilon>0$| there exist |$M_\epsilon$| such that |$|B_G(\mathbf{x})|$|, the size of the observation set of any agent |$\mathbf{x},$| is smaller than |$M_\epsilon$| with probability |$1-\rho^m_p-\epsilon$|. This again follows from the fact that for every agent |$\mathbf{x}$| the probability that |$\mathbf{x}$| observes at most |$M_\epsilon$| agents is greater than the probability that in the standard percolation model the origin can reach at most |$M_\epsilon$| nodes. Let |$k$| be large enough so that |$q_k\geq 1-\epsilon$|. We can also assume that for every |$t$|, the event that a given agent |$\mathbf{x}\in R^k_t$| is disjoint to the event that his observation set |$B_G(\mathbf{x})$| contains at most |$M_\epsilon$| nodes. That is, |$\mathbf{x}\in R^k_t$| implies that |$|B_G(\mathbf{x})|>M_\epsilon.$| This holds for example if |$k\geq M_\epsilon$|. By Corollary 5 there exists |$t_{k,\epsilon}$| such that for every |$\mathbf{x}\in B_t$| with |$\mathbf{x}_{-}\geq t_{k,\epsilon}$| it holds that |$\mathbf{x}\in R^k_t$| with probability |$\rho^m_p-\epsilon$|. Let |$\mathbf{x}$| be such that |$\mathbf{x}_{-}\geq M_\epsilon$|. It holds that \begin{align} \notag&E^m_{\sigma',p}[Y_{\mathbf{x}}]= \mathbf{P}^m_{\sigma',p}(|B_G(\mathbf{x})|\leq M_\epsilon)E^m_{\sigma',p}\big[Y_\mathbf{x}\big||B_G(\mathbf{x})|\leq M_\epsilon\big]+\mathbf{P}^m_{\sigma',p}(|B_G(\mathbf{x})|> M_\epsilon)E^m_{\sigma',p}\big[Y_\mathbf{x}\big||B_G(\mathbf{x})|> M_\epsilon\big]\\ \end{align} \begin{align} &\leq\mathbf{P}^m_{\sigma',p}(|B_G(\mathbf{x})|\leq M_\epsilon)E^m_{\sigma',p}\big[Y_\mathbf{x}\big||B_G(\mathbf{x})|\leq M_\epsilon\big]+ \rho^m_p+\epsilon\\ \end{align} (A.12) \begin{align} &\leq(1-\rho^m_p)E^m_{\sigma',p}\big[Y_\mathbf{x}\big||B_G(\mathbf{x})|\leq M_\epsilon\big]+ \rho^m_p+\epsilon.\label{eq:he0} \end{align} (A.13) Since |$\mathbf{x}_{-}\geq M_\epsilon$| and since |$p>\hat p(\delta)$| we have by the definition of |$\hat p(\delta)$| that \begin{align} \notag&E^m_{\sigma',p}\big[Y_\mathbf{x}\big||B_G(\mathbf{x})|\leq M_\epsilon\big]\\ \notag&\leq\mathbf{P}^m_{\sigma',p}(|B_G(\mathbf{x})|=1\big||B_G(\mathbf{x})|\leq M_\epsilon)y_{F'}+\mathbf{P}^m_{\sigma',p}(|B_G(\mathbf{x})|>1\big||B_G(\mathbf{x})|\leq M_\epsilon)\\ \end{align} \begin{align} &\leq \delta y'_F+(1-\delta). \label{eq:he} \end{align} (A.14) Since the proportion of agents |$\mathbf{x}\in B_t$| for whom |$\mathbf{x}_{-}\geq M_\epsilon$| goes to one, we get from Equation (A.13) and Equation (A.14) that |$ \underline{l}_{\sigma',p}^{m}(F')\leq (1-\rho^m_p)(\delta y_{F'}+1-\delta)+\rho^m_p+\epsilon. $| Since |$\epsilon$| is arbitrary we get that |$ \underline{l}_{\sigma',p}^{m}(F')\leq (1-\rho^m_p)(\delta y_{F'}+1-\delta)+\rho^m_p. $| If we choose |$\hat p(\delta)$| for |$\delta>\frac{1-y_F}{1-y_{F'}}$| then for every |$p>\hat p(\delta),$| we get that |$\underline{l}_{\sigma',p}^{m}(F')< w^m_p(F).$| ‖ A.5. Proof of Lemma 2 For the sake of clarity we prove Lemma 2 for |$m=2$|. The proof can be easily extended to the general case where |$m> 2$|. Proof of Lemma 2. Assume for simplicity that the two agents |$(0,1)$| and |$(1,0)$| are playing according to the same strategy, and that for every action |$a$| played by agent |$(0,0)$| there exists a positive probability that the action |$1-a$| is being played by either of the two agents. That is, we assume that neither agent |$ (0,1)$| nor agent |$(1,0)$| is cascading.26 This assumption implies that \begin{equation} \mathbf{P}^2_{\sigma,1 }(a_{\mathbf{x}}=1|\omega =1,a_{(0,0)}=1)>\mathbf{P}^2 _{\sigma,1 }(a_{\mathbf{x}}=1|\omega =0,a_{(0,0)}=1), \label{eq:ml} \end{equation} (A.15) for |$\mathbf{x}\in \{(0,1),(1,0)\}$|. For every agent |$\mathbf{x}$|, let |$p_\mathbf{x}$| be the public belief of agent |$\mathbf{x}$| conditional on his observed history prior to receiving his signal, given that all the agents in his observation set play action |$1$|. Note that agent |$\mathbf{x}=(t,0)\in Z_{+}^{2}$| observes all agents |$ (u,0)$| for |$u