Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Networks of causal relationships in the U.S. stock market

Networks of causal relationships in the U.S. stock market 1IntroductionStock markets are complex interconnected systems, where various “local” factors can cause “global” changes in the behavior of the entire market. For instance, favorable or unfavorable economic conditions in certain market segments, or in certain countries, may affect other countries and industries and potentially cause positive or negative fluctuations that span the entire U.S. and international markets. The idea of describing causal relationships between different components of the market system has been addressed in several recent studies. For instance, the survey [24] discussed the concept of contagion in financial markets, which essentially implies the propagation of impact (such as risk) between different components of the market. A network-based model is a natural way to mathematically represent these “contagion” processes; however, the principles for constructing the networks that reflect certain types of processes may vary depending on the respective goals and assumptions of a study.A simple and intuitive technique for constructing a network-based (or, graphical) model of the market is to represent its elements (e.g., stocks) as nodes and connect the nodes by links (arcs) based on pairwise correlations between the corresponding entities (i.e., the correlations between stock price fluctuations over a certain period of time). Such an approach was studied in [5,6,23] in the context of identifying large correlated clusters and diversified portfolios in the U.S. stock market. Although pairwise correlation-based similarity measures have merit in certain applications, a substantial drawback of such measures is in the inability to produce directed links between entities, that is, to establish the direction of “contagion” (i.e., the propagation from node iito node jjvs the propagation from node jjto node ii).In this work, we construct and analyze a directed network model, which describes causal relationships between all pairs of stocks in the U.S. stock market using the concept of Granger causality [15,16]. It should be noted that Granger causality (which will be formally defined later in the article) can be used to determine whether the time series describing stock iicontains useful information for predicting the behavior of the time series of stock jj. This should not be confused with the statement “the increase/decrease in the price of stock iicauses the increase/decrease in the price of stock jj,” which is not necessarily true.There are several previous studies constructing networks based on Granger causality, such as [12,26]; however, there has been no thorough analysis of the resulting networks. Also, the current literature contains little discussion about the influence of market sectors. Further, little attention has been paid to approaches widely used in network science, such as PageRank and kk-core-based methods.As it will be discussed in the next sections, networks constructed using Granger causality appear to capture certain structural properties of the stock market that reflect overall tendencies in its behavior. In particular, we investigate various aspects of connectivity patterns and the evolution of structural properties of the constructed network snapshots. In addition, the considered network representation is used to group stocks into network-based clusters and to identify the most “influential” market entities (sectors/industries).2Basic concepts, data description, network construction2.1Relevant graph-theoretic conceptsLet G=(N,A)G=\left(N,A)be a simple directed graph with the set of nodes NNand the set of arcs A={(i,j):i,j∈N}A=\left\{\left(i,j):i,j\in N\right\}, where the head jjand tail iiof each arc (i,j)\left(i,j)are specified. Arc density of GGis defined as the ratio of the number of arcs in the graph to the maximum possible number of arcs: ρ(G)=∣A∣∣N∣(∣N∣−1)\rho \left(G)=\frac{| A| }{| N| \left(| N| -1)}, where ∣A∣| A| is the number of arcs and ∣N∣| N| is the number of nodes in graph GG. Obviously, ρ(G)∈[0,1]\rho \left(G)\in \left[0,1].Given a node n∈Nn\in N, its in-degree degGin(n){{\bf{\deg }}}_{G}^{{\rm{in}}}\left(n)is the number of incoming arcs and its out-degree degGout(n){{\bf{\deg }}}_{G}^{{\rm{out}}}\left(n)is the number of outgoing arcs. Extensive empirical studies show that degree distributions of many real-life graphs representing diverse datasets follow the well-known power-law model [2,3,4, 8,19]. According to this model, the probability that a node has a degree kk(in- or out-degree for directed graphs) is P(k)∝k−γ{\mathbb{P}}\left(k)\propto {k}^{-\gamma }, or logP(k)=−γlogk+const\log {\mathbb{P}}\left(k)=-\gamma \log \hspace{0.33em}k+{\rm{const}}in the log–log scale, which can be described by a straight line with the slope equal to the parameter γ\gamma of the power-law degree distribution. One of the notable characteristics of such networks (known as the scale-free property) is that their power-law structure should not depend on the network’s size.A directed graph G=(N,A)G=\left(N,A)is called strongly connected if there is a directed path from each node to every other node in the set NN. A disconnected graph can be decomposed into strongly connected subgraphs, which are referred to as strongly connected components of GG. Distinct components can be interpreted as clusters in the corresponding dataset. Several algorithms exist for the efficient identification of strongly connected components in a directed graph. In our study, we use the popular Tarjan’s algorithm based on the depth-first search technique [25].In some situations, clusters based on strongly connected components can be extremely large and comparable with the size of the whole graph (which is in fact the case for the considered graphs, as it will be shown later). Therefore, the clustering approach based on connected components may not be necessarily appropriate for drawing meaningful conclusions regarding specific groups of nodes within a graph. There is a variety of definitions for “tighter” structures that may be interpreted as clusters that have specific cohesive properties of their connectivity patterns. In this study, we utilize the concepts of k-degenerate graph and k-cores for undirected graphs, introduced by [22], and modify them for the case of directed graphs. A simple undirected graph is called kk-degenerate if every its subgraph has a vertex of degree at most kk. The degeneracy of a simple undirected graph GG, denoted by δ∗(G){\delta }^{\ast }\left(G), is the smallest value of kksuch that GGis kk-degenerate. A kk-core in a simple undirected graph is a subset of vertices that induces a subgraph with the minimum degree at least kk. Alternatively, one can define kk-cores as connected components that are left after all nodes of degree less than kkhave been removed from the graph GG; therefore, δ∗(G){\delta }^{\ast }\left(G)is the maximum kkfor which GGcontains a nonempty kk-core.An extension of the notion of a kk-core was proposed in [14] by introducing the concept of a DD-core in a directed graph. The authors consider min-in-degree and min-out-degree of a graph GGdefined as δin(G)=minx∈N{degGin(x)}{\delta }^{{\rm{in}}}\left(G)={\min }_{x\in N}\left\{{{\bf{\deg }}}_{G}^{{\rm{in}}}\left(x)\right\}and δout(G)=minx∈N{degGout(x)}{\delta }^{{\rm{out}}}\left(G)={\min }_{x\in N}\left\{{{\bf{\deg }}}_{G}^{{\rm{out}}}\left(x)\right\}, respectively. Then, for two positive integers k,lk,l, a (k,l)\left(k,l)-DD-core is a maximal size subgraph G′G^{\prime} of GG, where δin(G′)≥k{\delta }^{{\rm{in}}}\left(G^{\prime} )\ge kand δout(G′)≥l{\delta }^{{\rm{out}}}\left(G^{\prime} )\ge l. The intuition behind this notion is to find a subset of the graph, where all the nodes have sufficient out- and in-degrees in order to form a “tight” cluster. For the reasons that will become clear later in the article, we introduce a slightly different structure referred to as a k-out-core, where each node is only required to have an out-degree of at least kk, i.e., the condition for the in-degree is relaxed. Therefore, for a positive integer kk, a kk-out-core of the graph GGis defined as a subgraph G′G^{\prime} of GG, where δout(G′)≥k{\delta }^{{\rm{out}}}\left(G^{\prime} )\ge k. As in the case of undirected graphs, we can define kk-out-cores as connected components that are left after all nodes of out-degree less than kkhave been removed from the graph GG; therefore, δout∗(G){\delta }_{{\rm{out}}}^{\ast }\left(G)is the maximum kkfor which GGcontains a nonempty kk-out-core. Although the original definition of “degeneracy” differs from the definition of δout∗(G){\delta }_{{\rm{out}}}^{\ast }\left(G), for simplicity, we will use the same notation.For a given kk, kk-out-cores can be easily found using a greedy algorithm, which recursively removes the nodes with out-degree less than kkone by one from the graph, until all the remaining nodes have sufficiently large out-degrees. Then one can decompose the resulting network into connected components, which are kk-out-cores by definition. The degeneracy of GGcan be found using a simple binary search technique.2.2Granger causalityIn the aforementioned previous studies of the network-based model of the U.S. stock market [5,6], the market graph was constructed in such a way that a given pair of nodes is connected by an undirected edge if the corresponding stocks exhibit a similar behavior over a certain period of time. The similarity was measured by Pearson’s correlation between the time series representing the returns of corresponding stocks. In this study, we propose a different technique for constructing the set of arcs: the similarity between stocks is measured by Granger causality [15,16], which is extensively used across many application areas because of its simplicity, robustness, and flexibility [9,13]. The details of the network construction will be presented in Section 2.3, whereas here we introduce the definition of causality and the procedure for conducting Granger causality test between two time series.Consider two scalar-valued, stationary time series {xt:t=1,…,T}\left\{{x}_{t}:t=1,\ldots ,T\right\}and {yt:t=1,…,T}\{{y}_{t}:t=1,\ldots ,T\}corresponding to the returns xx, yyof a pair of stocks. The basic idea behind the notion of causality is very general in its nature: one can state that xxcauses yy, denoted by x⇒yx\Rightarrow y, if xxcontains some unique information about yy, so that yycan be better predicted using this information than in the absence of this information. In practice, Granger causality is often tested using the following linear autoregressive model: (1)yt=∑i=1kaiyt−i+∑j=1kbjxt−j+εt,{y}_{t}=\mathop{\sum }\limits_{i=1}^{k}{a}_{i}{y}_{t-i}+\mathop{\sum }\limits_{j=1}^{k}{b}_{j}{x}_{t-j}+{\varepsilon }_{t},where kkis the maximal time lag and εt{\varepsilon }_{t}is a regression error. Then, xxdoes not cause yyif and only if H0:bj=0,j=1,…,k.{{\bf{H}}}_{0}:{b}_{j}=0,\hspace{1em}j=1,\ldots ,k.To test this hypothesis, one can apply the FF-test, and rejecting H0{{\bf{H}}}_{0}implies that xx“Granger causes” yy. The procedure of testing for the presence of causality in the other direction (y⇒x)(y\Rightarrow x)is similar.It should be noted that the Granger causality test is valid only if the time series are covariance (or weak) stationary. In this article, we used the Augmented Dickey-Fuller test [20] to check the stationarity of time series. Further, we assume homoscedasticity, i.e., constant variance of εt{\varepsilon }_{t}.2.3Network constructionIn the constructed directed unweighted network, the nodes are stocks represented as “ticker” symbols. We used all the stocks listed at NYSE, NASDAQ, and AMEX as of December 31, 2020: There were 7,240 stock symbols in total. The list of stock symbols was obtained from EODdata.https://eoddata.com/.We obtained historical stock prices data from Yahoo Finance using yfinancehttps://pypi.org/project/yfinance/.Python library.The adjusted close prices data were transformed into the time series of daily returns, since returns possess scalability property (i.e., the values in time series representing each stock returns have the same order of magnitude) and thus are easily comparable. Furthermore, the logarithms of returns were calculated, due to the fact that log-returns have more attractive statistical properties [11], including weak stationarity, which was verified for all considered time series. If Pi(t){P}_{i}\left(t)and Pi(t−1){P}_{i}\left(t-1)are the adjusted close prices of stock iion days ttand t−1t-1, respectively, then the log-return time series for each stock iiare defined as follows: ri(t)=lnPi(t)Pi(t−1),t=2,…,T,{r}_{i}\left(t)=\mathrm{ln}\frac{{P}_{i}\left(t)}{{P}_{i}\left(t-1)},\hspace{1em}t=2,\ldots ,T,where TTis the number of trading days in each of the considered calendar years (2001–2020).A directed network (referred to as a causal market graph) was constructed for each time period (calendar year) to reflect the causal relationships between stocks. It should be noted that a network constructed for each time period contains only those stocks that were present in the market during that entire time period; therefore, the cardinality and composition of the sets of nodes change from period to period. Every stock is represented by a node, and the existence of an arc (i,j)\left(i,j)means that the time series of stock iicauses the time series of stock jjin the sense of Granger causality. Recall that Granger causality test checks the hypothesis that coefficients bj=0,j=1,…k{b}_{j}=0,j=1,\ldots k. The null hypothesis (all bj{b}_{j}are equal to zero) is rejected in favor of alternative if the pp-value of FF-test is less than a certain threshold. Hence, an arc between stocks iiand jjis constructed if the corresponding pp-value is less than a chosen threshold. We picked this threshold to be 0.001, which means that Granger causality holds with 99.9% confidence. The motivation behind this threshold choice is to ensure that the constructed networks are sparse enough, so that it would be possible to observe significant changes in connectivity patterns over time (as opposed to the situation where each network contains close to the maximum possible number of arcs, which makes it difficult to detect any changes in connectivity patterns). Thus, only the most “meaningful” connections are reflected in the constructed networks. The summary of statistics of autoregression coefficients in Eq. (1) for edges kept in the networks is shown in Table 1. It is interesting that the values of bbare often close to zero, even if the null hypothesis is rejected.Table 1Summary statistics for autoregressive model from Eq. (1), all networksMeanStdmin25%50%75%maxbb−0.000160-0.0001600.001923−0.040823-0.040823−0.000692-0.0006920.0001080.0006210.046510The Granger causality test can be performed with different numbers of lags. In our preliminary computations, we found that in many cases the Bayesian information criterion (BIC) [21] produced the optimal quantity of one or two lags. Moreover, the corresponding pp-values were very close for both cases. Since it is computationally expensive to check the BIC for every pair of time series, one lag was chosen for all the Granger causality tests, as it was optimal or near-optimal for most pairs of time series. This choice can also be justified by the widely used assumption in financial mathematics that stock returns possess the Markovian property [18].3Dynamics of structural properties of causal market graphTo reveal the long-term evolution of causal market graph characteristics over time, we consider 20 nonoverlapping 1-year periods spanning the most recent two decades. We consider the dynamics of characteristics of the causal market graph, including the number of nodes, arc density, node degrees, connectivity, and degree distribution. In addition, we compute strongly connected components, kk-out-cores and propose a structural decomposition of the causal market graph.3.1Basic characteristicsThe set of stocks traded on NASDAQ, NYSE, and AMEX has undergone significant changes during 2001–2020. As it is shown in Table 2, the number of nodes (stocks) increased from 2087 in period 1 to 7240 in period 20. The number of publicly traded stocks increased by 246% despite the fact that many companies present in the market in earlier periods ceased to exist in later periods.Table 2Basic characteristics of networks corresponding to each time periodYear#Nodes#ArcsMax. o.d.Max i.d.Arc density (%)GCC size (%)In-in assort.Out-out assort.20012,08730,0821764560.6995.35−0.0280.11120022,25333,6252926410.6694.85−0.0480.11520032,35215,230841530.2889.71−0.0200.09220042,48227,9471542420.4593.030.1600.24420052,65622,1331032570.3193.34−0.0840.10020062,84133,7011288780.4294.65−0.0670.17720073,084134,1887918551.4198.51−0.0280.05320083,418651,7291,6492,5915.5899.44−0.2140.06020093,558110,2009972,2120.8798.37−0.069−0.00920103,70095,1851,4321,8200.7095.620.0090.02320113,944185,3312,0042,4951.1998.07−0.091−0.00620124,13082,4204501,5980.4897.34−0.0110.16820134,41093,1135989780.4896.67−0.0630.17520144,697121,2705571,6300.5598.59−0.0250.16320155,061224,4809992,1200.8899.19−0.0170.14420165,403205,6115582,1060.7099.44−0.0340.17320175,720101,2403351,7420.3199.28−0.0690.05820186,147407,6441,1983,0811.0899.74−0.0170.17220196,701273,2461,1012,4170.6199.69−0.0030.18420207,2403,416,0514,7205,6856.5299.90−0.118−0.091(max. o.d. and max i.d. are maximum out-degree and maximum in-degree, respectively; GCC size is the size of the giant connected component as the percentage of the total number of nodes; the last two columns show the respective in- and out-degree assortativity).The threshold value used to identify whether two nodes are connected controls the total number of arcs in the graph. Although the threshold specified in the previous section was chosen to be rather conservative, one can see that the number of arcs can still be large; however, it varies greatly: from 15,230 arcs in 2003 to over 3.4 million arcs in 2020. Due to the difference in the number of nodes in the networks corresponding to different time periods, it makes sense to calculate the arc density (i.e., the ratio of the number of arcs to the maximum possible number of arcs), which is a unit-less measure; thus, it can be used to compare graphs with different numbers of nodes. Table 2 summarizes basic characteristics of the networks corresponding to all considered time periods.In the case of correlation-based (undirected) graph instances constructed over a shorter timeframe, the arc density steadily increased over time [5]. However, the causal market graph does not have this property: Table 2 presents the nonmonotonic dynamics of the number of arcs and the arc density, the latter being also visualized in Figure 1. One can interpret the arc density of the causal market graph as a proportion of ordered pairs of stocks, such that the data corresponding to returns of one stock can be potentially used in order to forecast the future return values of the other. Table 2 presents two other fields related to the network structure: maximum out- and in-degrees. Based on the model of causality, the stocks with high out-degrees are the most “informative” in the sense that their statistics could be used for investigating the behavior of a large number of adjacent stocks (successor nodes in the causal market graph). The in-degree of a node can be treated as the property reflecting the number of stocks containing unique information about this stock. Although this characteristic may be meaningful in certain contexts, in this part of the study, we concentrate mainly on out-degrees of nodes due to the aforementioned considerations.Figure 1Evolution of arc density.The evolution of the density in the causal market graph is shown in Figure 1. One can observe that it has relatively small values during 2001–2006, but it starts to increase in 2007. Further, the arc density attains its highest values in 2008 and 2020. Many economists associate 2007 with the beginning of the worst financial downfall since the Great Depression (started with the U.S. subprime mortgage crisis). The most significant economic event of 2008 is the collapse of the stock market when Dow Jones and S&P500 endured their worst year since 1930. In 2009, although the US economy was still weak, the stock market started to slowly recover after hitting the bottom in March 2009. As one can see, the values of arc density fell drastically compared to 2008, and they stayed relatively stable until 2020, when COVID-19 pandemic started. It can also be observed that in-in assortativity has its lowest values in 2008 and 2020.Although the analysis of these basic properties of the constructed networks may not by itself be sufficient to draw comprehensive conclusions, it can be seen that extreme values of arc density and maximum out-degree of the causal market graph correspond to extreme events in the stock market, and the trends can be noted for the transition periods as well. The different nature of events that impacted the market between 2008 and 2020 may explain the difference in the magnitude of these metrics. In particular, it can be observed that two drastic “spikes” of arc density of the causal market graph (in 2008 and 2020) appear to be inherently different: the 2008 spike was preceded by a smaller yet still significant increase of arc density in 2007 (in fact, the arc density of the 2007 graph is the third largest among the considered time periods), whereas the 2020 spike was not preceded by such an increase. The difference between the respective underlying events that affected the market in 2008 and 2020 is that the 2008 crisis was anticipated by experts based on market trends that started during 2007, but the crisis associated with the 2020 COVID-19 pandemic was not anticipated during 2019.In addition, we consider the specific nodes (stocks/companies) that are most “influential” in the sense that their time series data contain useful information about a large number of other stocks. Figure 2 presents the aggregate distribution of highest out-degree stocks by sector for all considered periods. As one may intuitively expect, the top sectors in this diagram are Funds (that corresponds to Funds, Trusts, and Tracking Stocks) and Financial Services, followed by several other important sectors of the market.Figure 2Distribution of highest out-degree stocks by sectors for all considered years.3.2Degree distributionAs mentioned in Section 2.1, many previous studies have shown that the power-law distribution of out- and in-degrees appears to be a common property for many real-world networks. The degree distribution of most of the constructed causal market graphs also appears to follow a power law, although the quality of power-law fit varies between different network snapshots. Table 3 summarizes the evolution of the power-law parameter γ\gamma and the respective R2{R}^{2}value (which reflects the quality of a least-square fit of a straight line to the log–log data). One can observe from Table 3 that the R2{R}^{2}is only about 63% for 2008 and about 71% for 2020 out-degree distribution fit, but it is significantly higher for other years. Thus, it appears that more substantial deviations from power-law degree distributions coincide with significant events affecting the market. For illustrative purposes, we present the out- and in-degree distributions for causal market graph instances (plotted in the log–log scale) for 2008 and 2019 (see Figures 3 and 4).Table 3Power-law fit results for in-degree and out-degree distributionsYearγ(in){\gamma }_{\left({\rm{in}})}R(in)2{R}_{\left({\rm{in}})}^{2}γ(out){\gamma }_{\left({\rm{out}})}R(out)2{R}_{\left({\rm{out}})}^{2}20011.31140.85171.45950.885620021.21300.84181.44570.849720031.57440.84331.91490.882220041.44310.89291.51070.846920051.45160.86141.79080.857820061.18170.77091.63460.868320071.03750.80901.16080.856520080.77910.78000.72570.636820091.02510.75841.29470.808520101.10150.81131.17900.795320111.02510.78671.17660.814520121.05730.76131.46940.859320131.09870.81231.52290.836220141.08750.77891.39020.839120151.00060.78071.28240.864220160.98730.74491.39710.765120171.37920.81721.67430.732120180.84020.69071.14440.797420191.10930.77161.37820.819620200.72390.70700.73140.7109Figure 3Out-degree distributions for 2008 (left) and 2019 (right).Figure 4In-degree distributions for 2008 (left) and 2019 (right).Although the value of the parameter γ\gamma is rather stable for most of the considered time periods, there is a visible decrease for out- and in-degree distributions corresponding to 2008 and 2020, which is consistent with the aforementioned observations of other metrics, since a smaller value of γ\gamma implies a heavier tail of the distribution (i.e., more nodes with high degrees).3.3Strongly connected componentsAnother interesting question concerning the causal market graph is whether it is strongly connected. If the answer is “yes,” then it would mean that each stock iihas some relationship with every other stock jjvia a directed path of causal relationships connecting nodes iiand jj. To address this question, we have identified the largest strongly connected component in each considered network snapshot. We observed that every considered causal market graph had a “giant” component containing almost all of the nodes. In particular, the smallest size of a giant strongly connected component among the considered networks, which was observed for the 2003 network snapshot, contained almost 90% of the total number of nodes, whereas in many other instances, the relative size of the giant connected component was close to 100%.Returning to Table 3, it can be seen that the parameter γ\gamma of the power-law distribution fluctuates between 0.7 and 1.9 for both out- and in-degrees. Most of these values of γ\gamma are consistent with the range corresponding to the existence (with high probability) of a giant connected component in a power-law random graph, which has been theoretically proven to be (1,3.4785)\left(1,3.4785)for the undirected version of the power-law model in [1].3.4Identifying cohesive clusters based on kk-out-coresDue to the presence of a giant strongly connected component discussed in the previous subsection, strongly connected components cannot be used for clustering (i.e., partitioning a graph into subgraphs according to some similarity criterion), since one cluster would contain virtually all nodes in the graph. Therefore, in this section, we focus our attention on kk-out-cores, which are more “cohesive” network clusters compared to connected components.Recall from Section 2.1 that a kk-out-core is a highly interconnected set of nodes with out-degrees of at least kkwithin this set. Therefore, in the context of the causal market graph, this structure represents a group of stocks, where each stock has causal relationships with at least kkother stocks within the group. To find out how large the number kkcan be, we compute the graph degeneracy for each time period, as described in Section 2.1. Table 4 presents the degeneracy (δout∗(G){\delta }_{{\rm{out}}}^{\ast }\left(G)), and kk-out-core size (∣C∣| {\mathcal{C}}| ) for k=δout∗(G)k={\delta }_{{\rm{out}}}^{\ast }\left(G), and the proportion of the kk-out-core size to the number of nodes (∣C∣/∣N∣| {\mathcal{C}}| \hspace{0.1em}\text{/}\hspace{0.1em}| N| ) in the causal market graph for all considered periods.Table 4k-out-cores in causal market graphs for 2001–2020YearDegeneracykk-out-core sizeProportion (%)2001651924.872002765629.12200321,75474.57200451726.93200531,73565.3220065481.6920071795230.8720089292327.002009949313.862010940210.862011173659.25201271643.97201393948.93201491,41530.132015291733.422016112,71350.21201773,77465.982018172,47540.262019195247.8220202561,79724.82Taking a closer look at the kk-out-core found in the 2008 network snapshot, one can see that 923 stocks form a connected cohesive structure, in which every stock has an out-degree of at least 92. This is a rather interesting observation, taking into account that Granger causality links were constructed using a very conservative threshold value. An intuitive explanation of this fact is that the crisis of 2007–2008 impacted a large portion of the market, which in turn substantially increased the number of statistically significant causal relationships between stocks. An even “denser” kk-out-core (with out-degree of each node at least 256!) was found in the 2020 network, which was affected by COVID-19 pandemic. Overall, the kk-out-core decomposition approach confirms the observations reported earlier; moreover, it allows one to observe “amplified” trends corresponding to significant events affecting the market.4Identifying influential market sectors using pagerankWhile a stock’s out-degree appears to be a reasonable quantitative measure of the stock’s importance, it treats all links as equal and does not take into account the difference in importance of out-neighbors. The PageRank method, which was proposed in [7] for ranking webpages in Google’s search engine, is a simple yet very effective technique that overcomes this drawback. It can be applied to rank nodes in a directed network according to their importance or “centrality” expressed by a certain score. [10] describes the PageRank method as a “democracy,” with links interpreted as votes in favor of the webpages they are directed to. Each webpage can vote for other webpages, and its score is divided evenly over the set of webpages it is voting for. In the realm of a causal market graph, webpages are replaced with stocks and hyperlinks – with causality relations. In addition, we reverse the directions of arcs in the causal market graph to reflect the idea that stock iicausing stock jjcorresponds to stock jj“voting” for stock ii. We call the resulting network a reverse causal graph and denote it by Gr=(N,Ar){G}_{r}=\left(N,{A}_{r}). Then, a stock’s score can be viewed as a weight wi{w}_{i}assigned to the stock ii, which is uniformly distributed among its out-neighbors in the reverse causal graph, and is computed as the sum of the corresponding proportional weights of in-neighbors, i.e., (2)wi=∑j:(j,i)∈ArwjdegGrout(j),i=1,…,∣N∣,{w}_{i}=\sum _{j:\left(j,i)\in {A}_{r}}\frac{{w}_{j}}{{{\bf{\deg }}}_{{G}_{r}}^{{\rm{out}}}\left(j)},\hspace{1em}i=1,\ldots ,| N| ,or, in the matrix form, w=Bww=Bw, where B=[bij]i,j=1∣N∣B={\left[{b}_{ij}]}_{i,j=1}^{| N| }is given by (3)bij=1degGrout(j),if(j,i)∈Ar;0,otherwise.{b}_{ij}=\left\{\begin{array}{ll}\frac{1}{{{\bf{\deg }}}_{{G}_{r}}^{{\rm{out}}}\left(j)},\hspace{1.0em}& \hspace{0.1em}\text{if}\hspace{0.1em}\hspace{0.33em}\left(j,i)\in {A}_{r};\\ 0,\hspace{1.0em}& \hspace{0.1em}\text{otherwise}\hspace{0.1em}.\end{array}\right.Hence, the problem of finding the scores reduces to computing the eigenvector of the column-stochastic matrix BBthat corresponds to the eigenvalue equal to 1. As soon as the scores are computed, we can rank the stocks by ordering the scores from highest to lowest. To overcome technical shortcomings arising when the network has nodes of out-degree 0 or is not connected, the original PageRank method is based on solving the system w=(dB+(1−d)S)ww=\left({\rm{dB}}+\left(1-d)S)winstead of w=Bww=Bw, where d=0.85d=0.85and SSis an ∣N∣×∣N∣| N| \times | N| matrix with all entries equal to 1/∣N∣1\hspace{0.1em}\text{/}\hspace{0.1em}| N| . More detail on PageRank method and related literature are provided in [10].In our experiments, we use PageRank to identify market sectors and industries within a given sector that are most important with respect to aggregated causal relationships. To rank the market sectors over a certain time period, we apply PageRank to the newly introduced causal market sector graph Gs=(Ns,As){G}_{s}=\left({N}_{s},{A}_{s})that is obtained from a causal graph Gr=(N,Ar){G}_{r}=\left(N,{A}_{r})by merging a subset of nodes Ir{I}_{r}representing stocks from the same market sector into a single node is{i}_{s}(referred to as sector node). In addition, for any two-sector nodes is{i}_{s}and js{j}_{s}in Ns{N}_{s}, we assign a weight l(is,js)l\left({i}_{s},{j}_{s})to the arc between them as follows: l(is,js)=∑i∈Ir,j∈Jr1Ar((i,j)),l\left({i}_{s},{j}_{s})=\sum _{i\in {I}_{r},j\in {J}_{r}}{1}_{{A}_{r}}\left(\left(i,j)),where Ir{I}_{r}and Jr{J}_{r}are subsets of all nodes in Nr{N}_{r}that were used to define ir{i}_{r}and jr{j}_{r}, respectively; and 1Ar((i,j)){1}_{{A}_{r}}\left(\left(i,j))is the indicator function for Ar{A}_{r}, which yields 1 if (i,j)∈Ar\left(i,j)\in {A}_{r}and 0 otherwise. To apply the PageRank method to the edge-weighted graph Gs{G}_{s}, we need to solve the system (4)ws=(dBs+(1−d)Ss)ws,{w}^{s}=\left({{\rm{dB}}}^{s}+\left(1-d){S}^{s}){w}^{s},where Bs=[bpqs]p,q=1∣Ns∣{B}^{s}={\left[{b}_{pq}^{s}]}_{p,q=1}^{| {N}_{s}| }is given by (5)bpqs=l(q,p)∑p′:(q,p′)∈Asl(q,p′),if(q,p)∈As;0,otherwise.{b}_{pq}^{s}=\left\{\begin{array}{ll}\frac{l\left(q,p)}{{\displaystyle \sum }_{p^{\prime} :\left(q,p^{\prime} )\in {A}_{s}}l\left(q,p^{\prime} )},\hspace{1.0em}& {\rm{if}}\hspace{0.33em}\left(q,p)\in {A}_{s};\\ 0,\hspace{1.0em}& {\rm{otherwise}}.\end{array}\right.d=0.85d=0.85, and Ss{S}^{s}is an ∣Ns∣×∣Ns∣| {N}_{s}| \times | {N}_{s}| matrix with all entries equal to 1/∣Ns∣1\hspace{0.1em}\text{/}\hspace{0.1em}| {N}_{s}| .Figure 5 shows the breakdown of most influential market sectors for each time period according to their PageRank scores. One can observe that Funds, Trusts, and Tracking Stocks is the top-ranked sector in all time periods except 2002, when Financial Services sector had the same PageRank score. The fact that Funds, Trusts, and Tracking Stocks is the most influential market sector is not surprising, since many stocks in this sector are by definition reflective of the behavior of the entire market. The fact that Financial Services is the second-most influential sector in most of the considered time periods is also somewhat expected; however, it is interesting to observe that the PageRank scores of Financial Services, Industrial, and Technology sectors have decreased in the most recent years. Although the PageRank-based approach has limitations since it takes into account only the respective network topology, these observations may be worth investigating further from more traditional economics, and finance-based perspectives.Figure 5Breakdown of the most influential market sectors for each time period based on the PageRank method.5ConclusionIn this article, we constructed a network-based map of causal relationships in the entire U.S. stock market. The considered network-based model of the stock market is based on publicly available stock prices data and a quantitative causality measure, which makes the model easily interpretable and reproducible. The proposed approach enables one to apply the rich arsenal of network analysis tools toward revealing market trends and investigating the properties of individual nodes and market clusters that may not be apparent otherwise. We focused on studying the basic structural properties of the causal market graph and detecting its most influential entities. The considered network-based metrics are nonmonotonic, with an interesting observation that significant changes over time appear to coincide with global-scale events, such as COVID-19 pandemic and the 2008 financial crisis. In addition, the proposed PageRank-based technique for identifying “influential” market sectors revealed interesting observations that may be worth investigating further.In terms of other possible methods for constructing the respective networks, another potential direction of further research would be to analyze networks constructed using other connectedness computation methods such as [17,27]. It would also be of interest to consider heteroscedasticity in Granger causality and see its effect on the resulting networks. Future research may also include the investigation of the possibility of constructing a market index solely based on Granger causality metrics. The implication of the presence of power-law degree distribution in many of the networks is that a relatively small number of stocks have a large number of strong causal links to a large remaining portion of the market. Further, this observation suggests that the set of stocks comprising the kk-out-cores can be potentially used to create a conceptually new network-based market index.A limitation of this study, which may be addressed in future research, is the problem of multiple comparisons. In order to construct the edges, we do pair-wise Granger causality tests between each pair of nodes. For each pair-wise comparison, the employed statistical tests may result in incorrect rejection of the null hypothesis and adding a wrong edge with 0.1% chance. Even though the probability of adding a “wrong” edge is low, the networks analyzed in this article contain thousands of nodes, and considering independent tests, these networks may contain a few “wrong” edges. Despite the fact that these potential effects cannot be completely ruled out, the results presented in the article networks still contain interesting properties, such as the presence of power-law degree distributions, patterns of arc density changes corresponding to financial crises, and other observations, which are unlikely to appear solely due to statistical anomalies.The considered approaches can potentially be applied in a wider variety of settings. One interesting future research direction would be to consider networks of causal relationships that span stock markets of multiple countries. Another potential area of interest would be applying these techniques to shorter time periods, possibly with smaller time increments between data points (e.g., one could consider hourly, or minute-by-minute stock prices data over a time period of several days or weeks). In particular, although this article focused mainly on a descriptive rather than predictive/prescriptive analysis of stock market data, it would be interesting to see if the considered network-based approaches (perhaps with some modifications) could be used in the context of predictive models of market trends. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Dependence Modeling de Gruyter

Loading next page...
 
/lp/de-gruyter/networks-of-causal-relationships-in-the-u-s-stock-market-mYJ5z1rBPV
Publisher
de Gruyter
Copyright
© 2022 Oleg Shirokikh et al., published by De Gruyter
ISSN
2300-2298
eISSN
2300-2298
DOI
10.1515/demo-2022-0110
Publisher site
See Article on Publisher Site

Abstract

1IntroductionStock markets are complex interconnected systems, where various “local” factors can cause “global” changes in the behavior of the entire market. For instance, favorable or unfavorable economic conditions in certain market segments, or in certain countries, may affect other countries and industries and potentially cause positive or negative fluctuations that span the entire U.S. and international markets. The idea of describing causal relationships between different components of the market system has been addressed in several recent studies. For instance, the survey [24] discussed the concept of contagion in financial markets, which essentially implies the propagation of impact (such as risk) between different components of the market. A network-based model is a natural way to mathematically represent these “contagion” processes; however, the principles for constructing the networks that reflect certain types of processes may vary depending on the respective goals and assumptions of a study.A simple and intuitive technique for constructing a network-based (or, graphical) model of the market is to represent its elements (e.g., stocks) as nodes and connect the nodes by links (arcs) based on pairwise correlations between the corresponding entities (i.e., the correlations between stock price fluctuations over a certain period of time). Such an approach was studied in [5,6,23] in the context of identifying large correlated clusters and diversified portfolios in the U.S. stock market. Although pairwise correlation-based similarity measures have merit in certain applications, a substantial drawback of such measures is in the inability to produce directed links between entities, that is, to establish the direction of “contagion” (i.e., the propagation from node iito node jjvs the propagation from node jjto node ii).In this work, we construct and analyze a directed network model, which describes causal relationships between all pairs of stocks in the U.S. stock market using the concept of Granger causality [15,16]. It should be noted that Granger causality (which will be formally defined later in the article) can be used to determine whether the time series describing stock iicontains useful information for predicting the behavior of the time series of stock jj. This should not be confused with the statement “the increase/decrease in the price of stock iicauses the increase/decrease in the price of stock jj,” which is not necessarily true.There are several previous studies constructing networks based on Granger causality, such as [12,26]; however, there has been no thorough analysis of the resulting networks. Also, the current literature contains little discussion about the influence of market sectors. Further, little attention has been paid to approaches widely used in network science, such as PageRank and kk-core-based methods.As it will be discussed in the next sections, networks constructed using Granger causality appear to capture certain structural properties of the stock market that reflect overall tendencies in its behavior. In particular, we investigate various aspects of connectivity patterns and the evolution of structural properties of the constructed network snapshots. In addition, the considered network representation is used to group stocks into network-based clusters and to identify the most “influential” market entities (sectors/industries).2Basic concepts, data description, network construction2.1Relevant graph-theoretic conceptsLet G=(N,A)G=\left(N,A)be a simple directed graph with the set of nodes NNand the set of arcs A={(i,j):i,j∈N}A=\left\{\left(i,j):i,j\in N\right\}, where the head jjand tail iiof each arc (i,j)\left(i,j)are specified. Arc density of GGis defined as the ratio of the number of arcs in the graph to the maximum possible number of arcs: ρ(G)=∣A∣∣N∣(∣N∣−1)\rho \left(G)=\frac{| A| }{| N| \left(| N| -1)}, where ∣A∣| A| is the number of arcs and ∣N∣| N| is the number of nodes in graph GG. Obviously, ρ(G)∈[0,1]\rho \left(G)\in \left[0,1].Given a node n∈Nn\in N, its in-degree degGin(n){{\bf{\deg }}}_{G}^{{\rm{in}}}\left(n)is the number of incoming arcs and its out-degree degGout(n){{\bf{\deg }}}_{G}^{{\rm{out}}}\left(n)is the number of outgoing arcs. Extensive empirical studies show that degree distributions of many real-life graphs representing diverse datasets follow the well-known power-law model [2,3,4, 8,19]. According to this model, the probability that a node has a degree kk(in- or out-degree for directed graphs) is P(k)∝k−γ{\mathbb{P}}\left(k)\propto {k}^{-\gamma }, or logP(k)=−γlogk+const\log {\mathbb{P}}\left(k)=-\gamma \log \hspace{0.33em}k+{\rm{const}}in the log–log scale, which can be described by a straight line with the slope equal to the parameter γ\gamma of the power-law degree distribution. One of the notable characteristics of such networks (known as the scale-free property) is that their power-law structure should not depend on the network’s size.A directed graph G=(N,A)G=\left(N,A)is called strongly connected if there is a directed path from each node to every other node in the set NN. A disconnected graph can be decomposed into strongly connected subgraphs, which are referred to as strongly connected components of GG. Distinct components can be interpreted as clusters in the corresponding dataset. Several algorithms exist for the efficient identification of strongly connected components in a directed graph. In our study, we use the popular Tarjan’s algorithm based on the depth-first search technique [25].In some situations, clusters based on strongly connected components can be extremely large and comparable with the size of the whole graph (which is in fact the case for the considered graphs, as it will be shown later). Therefore, the clustering approach based on connected components may not be necessarily appropriate for drawing meaningful conclusions regarding specific groups of nodes within a graph. There is a variety of definitions for “tighter” structures that may be interpreted as clusters that have specific cohesive properties of their connectivity patterns. In this study, we utilize the concepts of k-degenerate graph and k-cores for undirected graphs, introduced by [22], and modify them for the case of directed graphs. A simple undirected graph is called kk-degenerate if every its subgraph has a vertex of degree at most kk. The degeneracy of a simple undirected graph GG, denoted by δ∗(G){\delta }^{\ast }\left(G), is the smallest value of kksuch that GGis kk-degenerate. A kk-core in a simple undirected graph is a subset of vertices that induces a subgraph with the minimum degree at least kk. Alternatively, one can define kk-cores as connected components that are left after all nodes of degree less than kkhave been removed from the graph GG; therefore, δ∗(G){\delta }^{\ast }\left(G)is the maximum kkfor which GGcontains a nonempty kk-core.An extension of the notion of a kk-core was proposed in [14] by introducing the concept of a DD-core in a directed graph. The authors consider min-in-degree and min-out-degree of a graph GGdefined as δin(G)=minx∈N{degGin(x)}{\delta }^{{\rm{in}}}\left(G)={\min }_{x\in N}\left\{{{\bf{\deg }}}_{G}^{{\rm{in}}}\left(x)\right\}and δout(G)=minx∈N{degGout(x)}{\delta }^{{\rm{out}}}\left(G)={\min }_{x\in N}\left\{{{\bf{\deg }}}_{G}^{{\rm{out}}}\left(x)\right\}, respectively. Then, for two positive integers k,lk,l, a (k,l)\left(k,l)-DD-core is a maximal size subgraph G′G^{\prime} of GG, where δin(G′)≥k{\delta }^{{\rm{in}}}\left(G^{\prime} )\ge kand δout(G′)≥l{\delta }^{{\rm{out}}}\left(G^{\prime} )\ge l. The intuition behind this notion is to find a subset of the graph, where all the nodes have sufficient out- and in-degrees in order to form a “tight” cluster. For the reasons that will become clear later in the article, we introduce a slightly different structure referred to as a k-out-core, where each node is only required to have an out-degree of at least kk, i.e., the condition for the in-degree is relaxed. Therefore, for a positive integer kk, a kk-out-core of the graph GGis defined as a subgraph G′G^{\prime} of GG, where δout(G′)≥k{\delta }^{{\rm{out}}}\left(G^{\prime} )\ge k. As in the case of undirected graphs, we can define kk-out-cores as connected components that are left after all nodes of out-degree less than kkhave been removed from the graph GG; therefore, δout∗(G){\delta }_{{\rm{out}}}^{\ast }\left(G)is the maximum kkfor which GGcontains a nonempty kk-out-core. Although the original definition of “degeneracy” differs from the definition of δout∗(G){\delta }_{{\rm{out}}}^{\ast }\left(G), for simplicity, we will use the same notation.For a given kk, kk-out-cores can be easily found using a greedy algorithm, which recursively removes the nodes with out-degree less than kkone by one from the graph, until all the remaining nodes have sufficiently large out-degrees. Then one can decompose the resulting network into connected components, which are kk-out-cores by definition. The degeneracy of GGcan be found using a simple binary search technique.2.2Granger causalityIn the aforementioned previous studies of the network-based model of the U.S. stock market [5,6], the market graph was constructed in such a way that a given pair of nodes is connected by an undirected edge if the corresponding stocks exhibit a similar behavior over a certain period of time. The similarity was measured by Pearson’s correlation between the time series representing the returns of corresponding stocks. In this study, we propose a different technique for constructing the set of arcs: the similarity between stocks is measured by Granger causality [15,16], which is extensively used across many application areas because of its simplicity, robustness, and flexibility [9,13]. The details of the network construction will be presented in Section 2.3, whereas here we introduce the definition of causality and the procedure for conducting Granger causality test between two time series.Consider two scalar-valued, stationary time series {xt:t=1,…,T}\left\{{x}_{t}:t=1,\ldots ,T\right\}and {yt:t=1,…,T}\{{y}_{t}:t=1,\ldots ,T\}corresponding to the returns xx, yyof a pair of stocks. The basic idea behind the notion of causality is very general in its nature: one can state that xxcauses yy, denoted by x⇒yx\Rightarrow y, if xxcontains some unique information about yy, so that yycan be better predicted using this information than in the absence of this information. In practice, Granger causality is often tested using the following linear autoregressive model: (1)yt=∑i=1kaiyt−i+∑j=1kbjxt−j+εt,{y}_{t}=\mathop{\sum }\limits_{i=1}^{k}{a}_{i}{y}_{t-i}+\mathop{\sum }\limits_{j=1}^{k}{b}_{j}{x}_{t-j}+{\varepsilon }_{t},where kkis the maximal time lag and εt{\varepsilon }_{t}is a regression error. Then, xxdoes not cause yyif and only if H0:bj=0,j=1,…,k.{{\bf{H}}}_{0}:{b}_{j}=0,\hspace{1em}j=1,\ldots ,k.To test this hypothesis, one can apply the FF-test, and rejecting H0{{\bf{H}}}_{0}implies that xx“Granger causes” yy. The procedure of testing for the presence of causality in the other direction (y⇒x)(y\Rightarrow x)is similar.It should be noted that the Granger causality test is valid only if the time series are covariance (or weak) stationary. In this article, we used the Augmented Dickey-Fuller test [20] to check the stationarity of time series. Further, we assume homoscedasticity, i.e., constant variance of εt{\varepsilon }_{t}.2.3Network constructionIn the constructed directed unweighted network, the nodes are stocks represented as “ticker” symbols. We used all the stocks listed at NYSE, NASDAQ, and AMEX as of December 31, 2020: There were 7,240 stock symbols in total. The list of stock symbols was obtained from EODdata.https://eoddata.com/.We obtained historical stock prices data from Yahoo Finance using yfinancehttps://pypi.org/project/yfinance/.Python library.The adjusted close prices data were transformed into the time series of daily returns, since returns possess scalability property (i.e., the values in time series representing each stock returns have the same order of magnitude) and thus are easily comparable. Furthermore, the logarithms of returns were calculated, due to the fact that log-returns have more attractive statistical properties [11], including weak stationarity, which was verified for all considered time series. If Pi(t){P}_{i}\left(t)and Pi(t−1){P}_{i}\left(t-1)are the adjusted close prices of stock iion days ttand t−1t-1, respectively, then the log-return time series for each stock iiare defined as follows: ri(t)=lnPi(t)Pi(t−1),t=2,…,T,{r}_{i}\left(t)=\mathrm{ln}\frac{{P}_{i}\left(t)}{{P}_{i}\left(t-1)},\hspace{1em}t=2,\ldots ,T,where TTis the number of trading days in each of the considered calendar years (2001–2020).A directed network (referred to as a causal market graph) was constructed for each time period (calendar year) to reflect the causal relationships between stocks. It should be noted that a network constructed for each time period contains only those stocks that were present in the market during that entire time period; therefore, the cardinality and composition of the sets of nodes change from period to period. Every stock is represented by a node, and the existence of an arc (i,j)\left(i,j)means that the time series of stock iicauses the time series of stock jjin the sense of Granger causality. Recall that Granger causality test checks the hypothesis that coefficients bj=0,j=1,…k{b}_{j}=0,j=1,\ldots k. The null hypothesis (all bj{b}_{j}are equal to zero) is rejected in favor of alternative if the pp-value of FF-test is less than a certain threshold. Hence, an arc between stocks iiand jjis constructed if the corresponding pp-value is less than a chosen threshold. We picked this threshold to be 0.001, which means that Granger causality holds with 99.9% confidence. The motivation behind this threshold choice is to ensure that the constructed networks are sparse enough, so that it would be possible to observe significant changes in connectivity patterns over time (as opposed to the situation where each network contains close to the maximum possible number of arcs, which makes it difficult to detect any changes in connectivity patterns). Thus, only the most “meaningful” connections are reflected in the constructed networks. The summary of statistics of autoregression coefficients in Eq. (1) for edges kept in the networks is shown in Table 1. It is interesting that the values of bbare often close to zero, even if the null hypothesis is rejected.Table 1Summary statistics for autoregressive model from Eq. (1), all networksMeanStdmin25%50%75%maxbb−0.000160-0.0001600.001923−0.040823-0.040823−0.000692-0.0006920.0001080.0006210.046510The Granger causality test can be performed with different numbers of lags. In our preliminary computations, we found that in many cases the Bayesian information criterion (BIC) [21] produced the optimal quantity of one or two lags. Moreover, the corresponding pp-values were very close for both cases. Since it is computationally expensive to check the BIC for every pair of time series, one lag was chosen for all the Granger causality tests, as it was optimal or near-optimal for most pairs of time series. This choice can also be justified by the widely used assumption in financial mathematics that stock returns possess the Markovian property [18].3Dynamics of structural properties of causal market graphTo reveal the long-term evolution of causal market graph characteristics over time, we consider 20 nonoverlapping 1-year periods spanning the most recent two decades. We consider the dynamics of characteristics of the causal market graph, including the number of nodes, arc density, node degrees, connectivity, and degree distribution. In addition, we compute strongly connected components, kk-out-cores and propose a structural decomposition of the causal market graph.3.1Basic characteristicsThe set of stocks traded on NASDAQ, NYSE, and AMEX has undergone significant changes during 2001–2020. As it is shown in Table 2, the number of nodes (stocks) increased from 2087 in period 1 to 7240 in period 20. The number of publicly traded stocks increased by 246% despite the fact that many companies present in the market in earlier periods ceased to exist in later periods.Table 2Basic characteristics of networks corresponding to each time periodYear#Nodes#ArcsMax. o.d.Max i.d.Arc density (%)GCC size (%)In-in assort.Out-out assort.20012,08730,0821764560.6995.35−0.0280.11120022,25333,6252926410.6694.85−0.0480.11520032,35215,230841530.2889.71−0.0200.09220042,48227,9471542420.4593.030.1600.24420052,65622,1331032570.3193.34−0.0840.10020062,84133,7011288780.4294.65−0.0670.17720073,084134,1887918551.4198.51−0.0280.05320083,418651,7291,6492,5915.5899.44−0.2140.06020093,558110,2009972,2120.8798.37−0.069−0.00920103,70095,1851,4321,8200.7095.620.0090.02320113,944185,3312,0042,4951.1998.07−0.091−0.00620124,13082,4204501,5980.4897.34−0.0110.16820134,41093,1135989780.4896.67−0.0630.17520144,697121,2705571,6300.5598.59−0.0250.16320155,061224,4809992,1200.8899.19−0.0170.14420165,403205,6115582,1060.7099.44−0.0340.17320175,720101,2403351,7420.3199.28−0.0690.05820186,147407,6441,1983,0811.0899.74−0.0170.17220196,701273,2461,1012,4170.6199.69−0.0030.18420207,2403,416,0514,7205,6856.5299.90−0.118−0.091(max. o.d. and max i.d. are maximum out-degree and maximum in-degree, respectively; GCC size is the size of the giant connected component as the percentage of the total number of nodes; the last two columns show the respective in- and out-degree assortativity).The threshold value used to identify whether two nodes are connected controls the total number of arcs in the graph. Although the threshold specified in the previous section was chosen to be rather conservative, one can see that the number of arcs can still be large; however, it varies greatly: from 15,230 arcs in 2003 to over 3.4 million arcs in 2020. Due to the difference in the number of nodes in the networks corresponding to different time periods, it makes sense to calculate the arc density (i.e., the ratio of the number of arcs to the maximum possible number of arcs), which is a unit-less measure; thus, it can be used to compare graphs with different numbers of nodes. Table 2 summarizes basic characteristics of the networks corresponding to all considered time periods.In the case of correlation-based (undirected) graph instances constructed over a shorter timeframe, the arc density steadily increased over time [5]. However, the causal market graph does not have this property: Table 2 presents the nonmonotonic dynamics of the number of arcs and the arc density, the latter being also visualized in Figure 1. One can interpret the arc density of the causal market graph as a proportion of ordered pairs of stocks, such that the data corresponding to returns of one stock can be potentially used in order to forecast the future return values of the other. Table 2 presents two other fields related to the network structure: maximum out- and in-degrees. Based on the model of causality, the stocks with high out-degrees are the most “informative” in the sense that their statistics could be used for investigating the behavior of a large number of adjacent stocks (successor nodes in the causal market graph). The in-degree of a node can be treated as the property reflecting the number of stocks containing unique information about this stock. Although this characteristic may be meaningful in certain contexts, in this part of the study, we concentrate mainly on out-degrees of nodes due to the aforementioned considerations.Figure 1Evolution of arc density.The evolution of the density in the causal market graph is shown in Figure 1. One can observe that it has relatively small values during 2001–2006, but it starts to increase in 2007. Further, the arc density attains its highest values in 2008 and 2020. Many economists associate 2007 with the beginning of the worst financial downfall since the Great Depression (started with the U.S. subprime mortgage crisis). The most significant economic event of 2008 is the collapse of the stock market when Dow Jones and S&P500 endured their worst year since 1930. In 2009, although the US economy was still weak, the stock market started to slowly recover after hitting the bottom in March 2009. As one can see, the values of arc density fell drastically compared to 2008, and they stayed relatively stable until 2020, when COVID-19 pandemic started. It can also be observed that in-in assortativity has its lowest values in 2008 and 2020.Although the analysis of these basic properties of the constructed networks may not by itself be sufficient to draw comprehensive conclusions, it can be seen that extreme values of arc density and maximum out-degree of the causal market graph correspond to extreme events in the stock market, and the trends can be noted for the transition periods as well. The different nature of events that impacted the market between 2008 and 2020 may explain the difference in the magnitude of these metrics. In particular, it can be observed that two drastic “spikes” of arc density of the causal market graph (in 2008 and 2020) appear to be inherently different: the 2008 spike was preceded by a smaller yet still significant increase of arc density in 2007 (in fact, the arc density of the 2007 graph is the third largest among the considered time periods), whereas the 2020 spike was not preceded by such an increase. The difference between the respective underlying events that affected the market in 2008 and 2020 is that the 2008 crisis was anticipated by experts based on market trends that started during 2007, but the crisis associated with the 2020 COVID-19 pandemic was not anticipated during 2019.In addition, we consider the specific nodes (stocks/companies) that are most “influential” in the sense that their time series data contain useful information about a large number of other stocks. Figure 2 presents the aggregate distribution of highest out-degree stocks by sector for all considered periods. As one may intuitively expect, the top sectors in this diagram are Funds (that corresponds to Funds, Trusts, and Tracking Stocks) and Financial Services, followed by several other important sectors of the market.Figure 2Distribution of highest out-degree stocks by sectors for all considered years.3.2Degree distributionAs mentioned in Section 2.1, many previous studies have shown that the power-law distribution of out- and in-degrees appears to be a common property for many real-world networks. The degree distribution of most of the constructed causal market graphs also appears to follow a power law, although the quality of power-law fit varies between different network snapshots. Table 3 summarizes the evolution of the power-law parameter γ\gamma and the respective R2{R}^{2}value (which reflects the quality of a least-square fit of a straight line to the log–log data). One can observe from Table 3 that the R2{R}^{2}is only about 63% for 2008 and about 71% for 2020 out-degree distribution fit, but it is significantly higher for other years. Thus, it appears that more substantial deviations from power-law degree distributions coincide with significant events affecting the market. For illustrative purposes, we present the out- and in-degree distributions for causal market graph instances (plotted in the log–log scale) for 2008 and 2019 (see Figures 3 and 4).Table 3Power-law fit results for in-degree and out-degree distributionsYearγ(in){\gamma }_{\left({\rm{in}})}R(in)2{R}_{\left({\rm{in}})}^{2}γ(out){\gamma }_{\left({\rm{out}})}R(out)2{R}_{\left({\rm{out}})}^{2}20011.31140.85171.45950.885620021.21300.84181.44570.849720031.57440.84331.91490.882220041.44310.89291.51070.846920051.45160.86141.79080.857820061.18170.77091.63460.868320071.03750.80901.16080.856520080.77910.78000.72570.636820091.02510.75841.29470.808520101.10150.81131.17900.795320111.02510.78671.17660.814520121.05730.76131.46940.859320131.09870.81231.52290.836220141.08750.77891.39020.839120151.00060.78071.28240.864220160.98730.74491.39710.765120171.37920.81721.67430.732120180.84020.69071.14440.797420191.10930.77161.37820.819620200.72390.70700.73140.7109Figure 3Out-degree distributions for 2008 (left) and 2019 (right).Figure 4In-degree distributions for 2008 (left) and 2019 (right).Although the value of the parameter γ\gamma is rather stable for most of the considered time periods, there is a visible decrease for out- and in-degree distributions corresponding to 2008 and 2020, which is consistent with the aforementioned observations of other metrics, since a smaller value of γ\gamma implies a heavier tail of the distribution (i.e., more nodes with high degrees).3.3Strongly connected componentsAnother interesting question concerning the causal market graph is whether it is strongly connected. If the answer is “yes,” then it would mean that each stock iihas some relationship with every other stock jjvia a directed path of causal relationships connecting nodes iiand jj. To address this question, we have identified the largest strongly connected component in each considered network snapshot. We observed that every considered causal market graph had a “giant” component containing almost all of the nodes. In particular, the smallest size of a giant strongly connected component among the considered networks, which was observed for the 2003 network snapshot, contained almost 90% of the total number of nodes, whereas in many other instances, the relative size of the giant connected component was close to 100%.Returning to Table 3, it can be seen that the parameter γ\gamma of the power-law distribution fluctuates between 0.7 and 1.9 for both out- and in-degrees. Most of these values of γ\gamma are consistent with the range corresponding to the existence (with high probability) of a giant connected component in a power-law random graph, which has been theoretically proven to be (1,3.4785)\left(1,3.4785)for the undirected version of the power-law model in [1].3.4Identifying cohesive clusters based on kk-out-coresDue to the presence of a giant strongly connected component discussed in the previous subsection, strongly connected components cannot be used for clustering (i.e., partitioning a graph into subgraphs according to some similarity criterion), since one cluster would contain virtually all nodes in the graph. Therefore, in this section, we focus our attention on kk-out-cores, which are more “cohesive” network clusters compared to connected components.Recall from Section 2.1 that a kk-out-core is a highly interconnected set of nodes with out-degrees of at least kkwithin this set. Therefore, in the context of the causal market graph, this structure represents a group of stocks, where each stock has causal relationships with at least kkother stocks within the group. To find out how large the number kkcan be, we compute the graph degeneracy for each time period, as described in Section 2.1. Table 4 presents the degeneracy (δout∗(G){\delta }_{{\rm{out}}}^{\ast }\left(G)), and kk-out-core size (∣C∣| {\mathcal{C}}| ) for k=δout∗(G)k={\delta }_{{\rm{out}}}^{\ast }\left(G), and the proportion of the kk-out-core size to the number of nodes (∣C∣/∣N∣| {\mathcal{C}}| \hspace{0.1em}\text{/}\hspace{0.1em}| N| ) in the causal market graph for all considered periods.Table 4k-out-cores in causal market graphs for 2001–2020YearDegeneracykk-out-core sizeProportion (%)2001651924.872002765629.12200321,75474.57200451726.93200531,73565.3220065481.6920071795230.8720089292327.002009949313.862010940210.862011173659.25201271643.97201393948.93201491,41530.132015291733.422016112,71350.21201773,77465.982018172,47540.262019195247.8220202561,79724.82Taking a closer look at the kk-out-core found in the 2008 network snapshot, one can see that 923 stocks form a connected cohesive structure, in which every stock has an out-degree of at least 92. This is a rather interesting observation, taking into account that Granger causality links were constructed using a very conservative threshold value. An intuitive explanation of this fact is that the crisis of 2007–2008 impacted a large portion of the market, which in turn substantially increased the number of statistically significant causal relationships between stocks. An even “denser” kk-out-core (with out-degree of each node at least 256!) was found in the 2020 network, which was affected by COVID-19 pandemic. Overall, the kk-out-core decomposition approach confirms the observations reported earlier; moreover, it allows one to observe “amplified” trends corresponding to significant events affecting the market.4Identifying influential market sectors using pagerankWhile a stock’s out-degree appears to be a reasonable quantitative measure of the stock’s importance, it treats all links as equal and does not take into account the difference in importance of out-neighbors. The PageRank method, which was proposed in [7] for ranking webpages in Google’s search engine, is a simple yet very effective technique that overcomes this drawback. It can be applied to rank nodes in a directed network according to their importance or “centrality” expressed by a certain score. [10] describes the PageRank method as a “democracy,” with links interpreted as votes in favor of the webpages they are directed to. Each webpage can vote for other webpages, and its score is divided evenly over the set of webpages it is voting for. In the realm of a causal market graph, webpages are replaced with stocks and hyperlinks – with causality relations. In addition, we reverse the directions of arcs in the causal market graph to reflect the idea that stock iicausing stock jjcorresponds to stock jj“voting” for stock ii. We call the resulting network a reverse causal graph and denote it by Gr=(N,Ar){G}_{r}=\left(N,{A}_{r}). Then, a stock’s score can be viewed as a weight wi{w}_{i}assigned to the stock ii, which is uniformly distributed among its out-neighbors in the reverse causal graph, and is computed as the sum of the corresponding proportional weights of in-neighbors, i.e., (2)wi=∑j:(j,i)∈ArwjdegGrout(j),i=1,…,∣N∣,{w}_{i}=\sum _{j:\left(j,i)\in {A}_{r}}\frac{{w}_{j}}{{{\bf{\deg }}}_{{G}_{r}}^{{\rm{out}}}\left(j)},\hspace{1em}i=1,\ldots ,| N| ,or, in the matrix form, w=Bww=Bw, where B=[bij]i,j=1∣N∣B={\left[{b}_{ij}]}_{i,j=1}^{| N| }is given by (3)bij=1degGrout(j),if(j,i)∈Ar;0,otherwise.{b}_{ij}=\left\{\begin{array}{ll}\frac{1}{{{\bf{\deg }}}_{{G}_{r}}^{{\rm{out}}}\left(j)},\hspace{1.0em}& \hspace{0.1em}\text{if}\hspace{0.1em}\hspace{0.33em}\left(j,i)\in {A}_{r};\\ 0,\hspace{1.0em}& \hspace{0.1em}\text{otherwise}\hspace{0.1em}.\end{array}\right.Hence, the problem of finding the scores reduces to computing the eigenvector of the column-stochastic matrix BBthat corresponds to the eigenvalue equal to 1. As soon as the scores are computed, we can rank the stocks by ordering the scores from highest to lowest. To overcome technical shortcomings arising when the network has nodes of out-degree 0 or is not connected, the original PageRank method is based on solving the system w=(dB+(1−d)S)ww=\left({\rm{dB}}+\left(1-d)S)winstead of w=Bww=Bw, where d=0.85d=0.85and SSis an ∣N∣×∣N∣| N| \times | N| matrix with all entries equal to 1/∣N∣1\hspace{0.1em}\text{/}\hspace{0.1em}| N| . More detail on PageRank method and related literature are provided in [10].In our experiments, we use PageRank to identify market sectors and industries within a given sector that are most important with respect to aggregated causal relationships. To rank the market sectors over a certain time period, we apply PageRank to the newly introduced causal market sector graph Gs=(Ns,As){G}_{s}=\left({N}_{s},{A}_{s})that is obtained from a causal graph Gr=(N,Ar){G}_{r}=\left(N,{A}_{r})by merging a subset of nodes Ir{I}_{r}representing stocks from the same market sector into a single node is{i}_{s}(referred to as sector node). In addition, for any two-sector nodes is{i}_{s}and js{j}_{s}in Ns{N}_{s}, we assign a weight l(is,js)l\left({i}_{s},{j}_{s})to the arc between them as follows: l(is,js)=∑i∈Ir,j∈Jr1Ar((i,j)),l\left({i}_{s},{j}_{s})=\sum _{i\in {I}_{r},j\in {J}_{r}}{1}_{{A}_{r}}\left(\left(i,j)),where Ir{I}_{r}and Jr{J}_{r}are subsets of all nodes in Nr{N}_{r}that were used to define ir{i}_{r}and jr{j}_{r}, respectively; and 1Ar((i,j)){1}_{{A}_{r}}\left(\left(i,j))is the indicator function for Ar{A}_{r}, which yields 1 if (i,j)∈Ar\left(i,j)\in {A}_{r}and 0 otherwise. To apply the PageRank method to the edge-weighted graph Gs{G}_{s}, we need to solve the system (4)ws=(dBs+(1−d)Ss)ws,{w}^{s}=\left({{\rm{dB}}}^{s}+\left(1-d){S}^{s}){w}^{s},where Bs=[bpqs]p,q=1∣Ns∣{B}^{s}={\left[{b}_{pq}^{s}]}_{p,q=1}^{| {N}_{s}| }is given by (5)bpqs=l(q,p)∑p′:(q,p′)∈Asl(q,p′),if(q,p)∈As;0,otherwise.{b}_{pq}^{s}=\left\{\begin{array}{ll}\frac{l\left(q,p)}{{\displaystyle \sum }_{p^{\prime} :\left(q,p^{\prime} )\in {A}_{s}}l\left(q,p^{\prime} )},\hspace{1.0em}& {\rm{if}}\hspace{0.33em}\left(q,p)\in {A}_{s};\\ 0,\hspace{1.0em}& {\rm{otherwise}}.\end{array}\right.d=0.85d=0.85, and Ss{S}^{s}is an ∣Ns∣×∣Ns∣| {N}_{s}| \times | {N}_{s}| matrix with all entries equal to 1/∣Ns∣1\hspace{0.1em}\text{/}\hspace{0.1em}| {N}_{s}| .Figure 5 shows the breakdown of most influential market sectors for each time period according to their PageRank scores. One can observe that Funds, Trusts, and Tracking Stocks is the top-ranked sector in all time periods except 2002, when Financial Services sector had the same PageRank score. The fact that Funds, Trusts, and Tracking Stocks is the most influential market sector is not surprising, since many stocks in this sector are by definition reflective of the behavior of the entire market. The fact that Financial Services is the second-most influential sector in most of the considered time periods is also somewhat expected; however, it is interesting to observe that the PageRank scores of Financial Services, Industrial, and Technology sectors have decreased in the most recent years. Although the PageRank-based approach has limitations since it takes into account only the respective network topology, these observations may be worth investigating further from more traditional economics, and finance-based perspectives.Figure 5Breakdown of the most influential market sectors for each time period based on the PageRank method.5ConclusionIn this article, we constructed a network-based map of causal relationships in the entire U.S. stock market. The considered network-based model of the stock market is based on publicly available stock prices data and a quantitative causality measure, which makes the model easily interpretable and reproducible. The proposed approach enables one to apply the rich arsenal of network analysis tools toward revealing market trends and investigating the properties of individual nodes and market clusters that may not be apparent otherwise. We focused on studying the basic structural properties of the causal market graph and detecting its most influential entities. The considered network-based metrics are nonmonotonic, with an interesting observation that significant changes over time appear to coincide with global-scale events, such as COVID-19 pandemic and the 2008 financial crisis. In addition, the proposed PageRank-based technique for identifying “influential” market sectors revealed interesting observations that may be worth investigating further.In terms of other possible methods for constructing the respective networks, another potential direction of further research would be to analyze networks constructed using other connectedness computation methods such as [17,27]. It would also be of interest to consider heteroscedasticity in Granger causality and see its effect on the resulting networks. Future research may also include the investigation of the possibility of constructing a market index solely based on Granger causality metrics. The implication of the presence of power-law degree distribution in many of the networks is that a relatively small number of stocks have a large number of strong causal links to a large remaining portion of the market. Further, this observation suggests that the set of stocks comprising the kk-out-cores can be potentially used to create a conceptually new network-based market index.A limitation of this study, which may be addressed in future research, is the problem of multiple comparisons. In order to construct the edges, we do pair-wise Granger causality tests between each pair of nodes. For each pair-wise comparison, the employed statistical tests may result in incorrect rejection of the null hypothesis and adding a wrong edge with 0.1% chance. Even though the probability of adding a “wrong” edge is low, the networks analyzed in this article contain thousands of nodes, and considering independent tests, these networks may contain a few “wrong” edges. Despite the fact that these potential effects cannot be completely ruled out, the results presented in the article networks still contain interesting properties, such as the presence of power-law degree distributions, patterns of arc density changes corresponding to financial crises, and other observations, which are unlikely to appear solely due to statistical anomalies.The considered approaches can potentially be applied in a wider variety of settings. One interesting future research direction would be to consider networks of causal relationships that span stock markets of multiple countries. Another potential area of interest would be applying these techniques to shorter time periods, possibly with smaller time increments between data points (e.g., one could consider hourly, or minute-by-minute stock prices data over a time period of several days or weeks). In particular, although this article focused mainly on a descriptive rather than predictive/prescriptive analysis of stock market data, it would be interesting to see if the considered network-based approaches (perhaps with some modifications) could be used in the context of predictive models of market trends.

Journal

Dependence Modelingde Gruyter

Published: Jan 1, 2022

Keywords: network analysis; graph theory; causal market graph; Granger causality; k -core; PageRank; 91G45; 90B10

There are no references for this article.