Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Analyzing voter behavior on social media during the 2020 US presidential election campaign

Analyzing voter behavior on social media during the 2020 US presidential election campaign Every day millions of people use social media platforms by generating a very large amount of opinion-rich data, which can be exploited to extract valuable information about human dynamics and behaviors. In this context, the present manuscript provides a precise view of the 2020 US presidential election by jointly applying topic discovery, opinion mining, and emo- tion analysis techniques on social media data. In particular, we exploited a clustering-based technique for extracting the main discussion topics and monitoring their weekly impact on social media conversation. Afterward, we leveraged a neural-based opinion mining technique for determining the political orientation of social media users by analyzing the posts they published. In this way, we were able to determine in the weeks preceding the Election Day which candidate or party public opinion is most in favor of. We also investigated the temporal dynamics of the online discussions, by studying how users’ publishing behavior is related to their political alignment. Finally, we combined sentiment analysis and text mining techniques to dis- cover the relationship between the user polarity and sentiment expressed referring to the different candidates, thus modeling political support of social media users from an emotional viewpoint. Keywords Social media analysis · Opinion mining · User polarization · Sentiment analysis · Political events 1 Introduction opinion, and the patterns of information production (Pang and Lee 2008; Cesario et al. 2016; Marozzo and Bessi 2018; In recent years, the growing use of social media is generat- Cantini et al. 2022). All the knowledge extracted through ing an amount of information-rich data never seen before. such techniques allows to outline a precise profile of social This data, commonly referred as Big Social Data, can be users, by describing them from a behavioral and psychologi- effectively leveraged by a wide range of techniques aimed cal viewpoint, and by modeling their perception of events at modeling the interactions of users on social media, their and public decisions. collective sentiment and behavior, the dynamics of public This manuscript presents an in-depth analysis of the posts published on Twitter during the 2020 US election campaign, aiming at outlining an accurate view of the political event from different points of view. Specifically, several techniques * Fabrizio Marozzo fmarozzo@dimes.unical.it of topic discovery, opinion mining, and emotion analysis were combined in a unified data analysis workflow for inves- Loris Belcastro lbelcastro@dimes.unical.it tigating: (i) trending topics and their evolution over time, (ii) users’ political alignment and publishing behavior, and (iii) Francesco Branda fbranda@dimes.unical.it users’ sentiment and emotional aspects. Firstly, we extracted the main discussion topics charac- Riccardo Cantini rcantini@dimes.unical.it terizing the 2020 US election campaign by leveraging the unsupervised approach proposed in Cantini et al. (2021), Domenico Talia talia@dimes.unical.it which relies on the density-based clustering of the latent representation of trending hashtags. Afterward, in order to Paolo Trunfio trunfio@dimes.unical.it achieve a more accurate representation of social media con- versation, we studied the weekly evolution of the detected DIMES Department, University of Calabria, Rende, Italy Vol.:(0123456789) 1 3 83 Page 2 of 16 Social Network Analysis and Mining (2022) 12:83 topics, which is useful to understand how online discussion of pro-Trump users, their favorite candidate with positive evolves over time. content that shows emotions like joy and confidence. Alter - Secondly, we modeled the political alignment of social natively, they may be more likely to discredit the opposing media users, in order to understand which candidate or party candidate, as in the case of pro-Biden users, by produc- public opinion is most in favor of in the weeks preceding ing negative online content characterized by emotions like the Election Day. For this purpose, we exploited IOM-NN anger, disgust and sadness. (Iterative Opinion Mining using Neural Networks), a neural- The rest of the paper is organized as follows. Section 2 based opinion mining methodology we previously proposed reports the most relevant approaches in computational poli- in Belcastro et al. (2020). Specifically, a real-time analy - tics and sentiment analysis present in the literature. Section 3 sis was carried out during the 2020 US presidential elec- describes the different techniques combined in the proposed tion campaign using data gathered from Twitter, correctly analysis workflow. Section  4 describes the experimental determining Joe Biden’s lead over Donald Trump before evaluation. Finally, Sect. 5 concludes the paper. the Election Day. The achieved results, publicly available through our university web portal, represent a remarkable step forward with respect to previous works present in the 2 Related work literature. In fact, to the best of our knowledge, experimental evaluations are carried out after the end of the considered Computational politics is a research area that involves a set event, while in our case we have given a proof of the real- of techniques aimed at analyzing users’ behavior during a time effectiveness of IOM-NN, which leads to the possibil- political event of interest, both modeling and influencing ity of using it for enhancing or even replacing traditional their perception and opinion about facts, events and pub- opinion polls. Furthermore, for the sake of completeness, lic decisions. With the rapid growth of social media usage, we extended the results of the real-time analysis by focus- microblogging platforms have become a rich source of ing on the main swing states, i.e., those states characterized valuable information, which can be effectively exploited by a high uncertainty about the winning candidate and for for investigating the patterns of information diffusion, the this reason by a marked strategic importance. We assessed interactions between users and their opinion about a spe- the statistical significance of the collected data by study - cific faction or candidate (Belcastro et al. 2020). According ing the age, gender and geographical distribution of Twitter to a recent survey (Haq et al. 2020), existing literature on users for understanding whether they can be considered vot- computational politics can be categorized into five classes, ers of the political event. The obtained results confirm the as discussed in the following. great effectiveness of our approach, which outperformed the Community and user modeling This class of works average of the latest opinion polls by correctly identifying focuses on modeling the behavior of social media users from the leading candidate before the Election Day in 10 out of both an individual and collective viewpoint. Many works 11 swing states. Furthermore, the polarization information in this category are related to the analysis of homophily, achieved by IOM-NN was also leveraged to investigate the i.e., the connection of groups of users driven by common temporal dynamics of social media conversation, with the interests, which leads to the formation of community struc- aim of studying how users’ publishing behavior is related to tures of like-minded people (Grevet et al. 2014; Bastos et al. their political alignment, and how it reflected the occurrence 2013; Fraisier et al. 2017). Other works focus on modeling of external events like debates or rallies. political affiliation of social users, exploiting community Thirdly, we analyzed the relationship between the emo- information for predicting the results of a political event tional sphere of Twitter users and their political alignment. (Belcastro et al. 2020; Chiu and Hsu 2018; Takikawa and In particular, we jointly exploited sentiment analysis and Nagayoshi 2017). text mining techniques for extracting the sentiment of social Information flow These works investigate how informa- media users. Then, we combined this information with the tion flows within the network. Most of them analyze the polarization achieved by IOM-NN for investigating how a misinformation spread, trying to detect fake news thus limit- user refers to the candidates while supporting his/her pre- ing its distortion effects on public opinion (Kim et al. 2018; ferred faction, with respect to a broad spectrum of emo- Ciampaglia et al. 2015; Gyongyi et al. 2004). Other works tions. This step is useful for understanding how the sup- in this category are also aimed at identifying echo cham- porters of a particular candidate express their preference on bers, i.e., situations in which the repetition and sharing of social media. Specifically, they can praise, as in the case information causes the strengthening of an opinion inside a community (Garimella et al. 2018; An et al. 2013; Shu et al. 2019). Political discourse Works in this category model online https:// scalab. dimes. unical. it/ usa- 2020/ (text in Italian). discussion from different points of view, taking into account 1 3 Social Network Analysis and Mining (2022) 12:83 Page 3 of 16 83 demographic aspects, community structure and information tool for natural language processing, to examine the dynam- diffusion patterns. Many works are aimed at extracting the ics between candidate posts and comments they received on main topics of discussion through topic modeling (Greene Facebook and calculate a score for each political candidate and Cross 2015; Trabelsi and Zaïane 2019), or identifying for measuring his/her credibility on a given issue. Singh political crisis (Keneshloo et al. 2014). Opinion mining tech- et al. (2021) carried out a comparison among four machine niques can be also exploited for identifying the opinion or and deep learning algorithms (i.e., TextBlob, Naive Bayes, mood of social media users about those topics, as users’ SVM, and BERT (Devlin et al. 2018)) for sentiment analysis. interactions on social media can affect their political engage- Authors used the 2020 US presidential election as a case ment (Hoffmann and Lutz 2017; Azarbonyad et al. 2017; study, finding that the use of BERT leads to the best results. Monti et al. 2013). All of the aforementioned techniques are characterized Election campaigns Research contributions in this class by several issues related to the use of social media data for are aimed at measuring the engagement of the online audi- predicting the outcome of political events, which are lan- ence, enabling large-scale opinion polls and the management guage barrier, misclassification, data imbalance and reli- of the political campaign. In fact, social media provide an ability (Bilal et al. 2019). Consequently, in order to achieve effective platform for engaging users in political discussion, a precise estimate of the political polarization of the US which is often used by politicians during the political cam- citizens, we leveraged the IOM-NN technique, specially paigns (Wulf et al. 2013; Hong and Nadler 2015). Moreover, designed to overcome these issues (Belcastro et al. 2020): the analysis of political engagement of social users can accu- (i) it is language-independent, as it uses a hashtag-based rately forecast the final results of the political event under bag of words representation; (ii) it avoids misclassifications analysis (Belcastro et al. 2020; Saleiro et al. 2016). using a high threshold on the polarization probability; (iii) it System design Works in this category propose a full sys- uses randomized class balancing algorithms in order to avoid tem design of computation politics systems. As an example, the learning process being biased toward majority classes; Cambre et al. (2017) propose a system design that can help (iv) it requires a preliminary study of users’ representative- to break the echo chamber effect, moderating the online ness, in order to understand whether they can be considered political discussion, while Dade-Robertson et  al. (2012) voters in the political event under analysis (See Sect. 4.1.1). discuss the relationship between political processes, urban Moreover, with respect to state-of-art techniques, IOM-NN environments and situated technologies. allows the classification of a much greater number of tweets In this work we use opinion mining and sentiment anal- and users, due to its incremental and iterative nature, which ysis techniques in order to investigate the polarization of leads to a better quality and robustness of the results. the US social media users toward the different candidates involved in the 2020 US presidential election. Starting from this, we identify the emotional state (mood) of social users 3 Analysis workflow and its relation with their political orientation. Finally, we exploit the results of polarization analysis in order to fore- In this work we present an in-depth analysis of the posts cast the final results. published on Twitter during the 2020 US election campaign, There are several works in the literature that rely on text with the aim of outlining an accurate representation of this mining and natural language processing algorithms for political event from different perspectives, including users’ investigating the opinion of social users and their collective publishing behavior, discussion topics, political alignment sentiment toward political candidates or parties. Oikono- and its relationships with the emotional sphere. mou and Tjortjis (2018) used Textblob, a Python library For this purpose, several techniques were combined in a for natural language processing, to predict the outcome of unified analysis workflow, represented in Fig.  1, composed the US presidential election in three states of interest (i.e., of the following steps: Florida, Ohio and North Carolina). Wong et al. exploited (Wong et al. 2016) SentiStrength, a lexicon-based sentiment Collection of posts: data are gathered from social media analysis tool, for modeling the political behaviors of users by by using a set of keywords related to the considered polit- analyzing tweets and retweets. Alashri et al. (2016) analyzed ical event. Facebook posts about the 2016 US presidential election with Classification of posts: the collected posts are classified CoreNLP  (Manning et al. 2014), one of the most popular in favor of a faction according to the detected political support. Polarization of users: the classified posts are analyzed for determining the polarization of users toward a faction. https:// textb lob. readt hedocs. io/ en/ dev/. Topic discovery: the collected posts are analyzed in order http:// senti stren gth. wlv. ac. uk/. to identify the politically related discussion topics under- https:// stanf ordnlp. github. io/ CoreN LP/. 1 3 83 Page 4 of 16 Social Network Analysis and Mining (2022) 12:83 Fig. 1 A graphic representation of our analysis workflow lying the conversation on social media, modeling their In the following sections we provide a detailed descrip- evolution over time. tion of the proposed analysis workflow. Moreover, to facili- • Temporal analysis: the temporal dynamics of social tate understanding of the different steps, we will show prac- media conversation are analyzed and combined with the tical examples by examining a small subset of the collected polarity information of classified posts in order to study data. users’ publishing behavior in relation to their political alignment.3.1 Collection of posts Emotion analysis: the polarized posts are exploited for investigating the relationship between the political orien- The goal of this step is to collect a set P of social media tation of users and the different emotions they expressed posts from different sources (e.g., Twitter), related to the in referring to the different candidates. political event E under analysis. As a first step, the different factions, parties or candidates involved in the political event Among the aforementioned steps, the first three jointly con - are identified, defined as the set F ={f , f , … , f } . In par- 1 2 n stitute the IOM-NN methodology (Belcastro et al. 2020), ticular, in the case of the 2020 US presidential election, we while the fourth follows the approach to topic detection in focused on the two main candidates Joe Biden and Donald social data proposed in Cantini et al. (2021). IOM-NN is an Trump. Afterward, geotagged posts are gathered by using a opinion mining technique aimed at discovering the political set of keywords K that is partitioned as follows: polarization of social media users during election campaigns characterized by the competition of political factions. The • neutral keywords ( K ) that contains generic key- context methodology relies on an iterative and incremental proce- words that can be associated to E without referring to dure based on feed-forward neural networks, aimed at dis- any specific faction in F (e.g., #vote , #election2020); ⊕ ⊕ ⊕ covering the political polarization of social media users by faction keywords ( K = K ,… ,K ) that contains the F f f 1 n analyzing the posts they publish. An open-source implemen- keywords used for supporting each faction (e.g., tation of IOM-NN is available on Github. #votebiden , #maga). The keywords selection process requires a small amount of domain knowledge, as these keywords can be manually https:// github. com/ SCAla bUnic al/ IOM- NN. 1 3 Social Network Analysis and Mining (2022) 12:83 Page 5 of 16 83 acquired at the previous iterations. It is worth noticing that, due to the incremental nature of the annotation process, IOM-NN is not tied to a specific set of initial faction key - words and does not require an in-depth knowledge of the political event under consideration. In fact, even starting from a small but representative set of faction keywords, IOM-NN is able to infer new classification rules iteratively, which implies a good robustness and generalizability of the Fig. 2 Example of how the collection of posts step works methodology. Figure 3 shows a classification example of a small set of tweets about the 2020 US presidential election, which exploits the following faction keywords. selected among the trending hashtags that people com- • K monly use to refer to E on social media. Moreover, the key- = {#voteblue , #backtheblue , #votebiden , ...}; Biden word selection process can be automatized by searching for • K = {#votered , #trump2020 , #maga , ...}. Trump specific patterns, like “#vote + candidate”, often used for labeling politically polarized posts. We assessed the sta- At iteration 0, IOM-NN uses the keywords in K for clas- tistical significance of the collected posts by studying the sifying five tweets. In the subsequent iterations, the neural age, gender and geographical distribution of Twitter users model iteratively exploits the tweets classified in the pre- for understanding whether they can be considered voters of vious steps for generating new classification rules based the political event. For this purpose we used a wide range on co-hashtag relationships. As an example, at iteration 1 of information which can be directly extracted from users the model is trained with the tweet classified at iteration 0, metadata (e.g., location and language), or examined starting discovering new political-oriented topics of discussion and from statistical reports about the usage of the social media generating the following classification rules: platform in a given country (e.g., user distribution by age and gender). Furthermore, in order to improve the represent- tweets with keywords #bountygate are classified in favor ativeness of the collected posts, user accounts are analyzed, of Biden since Donald Trump was accused of paying filtering those that show anomalous publishing activity, such Moscow’s secret agents for the killing of the US service- as social bots or news sites, or those that have inconsistent men in Afghanistan; information in their profile, such as for example a location tweets with keywords #crookedbiden are classified in that is not defined or does not belong to any of the states favor of Trump since Hunter Biden (i.e., second son of considered (see Sect. 4.1.1). US President Joe Biden), was accused by Donald Trump Collected posts undergo the following preprocessing of wrongdoing in regard to China and Ukraine. operations: i) the text of each post is converted to lowercase and accented characters are normalized; ii) words are lem- This learning process iterates until the algorithm is no longer matized and stemmed (e.g., vote or votes or voted →� vot); able to generate new classification rules and therefore to iii) stopwords are removed; and iv) bigrams are identified identify the polarization of new tweets. (e.g., San Francisco →� San Francisco). Figure 2 shows an After having classified the posts according to the polar - example of how posts are collected using keywords about ity discovered by IOM-NN, this information is used for the 2020 US presidential election. Some of these keywords investigating how the user publishing behavior is related to are generic (e.g., #vpdebate2020 ), and others are used to their political alignment. Specifically, temporal dynamics of support a specific candidate (e.g., #voteblue for Biden and social media conversation are analyzed, studying how infor- #trump2020 for Trump). mation is produced by the supporters of both candidates, and how this reflects the occurrence of external events such as 3.2 Classification of posts and temporal analysis debates and rallies. During this step, the posts P collected in the previous step 3.3 Polarization of users are classified in favor of a faction by using IOM-NN. Spe- cifically, a preliminary iteration is performed for classifying This step is aimed at analyzing the set of previously clas- input posts according to the keywords in K . Posts contain- sified posts in order to determine the polarization of users ing keywords related to exactly one faction are polarized toward a faction. Specifically, the list of classified posts for toward that faction, while remaining posts are labeled as each user u is computed, filtering out those users that pub- neutral. Then, neutral posts undergo an iterative classifica- lished a number of posts below a given threshold. Afterward, tion process, during which the model exploits the knowledge a score vector v for each user u is computed, which contains 1 3 83 Page 6 of 16 Social Network Analysis and Mining (2022) 12:83 Fig. 3 Example of how the clas- sification of posts step works Fig. 4 Example of how the polarization of users step works his/her score for each faction. Finally, IOM-NN calculates information about the political alignment of social media the overall faction score as the normalized sum of the score users with their sentiment and emotional expressions. vectors. Figure 4 shows how the polarization of users step Firstly, in order to extract the sentiment from online pub- works on the classified posts reported in Fig.  3. For each lished content, we exploited SentiStrength (Thelwall 2017) user, the posts in favor of Biden and Trump are counted, for the annotation of social media posts. In particular, for discarding those users who have published less than two each polarized post we computed a positive Sc (p) and tweets. Then, the polarization vector for each user is com- negative Sc (p) sentiment score, both ranging between 1 puted containing the percentage of posts published in favor (neutral) and 5 (strongly positive/negative). Then, the over- of his/her preferred faction. Lastly, the final score vector is all sentiment score Sc(p) of a polarized post is obtained as + − determined, that contains the overall polarization percent- follows: Sc(p)= Sc (p)− Sc (p) . Secondly, we modeled ages for the two candidates. the political orientation of social media users from an emo- tional point of view by exploiting NRC-EmoLex (Moham- 3.4 Emotion analysis mad and Turney 2013), a publicly available emotion lexi- con which has proven its performance in several sentiment This step analyzes the polarized posts for identifying the and emotion classification tasks, as described in  Kiritch- users’ sentiment underlying the online discussion about enko et al. (2014),  Mohammad (2012), and  Nakov et al. the presidential candidates. Specifically, we combined the (2016). Specifically, NRC contains more than 14 thousand 1 3 Social Network Analysis and Mining (2022) 12:83 Page 7 of 16 83 Fig. 5 Example of pro-Biden tweets Fig. 6 Example of pro-Trump tweets English terms labeled by the expressed polarity (i.e., posi- used in Cantini et al. (2021). As a first step, a Word2Vec tive or negative) and eight basic emotion categories of Plut- model is trained on the entire corpus of tweets, in order chik (2001) (i.e., joy, trust, anticipation, sadness, surprise, to get the latent representation of hashtags and words in a disgust, fear or anger). Finally, we combined the obtained 150-dimensional vector space. We selected the dimension information with the political alignment discovered by using of the embedding space by conducting several experiments, IOM-NN, in order to extract the overall sentiment and emo- finding out the smallest size for which a clear clustering tions expressed by social media users, while talking about structure emerged, i.e., the best trade-off between complex - the two candidates. ity and representativeness. Subsequently, all hashtags are As an example, Figs.  5 and 6 show how the sentiment embedded in that 150-dimensional space, whose dimension- analysis step works on the polarized tweets obtained at the ality is then reduced by using the t-distributed stochastic previous step (i.e., a small subset of the collected tweets) for neighbor embedding (t-SNE) technique, initialized through understanding the emotional state of the users who support principal component analysis (PCA), to obtain a 2D pro- the different candidates. As we can see, polarized tweets jection of that space. Moreover, in order to reduce noise, are quite positive for both candidates, but show different all hashtags with a frequency lower than a given threshold emotional profiles. In particular, Biden’s supporters show are filtered out. Finally, the OPTICS algorithm is used for trust in the new presidential candidate, while Trump’s ones extracting a clustering structure based on the topic-based express their joy at having Trump as president. separation of hashtags, induced by the projection of their semantic distribution. We have chosen this clustering algo- 3.5 Topic discovery rithm due to its ability to discover clusters with arbitrary shape. In addition, compared to classical density-based This step is aimed at identifying the main politically- algorithms such as DBSCAN, it is able to extract clustering related discussion topics characterizing the 2020 US elec- structures at different density levels, which in our work is tion campaign, by following the unsupervised approach useful for dealing with micro-topics. 1 3 83 Page 8 of 16 Social Network Analysis and Mining (2022) 12:83 4 The US 2020 Presidential election analysis and experimental results In this section we provide an accurate description of the results coming from the analysis of the 2020 US presiden- tial campaign, characterized by a strong rivalry between Joe Biden and Donald Trump. In particular, we analyzed election-related tweets with the aim of outlining a precise representation of this political event from different points of view, in terms of users’ publishing behavior, sentiment, political alignment and discussion topics. For this purpose we combined several techniques in an analysis workflow, whose steps are accurately described in Sect. 3 and whose results are reported in the following sections. Fig. 7 Complementary Cumulative Density Function (CCDF) of pub- 4.1 Data description lished tweets per user The data used to perform the experimental evaluation comes huge amount of tweets, and many users posting infrequently from a public repository that contains a real-time collection or not at all, the so-called social lurkers. of tweets related to the 2020 US presidential election from December 2019 to June 2021 (Chen et al. 2021). From such 4.1.1 Statistical significance of collected data repository we considered only the tweets published close to the election event (from September 1 to October 31, 2020), Here we investigate the statistical significance of the col- i.e., about 160 million of which 18 million are tweets (11%), lected data in order to assess users representativeness, i.e., 110 million are retweets (69%), and 32 million are replies whether they can be considered voters of the political event (20%), posted by about 29 million users. Only 22% of fil- under analysis. tered data contain hashtags (e.g., #trump2020 , #bidenhar- Firstly, from tweets metadata we extracted aggregate ris2020), useful to understand the arguments used in favor information on the used language of social media users, dis- of the different candidates. In particular, the percentage of covering that most of tweets have the lang field set to Eng- tweets published with at least one hashtag related to Trump lish (about 90%), whereas the remaining 10% is Undefined (i.e., #trump , #trump2020 , and #maga ) and Biden (i.e., or set to other languages like Spanish. Secondly, we com- #bidenharris2020 , #biden ) is about 31% and 11%, respec- pared the number of Twitter users in our dataset, grouped by tively. However, 7% of tweets contain at least one nega- state, with the number of adult citizens actually living in that tive hashtag about Trump (i.e., #trumpknew , #pedotrump , state, belonging to the voting-eligible population (VEP). #trumphascovid , #trumptaxreturns , #bountygate), whereas Specifically, users were associated with states via Twitter only 1% of tweets contain a negative hashtag for Biden (i.e., metadata, by analyzing the location field present in each #crookedjoebiden ). In order to ensure the representativeness tweet, which indicates the location defined by the user in of the collected posts, we analyzed users’ account infor- his/her Twitter account (e.g., Austin, TX). It is worth noting mation, filtering out content posted by users that show an that, from the textual analysis of this field, it is not always anomalous publishing activity or inconsistent profile infor - easy to extract a meaningful city/state, as many users either mation. This step allows to avoid the negative effects caused left the field blank, or did not provide precise information by the presence of content published by new sites and social (e.g., “USA”), or specified fictitious or nonexistent locations bots, which can introduce a heavy bias in social media data (e.g., “the moon” or “NY, Italy”). We measured the strength (Cantini et al. 2022). We further analyzed the publishing of this correlation, finding a Pearson coefficient r = 0.97 , behavior of the users in the filtered dataset by determining significant at p < 0.01 . The linear relationship that links the Complementary Cumulative Density Function (CCDF) users and the voting-eligible population can be easily seen of shared tweets per user. Specifically, given the random in Fig. 8, which depicts an interpolation of the related scat- variable X representing the number of shared tweets, it is ter plot, with a goodness-of-fit R = 0.93 . Notice that outlier determined by the frequency of users publishing a number states were not considered in this step in order to achieve of posts greater than x, i.e., the probability P(X > x) . The scatter plot in log-scale shown in Fig. 7, reveals a highly skewed distribution, with few active Twitter users posting a http:// www. elect proje ct. org/ 2020g. 1 3 Social Network Analysis and Mining (2022) 12:83 Page 9 of 16 83 Fig. 8 Linear interpolation: analyzed users versus voting-eligible population grouped by the US states Table 1 Number of Twitter users versus voting-eligible population (VEP) grouped by swing states State #Users #VEP Arizona 5,692 5,189,000 Fig. 9 Unsupervised detection of the main topics underlying the Florida 16,921 15,551,739 online discussion Georgia 5,841 7,383,562 Michigan 8,411 7,550,147 electoral event. We made this data, used in all the subsequent Minnesota 4,596 4,118,462 9 analysis steps, publicly available on Github. Nevada 1,156 2,153,915 Table 1 reports a comparison between the users we were New Hampshire 1,610 1,079,434 able to capture for each swing state and the VEP. The high North Carolina 7,245 7,759,051 correlation between the number of analyzed users per state Pennsylvania 7,040 9,781,976 and the VEP leads to a significant set of social media data Texas 19,119 18,784,280 effectively exploitable to determine the polarization of pub- Wisconsin 3,898 4,368,530 lic opinion. However, despite the representativeness of the considered posts, the results achieved by the analysis of the online conversation can be influenced by platform biases. Specifically there exist usage biases due to the distribution meaningful results, by excluding data of different magnitude. In addition, we explored age and gender distribution of ana- of users of a social media platform in terms of gender, age, culture and social status, as well as technical biases related lyzed users, n fi ding out that about 94% of them are adults (at 7 8 least 18 years old) and almost equally divided by gender . to platform policies about data availability and restrictions imposed in some areas of the world. Among all the available tweets we have selected those published by users located in the 11 main swing states (i.e., 4.2 Trending topics of the election campaign Arizona, Florida, Georgia, Michigan, Minnesota, Nevada, New Hampshire, North Carolina, Pennsylvania, Texas, Wis- In this step we identified the main politically related discus - consin). We analyzed only these states as they are character- ized by a marked political uncertainty and their outcomes sion topics characterizing the 2020 US election campaign. Achieved results are shown in Fig. 9, where six clusters are have a high probability of being a decisive factor of the clearly visible, each one related to a different topic of discus- sion. Moreover, Table 2 summarizes the discovered topics by reporting the corresponding top hashtags. https:// www . s t ati s t a. com/ s t ati s tics/ 265647/ shar e- of- us- inter ne t- users- who- use- twitt er- by- age- group/. https:// www . s t ati s t a. com/ s t ati s tics/ 265643/ shar e- of- us- inter ne t- users- who- use- twitt er- by- gender/. https:// github. com/ SCAla bUnic al/ USA20 20. 1 3 83 Page 10 of 16 Social Network Analysis and Mining (2022) 12:83 Table 2 Brief description of the identified topics Cluster ID Topic Top hashtags #1 Bad management of Covid-19 emergency #trumpknew, #trumpvirus, #covid, #trumpisaloser, #trumpisana- tionaldisgrace, #trumpliedpeopledied #2 Town hall meetings; sub-topics: climate crisis, veterans, #cnntownhall, #climatecrisis, #greennewdeal, #respectveterans, discrimination #hererightmatters, #stoptrumpsterror #3 Encouraging peopleto vote #election2020, #voteearly, #vote2020, #votebymail, #voteready, #electionday #4 Accusations against Hunter Biden #hunterbiden, #bidencrimefamily, #burisma, #ukraine, #hunterbiden- emails, #china #5 The US Supreme Court;nomination of Amy Coney Barrett #scotus, #amyconeybarrett, #filltheseat, #supremecourt, #riprbg, #scotushearings #6 Support for Trump #maga, #votetrump2020, #maga2020, #kag, #voteredtosaveamer- ica2020, #trumppence2020 The first topic is focused on the criticisms leveled at pandemic. In addition, the discussion about the US Supreme Trump regarding the management of the health emergency Court showed a slight increase close to the nomination, in the USA caused by Covid-19 pandemic. The second one announced by Donald Trump, of Judge Amy Coney Bar- is related to the online discussion about town hall meetings, rett as Associate Justice of the US Supreme Court to fill covering different topics against Trump like discrimina- the vacancy left by the death of Ruth Bader Ginsburg. In tion, veterans and climate crisis (e.g., he referred to climate the following weeks, the focus of the online conversation change as a “hoax”, and to veterans as “human scum”). The shifted to various topics related to the approach of the Elec- third one is a general topic about the presidential election. tion Day and the importance of voting. We also observed an The fourth topic is related to the accusations of corruption increase in the volume of tweets concerning the accusations and wrongdoing in regards to China and Ukraine leveled leveled against Joe Biden’s son (i.e., Hunter Biden), a topic against Hunter Biden, i.e., the son of the democratic can- discussed mostly by the Democratic candidate’s detractors. didate Joe Biden. The fifth topic focuses on the nomination Finally, other topics regarding the support voters expressed of the conservative Amy Coney Barrett for a seat on the toward Trump and their criticisms leveled against him linked Supreme Court as successor to the liberal Associate Justice to town hall meetings showed an almost constant impact on Ruth Bader Ginsburg. Finally, the last topic is related to the the online conversation. online discussion of Trump’s supporters, characterized by notorious hashtags like #maga or #kag.4.3 Temporal analysis Once the major discussion topics were detected, we analyzed their overall impact on the online conversation, In this step we investigated the temporal dynamics of social along with their evolution in the eight weeks included in media conversation, in order to analyze users’ publishing our observation period, as shown in Fig. 10. In particular, behavior, studying how it is related to the detected polarity we calculated the volume of each hashtag-based topic by and how it reflected the occurrence of external events (e.g., determining the percentage of tweets that contain hashtags debates, rallies, etc.). However, as described by the reposi- belonging to the corresponding cluster. Considering our tory owners in Chen et al. (2021), there may be gaps in the overall observation period, the most relevant topic is about dataset due to several issues. Firstly, the data collection step Covid-19 pandemic and it specifically refers to Trump’s was highly contingent upon the stability of the network and mismanagement of the health emergency. Other topics are hardware. Secondly, Twitter significantly limits the num- related to the presidential election in general or arise from ber of tweets that can be rehydrated. Finally, tweets may no the publishing activity of Trump’s supporters. Also Biden’s longer be available as users have been removed, banned, or supporters signic fi antly contributed to the online discussion, suspended. by leveraging anti-Trump sub-topics that have emerged from Figure 11 shows the timeline of polarized tweets volume several town hall meetings, about discrimination, veterans annotated with the four main political debates occurring dur- and the position of the Republican candidate about the cli- ing the election campaign, i.e., between September 1 and mate crisis. October 31. The first observation period (September 1 to For what concerns the temporal evolution of the detected September 28) exhibits significantly different communica- topics, we found that in the early weeks online conversation tion dynamics prior to the first debate. Interestingly, this focused on the relationship between Trump and Covid-19 1 3 Social Network Analysis and Mining (2022) 12:83 Page 11 of 16 83 Fig. 10 Weekly volume of tweets related to the detected topics from September 1 to October 31, 2020 Fig. 11 Time series of polarized tweets published from Septem- ber 1 to October 31, 2020 image shows an intense activity spikes of Biden’s support- Vice Presidential debate) and October 13 (before the second ers, as a likely consequence of President Trump’s actions: Presidential debate). September 10: president Trump has attacked Democratic 4.4 Comparative analysis with opinion polls Vice Presidential candidate Kamala Harris. September 15: despite being banned by state authorities In this step we assessed the effectiveness of our approach in from holding rallies, President Trump still decided to determining the polarity of social media users with the aim hold one in Nevada. of understanding which candidate or party public opinion • September 18: president Trump blamed blue states for is most in favor of. A first remarkable result was obtained the high number of the US Covid-19 fatalities. through a real-time analysis, carried out on Twitter data • September 28: during a rally in Pennsylvania, Trump collected during the two weeks before the Election Day. called Biden “a dishonest politician and a puppet in the Specifically, IOM-NN was able to correctly determine Joe hands of the radical left”. Biden’s lead over Donald Trump, especially in Georgia, where a Democratic candidate had not won since 1992 with The second and third observation windows (from Septem- the election of Bill Clinton. This promising result, publicly ber 30 to October 31) show typical weekly cycles of social available through our university web portal, represents a media chatter, with no particular explosion or shock-related spike from external events, except for October 6 (before the https:// scalab. dimes. unical. it/ usa- 2020/ (text in Italian). 1 3 83 Page 12 of 16 Social Network Analysis and Mining (2022) 12:83 Table 3 Comparison between State Real percentages Opinion polls IOM-NN voting percentages estimated by IOM-NN and the latest opinion B T B T B T polls Arizona 49.4 49.1 48.0 45.8 50.2 48.3 Florida 47.9 51.2 48.7 46.0 48.0 51.1 Georgia 49.5 49.2 47.6 47.4 52.7 46.0 Michigan 50.6 47.8 49.9 44.4 55.4 43.0 Minnesota 52.4 45.3 51.6 41.8 55.1 42.6 Nevada 50.1 47.7 49.4 44.4 49.8 48.0 New Hampshire 52.7 45.4 53.4 42.4 50.9 47.3 North Carolina 48.6 49.9 47.8 47.5 56.6 41.9 Pennsylvania 50.0 48.8 49.4 45.7 55.7 43.1 Texas 46.5 52.1 47.5 48.8 46.1 52.5 Wisconsin 49.4 48.8 52.0 42.8 56.3 41.9 Correctly classified – 9/11 10/11 Tweets – – 670,451 Users – ≈ 11,000 57,116 Avg. Acc – 0.82 0.91 Fig. 12 Comparison between IOM-NN and the latest opinion polls in identifying the winning candidate step forward with respect to our previous work, as it gives a opinion polls, and IOM-NN estimates. The two candidates clear proof of the real-time effectiveness of IOM-NN, which (i.e., Joe Biden and Donald Trump) are indicated with “B” suggests the possibility of using it to enhance or even replace and “T”, respectively. The winning candidate is written in traditional opinion polls. bold when it is correctly identified. Starting from the encouraging real-time results, we The results of the comparison are summarized in extended that analysis by focusing on the main eleven swing Fig. 12, which shows that the estimates achieved by IOM- states, as described in Sect. 4.1.1. Specifically, we compared NN, related to the voting intentions of social media users the results obtained through IOM-NN with the average val- are more in-line with the actual behaviors of voters with ues of the latest opinion polls before the election. For each respect to the opinion polls, thus giving a clue to the final analyzed state, Table 3 reports the real voting percentages, result in 10 out of 11 swing states (with an average accu- racy of 91%). Using this metric we penalize the inversions of polarity which can be a crucial issue while analyzing these kinds of states characterized by a high degree of https:// www. 270to win. com/ 2020- polls- biden- trump/. 1 3 Social Network Analysis and Mining (2022) 12:83 Page 13 of 16 83 Fig. 13 Distribution of sentiments and emotions of pro- Trump tweets Fig. 14 Distribution of sentiments and emotions of pro- Biden tweets uncertainty. Notice that, for what concerns North Carolina, behavior of voters. Moreover, a noteworthy advantage of neither the estimates achieved by IOM-NN nor the opinion IOM-NN with respect to traditional opinion polls, is the polls were in-line with the actual outcome in this state. ability to capture the opinion of a larger number of people This is a common situation as the results achieved by the more quickly and at a lower cost. This makes IOM-NN a polls and IOM-NN must be understood as an estimate of valid support to enhance or even replace opinion polls, the polarization of public opinion in the weeks preceding by providing relevant insights useful to understand the the Election Day, not always in accordance with the actual dynamics of the election campaign. 1 3 83 Page 14 of 16 Social Network Analysis and Mining (2022) 12:83 Table 4 A sample of pro-Trump tweets showing different emotions Tweet About Sentiment Emotion “First time registered voter excited to vote for@realDonaldTrump #FourMoreYears” Trump Positive Joy “#JoeBiden Democrats support domestic terrorists and exploit race and gender for political gain.I’m afraid for Biden Negative Fear America #PennsylvaniansForTrump” “You are a disgrace to politicize the death of these people, but obviously you don’t care. #JoeBiden #BidenHar- Biden Negative Disgust ris” “#realDonaldTrump If anyone can do it, you can. Best President ever! Godspeed sir. #AmericaFirst Trump Positive Trust #MAGA2020” Table 5 A sample of pro-Biden tweets showing different emotions Tweet About Sentiment Emotion “We all need to #VoteBiden to make this happen #VOTE” Biden Positive Anticipation “#realDonaldTrump You are a racist and a loser. #TrumpIsALoser #RacistTrump” Trump Negative Disgust “Today is a sad day. News reports are talking about 200,000 Americans dead from Covid-19 Trump Negative Sadness so far #TrumpKnew #COVID19 ” “I want empathy and decency in the White House. #BidenHarris2020ToSaveAmerica” Biden Positive Trust 4.5 Emotion analysis In this paper we presented an in-depth analysis of the posts published on Twitter during the 2020 US election The goal of this last step is to model the political orienta- campaign, jointly exploiting several techniques for topic discovery, opinion mining and emotion analysis in a uni- tion of Twitter users from an emotional point of view. To this purpose, we used the SentiStrength tool (as explained fied analysis workflow, with the aim of outlining an accurate representation of this political event from different points of earlier in Sect. 3.4), for discovering the existing relation- ships between user polarity and the sentiment expressed in view. In particular, we extracted the main discussion top- ics following a clustering-based approach, monitoring their referring to the two presidential candidates. Then, for each polarized tweet we explored the emotion the tweet conveys. weekly impact on social media conversation. Moreover, we leveraged IOM-NN to estimate the polarization of Twitter Figures 13 and 14 describe the sentiment and the emotional state of the tweets with the relative intensity of the tweets users regarding the two main candidates Donald Trump and Joe Biden, both in real-time and by focusing on the main US produced by Trump and Biden supporters, respectively. What appears evident is that, on average, the tweets pro- swing states. We also investigated the temporal dynamics of the online discussion, combining it with the polarization duced by Trump’s supporters are significantly more positive than those produced by Biden’s supporters, which devote information coming from IOM-NN, in order to study how users’ publishing behavior reflected external events, over a significant number of negative tweets to their opponent. For what concerns the detected emotions, Trump’s sup- time, in relation to their political orientation. Finally, we exploited sentiment analysis and text mining techniques porters express joy and confidence about Trump, while fear about Biden’s election. Biden’s supporters, instead, show to discover the relationship between the user polarization, determined with the aid of IOM-NN, and the sentiment trust and anticipation in having Biden as future president of the USA, with a more marked presence of negative emotions expressed in referring to the different candidates, thus mod- eling political support of Twitter users from an emotional about Trump, like anger, disgust and sadness. Tables 4 and 5 show various examples of tweets including viewpoint. Experimental evaluation shows that in the early weeks in the analysis, showing how our approach can model social media conversation from an emotional point of view. online conversation focused on the relationship between Trump and Covid-19 pandemic and on the nomination of Judge Amy Coney Barrett as Associate Justice of the US Supreme Court. In the following weeks, instead, the focus of 5 Conclusions and final remarks the online conversation shifted to other topics including the accusations leveled to Hunter Biden and the criticism leveled The widespread use of social media can be exploited to extract useful information concerning people’s behaviors against Trump linked to his position about the climate crisis and veterans. Regarding the political polarization of public and interactions. 1 3 Social Network Analysis and Mining (2022) 12:83 Page 15 of 16 83 need to obtain permission directly from the copyright holder. To view a opinion, IOM-NN was able to achieve meaningful estimates copy of this licence, visit http://cr eativ ecommons. or g/licen ses/ b y/4.0/ . of the voting intentions of social media users, which makes it a valid solution to go beyond traditional opinion polls, by providing relevant insights useful to understand the dynamics of the election campaign. One major drawback References of this approach lies in different possible platform biases, such as usage biases due to the distribution of users of a Alashri S, Kandala SS, Bajaj V, Ravi R, Smith KL, Desouza KC (2016) social media platform in terms of gender, age, culture and An analysis of sentiments on facebook during the 2016 us presi- social status, as well as technical biases related to platform dential election. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). policies about data availability and restrictions imposed in IEEE, pp 795–802 some areas of the world. Finally, as for the analysis of the An J, Quercia D, Crowcroft J (2013) Fragmented social media: a look emotional state of social users, we found out that the tweets into selective exposure to political news. In: Proceedings of the produced by Trump’s supporters are significantly more posi- 22nd international conference on world wide web, pp 51–52 Azarbonyad H, Dehghani M, Beelen K, Arkut A, Marx M, Kamps tive than those produced by Biden’s supporters. In particu- J (2017) Words are malleable: Computing semantic shifts in lar, i) Trump’s supporters express joy and confidence about political and media discourse. In: Proceedings of the 2017 ACM Trump, while fear about Biden’s election; ii) Biden’s sup- on conference on information and knowledge management, pp porters show trust and anticipation in having Biden as future 1509–1518 Bastos MT, Puschmann C, Travitzki R (2013) Tweeting across president of the USA, with a more marked presence of nega- hashtags: overlapping users and the importance of language, top- tive emotions about Trump, like anger, disgust and sadness. ics, and politics. In: Proceedings of the 24th ACM conference on As future work, we will apply the presented analysis hypertext and social media, pp 164–168 workflow to other scenarios, such as product adoption anal- Belcastro L, Cantini R, Marozzo F, Talia D, Trunfio P (2020) Learning political polarization on social media using neural networks. IEEE ysis and reputation evaluation of companies. In fact, it can Access 8:47177–47187 be easily generalized to different use cases, as it is not tied Bilal M, Gani A, Marjani M, Malik N (2019) Predicting elections: to any specific application domain, and only relies on the Social media data and techniques. In: 2019 International confer- representativeness of the analyzed posts. Moreover, we can ence on engineering and emerging technologies (ICEET), pp 1–6. IEEE integrate other techniques in our workflow, introducing new Cambre J, Klemmer SR, Kulkarni C (2017) Escaping the echo cham- steps aimed at improving the quality of the achieved results. ber: ideologically and geographically diverse discussions about As an example, a hashtag recommendation model can be politics. In: Proceedings of the 2017 CHI conference extended used for enriching the information content of the analyzed abstracts on human factors in computing systems, pp 2423–2428 Cantini R, Marozzo F, Bruno G, Trunfio P (2021) Learning sentence- data, since keyword-based approaches like IOM-NN are to-hashtags semantic mapping for hashtag recommendation on strongly dependent on the availability of consistent hashtags microblogs. ACM Trans Knowl Discov Data (TKDD) 16(2):1–26 in social media posts (Cantini et al. 2021). Cantini R, Marozzo F, Talia D, Trunfio P (2022) Analyzing political polarization on social media by deleting bot spamming. Big Data Acknowledgements This project has received funding from the Euro- Cognit Comput 6(1):1. https:// doi. org/ 10. 3390/ bdcc6 010003 pean High-Performance Computing Joint Undertaking (JU) under grant Cesario E, Iannazzo AR, Marozzo F, Morello F, Riotta G, Spada A, agreement No 955558. The JU receives support from the European Talia D, Trunfio P (2016) Analyzing social media data to discover Union’s Horizon 2020 research and innovation program and Spain, mobility patterns at expo 2015: methodology and results. In: Inter- Germany, France, Italy, Poland, Switzerland, Norway. national conference on high performance computing & simulation (HPCS). IEEE, pp 230–237 Chen E, Deb A, Ferrara E (2021) # election2020: the first public twit- Funding Open access funding provided by Università della Calabria ter dataset on the 2020 us presidential election. J Comput Soc within the CRUI-CARE Agreement. Sci 1–18 Chiu SI, Hsu KW (2018) Predicting political tendency of posts on Data availability The data that support the findings of this study are facebook. In: Proceedings of the 2018 7th international conference publicly available. In particular, this data was gathered using Twitter on software and computer applications, pp 110–114 APIs (https:// devel oper. twitt er. com.) and is hosted on Github (https:// Ciampaglia GL, Shiralkar P, Rocha LM, Bollen J, Menczer F, Flam- github. com/ SCAla bUnic al/ USA20 20). mini A (2015) Computational fact checking from knowledge net- works. PloS One 10(6):e0128193 Open Access This article is licensed under a Creative Commons Attri- Dade-Robertson M, Taylor N, Marshall J, Olivier P (2012) The politi- bution 4.0 International License, which permits use, sharing, adapta- cal sensorium. In: Proceedings of the 4th media architecture Bien- tion, distribution and reproduction in any medium or format, as long nale conference: participation, pp 47–50 as you give appropriate credit to the original author(s) and the source, Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of provide a link to the Creative Commons licence, and indicate if changes deep bidirectional transformers for language understanding. arXiv were made. The images or other third party material in this article are preprint arXiv: 1810. 04805 included in the article's Creative Commons licence, unless indicated Fraisier O, Cabanac G, Pitarch Y, Besançon R, Boughanem M (2017) otherwise in a credit line to the material. If material is not included in Uncovering like-minded political communities on twitter. In: the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will 1 3 83 Page 16 of 16 Social Network Analysis and Mining (2022) 12:83 Proceedings of the ACM SIGIR international conference on Nakov P, Rosenthal S, Kiritchenko S, Mohammad SM, Kozareva theory of information retrieval, pp 261–264 Z, Ritter A, Stoyanov V, Zhu X (2016) Developing a success- Garimella K, De Francisci Morales G, Gionis A, Mathioudakis M ful semeval task in sentiment analysis of twitter and other social (2018) Political discourse on social media: Echo chambers, gate- media texts. Lang Resour Eval 50(1):35–65 keepers, and the price of bipartisanship. In: Proceedings of the Oikonomou L, Tjortjis C (2018) A method for predicting the winner 2018 World Wide Web Conference, pp 913–922 of the usa presidential elections using data extracted from twitter. Greene D, Cross JP (2015) Unveiling the political agenda of the euro- In: 2018 South-Eastern European Design Automation, Computer pean parliament plenary: a topical analysis. In: Proceedings of the Engineering, Computer Networks and Society Media Conference ACM web science conference, pp 1–10 (SEEDA_CECNSM). IEEE, pp 1–8 Grevet C, Terveen LG, Gilbert E (2014) Managing political differences Pang B, Lee L (2008) Opinion mining and sentiment analysis. Founda- in social media. In: Proceedings of the 17th ACM conference on tions Trends (r) Inf Retriev 2(1-2):1–135 Computer supported cooperative work & social computing, pp Plutchik R (2001) The nature of emotions: human emotions have deep 1400–1408 evolutionary roots, a fact that may explain their complexity and Gyongyi Z, Garcia-Molina H, Pedersen J (2004) Combating web spam provide tools for clinical practice. Am Sci 89(4):344–350 with trustrank. In: Proceedings of the 30th international confer- Saleiro P, Gomes L, Soares C (2016) Sentiment aggregate functions for ence on very large data bases (VLDB) political opinion polling using microblog streams. In: Proceedings Haq EU, Braud T, Kwon YD, Hui P (2020) A survey on computational of the Ninth International C* Conference on Computer Science & politics. IEEE Access 8:197379–197406 Software Engineering, pp 44–50 Hoffmann CP, Lutz C (2017) Spiral of silence 2.0: Political self-cen- Shu K, Bernard HR, Liu H (2019) Studying fake news via network sorship among young facebook users. In: Proceedings of the 8th analysis: detection and mitigation. In: Emerging Research Chal- international conference on social media & society, pp 1–12 lenges and Opportunities in Computational Social Network Anal- Hong S, Nadler D (2015) Social media and political voices of organ- ysis and Mining. Springer, pp 43–65 ized interest groups: a descriptive analysis. In: Proceedings of Singh A, kumar A, Dua N, Mishra VK, Singh D, Agrawal A (2021) the 16th annual international conference on digital government Predicting elections results using social media activity a case research, pp 210–216 study: Usa presidential election 2020. In: 2021 7th International Keneshloo Y, Cadena J, Korkmaz G, Ramakrishnan N (2014) Detecting Conference on Advanced Computing and Communication Sys- and forecasting domestic political crises: A graph-based approach. tems (ICACCS), vol  1, pp 314–319 . h t t p s : / / d o i . o r g / 1 0 . 1 1 0 9 / In: Proceedings of the 2014 ACM conference on Web science, ICACC S51430. 2021. 94418 35 pp 192–196 Takikawa H, Nagayoshi K (2017) Political polarization in social Kim J, Tabibian B, Oh A, Schölkopf B, Gomez-Rodriguez M (2018) media: analysis of the “twitter political field” in Japan. In: 2017 Leveraging the crowd to detect and reduce the spread of fake news IEEE international conference on big data (big data). IEEE, pp and misinformation. In: Proceedings of the eleventh ACM inter- 3143–3150 national conference on web search and data mining, pp 324–332 Thelwall M (2017) The heart and soul of the web? sentiment strength Kiritchenko S, Zhu X, Mohammad SM (2014) Sentiment analysis of detection in the social web with sentistrength. In: Cyberemotions. short informal texts. J Artif Intell Res 50:723–762 Springer, pp 119–134 Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D Trabelsi A, Zaïane OR (2019) Phaitv: A phrase author interaction topic (2014) The stanford corenlp natural language processing toolkit. viewpoint model for the summarization of reasons expressed by In: Proceedings of 52nd annual meeting of the association for polarized stances. In: Proceedings of the International AAAI Con- computational linguistics: system demonstrations, pp 55–60 ference on Web and Social Media, vol 13, pp 482–492 Marozzo F, Bessi A (2018) Analyzing polarization of social media Wong FMF, Tan CW, Sen S, Chiang M (2016) Quantifying political users and news sites during political campaigns. Soc Netw Anal leaning from tweets, retweets, and retweeters. IEEE Trans Knowl Mining 8(1):1–13 Data Eng 28(8):2158–2172 Mohammad S (2012) Portable features for classifying emotional text. Wulf V, Aal K, Abu Kteish I, Atam M, Schubert K, Rohde M, Yerousis In: Proceedings of the 2012 Conference of the North American GP, Randall D (2013) Fighting against the wall: Social media use Chapter of the Association for Computational Linguistics: Human by political activists in a palestinian village. In: Proceedings of Language Technologies, pp 587–591 the SIGCHI conference on human factors in computing systems, Mohammad SM, Turney PD (2013) Crowdsourcing a word-emotion pp 1979–1988 association lexicon. Comput Intell 29(3):436–465 Monti C, Rozza A, Zappella G, Zignani M, Arvidsson A, Colleoni Publisher's Note Springer Nature remains neutral with regard to E (2013) Modelling political disaffection from twitter data. In: jurisdictional claims in published maps and institutional affiliations. Proceedings of the second international workshop on issues of sentiment discovery and opinion mining, pp 1–9 1 3 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Social Network Analysis and Mining Springer Journals

Analyzing voter behavior on social media during the 2020 US presidential election campaign

Loading next page...
 
/lp/springer-journals/analyzing-voter-behavior-on-social-media-during-the-2020-us-A04nEIxETn
Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2022
ISSN
1869-5450
eISSN
1869-5469
DOI
10.1007/s13278-022-00913-9
Publisher site
See Article on Publisher Site

Abstract

Every day millions of people use social media platforms by generating a very large amount of opinion-rich data, which can be exploited to extract valuable information about human dynamics and behaviors. In this context, the present manuscript provides a precise view of the 2020 US presidential election by jointly applying topic discovery, opinion mining, and emo- tion analysis techniques on social media data. In particular, we exploited a clustering-based technique for extracting the main discussion topics and monitoring their weekly impact on social media conversation. Afterward, we leveraged a neural-based opinion mining technique for determining the political orientation of social media users by analyzing the posts they published. In this way, we were able to determine in the weeks preceding the Election Day which candidate or party public opinion is most in favor of. We also investigated the temporal dynamics of the online discussions, by studying how users’ publishing behavior is related to their political alignment. Finally, we combined sentiment analysis and text mining techniques to dis- cover the relationship between the user polarity and sentiment expressed referring to the different candidates, thus modeling political support of social media users from an emotional viewpoint. Keywords Social media analysis · Opinion mining · User polarization · Sentiment analysis · Political events 1 Introduction opinion, and the patterns of information production (Pang and Lee 2008; Cesario et al. 2016; Marozzo and Bessi 2018; In recent years, the growing use of social media is generat- Cantini et al. 2022). All the knowledge extracted through ing an amount of information-rich data never seen before. such techniques allows to outline a precise profile of social This data, commonly referred as Big Social Data, can be users, by describing them from a behavioral and psychologi- effectively leveraged by a wide range of techniques aimed cal viewpoint, and by modeling their perception of events at modeling the interactions of users on social media, their and public decisions. collective sentiment and behavior, the dynamics of public This manuscript presents an in-depth analysis of the posts published on Twitter during the 2020 US election campaign, aiming at outlining an accurate view of the political event from different points of view. Specifically, several techniques * Fabrizio Marozzo fmarozzo@dimes.unical.it of topic discovery, opinion mining, and emotion analysis were combined in a unified data analysis workflow for inves- Loris Belcastro lbelcastro@dimes.unical.it tigating: (i) trending topics and their evolution over time, (ii) users’ political alignment and publishing behavior, and (iii) Francesco Branda fbranda@dimes.unical.it users’ sentiment and emotional aspects. Firstly, we extracted the main discussion topics charac- Riccardo Cantini rcantini@dimes.unical.it terizing the 2020 US election campaign by leveraging the unsupervised approach proposed in Cantini et al. (2021), Domenico Talia talia@dimes.unical.it which relies on the density-based clustering of the latent representation of trending hashtags. Afterward, in order to Paolo Trunfio trunfio@dimes.unical.it achieve a more accurate representation of social media con- versation, we studied the weekly evolution of the detected DIMES Department, University of Calabria, Rende, Italy Vol.:(0123456789) 1 3 83 Page 2 of 16 Social Network Analysis and Mining (2022) 12:83 topics, which is useful to understand how online discussion of pro-Trump users, their favorite candidate with positive evolves over time. content that shows emotions like joy and confidence. Alter - Secondly, we modeled the political alignment of social natively, they may be more likely to discredit the opposing media users, in order to understand which candidate or party candidate, as in the case of pro-Biden users, by produc- public opinion is most in favor of in the weeks preceding ing negative online content characterized by emotions like the Election Day. For this purpose, we exploited IOM-NN anger, disgust and sadness. (Iterative Opinion Mining using Neural Networks), a neural- The rest of the paper is organized as follows. Section 2 based opinion mining methodology we previously proposed reports the most relevant approaches in computational poli- in Belcastro et al. (2020). Specifically, a real-time analy - tics and sentiment analysis present in the literature. Section 3 sis was carried out during the 2020 US presidential elec- describes the different techniques combined in the proposed tion campaign using data gathered from Twitter, correctly analysis workflow. Section  4 describes the experimental determining Joe Biden’s lead over Donald Trump before evaluation. Finally, Sect. 5 concludes the paper. the Election Day. The achieved results, publicly available through our university web portal, represent a remarkable step forward with respect to previous works present in the 2 Related work literature. In fact, to the best of our knowledge, experimental evaluations are carried out after the end of the considered Computational politics is a research area that involves a set event, while in our case we have given a proof of the real- of techniques aimed at analyzing users’ behavior during a time effectiveness of IOM-NN, which leads to the possibil- political event of interest, both modeling and influencing ity of using it for enhancing or even replacing traditional their perception and opinion about facts, events and pub- opinion polls. Furthermore, for the sake of completeness, lic decisions. With the rapid growth of social media usage, we extended the results of the real-time analysis by focus- microblogging platforms have become a rich source of ing on the main swing states, i.e., those states characterized valuable information, which can be effectively exploited by a high uncertainty about the winning candidate and for for investigating the patterns of information diffusion, the this reason by a marked strategic importance. We assessed interactions between users and their opinion about a spe- the statistical significance of the collected data by study - cific faction or candidate (Belcastro et al. 2020). According ing the age, gender and geographical distribution of Twitter to a recent survey (Haq et al. 2020), existing literature on users for understanding whether they can be considered vot- computational politics can be categorized into five classes, ers of the political event. The obtained results confirm the as discussed in the following. great effectiveness of our approach, which outperformed the Community and user modeling This class of works average of the latest opinion polls by correctly identifying focuses on modeling the behavior of social media users from the leading candidate before the Election Day in 10 out of both an individual and collective viewpoint. Many works 11 swing states. Furthermore, the polarization information in this category are related to the analysis of homophily, achieved by IOM-NN was also leveraged to investigate the i.e., the connection of groups of users driven by common temporal dynamics of social media conversation, with the interests, which leads to the formation of community struc- aim of studying how users’ publishing behavior is related to tures of like-minded people (Grevet et al. 2014; Bastos et al. their political alignment, and how it reflected the occurrence 2013; Fraisier et al. 2017). Other works focus on modeling of external events like debates or rallies. political affiliation of social users, exploiting community Thirdly, we analyzed the relationship between the emo- information for predicting the results of a political event tional sphere of Twitter users and their political alignment. (Belcastro et al. 2020; Chiu and Hsu 2018; Takikawa and In particular, we jointly exploited sentiment analysis and Nagayoshi 2017). text mining techniques for extracting the sentiment of social Information flow These works investigate how informa- media users. Then, we combined this information with the tion flows within the network. Most of them analyze the polarization achieved by IOM-NN for investigating how a misinformation spread, trying to detect fake news thus limit- user refers to the candidates while supporting his/her pre- ing its distortion effects on public opinion (Kim et al. 2018; ferred faction, with respect to a broad spectrum of emo- Ciampaglia et al. 2015; Gyongyi et al. 2004). Other works tions. This step is useful for understanding how the sup- in this category are also aimed at identifying echo cham- porters of a particular candidate express their preference on bers, i.e., situations in which the repetition and sharing of social media. Specifically, they can praise, as in the case information causes the strengthening of an opinion inside a community (Garimella et al. 2018; An et al. 2013; Shu et al. 2019). Political discourse Works in this category model online https:// scalab. dimes. unical. it/ usa- 2020/ (text in Italian). discussion from different points of view, taking into account 1 3 Social Network Analysis and Mining (2022) 12:83 Page 3 of 16 83 demographic aspects, community structure and information tool for natural language processing, to examine the dynam- diffusion patterns. Many works are aimed at extracting the ics between candidate posts and comments they received on main topics of discussion through topic modeling (Greene Facebook and calculate a score for each political candidate and Cross 2015; Trabelsi and Zaïane 2019), or identifying for measuring his/her credibility on a given issue. Singh political crisis (Keneshloo et al. 2014). Opinion mining tech- et al. (2021) carried out a comparison among four machine niques can be also exploited for identifying the opinion or and deep learning algorithms (i.e., TextBlob, Naive Bayes, mood of social media users about those topics, as users’ SVM, and BERT (Devlin et al. 2018)) for sentiment analysis. interactions on social media can affect their political engage- Authors used the 2020 US presidential election as a case ment (Hoffmann and Lutz 2017; Azarbonyad et al. 2017; study, finding that the use of BERT leads to the best results. Monti et al. 2013). All of the aforementioned techniques are characterized Election campaigns Research contributions in this class by several issues related to the use of social media data for are aimed at measuring the engagement of the online audi- predicting the outcome of political events, which are lan- ence, enabling large-scale opinion polls and the management guage barrier, misclassification, data imbalance and reli- of the political campaign. In fact, social media provide an ability (Bilal et al. 2019). Consequently, in order to achieve effective platform for engaging users in political discussion, a precise estimate of the political polarization of the US which is often used by politicians during the political cam- citizens, we leveraged the IOM-NN technique, specially paigns (Wulf et al. 2013; Hong and Nadler 2015). Moreover, designed to overcome these issues (Belcastro et al. 2020): the analysis of political engagement of social users can accu- (i) it is language-independent, as it uses a hashtag-based rately forecast the final results of the political event under bag of words representation; (ii) it avoids misclassifications analysis (Belcastro et al. 2020; Saleiro et al. 2016). using a high threshold on the polarization probability; (iii) it System design Works in this category propose a full sys- uses randomized class balancing algorithms in order to avoid tem design of computation politics systems. As an example, the learning process being biased toward majority classes; Cambre et al. (2017) propose a system design that can help (iv) it requires a preliminary study of users’ representative- to break the echo chamber effect, moderating the online ness, in order to understand whether they can be considered political discussion, while Dade-Robertson et  al. (2012) voters in the political event under analysis (See Sect. 4.1.1). discuss the relationship between political processes, urban Moreover, with respect to state-of-art techniques, IOM-NN environments and situated technologies. allows the classification of a much greater number of tweets In this work we use opinion mining and sentiment anal- and users, due to its incremental and iterative nature, which ysis techniques in order to investigate the polarization of leads to a better quality and robustness of the results. the US social media users toward the different candidates involved in the 2020 US presidential election. Starting from this, we identify the emotional state (mood) of social users 3 Analysis workflow and its relation with their political orientation. Finally, we exploit the results of polarization analysis in order to fore- In this work we present an in-depth analysis of the posts cast the final results. published on Twitter during the 2020 US election campaign, There are several works in the literature that rely on text with the aim of outlining an accurate representation of this mining and natural language processing algorithms for political event from different perspectives, including users’ investigating the opinion of social users and their collective publishing behavior, discussion topics, political alignment sentiment toward political candidates or parties. Oikono- and its relationships with the emotional sphere. mou and Tjortjis (2018) used Textblob, a Python library For this purpose, several techniques were combined in a for natural language processing, to predict the outcome of unified analysis workflow, represented in Fig.  1, composed the US presidential election in three states of interest (i.e., of the following steps: Florida, Ohio and North Carolina). Wong et al. exploited (Wong et al. 2016) SentiStrength, a lexicon-based sentiment Collection of posts: data are gathered from social media analysis tool, for modeling the political behaviors of users by by using a set of keywords related to the considered polit- analyzing tweets and retweets. Alashri et al. (2016) analyzed ical event. Facebook posts about the 2016 US presidential election with Classification of posts: the collected posts are classified CoreNLP  (Manning et al. 2014), one of the most popular in favor of a faction according to the detected political support. Polarization of users: the classified posts are analyzed for determining the polarization of users toward a faction. https:// textb lob. readt hedocs. io/ en/ dev/. Topic discovery: the collected posts are analyzed in order http:// senti stren gth. wlv. ac. uk/. to identify the politically related discussion topics under- https:// stanf ordnlp. github. io/ CoreN LP/. 1 3 83 Page 4 of 16 Social Network Analysis and Mining (2022) 12:83 Fig. 1 A graphic representation of our analysis workflow lying the conversation on social media, modeling their In the following sections we provide a detailed descrip- evolution over time. tion of the proposed analysis workflow. Moreover, to facili- • Temporal analysis: the temporal dynamics of social tate understanding of the different steps, we will show prac- media conversation are analyzed and combined with the tical examples by examining a small subset of the collected polarity information of classified posts in order to study data. users’ publishing behavior in relation to their political alignment.3.1 Collection of posts Emotion analysis: the polarized posts are exploited for investigating the relationship between the political orien- The goal of this step is to collect a set P of social media tation of users and the different emotions they expressed posts from different sources (e.g., Twitter), related to the in referring to the different candidates. political event E under analysis. As a first step, the different factions, parties or candidates involved in the political event Among the aforementioned steps, the first three jointly con - are identified, defined as the set F ={f , f , … , f } . In par- 1 2 n stitute the IOM-NN methodology (Belcastro et al. 2020), ticular, in the case of the 2020 US presidential election, we while the fourth follows the approach to topic detection in focused on the two main candidates Joe Biden and Donald social data proposed in Cantini et al. (2021). IOM-NN is an Trump. Afterward, geotagged posts are gathered by using a opinion mining technique aimed at discovering the political set of keywords K that is partitioned as follows: polarization of social media users during election campaigns characterized by the competition of political factions. The • neutral keywords ( K ) that contains generic key- context methodology relies on an iterative and incremental proce- words that can be associated to E without referring to dure based on feed-forward neural networks, aimed at dis- any specific faction in F (e.g., #vote , #election2020); ⊕ ⊕ ⊕ covering the political polarization of social media users by faction keywords ( K = K ,… ,K ) that contains the F f f 1 n analyzing the posts they publish. An open-source implemen- keywords used for supporting each faction (e.g., tation of IOM-NN is available on Github. #votebiden , #maga). The keywords selection process requires a small amount of domain knowledge, as these keywords can be manually https:// github. com/ SCAla bUnic al/ IOM- NN. 1 3 Social Network Analysis and Mining (2022) 12:83 Page 5 of 16 83 acquired at the previous iterations. It is worth noticing that, due to the incremental nature of the annotation process, IOM-NN is not tied to a specific set of initial faction key - words and does not require an in-depth knowledge of the political event under consideration. In fact, even starting from a small but representative set of faction keywords, IOM-NN is able to infer new classification rules iteratively, which implies a good robustness and generalizability of the Fig. 2 Example of how the collection of posts step works methodology. Figure 3 shows a classification example of a small set of tweets about the 2020 US presidential election, which exploits the following faction keywords. selected among the trending hashtags that people com- • K monly use to refer to E on social media. Moreover, the key- = {#voteblue , #backtheblue , #votebiden , ...}; Biden word selection process can be automatized by searching for • K = {#votered , #trump2020 , #maga , ...}. Trump specific patterns, like “#vote + candidate”, often used for labeling politically polarized posts. We assessed the sta- At iteration 0, IOM-NN uses the keywords in K for clas- tistical significance of the collected posts by studying the sifying five tweets. In the subsequent iterations, the neural age, gender and geographical distribution of Twitter users model iteratively exploits the tweets classified in the pre- for understanding whether they can be considered voters of vious steps for generating new classification rules based the political event. For this purpose we used a wide range on co-hashtag relationships. As an example, at iteration 1 of information which can be directly extracted from users the model is trained with the tweet classified at iteration 0, metadata (e.g., location and language), or examined starting discovering new political-oriented topics of discussion and from statistical reports about the usage of the social media generating the following classification rules: platform in a given country (e.g., user distribution by age and gender). Furthermore, in order to improve the represent- tweets with keywords #bountygate are classified in favor ativeness of the collected posts, user accounts are analyzed, of Biden since Donald Trump was accused of paying filtering those that show anomalous publishing activity, such Moscow’s secret agents for the killing of the US service- as social bots or news sites, or those that have inconsistent men in Afghanistan; information in their profile, such as for example a location tweets with keywords #crookedbiden are classified in that is not defined or does not belong to any of the states favor of Trump since Hunter Biden (i.e., second son of considered (see Sect. 4.1.1). US President Joe Biden), was accused by Donald Trump Collected posts undergo the following preprocessing of wrongdoing in regard to China and Ukraine. operations: i) the text of each post is converted to lowercase and accented characters are normalized; ii) words are lem- This learning process iterates until the algorithm is no longer matized and stemmed (e.g., vote or votes or voted →� vot); able to generate new classification rules and therefore to iii) stopwords are removed; and iv) bigrams are identified identify the polarization of new tweets. (e.g., San Francisco →� San Francisco). Figure 2 shows an After having classified the posts according to the polar - example of how posts are collected using keywords about ity discovered by IOM-NN, this information is used for the 2020 US presidential election. Some of these keywords investigating how the user publishing behavior is related to are generic (e.g., #vpdebate2020 ), and others are used to their political alignment. Specifically, temporal dynamics of support a specific candidate (e.g., #voteblue for Biden and social media conversation are analyzed, studying how infor- #trump2020 for Trump). mation is produced by the supporters of both candidates, and how this reflects the occurrence of external events such as 3.2 Classification of posts and temporal analysis debates and rallies. During this step, the posts P collected in the previous step 3.3 Polarization of users are classified in favor of a faction by using IOM-NN. Spe- cifically, a preliminary iteration is performed for classifying This step is aimed at analyzing the set of previously clas- input posts according to the keywords in K . Posts contain- sified posts in order to determine the polarization of users ing keywords related to exactly one faction are polarized toward a faction. Specifically, the list of classified posts for toward that faction, while remaining posts are labeled as each user u is computed, filtering out those users that pub- neutral. Then, neutral posts undergo an iterative classifica- lished a number of posts below a given threshold. Afterward, tion process, during which the model exploits the knowledge a score vector v for each user u is computed, which contains 1 3 83 Page 6 of 16 Social Network Analysis and Mining (2022) 12:83 Fig. 3 Example of how the clas- sification of posts step works Fig. 4 Example of how the polarization of users step works his/her score for each faction. Finally, IOM-NN calculates information about the political alignment of social media the overall faction score as the normalized sum of the score users with their sentiment and emotional expressions. vectors. Figure 4 shows how the polarization of users step Firstly, in order to extract the sentiment from online pub- works on the classified posts reported in Fig.  3. For each lished content, we exploited SentiStrength (Thelwall 2017) user, the posts in favor of Biden and Trump are counted, for the annotation of social media posts. In particular, for discarding those users who have published less than two each polarized post we computed a positive Sc (p) and tweets. Then, the polarization vector for each user is com- negative Sc (p) sentiment score, both ranging between 1 puted containing the percentage of posts published in favor (neutral) and 5 (strongly positive/negative). Then, the over- of his/her preferred faction. Lastly, the final score vector is all sentiment score Sc(p) of a polarized post is obtained as + − determined, that contains the overall polarization percent- follows: Sc(p)= Sc (p)− Sc (p) . Secondly, we modeled ages for the two candidates. the political orientation of social media users from an emo- tional point of view by exploiting NRC-EmoLex (Moham- 3.4 Emotion analysis mad and Turney 2013), a publicly available emotion lexi- con which has proven its performance in several sentiment This step analyzes the polarized posts for identifying the and emotion classification tasks, as described in  Kiritch- users’ sentiment underlying the online discussion about enko et al. (2014),  Mohammad (2012), and  Nakov et al. the presidential candidates. Specifically, we combined the (2016). Specifically, NRC contains more than 14 thousand 1 3 Social Network Analysis and Mining (2022) 12:83 Page 7 of 16 83 Fig. 5 Example of pro-Biden tweets Fig. 6 Example of pro-Trump tweets English terms labeled by the expressed polarity (i.e., posi- used in Cantini et al. (2021). As a first step, a Word2Vec tive or negative) and eight basic emotion categories of Plut- model is trained on the entire corpus of tweets, in order chik (2001) (i.e., joy, trust, anticipation, sadness, surprise, to get the latent representation of hashtags and words in a disgust, fear or anger). Finally, we combined the obtained 150-dimensional vector space. We selected the dimension information with the political alignment discovered by using of the embedding space by conducting several experiments, IOM-NN, in order to extract the overall sentiment and emo- finding out the smallest size for which a clear clustering tions expressed by social media users, while talking about structure emerged, i.e., the best trade-off between complex - the two candidates. ity and representativeness. Subsequently, all hashtags are As an example, Figs.  5 and 6 show how the sentiment embedded in that 150-dimensional space, whose dimension- analysis step works on the polarized tweets obtained at the ality is then reduced by using the t-distributed stochastic previous step (i.e., a small subset of the collected tweets) for neighbor embedding (t-SNE) technique, initialized through understanding the emotional state of the users who support principal component analysis (PCA), to obtain a 2D pro- the different candidates. As we can see, polarized tweets jection of that space. Moreover, in order to reduce noise, are quite positive for both candidates, but show different all hashtags with a frequency lower than a given threshold emotional profiles. In particular, Biden’s supporters show are filtered out. Finally, the OPTICS algorithm is used for trust in the new presidential candidate, while Trump’s ones extracting a clustering structure based on the topic-based express their joy at having Trump as president. separation of hashtags, induced by the projection of their semantic distribution. We have chosen this clustering algo- 3.5 Topic discovery rithm due to its ability to discover clusters with arbitrary shape. In addition, compared to classical density-based This step is aimed at identifying the main politically- algorithms such as DBSCAN, it is able to extract clustering related discussion topics characterizing the 2020 US elec- structures at different density levels, which in our work is tion campaign, by following the unsupervised approach useful for dealing with micro-topics. 1 3 83 Page 8 of 16 Social Network Analysis and Mining (2022) 12:83 4 The US 2020 Presidential election analysis and experimental results In this section we provide an accurate description of the results coming from the analysis of the 2020 US presiden- tial campaign, characterized by a strong rivalry between Joe Biden and Donald Trump. In particular, we analyzed election-related tweets with the aim of outlining a precise representation of this political event from different points of view, in terms of users’ publishing behavior, sentiment, political alignment and discussion topics. For this purpose we combined several techniques in an analysis workflow, whose steps are accurately described in Sect. 3 and whose results are reported in the following sections. Fig. 7 Complementary Cumulative Density Function (CCDF) of pub- 4.1 Data description lished tweets per user The data used to perform the experimental evaluation comes huge amount of tweets, and many users posting infrequently from a public repository that contains a real-time collection or not at all, the so-called social lurkers. of tweets related to the 2020 US presidential election from December 2019 to June 2021 (Chen et al. 2021). From such 4.1.1 Statistical significance of collected data repository we considered only the tweets published close to the election event (from September 1 to October 31, 2020), Here we investigate the statistical significance of the col- i.e., about 160 million of which 18 million are tweets (11%), lected data in order to assess users representativeness, i.e., 110 million are retweets (69%), and 32 million are replies whether they can be considered voters of the political event (20%), posted by about 29 million users. Only 22% of fil- under analysis. tered data contain hashtags (e.g., #trump2020 , #bidenhar- Firstly, from tweets metadata we extracted aggregate ris2020), useful to understand the arguments used in favor information on the used language of social media users, dis- of the different candidates. In particular, the percentage of covering that most of tweets have the lang field set to Eng- tweets published with at least one hashtag related to Trump lish (about 90%), whereas the remaining 10% is Undefined (i.e., #trump , #trump2020 , and #maga ) and Biden (i.e., or set to other languages like Spanish. Secondly, we com- #bidenharris2020 , #biden ) is about 31% and 11%, respec- pared the number of Twitter users in our dataset, grouped by tively. However, 7% of tweets contain at least one nega- state, with the number of adult citizens actually living in that tive hashtag about Trump (i.e., #trumpknew , #pedotrump , state, belonging to the voting-eligible population (VEP). #trumphascovid , #trumptaxreturns , #bountygate), whereas Specifically, users were associated with states via Twitter only 1% of tweets contain a negative hashtag for Biden (i.e., metadata, by analyzing the location field present in each #crookedjoebiden ). In order to ensure the representativeness tweet, which indicates the location defined by the user in of the collected posts, we analyzed users’ account infor- his/her Twitter account (e.g., Austin, TX). It is worth noting mation, filtering out content posted by users that show an that, from the textual analysis of this field, it is not always anomalous publishing activity or inconsistent profile infor - easy to extract a meaningful city/state, as many users either mation. This step allows to avoid the negative effects caused left the field blank, or did not provide precise information by the presence of content published by new sites and social (e.g., “USA”), or specified fictitious or nonexistent locations bots, which can introduce a heavy bias in social media data (e.g., “the moon” or “NY, Italy”). We measured the strength (Cantini et al. 2022). We further analyzed the publishing of this correlation, finding a Pearson coefficient r = 0.97 , behavior of the users in the filtered dataset by determining significant at p < 0.01 . The linear relationship that links the Complementary Cumulative Density Function (CCDF) users and the voting-eligible population can be easily seen of shared tweets per user. Specifically, given the random in Fig. 8, which depicts an interpolation of the related scat- variable X representing the number of shared tweets, it is ter plot, with a goodness-of-fit R = 0.93 . Notice that outlier determined by the frequency of users publishing a number states were not considered in this step in order to achieve of posts greater than x, i.e., the probability P(X > x) . The scatter plot in log-scale shown in Fig. 7, reveals a highly skewed distribution, with few active Twitter users posting a http:// www. elect proje ct. org/ 2020g. 1 3 Social Network Analysis and Mining (2022) 12:83 Page 9 of 16 83 Fig. 8 Linear interpolation: analyzed users versus voting-eligible population grouped by the US states Table 1 Number of Twitter users versus voting-eligible population (VEP) grouped by swing states State #Users #VEP Arizona 5,692 5,189,000 Fig. 9 Unsupervised detection of the main topics underlying the Florida 16,921 15,551,739 online discussion Georgia 5,841 7,383,562 Michigan 8,411 7,550,147 electoral event. We made this data, used in all the subsequent Minnesota 4,596 4,118,462 9 analysis steps, publicly available on Github. Nevada 1,156 2,153,915 Table 1 reports a comparison between the users we were New Hampshire 1,610 1,079,434 able to capture for each swing state and the VEP. The high North Carolina 7,245 7,759,051 correlation between the number of analyzed users per state Pennsylvania 7,040 9,781,976 and the VEP leads to a significant set of social media data Texas 19,119 18,784,280 effectively exploitable to determine the polarization of pub- Wisconsin 3,898 4,368,530 lic opinion. However, despite the representativeness of the considered posts, the results achieved by the analysis of the online conversation can be influenced by platform biases. Specifically there exist usage biases due to the distribution meaningful results, by excluding data of different magnitude. In addition, we explored age and gender distribution of ana- of users of a social media platform in terms of gender, age, culture and social status, as well as technical biases related lyzed users, n fi ding out that about 94% of them are adults (at 7 8 least 18 years old) and almost equally divided by gender . to platform policies about data availability and restrictions imposed in some areas of the world. Among all the available tweets we have selected those published by users located in the 11 main swing states (i.e., 4.2 Trending topics of the election campaign Arizona, Florida, Georgia, Michigan, Minnesota, Nevada, New Hampshire, North Carolina, Pennsylvania, Texas, Wis- In this step we identified the main politically related discus - consin). We analyzed only these states as they are character- ized by a marked political uncertainty and their outcomes sion topics characterizing the 2020 US election campaign. Achieved results are shown in Fig. 9, where six clusters are have a high probability of being a decisive factor of the clearly visible, each one related to a different topic of discus- sion. Moreover, Table 2 summarizes the discovered topics by reporting the corresponding top hashtags. https:// www . s t ati s t a. com/ s t ati s tics/ 265647/ shar e- of- us- inter ne t- users- who- use- twitt er- by- age- group/. https:// www . s t ati s t a. com/ s t ati s tics/ 265643/ shar e- of- us- inter ne t- users- who- use- twitt er- by- gender/. https:// github. com/ SCAla bUnic al/ USA20 20. 1 3 83 Page 10 of 16 Social Network Analysis and Mining (2022) 12:83 Table 2 Brief description of the identified topics Cluster ID Topic Top hashtags #1 Bad management of Covid-19 emergency #trumpknew, #trumpvirus, #covid, #trumpisaloser, #trumpisana- tionaldisgrace, #trumpliedpeopledied #2 Town hall meetings; sub-topics: climate crisis, veterans, #cnntownhall, #climatecrisis, #greennewdeal, #respectveterans, discrimination #hererightmatters, #stoptrumpsterror #3 Encouraging peopleto vote #election2020, #voteearly, #vote2020, #votebymail, #voteready, #electionday #4 Accusations against Hunter Biden #hunterbiden, #bidencrimefamily, #burisma, #ukraine, #hunterbiden- emails, #china #5 The US Supreme Court;nomination of Amy Coney Barrett #scotus, #amyconeybarrett, #filltheseat, #supremecourt, #riprbg, #scotushearings #6 Support for Trump #maga, #votetrump2020, #maga2020, #kag, #voteredtosaveamer- ica2020, #trumppence2020 The first topic is focused on the criticisms leveled at pandemic. In addition, the discussion about the US Supreme Trump regarding the management of the health emergency Court showed a slight increase close to the nomination, in the USA caused by Covid-19 pandemic. The second one announced by Donald Trump, of Judge Amy Coney Bar- is related to the online discussion about town hall meetings, rett as Associate Justice of the US Supreme Court to fill covering different topics against Trump like discrimina- the vacancy left by the death of Ruth Bader Ginsburg. In tion, veterans and climate crisis (e.g., he referred to climate the following weeks, the focus of the online conversation change as a “hoax”, and to veterans as “human scum”). The shifted to various topics related to the approach of the Elec- third one is a general topic about the presidential election. tion Day and the importance of voting. We also observed an The fourth topic is related to the accusations of corruption increase in the volume of tweets concerning the accusations and wrongdoing in regards to China and Ukraine leveled leveled against Joe Biden’s son (i.e., Hunter Biden), a topic against Hunter Biden, i.e., the son of the democratic can- discussed mostly by the Democratic candidate’s detractors. didate Joe Biden. The fifth topic focuses on the nomination Finally, other topics regarding the support voters expressed of the conservative Amy Coney Barrett for a seat on the toward Trump and their criticisms leveled against him linked Supreme Court as successor to the liberal Associate Justice to town hall meetings showed an almost constant impact on Ruth Bader Ginsburg. Finally, the last topic is related to the the online conversation. online discussion of Trump’s supporters, characterized by notorious hashtags like #maga or #kag.4.3 Temporal analysis Once the major discussion topics were detected, we analyzed their overall impact on the online conversation, In this step we investigated the temporal dynamics of social along with their evolution in the eight weeks included in media conversation, in order to analyze users’ publishing our observation period, as shown in Fig. 10. In particular, behavior, studying how it is related to the detected polarity we calculated the volume of each hashtag-based topic by and how it reflected the occurrence of external events (e.g., determining the percentage of tweets that contain hashtags debates, rallies, etc.). However, as described by the reposi- belonging to the corresponding cluster. Considering our tory owners in Chen et al. (2021), there may be gaps in the overall observation period, the most relevant topic is about dataset due to several issues. Firstly, the data collection step Covid-19 pandemic and it specifically refers to Trump’s was highly contingent upon the stability of the network and mismanagement of the health emergency. Other topics are hardware. Secondly, Twitter significantly limits the num- related to the presidential election in general or arise from ber of tweets that can be rehydrated. Finally, tweets may no the publishing activity of Trump’s supporters. Also Biden’s longer be available as users have been removed, banned, or supporters signic fi antly contributed to the online discussion, suspended. by leveraging anti-Trump sub-topics that have emerged from Figure 11 shows the timeline of polarized tweets volume several town hall meetings, about discrimination, veterans annotated with the four main political debates occurring dur- and the position of the Republican candidate about the cli- ing the election campaign, i.e., between September 1 and mate crisis. October 31. The first observation period (September 1 to For what concerns the temporal evolution of the detected September 28) exhibits significantly different communica- topics, we found that in the early weeks online conversation tion dynamics prior to the first debate. Interestingly, this focused on the relationship between Trump and Covid-19 1 3 Social Network Analysis and Mining (2022) 12:83 Page 11 of 16 83 Fig. 10 Weekly volume of tweets related to the detected topics from September 1 to October 31, 2020 Fig. 11 Time series of polarized tweets published from Septem- ber 1 to October 31, 2020 image shows an intense activity spikes of Biden’s support- Vice Presidential debate) and October 13 (before the second ers, as a likely consequence of President Trump’s actions: Presidential debate). September 10: president Trump has attacked Democratic 4.4 Comparative analysis with opinion polls Vice Presidential candidate Kamala Harris. September 15: despite being banned by state authorities In this step we assessed the effectiveness of our approach in from holding rallies, President Trump still decided to determining the polarity of social media users with the aim hold one in Nevada. of understanding which candidate or party public opinion • September 18: president Trump blamed blue states for is most in favor of. A first remarkable result was obtained the high number of the US Covid-19 fatalities. through a real-time analysis, carried out on Twitter data • September 28: during a rally in Pennsylvania, Trump collected during the two weeks before the Election Day. called Biden “a dishonest politician and a puppet in the Specifically, IOM-NN was able to correctly determine Joe hands of the radical left”. Biden’s lead over Donald Trump, especially in Georgia, where a Democratic candidate had not won since 1992 with The second and third observation windows (from Septem- the election of Bill Clinton. This promising result, publicly ber 30 to October 31) show typical weekly cycles of social available through our university web portal, represents a media chatter, with no particular explosion or shock-related spike from external events, except for October 6 (before the https:// scalab. dimes. unical. it/ usa- 2020/ (text in Italian). 1 3 83 Page 12 of 16 Social Network Analysis and Mining (2022) 12:83 Table 3 Comparison between State Real percentages Opinion polls IOM-NN voting percentages estimated by IOM-NN and the latest opinion B T B T B T polls Arizona 49.4 49.1 48.0 45.8 50.2 48.3 Florida 47.9 51.2 48.7 46.0 48.0 51.1 Georgia 49.5 49.2 47.6 47.4 52.7 46.0 Michigan 50.6 47.8 49.9 44.4 55.4 43.0 Minnesota 52.4 45.3 51.6 41.8 55.1 42.6 Nevada 50.1 47.7 49.4 44.4 49.8 48.0 New Hampshire 52.7 45.4 53.4 42.4 50.9 47.3 North Carolina 48.6 49.9 47.8 47.5 56.6 41.9 Pennsylvania 50.0 48.8 49.4 45.7 55.7 43.1 Texas 46.5 52.1 47.5 48.8 46.1 52.5 Wisconsin 49.4 48.8 52.0 42.8 56.3 41.9 Correctly classified – 9/11 10/11 Tweets – – 670,451 Users – ≈ 11,000 57,116 Avg. Acc – 0.82 0.91 Fig. 12 Comparison between IOM-NN and the latest opinion polls in identifying the winning candidate step forward with respect to our previous work, as it gives a opinion polls, and IOM-NN estimates. The two candidates clear proof of the real-time effectiveness of IOM-NN, which (i.e., Joe Biden and Donald Trump) are indicated with “B” suggests the possibility of using it to enhance or even replace and “T”, respectively. The winning candidate is written in traditional opinion polls. bold when it is correctly identified. Starting from the encouraging real-time results, we The results of the comparison are summarized in extended that analysis by focusing on the main eleven swing Fig. 12, which shows that the estimates achieved by IOM- states, as described in Sect. 4.1.1. Specifically, we compared NN, related to the voting intentions of social media users the results obtained through IOM-NN with the average val- are more in-line with the actual behaviors of voters with ues of the latest opinion polls before the election. For each respect to the opinion polls, thus giving a clue to the final analyzed state, Table 3 reports the real voting percentages, result in 10 out of 11 swing states (with an average accu- racy of 91%). Using this metric we penalize the inversions of polarity which can be a crucial issue while analyzing these kinds of states characterized by a high degree of https:// www. 270to win. com/ 2020- polls- biden- trump/. 1 3 Social Network Analysis and Mining (2022) 12:83 Page 13 of 16 83 Fig. 13 Distribution of sentiments and emotions of pro- Trump tweets Fig. 14 Distribution of sentiments and emotions of pro- Biden tweets uncertainty. Notice that, for what concerns North Carolina, behavior of voters. Moreover, a noteworthy advantage of neither the estimates achieved by IOM-NN nor the opinion IOM-NN with respect to traditional opinion polls, is the polls were in-line with the actual outcome in this state. ability to capture the opinion of a larger number of people This is a common situation as the results achieved by the more quickly and at a lower cost. This makes IOM-NN a polls and IOM-NN must be understood as an estimate of valid support to enhance or even replace opinion polls, the polarization of public opinion in the weeks preceding by providing relevant insights useful to understand the the Election Day, not always in accordance with the actual dynamics of the election campaign. 1 3 83 Page 14 of 16 Social Network Analysis and Mining (2022) 12:83 Table 4 A sample of pro-Trump tweets showing different emotions Tweet About Sentiment Emotion “First time registered voter excited to vote for@realDonaldTrump #FourMoreYears” Trump Positive Joy “#JoeBiden Democrats support domestic terrorists and exploit race and gender for political gain.I’m afraid for Biden Negative Fear America #PennsylvaniansForTrump” “You are a disgrace to politicize the death of these people, but obviously you don’t care. #JoeBiden #BidenHar- Biden Negative Disgust ris” “#realDonaldTrump If anyone can do it, you can. Best President ever! Godspeed sir. #AmericaFirst Trump Positive Trust #MAGA2020” Table 5 A sample of pro-Biden tweets showing different emotions Tweet About Sentiment Emotion “We all need to #VoteBiden to make this happen #VOTE” Biden Positive Anticipation “#realDonaldTrump You are a racist and a loser. #TrumpIsALoser #RacistTrump” Trump Negative Disgust “Today is a sad day. News reports are talking about 200,000 Americans dead from Covid-19 Trump Negative Sadness so far #TrumpKnew #COVID19 ” “I want empathy and decency in the White House. #BidenHarris2020ToSaveAmerica” Biden Positive Trust 4.5 Emotion analysis In this paper we presented an in-depth analysis of the posts published on Twitter during the 2020 US election The goal of this last step is to model the political orienta- campaign, jointly exploiting several techniques for topic discovery, opinion mining and emotion analysis in a uni- tion of Twitter users from an emotional point of view. To this purpose, we used the SentiStrength tool (as explained fied analysis workflow, with the aim of outlining an accurate representation of this political event from different points of earlier in Sect. 3.4), for discovering the existing relation- ships between user polarity and the sentiment expressed in view. In particular, we extracted the main discussion top- ics following a clustering-based approach, monitoring their referring to the two presidential candidates. Then, for each polarized tweet we explored the emotion the tweet conveys. weekly impact on social media conversation. Moreover, we leveraged IOM-NN to estimate the polarization of Twitter Figures 13 and 14 describe the sentiment and the emotional state of the tweets with the relative intensity of the tweets users regarding the two main candidates Donald Trump and Joe Biden, both in real-time and by focusing on the main US produced by Trump and Biden supporters, respectively. What appears evident is that, on average, the tweets pro- swing states. We also investigated the temporal dynamics of the online discussion, combining it with the polarization duced by Trump’s supporters are significantly more positive than those produced by Biden’s supporters, which devote information coming from IOM-NN, in order to study how users’ publishing behavior reflected external events, over a significant number of negative tweets to their opponent. For what concerns the detected emotions, Trump’s sup- time, in relation to their political orientation. Finally, we exploited sentiment analysis and text mining techniques porters express joy and confidence about Trump, while fear about Biden’s election. Biden’s supporters, instead, show to discover the relationship between the user polarization, determined with the aid of IOM-NN, and the sentiment trust and anticipation in having Biden as future president of the USA, with a more marked presence of negative emotions expressed in referring to the different candidates, thus mod- eling political support of Twitter users from an emotional about Trump, like anger, disgust and sadness. Tables 4 and 5 show various examples of tweets including viewpoint. Experimental evaluation shows that in the early weeks in the analysis, showing how our approach can model social media conversation from an emotional point of view. online conversation focused on the relationship between Trump and Covid-19 pandemic and on the nomination of Judge Amy Coney Barrett as Associate Justice of the US Supreme Court. In the following weeks, instead, the focus of 5 Conclusions and final remarks the online conversation shifted to other topics including the accusations leveled to Hunter Biden and the criticism leveled The widespread use of social media can be exploited to extract useful information concerning people’s behaviors against Trump linked to his position about the climate crisis and veterans. Regarding the political polarization of public and interactions. 1 3 Social Network Analysis and Mining (2022) 12:83 Page 15 of 16 83 need to obtain permission directly from the copyright holder. To view a opinion, IOM-NN was able to achieve meaningful estimates copy of this licence, visit http://cr eativ ecommons. or g/licen ses/ b y/4.0/ . of the voting intentions of social media users, which makes it a valid solution to go beyond traditional opinion polls, by providing relevant insights useful to understand the dynamics of the election campaign. One major drawback References of this approach lies in different possible platform biases, such as usage biases due to the distribution of users of a Alashri S, Kandala SS, Bajaj V, Ravi R, Smith KL, Desouza KC (2016) social media platform in terms of gender, age, culture and An analysis of sentiments on facebook during the 2016 us presi- social status, as well as technical biases related to platform dential election. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). policies about data availability and restrictions imposed in IEEE, pp 795–802 some areas of the world. Finally, as for the analysis of the An J, Quercia D, Crowcroft J (2013) Fragmented social media: a look emotional state of social users, we found out that the tweets into selective exposure to political news. In: Proceedings of the produced by Trump’s supporters are significantly more posi- 22nd international conference on world wide web, pp 51–52 Azarbonyad H, Dehghani M, Beelen K, Arkut A, Marx M, Kamps tive than those produced by Biden’s supporters. In particu- J (2017) Words are malleable: Computing semantic shifts in lar, i) Trump’s supporters express joy and confidence about political and media discourse. In: Proceedings of the 2017 ACM Trump, while fear about Biden’s election; ii) Biden’s sup- on conference on information and knowledge management, pp porters show trust and anticipation in having Biden as future 1509–1518 Bastos MT, Puschmann C, Travitzki R (2013) Tweeting across president of the USA, with a more marked presence of nega- hashtags: overlapping users and the importance of language, top- tive emotions about Trump, like anger, disgust and sadness. ics, and politics. In: Proceedings of the 24th ACM conference on As future work, we will apply the presented analysis hypertext and social media, pp 164–168 workflow to other scenarios, such as product adoption anal- Belcastro L, Cantini R, Marozzo F, Talia D, Trunfio P (2020) Learning political polarization on social media using neural networks. IEEE ysis and reputation evaluation of companies. In fact, it can Access 8:47177–47187 be easily generalized to different use cases, as it is not tied Bilal M, Gani A, Marjani M, Malik N (2019) Predicting elections: to any specific application domain, and only relies on the Social media data and techniques. In: 2019 International confer- representativeness of the analyzed posts. Moreover, we can ence on engineering and emerging technologies (ICEET), pp 1–6. IEEE integrate other techniques in our workflow, introducing new Cambre J, Klemmer SR, Kulkarni C (2017) Escaping the echo cham- steps aimed at improving the quality of the achieved results. ber: ideologically and geographically diverse discussions about As an example, a hashtag recommendation model can be politics. In: Proceedings of the 2017 CHI conference extended used for enriching the information content of the analyzed abstracts on human factors in computing systems, pp 2423–2428 Cantini R, Marozzo F, Bruno G, Trunfio P (2021) Learning sentence- data, since keyword-based approaches like IOM-NN are to-hashtags semantic mapping for hashtag recommendation on strongly dependent on the availability of consistent hashtags microblogs. ACM Trans Knowl Discov Data (TKDD) 16(2):1–26 in social media posts (Cantini et al. 2021). Cantini R, Marozzo F, Talia D, Trunfio P (2022) Analyzing political polarization on social media by deleting bot spamming. Big Data Acknowledgements This project has received funding from the Euro- Cognit Comput 6(1):1. https:// doi. org/ 10. 3390/ bdcc6 010003 pean High-Performance Computing Joint Undertaking (JU) under grant Cesario E, Iannazzo AR, Marozzo F, Morello F, Riotta G, Spada A, agreement No 955558. The JU receives support from the European Talia D, Trunfio P (2016) Analyzing social media data to discover Union’s Horizon 2020 research and innovation program and Spain, mobility patterns at expo 2015: methodology and results. In: Inter- Germany, France, Italy, Poland, Switzerland, Norway. national conference on high performance computing & simulation (HPCS). IEEE, pp 230–237 Chen E, Deb A, Ferrara E (2021) # election2020: the first public twit- Funding Open access funding provided by Università della Calabria ter dataset on the 2020 us presidential election. J Comput Soc within the CRUI-CARE Agreement. Sci 1–18 Chiu SI, Hsu KW (2018) Predicting political tendency of posts on Data availability The data that support the findings of this study are facebook. In: Proceedings of the 2018 7th international conference publicly available. In particular, this data was gathered using Twitter on software and computer applications, pp 110–114 APIs (https:// devel oper. twitt er. com.) and is hosted on Github (https:// Ciampaglia GL, Shiralkar P, Rocha LM, Bollen J, Menczer F, Flam- github. com/ SCAla bUnic al/ USA20 20). mini A (2015) Computational fact checking from knowledge net- works. PloS One 10(6):e0128193 Open Access This article is licensed under a Creative Commons Attri- Dade-Robertson M, Taylor N, Marshall J, Olivier P (2012) The politi- bution 4.0 International License, which permits use, sharing, adapta- cal sensorium. In: Proceedings of the 4th media architecture Bien- tion, distribution and reproduction in any medium or format, as long nale conference: participation, pp 47–50 as you give appropriate credit to the original author(s) and the source, Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of provide a link to the Creative Commons licence, and indicate if changes deep bidirectional transformers for language understanding. arXiv were made. The images or other third party material in this article are preprint arXiv: 1810. 04805 included in the article's Creative Commons licence, unless indicated Fraisier O, Cabanac G, Pitarch Y, Besançon R, Boughanem M (2017) otherwise in a credit line to the material. If material is not included in Uncovering like-minded political communities on twitter. In: the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will 1 3 83 Page 16 of 16 Social Network Analysis and Mining (2022) 12:83 Proceedings of the ACM SIGIR international conference on Nakov P, Rosenthal S, Kiritchenko S, Mohammad SM, Kozareva theory of information retrieval, pp 261–264 Z, Ritter A, Stoyanov V, Zhu X (2016) Developing a success- Garimella K, De Francisci Morales G, Gionis A, Mathioudakis M ful semeval task in sentiment analysis of twitter and other social (2018) Political discourse on social media: Echo chambers, gate- media texts. Lang Resour Eval 50(1):35–65 keepers, and the price of bipartisanship. In: Proceedings of the Oikonomou L, Tjortjis C (2018) A method for predicting the winner 2018 World Wide Web Conference, pp 913–922 of the usa presidential elections using data extracted from twitter. Greene D, Cross JP (2015) Unveiling the political agenda of the euro- In: 2018 South-Eastern European Design Automation, Computer pean parliament plenary: a topical analysis. In: Proceedings of the Engineering, Computer Networks and Society Media Conference ACM web science conference, pp 1–10 (SEEDA_CECNSM). IEEE, pp 1–8 Grevet C, Terveen LG, Gilbert E (2014) Managing political differences Pang B, Lee L (2008) Opinion mining and sentiment analysis. Founda- in social media. In: Proceedings of the 17th ACM conference on tions Trends (r) Inf Retriev 2(1-2):1–135 Computer supported cooperative work & social computing, pp Plutchik R (2001) The nature of emotions: human emotions have deep 1400–1408 evolutionary roots, a fact that may explain their complexity and Gyongyi Z, Garcia-Molina H, Pedersen J (2004) Combating web spam provide tools for clinical practice. Am Sci 89(4):344–350 with trustrank. In: Proceedings of the 30th international confer- Saleiro P, Gomes L, Soares C (2016) Sentiment aggregate functions for ence on very large data bases (VLDB) political opinion polling using microblog streams. In: Proceedings Haq EU, Braud T, Kwon YD, Hui P (2020) A survey on computational of the Ninth International C* Conference on Computer Science & politics. IEEE Access 8:197379–197406 Software Engineering, pp 44–50 Hoffmann CP, Lutz C (2017) Spiral of silence 2.0: Political self-cen- Shu K, Bernard HR, Liu H (2019) Studying fake news via network sorship among young facebook users. In: Proceedings of the 8th analysis: detection and mitigation. In: Emerging Research Chal- international conference on social media & society, pp 1–12 lenges and Opportunities in Computational Social Network Anal- Hong S, Nadler D (2015) Social media and political voices of organ- ysis and Mining. Springer, pp 43–65 ized interest groups: a descriptive analysis. In: Proceedings of Singh A, kumar A, Dua N, Mishra VK, Singh D, Agrawal A (2021) the 16th annual international conference on digital government Predicting elections results using social media activity a case research, pp 210–216 study: Usa presidential election 2020. In: 2021 7th International Keneshloo Y, Cadena J, Korkmaz G, Ramakrishnan N (2014) Detecting Conference on Advanced Computing and Communication Sys- and forecasting domestic political crises: A graph-based approach. tems (ICACCS), vol  1, pp 314–319 . h t t p s : / / d o i . o r g / 1 0 . 1 1 0 9 / In: Proceedings of the 2014 ACM conference on Web science, ICACC S51430. 2021. 94418 35 pp 192–196 Takikawa H, Nagayoshi K (2017) Political polarization in social Kim J, Tabibian B, Oh A, Schölkopf B, Gomez-Rodriguez M (2018) media: analysis of the “twitter political field” in Japan. In: 2017 Leveraging the crowd to detect and reduce the spread of fake news IEEE international conference on big data (big data). IEEE, pp and misinformation. In: Proceedings of the eleventh ACM inter- 3143–3150 national conference on web search and data mining, pp 324–332 Thelwall M (2017) The heart and soul of the web? sentiment strength Kiritchenko S, Zhu X, Mohammad SM (2014) Sentiment analysis of detection in the social web with sentistrength. In: Cyberemotions. short informal texts. J Artif Intell Res 50:723–762 Springer, pp 119–134 Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D Trabelsi A, Zaïane OR (2019) Phaitv: A phrase author interaction topic (2014) The stanford corenlp natural language processing toolkit. viewpoint model for the summarization of reasons expressed by In: Proceedings of 52nd annual meeting of the association for polarized stances. In: Proceedings of the International AAAI Con- computational linguistics: system demonstrations, pp 55–60 ference on Web and Social Media, vol 13, pp 482–492 Marozzo F, Bessi A (2018) Analyzing polarization of social media Wong FMF, Tan CW, Sen S, Chiang M (2016) Quantifying political users and news sites during political campaigns. Soc Netw Anal leaning from tweets, retweets, and retweeters. IEEE Trans Knowl Mining 8(1):1–13 Data Eng 28(8):2158–2172 Mohammad S (2012) Portable features for classifying emotional text. Wulf V, Aal K, Abu Kteish I, Atam M, Schubert K, Rohde M, Yerousis In: Proceedings of the 2012 Conference of the North American GP, Randall D (2013) Fighting against the wall: Social media use Chapter of the Association for Computational Linguistics: Human by political activists in a palestinian village. In: Proceedings of Language Technologies, pp 587–591 the SIGCHI conference on human factors in computing systems, Mohammad SM, Turney PD (2013) Crowdsourcing a word-emotion pp 1979–1988 association lexicon. Comput Intell 29(3):436–465 Monti C, Rozza A, Zappella G, Zignani M, Arvidsson A, Colleoni Publisher's Note Springer Nature remains neutral with regard to E (2013) Modelling political disaffection from twitter data. In: jurisdictional claims in published maps and institutional affiliations. Proceedings of the second international workshop on issues of sentiment discovery and opinion mining, pp 1–9 1 3

Journal

Social Network Analysis and MiningSpringer Journals

Published: Dec 1, 2022

Keywords: Social media analysis; Opinion mining; User polarization; Sentiment analysis; Political events

References