Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Identifying lexical change in negative word-of-mouth on social media

Identifying lexical change in negative word-of-mouth on social media Negative word-of-mouth is a strong consumer and user response to dissatisfaction. Moral outrages can create an excessive collective aggressiveness against one single argument, one single word, or one action of a person resulting in hateful speech. In this work, we examine the change of vocabulary to explore the outbreak of online firestorms on Twitter. The sudden change of an emotional state can be captured in language. It reveals how people connect with each other to form outrage. We find that when users turn their outrage against somebody, the occurrence of self-referencing pronouns like ‘I’ and ‘me’ reduces significantly. Using data from Twitter, we derive such linguistic features together with features based on retweets and mention networks to use them as indicators for negative word-of-mouth dynamics in social media networks. Based on these features, we build three classification models that can predict the outbreak of a firestorm with high accuracy. 1 Introduction politics. Even though online firestorms are a new phenom- enon, their dynamics are similar to the way in which rumors As social media platforms with hundreds of millions of users are circulated. In 1947, Gordon Allport and Leo Postman interacting in real time on topics and events all over the defined a rumor as a “proposition for belief, passed along world, social media networks are social sensors for online from person to person, usually by word of mouth, without discussions and are known for quick and often emotional secure standards of evidence being presented” (Allport and disputes (Chadwick 2017). Online firestorms can be defined Postman 1947). as the sudden discharge of large quantities of messages When people are active on social media, they act in a containing negative word of mouth and complaint behavior socio-technical system that is mediated and driven by algo- against a person, company or group in social media networks rithms. The goal of social media platforms is to keep users (Pfeffer et al. 2014). The negative dynamics often start with engaged and to maximize their time spent on the platform. a collective “against the others” (Strathern et al. 2020). Highly engaged users who spend a lot of time on platforms In social media, negative opinions about products or are the core of a social media business model that is based companies are formed by and propagated via thousands or on selling more and better targeted ads. But the question millions of people within hours. Furthermore, massive nega- is always which content will be interesting for a particular tive online dynamics are not only limited to the business user? To answer this, recommendation systems are devel- domain, but they also affect organizations and individuals in oped to increase the chance that a user will click on a sug- gested link and read its content. These recommendation algorithms incorporate socio-demographic information, but * Wienke Strathern also data of a user’s previous activity (Leskovec et al. 2014; wienke.strathern@tum.de Anderson 2006). Raji Ghawi Furthermore, behavioral data of alters (friends) of a user raji.ghawi@tum.de are also used to suggest new content (Appel et al. 2020). Mirco Schönfeld Social scientists have studied the driving forces of social mirco.schoenfeld@uni-bayreuth.de relationships for decades, i.e., why do people connect with Jürgen Pfeffer each other. Homophily and transitivity are the most impor- juergen.pfeffer@tum.de tant factors for network formation. Homophily means that School of Social Science and Technology, Technical your friends are similar to yourself (McPherson et al. 2001). University of Munich, Munich, Germany They like similar things and are interested in similar topics. University of Bayreuth, Bayreuth, Germany Vol.:(0123456789) 1 3 59 Page 2 of 13 Social Network Analysis and Mining (2022) 12:59 Transitivity describes the fact that a person’s friends are often connected among each other (Heider 1946; Cartwright and Harary 1956). Combining these two aspects results in the fact that most people are embedded in personal networks with people that are similar to themselves and who are to a 1 high degree connected among each other. The above-described forces of how humans create net- works combined with recommendation systems have prob- lematic implications. Recommendation systems filter the content that is presented on social media and suggest new Fig. 1 Early detection of linguistic indicators (1) and prediction of “friends” to us. As a result, filter bubbles (Pariser 2011) are firestorm (2) formed around individuals on social media, i.e., they are connected to like-minded people and familiar content. The lack of diversity in access to people and content can easily on a firestorm as depicted in Fig.  1, the indicators show at lead to polarization (Dandekar et al. 2013). If we now add time point 1), whereas the firestorm takes place starting another key characteristic of social media, abbreviated com- during the phase marked by 2) in the figure. Hence, in this munication with little space for elaborate exchange, a perfect paper, we build upon and extend the work presented by breeding ground for online firestorms emerges. Consider a Strathern et al. (2020). couple of people disliking a statement or action of a politi- Our choice of methods to answer our research ques- cian, celebrity or any private individual and these people tion regarding the prediction of the beginning of online voicing their dislike aggressively on social media. Their firestorms is based on text statistics and social network online peers, who most likely have similar views (see above), analysis for longitudinal network data. We assume that will easily and quickly agree by sharing or retweeting the anomalies in behavior can be detected by statistical analy- discontent. Within hours, these negative dynamics can reach sis applied to processes over time. Hence, in this work, tens of thousands of users (Newman et al. 2006). A major we extract lexical and network-based properties, meas- problem, however, is to capture first signals of online outrage ure their occurrence for different tweet periods and use at an early stage. Knowing about these signals would help to these features to predict the outbreak of a firestorm. For intervene in a proper way to avoid escalations and negative the scope of this work, we are mainly interested in textual dynamics. data from tweets and in mention and retweet networks. In previous work, Strathern et al. (2020) tackled the ques- We use quantitative linguistics to study lexical properties. tion of anomaly detection in a network by exploring major For our linguistic analysis, we apply the Linguistic Inquiry features that indicate the outbreak of a firestorm; hence, the Word Count Tool by Pennebaker et al. (2015). To contrast goal was to early detect change and extract linguistic fea- this linguistic perspective, we also investigate mention tures. Detection of outrage (e.g., hate speech) is based on and retweet networks. Mentions and hashtags represent identification of predefined keywords, while the context in speech acts in linguistic pragmatics and are interesting in which certain topics and words are being used has to be that they represent behavioral properties in addition to the almost disregarded. To name just one extreme example, hate lexical properties (Scott 2015). For predictive analysis, we groups have managed to escape keyword-based machine define models based on linguistic features as well as mod- detection through clever combinations of words, misspell- els based on features derived from mention and retweet ings, satire and coded language (Udupa 2020). The focus of networks and compare them with each other. the analysis of Strathern et al. was on more complex lexical characteristics, which they applied as a basis for automated Our contributions are: detection. Our research question is the following: On Twitter, • Extracting linguistic and sentimental features from tex- there is constant fluctuation of content and tweets and the tual data as indicators of firestorms. question arises if, in these fluctuations, we can detect early • Defining a prediction model that accounts for linguistic that a negative event starts solely based on linguistic fea- features. tures. We assume that the start of a firestorm is a process, and because of a sudden change of emotions it can be The remainder of the paper is organized as follows: Sect. 2 early detected in sentiments and lexical items. With this highlights important related works. In Sect. 3, we introduce work, we aim at answering the following question: Once the dataset used for this analysis together with a few descrip- we identify the linguistic changes as indicators of a fire - tive statistics. What follows in Sects. 4 and 5 is a descrip- storm, can we also predict a firestorm? In an abstract view tion of the linguistic and network-based features that our 1 3 Social Network Analysis and Mining (2022) 12:59 Page 3 of 13 59 prediction is based upon. The prediction task is described in gave helpful insights about the mathematical approach to detail in Sect. 6. Section 7 concludes the paper. sentiment dynamics (Charlton et al. 2016). Arguing that rational and emotional styles of commu- nication have strong influence on conversational dynam- 2 Related work ics, sentiments were the basis to measure the frequency of cognitive and emotional language on Facebook. Bail While online firestorms are similar to rumors to some extent, et al. (2017). e.g. they often rely on hearsay and uncertainty, online fire- Instead, the analysis of linguistic patterns was used to storms pose new challenges due to the speed and potential understand affective arousal and linguist output (Sharp and global reach of social media dynamics (Pfeffer et al. 2014). Hargrove 2004). Extracting the patterns of word choice in an With respect to firestorms on social media, the analysis of online social platform reflecting on pronouns is one way to social dynamics, their early detection and prediction often characterize how a community forms in response to adverse involves research from the field of sentiment analysis, net- events such as a terrorist attack (Shaikh et al. 2017). Syn- work analysis as well as change detection. There is work chronized verbal behavior can reveal important information asking why do people join online firestorms (Delgado-Ball- about social dynamics. The effectiveness of using language ester et al. 2021). Based on the concept of moral panics, the to predict change in social psychological factors of inter- authors argue that participation behavior is driven by a moral est can be demonstrated nicely (Gonzales et al. 2010). In compass and a desire for social recognition (Johnen et al. Lamba et al. (2015), the authors detected and described 21 2018). Social norm theory refers to understanding online online firestorms discussing their impact on the network. aggression in a social–political online setting, challenging To advance knowledge about firestorms and the spread of the popular assumption that online anonymity is one of the rumors, we use the extracted data as a starting point to fol- principle factors that promote aggression (Rost et al. 2016). low up on the research findings. 2.1 Sentiment analysis 2.2 Network analysis Approaches to the analysis of firestorms focusing on the mood of the users and their expressed sentiments unveil, for Social media dynamics can be described with models and example, that in the context of online firestorms, non-anon- methods of social networks (Wasserman and Faust 1994; ymous individuals are more aggressive compared to anony- Newman 2010; Hennig et  al. 2012). Approaches mainly mous individuals (Rost et al. 2016). Online firestorms are evaluating network dynamics are, for example, proposed used as a topic of news coverage by journalists and explore by Snijders et al. Here, network dynamics were modeled journalists’ contribution to attempts of online scandalization. as network panel data (Snijders et al. 2010). The assump- By covering the outcry, journalists elevate it onto a main- tion is that the observed data are discrete observations of a stream communication platform and support the process of continuous-time Markov process on the space of all directed scandalization. Based on a typology of online firestorms, the graphs on a given node set, in which changes in tie vari- authors have found that the majority of cases address events ables are independent conditional on the current graph. The of perceived discrimination and moral misconduct aiming model for tie changes is parametric and designed for applica- at societal change (Stich et al. 2014). Online firestorms on tions to social network analysis, where the network dynam- social media have been studied to design an Online Fire- ics can be interpreted as being generated by choices made storm Detector that includes an algorithm inspired by epide- by the social actors represented by the nodes of the graph. miological surveillance systems using real-world data from This study demonstrated ways in which network structure a firestorm (Drasch et al. 2015). reacts to users posting and sharing content. While exam- Sentiment analysis was applied to analyze the emotional ining the complete dynamics of the Twitter information shape of moral discussions in social networks (Brady network, the authors showed where users post and reshare et al. 2017). It has been argued that moral–emotional lan- information while creating and destroying connections. guage increased diffusion more strongly. Highlighting the Dynamics of network structure can be characterized by importance of emotion in the social transmission of moral steady rates of change, interrupted by sudden bursts (Myers ideas, the authors demonstrate the utility of social network et  al. 2012). Network dynamics were modeled as a class methods for studying morality. A different approach is to of statistical models for longitudinal network data (Snijders measure emotional contagion in social media and networks 2001). Dynamics of online firestorms were analyzed using by evaluating the emotional valence of content the users an agent-based computer simulation (ABS) (Hauser et al. are exposed to before posting their own tweets (Ferrara 2017)—information diffusion and opinion adoption are trig- and Yang 2015). Modeling collective sentiment on Twitter gered by negative conflict messages. 1 3 59 Page 4 of 13 Social Network Analysis and Mining (2022) 12:59 Table 1 Firestorm events sorted by number of tweets 2.3 Classification in machine learning Firestorm hashtag/mention Tweets Users First day In order to efficiently analyze big data, machine learning #whyimvotingukip 39,969 32,382 2014-05-21 methods are used, with the goal of learning from experience #muslimrage 15,721 11,952 2012-09-17 in certain tasks. In particular, in supervised learning, the #CancelColbert 13,277 10,353 2014-03-28 goal is to predict some output variable that is associated with #myNYPD 12,762 10,362 2014-04-23 each input item. This task is called classification when the @TheOnion 9959 8803 2013-02-25 output variable is a category. Many standard classification @KLM 8716 8050 2014-06-29 algorithms have been developed over the last decades, such #qantas 8649 5405 2011-10-29 as logistic regression, random forests, k nearest neighbors, @David_Cameron 7096 6447 2014-03-06 support vector machines and many more (Friedman et al. suey_park 6919 3854 2014-03-28 2001; James et al. 2014). @celebboutique 6679 6189 2012-07-20 Machine learning methods have been used widely for @GaelGarciaB 6646 6234 2014-06-29 studying users’ behavior on social media (Ruths and Pfef- #NotIntendedtobeaFactualStat. 6261 4389 2011-04-13 fer 2014), predicting the behavior of techno-social systems #AskJPM 4321 3418 2013-11-14 (Vespignani 2009) and predicting consumer behavior with @SpaghettiOs 2890 2704 2013-12-07 Web search (Goel et al. 2010). Moreover, such methods are #McDStories 2374 1993 2012-01-24 also used in identifying relevant electronic word of mouth #AskBG 2221 1933 2013-10-17 in social media (Vermeer et al. 2019; Strathern et al. 2021). #QantasLuxury 2098 1658 2011-11-22 #VogueArticles 1894 1819 2014-09-14 2.4 Mixed approaches @fafsa 1828 1693 2014-06-25 @UKinUSA 142 140 2014-08-27 More recent approaches analyze online firestorms by analyz- ing both content and structural information. A text-mining study on online firestorms evaluates negative eWOM that demonstrates distinct impacts of high- and low-arousal decahose, a random 10% sample of all tweets. This is a emotions, structural tie strength, and linguistic style match (between sender and brand community) on firestorm poten- scaled up version of Twitter’s Sample API, which gives a stream of a random 1% sample of all tweets. tial (Herhausen et al. 2019). Online Firestorms were studied to develop optimized forms of counteraction, which engage Mention and retweet networks based on these sam- ples can be considered as random edge sampled networks individuals to act as supporters and initiate the spread of positive word of mouth, helping to constrain the firestorm (Wagner et al. 2017) since sampling and network construc- tion is based on Tweets that constitute the links in the as much as possible (Mochalova and Nanopoulos 2014). By monitoring psychological and linguistic features in the network. As found by Morstatter et al. (2013), the Sample API (unlike the Streaming API) indeed gives an accurate tweets and network features, we combine methods from text analysis, social network analysis and change detection to representation of the relative frequencies of hashtags over time. We assume that the decahose has this property as early detect and predict the start of a firestorm. well, with the significant benefit that it gives us more sta- tistical power to estimate the true size of smaller events. 3 Data The dataset consists of 20 firestorms with the  high- est volume of tweets as identified in Lamba et al. (2015). To address our research question, we examined 20 different Table  1 shows those events along with the number of tweets, number of users, and the date of the first day of firestorms. Some are directed against individuals and a sin- gle statement; some are against companies, campaigns and the event. The set of tweets of each firestorm covers the first week of the event. We also augmented this dataset via marketing actions. They have all received widespread public attention in social media as well as mainstream media. As including additional tweets, of the same group of users, during the same week of the event (7 days) and the week shown in Table 1, there are hashtags and also @mentions that name the target. before (8 days), such that the volume of tweets is balanced between the 2 weeks (about 50% each). The fraction of 3.1 Dataset firestorm-related tweets is between 2 and 8% of the tweets of each event (Table 1)—it is important to realize at this We used the same set of firestorms as in Lamba et  al. point that even for users engaging in online firestorms, (2015), whose data source is an archive of the Twitter 1 3 Social Network Analysis and Mining (2022) 12:59 Page 5 of 13 59 this activity is a minor part of their overall activity on the during firestorms compared to non-firestorm periods. Fur - platform. thermore, we would like to know which lexical items differ Thus, for each of the 20 firestorms, we have three types of in different phases. We extracted 90 lexical features for each tweets: (1) tweets related to the firestorm, (2) tweets posted tweet of each of the 20 firestorms. We used variables that 1 week before the firestorm and (3) tweets posted during the give standard linguistic dimensions (percentage of words firestorm (same week) but not related to it. Let us denote in the text that are pronouns, articles, auxiliary verbs) and these three sets of tweets T , T and T , respectively . informal language markers (percentage of words that refer 1 2 3 For each event, we also extracted tweets metadata includ- to the category assents, fillers, swear words, netspeak). To ing timestamp, hashtags, mentions and retweet information discover sentiments, we also used the variables affective (user and tweet ID). processes, cognitive processes, perceptual processes. The categories provide a sentiment score of positivity and nega- tivity to every single tweet. We also considered the category 4 Linguistic features posemo and negemo to see if a tweet is considered positive or negative. We also constructed our own category ‘emo’ Negative word-of-mouth sometimes contains strong emo- by calculating the difference between positive and negative tional expressions and even highly aggressive words against sentiments in tweets. Thus, weights of this category can a person or a company. Hence, the start of a firestorm might be negative and should describe the overall sentiment of be indicated by a sudden change of vocabulary and emo- a tweet. tions. Do people become emotionally thrilled and can we These categories each contain several subcategories that find changes in tweets? Can we capture a change of perspec- can be subsumed under the category names. The category of tive in the text against a target? Emotionality is reflected in personal pronouns, for example, contains several subcatego- words, the first analysis is based on the smallest structural ries referring to personal pronouns in numerous forms. One unit in language: words (Bybee and Hopper 2001). of these subcategories ‘I,’ for example, includes—besides the pronoun ‘I’—‘me,’ ‘mine,’ ‘my,’ and special netspeak 4.1 Extraction of features forms such as ‘idk’ (which means “I don’t know”). Netspeak is a written and oral language, an internet chat, To extract linguistic features and sentiment scores we use the which has developed mainly from the technical circum- Linguistic Inquiry Word Count classification scheme, short stances: the keyboard and the screen. The combination of LIWCTool (Pennebaker et al. 2015). In this way, first tex- technology and language makes it possible to write the way tual differences and similarities can be quantified by simple you speak (Crystal 2002). word frequency distribution (Baayen 1993). Furthermore, to Finally, for each individual subcategory, we obtain the understand emotions in tweets we use the sentiment analysis mean value of the respective LIWC values for the firestorm provided by the LIWCTool. Essentially, sentiment analysis tweets and the non-r fi estorm tweets. Comparing these values is the automatic determination of the valence or polarity of gives first insights about lexical differences and similarities. a text part, i.e., the classification of whether a text part has a positive, negative or neutral valence. Basically, automatic 4.2 Comparing firestorm and non‑firestorm tweets methods of sentiment analysis work either lexicon based or on the basis of machine learning. Lexicon-based meth- In order to explore how the linguistic and sentiment features ods use extensive lexicons in which individual words are of tweets change during firestorms, we perform compari- assigned positive or negative numerical values to determine sons between firestorm tweets and non-firestorm tweets with the valence of a text section (usually at the sentence level) regard to the individual LIWC subcategories. The firestorm of a text part (mostly on sentence level) (Tausczik and Pen- tweets (T ) were compared with tweets from the same user nebaker 2009). accounts from the week immediately before the firestorm LIWC contains a dictionary with about 90 output vari- (T ) and the same week of the firestorm ( T ). We used t-tests 2 3 ables, so each tweet is matched with about 90 different cat- to compare the mean value of the respective LIWC values egories. The classification scheme is based on psychologi- for the firestorm tweets and the non-firestorm tweets, where cal and linguistic research. Particularly, we were interested the level of statistical significance of those tests is expressed in sentiments to see if users show ways of aggressiveness using p-values (we used p < 0.01). Figure  2a depicts the comparisons between firestorm tweets and non-firestorm tweets with regard to the individual subcategories. Every subcategory was examined separately Comparing with (Lamba et  al. 2015), we have excluded ‘Ask- for all 20 firestorms. Thicke’ firestorm, because it has a gap of 24  h between T and T ; 2 1 hence, we added ‘suey_park’ firestorm instead. 1 3 59 Page 6 of 13 Social Network Analysis and Mining (2022) 12:59 Fig. 2 Comparison between firestorm-related tweets (T1) and non-firestorm tweets (T2 askbg and T3) w.r.t various linguistic features using T-tests with p askjpm value < 0.01 cancelcolbert celebbou�que david_cameron fafsa gaelgarciab klm mcdstories muslimrage mynypd no�ntendedtobeafact qantas qantasluxury spaghe‚os suey_park theonion ukinusa voguear�cle s whyimvo�nguki p significantly LOWER in firestorm-tweets w.r.t non-firestorm tweets no significant diff erence between firestorm-tweets and non-firestorm tweets significantly HIGHER in firestorm-tweets w.r.t non-firestorm tweets (a) Resultsof comparison lower 15 811158 16 5187 2 same 030 05 21211 higher 599 572 14 01217 (b) Numberof firestorms The blue (turquoise) cells represent the firestorms in while in 15 firestorms these words were used significantly which terms from the respective category occurred more less. Similar results are observed for category ‘she/he’ In frequently during the firestorms. The red (brick) cells rep- addition to the category ‘I’, the categories ‘posemo’ and resent the firestorms in which the same words occurred less ‘negemo’ should also be highlighted. Words representing frequently during the firestorms. The light gray cells repre- positive emotions like ‘love,’ ‘nice,’ ‘sweet’—the ‘posemo’ sent the firestorms in which there is no significant difference category—are used significantly less in almost all firestorms: between firestorm tweets and non-firestorm tweets. positive emotions were less present in 16 out of 20 fire- The results of comparison are aggregated in the table in storms. For the category ‘negemo,’ which contains words Fig. 2b, which shows, for each feature, the number of fire- representing negative emotions, this effect is reversed for storms according to the three cases of comparison: lower, all tweets—words in this category are used significantly higher and same (no significant difference). more often during most of the firestorms (14 out of 20). Results For category ‘I’ this means that in five firestorms There are 18 firestorms in which the ‘emo’ values were sig - people used words of this category significantly more often, nificantly lower during a firestorm. At the same time, there 1 3 we we you you shehe shehe they they posemo posemo negemo negemo emo emo netspeak netspeak assent assent Social Network Analysis and Mining (2022) 12:59 Page 7 of 13 59 Fig. 3 Evolution of network features over time (#myNYPD firestorm). Highlighted area indicates the start of the fire- storm (first 24 h) were only two firestorms where the differences in the values Moreover, at each time point we construct mention networks, of ‘emo’ were not significant. Another remarkable category and retweet networks taking into account all the tweets dur- is ‘assent,’ which contains words like ‘agree,’ ‘OK,’ ‘yes.’ ing the last 12 h. This way, we obtain a moving window of In this category, the effect is also reversed—words in this tweets: with a window size of 12 slices at steps of 1 h. The category are used significantly more often during almost all mention network of each moving window contains an edge firestorms (17 out of 20). Interpretation. We can state that ( user , user ) if a tweet (among tweets under consideration) 1 2 during firestorms, the I vanishes and users talk significantly posted by user contains a mention to user . The retweet 1 2 less about themselves compared to non-firestorm periods. network of each moving window contains an edge ( user , Simultaneously, the positivity in firestorms tweets vanishes user ) if a tweet (among tweets under consideration) posted and negativity rises. by user is a retweet of another (original) tweet posted by user . For each event, the mention networks constructed at dif- 5 Mention and retweet networks ferent time points are directed, unweighted networks. We performed several types of social network analysis and Besides linguistic features and sentiments expressed in extracted a set of metrics, including: tweets, online firestorms have also impact on the struc- ture of user’s social networks, such as mention and retweet • Number of nodes N and edges E, networks. Average out-degree (which equals avg. in-degree). To get insight on the evolution of each firestorm over • Maximum out-degree and maximum in-degree. time, we first split the time-line of each of the firestorm data- Relative size of the largest connected component. sets into buckets of one hour and assign tweets to buckets based on their timestamp. The result of this splitting is a Each of the aforementioned features leads to a time-series series of about 360 time slices (since the studied time-span when taken over the entire time-span of the event. For exam- of an event is 15 days). This allows us to perform analysis ple, Fig. 3 depicts some of those time-series for the features at fine granularity. of the mention and retweet networks of #myNYPD fire- First, at each time slice, we extract several basic features storm, showing how those features evolve over time. While of the corresponding hourly buckets of tweets, including: network metrics are affected by sampled datasets, we still believe that these metrics are meaningful since the sampling • Number of tweets N process was consistent over all firestorms. Number of mention tweets N Results One can clearly observe the oscillating behavior mt • Number of mentions N of those features. This oscillation is due to the alternation Ratio of mention tweets to all tweets N ∕N . of tweeting activity between daytime and night. More inter- mt t • Mention per tweet ratio: N ∕N . esting observation is the manifest change of behavior that m t occurs near the middle of the time span, which evidently 1 3 59 Page 8 of 13 Social Network Analysis and Mining (2022) 12:59 tweets before firestorm (T ) Fig. 4 Timeline of a firestorm firestorm tweets (T ) firestorm start observation period target = 0 target = 1 signals the beginning of the firestorm event. This apparent Thus, for each time slice, the corresponding bucket of change can be observed in most of the features for the event tweets is described by several features. Mainly, we distin- at hand. However, not all the features are useful to detect the guish between different types of features; each type of them trigger of the firestorm in all events. In particular, we find defines a prediction model: the maximum in-degree feature is one of the best features to detect this change. This feature can clearly detect the start Baseline model includes the basic features, such as of the firestorm (in all events). The maximum in-degree in number of tweets N , number of mentions N , etc. (see t m mention networks means the highest number of mentions Sect. 5). received by a particular user. • Mention-network model includes network features, such Interpretation Thus, the ability of this feature to detect a as, number of nodes and edges, density, reciprocity, aver- firestorm can be interpreted by considering that, generally age and max in-degree and out-degree, etc., extracted speaking, a firestorm occurs when one user is being men- from mention networks. tioned unusually high. This result is intuitive since Tweets • Retweet-network model includes the same set of network related to a certain Firestorm normally mention the victim’s features extracted from retweet networks. Twitter account. • Linguistic model extends the basic model by including Monitoring this feature in real-time would be certainly linguistic features, i.e., the mean values of extracted handy at detecting firestorms as early as possible, by signal- LIWC features (over the hourly bucket of tweets). In ing abnormal changes (increase) in this feature. However, particular, we are interested in the following features: the change of focus to a particular user can be the result of pronouns: namely: ‘i,’ ‘we,’ ‘you,’ ‘shehe,’ and ‘they’; different (including positive) events. emotions: ‘posemo,’ ‘negemo’ and ‘emo’; and ‘netspeak’ From a network perspective, an online firestorm occurs and ‘assent.’ when one user is mentioned unusually high, focusing on a Twitter handle or a hashtag. The maximum in-degree in @ By doing so, we create separate time series for each of the mention networks is significantly deviating from comparable features mentioned above. time periods. 6.2 Target variable 6 Predicting the start of a firestorm As shown in Fig. 4, the time span of the two sets of tweets T and T is 8 and 7 days, respectively, with an overlap of 1 2 1 In the previous section we identified slight changes in lexi- day between the two periods. We consider the first day of cal and sentimental cues as indicators of a firestorm. From a the firestorm as its start. Hence, we create a target variable network perspective, we identified the maximum in-degree whose value is 0 for the time points t occurring entirely to be a very good indicator for a firestorm to occur. Based before the firestorm (the first 7 days of T ) and 1 for the on these findings we want to test and compare our extracted time points t occurring during the first day of the firestorm. features for a classification task in order to build models for The rest of the firestorm days are omitted. Hence, we obtain predicting the start of a firestorm. about 7 × 24 = 168 time points where target = 0 (negative instances), as well as 24 points where target = 1 (positive 6.1 Prediction models (predictor variables) instances). As mentioned earlier, we split the time-line of each firestorm into buckets of one hour and assign tweets to buckets based on their timestamp. This number slightly varies from one firestorm to another. 1 3 Social Network Analysis and Mining (2022) 12:59 Page 9 of 13 59 Table 2 Pearson correlation of Basic features Network features Linguistic features basic features, network features and linguistic features with the Mention Retweet target variable (#myNYPD N 0.70 N 0.76 0.83 i 0.16 firestorm) N 0.71 E 0.80 0.86 we 0.12 mn − − − N 0.67 density 0.61 0.68 you 0.15 − − N ∕N 0.40 recip. 0.34 0.38 she/he 0.06 mt t N ∕N 0.21 lwcc 0.82 0.85 they 0.35 m t avg d 0.82 0.87 posemo 0.01 in 0.96 0.96 negemo 0.27 max d in − − max d 0.00 0.11 emo 0.23 out netspeak 0.56 assent 0.19 Our objective is thus to predict the value of this target Regarding linguistic features, most of those features have variable using the aforementioned sets of predictors. Hence, weak correlation (positive or negative), or no correlation the prediction turns into a binary classification task, where with the target variable. The highest correlations are for ‘net- we want to classify whether a time point t belongs to the speak’ (0.56) and ‘they’ (0.35). period of firestorm start (target = 1) or not (belongs to the period before the firestorm: target = 0), using different types 6.4 Design of the classification task of features of the tweets. This classification task needs to be performed for each firestorm separately and independently 6.4.1 Split into training and test sets from other firestorms. As in any supervised machine learning task, data instances 6.3 Comparing features between before and the need to be split into training and test subsets: the first is start of the firestorm used to train the classifier while the other is used to test it, i.e., to evaluate its performance. Typically, such splitting Before we dive deeper into the details of the classification of the dataset is performed in a random fashion, with, for task, it is interesting at this point to look at how different example, 75% of instances for training and the remaining predictor features correlate with our target variable (which 25% for testing. Moreover, in order to make a more reliable indicates the firestorm start). This would help us get insight evaluation, a cross validation approach is typically used, on the ability of those features to predict that target vari- such as the k-folds method. In k-folds cross-validation, the able. For this purpose, we calculate the Pearson correlation dataset is split into k consecutive folds, and each fold is of each feature with the target variable (its numeric value 0 then used once as a validation while the k − 1 remaining or 1). Table 2 shows the correlation values for the case of folds form the training set. This method generally results #myNYPD firestorm. in a less biased model compared to other methods, because We can observe that basic features—in particular, number it ensures that every observation from the original dataset of tweets N , number of mention tweets N and number of has the chance of appearing in the training and test set. t mt mentions N —have a relatively strong positive correlation However, in our firestorm dataset(s), positive and nega- with the target variable. tive classes are highly unbalanced, with a 1:7 ratio, i.e., This effect of strong positive correlation can be also for each positive instance there are 7 negative instances. observed for most of network features, such as number of To tackle this unbalanced issue, we use stratified k-folds, nodes N and edges E, relative size of largest (weakly) con- which is a variation in k-folds cross-validation that returns nected component lwcc, avg. and max. in-degree. In contrast, stratified folds, that is, the folds are made by preserving density has a strong negative correlation, which means that the percentage of samples for each class. this feature is lower at the start of the firestorm compared In this study, we opt to use k = 4 , and the dataset is to before the firestorm. On the other hand, reciprocity has split hence into 4 stratified folds. Thus, when the data - rather a weak correlation with the target variable; this corre- set contains 24 positive samples, and 168 negative ones, lation is positive for mention networks (+0.34) and negative then each fold will contain 24∕4 = 6 positive samples, and for retweet networks (-0.38). Finally, max d , the maximum about 168∕4 = 42 negative ones. The training is also per- out out-degree, has no correlation at all. formed 4 times, each time one of the folds is used as a test set while the remaining 3 folds are used as a training set. 1 3 59 Page 10 of 13 Social Network Analysis and Mining (2022) 12:59 This means that, each time, the 24 positive instances will Table 3 Accuracy of prediction models be distributed such that 6 instances will be in the test set Basic Linguistic Mention Retweet and 18 instances in the training set. This approach avoids askbg 0.926 0.916 0.958 0.953 the undesired situations where the training is performed askjpm 0.953 0.953 0.995 0.995 with very few or with too many positive instances. The cancelcolbert 0.948 0.953 0.990 0.984 overall evaluation score is calculated as the average over celebboutique 0.915 0.945 0.937 0.963 the 4 training times. david_cameron 1.000 0.995 0.995 0.995 fafsa 0.932 0.943 0.989 0.989 6.4.2 Feature scaling gaelgarciab 0.906 0.885 0.956 0.956 klm 0.891 0.902 0.907 0.907 In our case, different features have their values on very dif- mcdstories 0.943 0.938 0.923 0.961 ferent scales. For instance, regarding network features, the muslimrage 0.956 0.990 0.980 0.969 number of nodes N and edges E are usually > 10 , while −3 −2 mynypd 0.958 0.984 1.000 0.995 density is < 10 and reciprocity is < 10 . Thus, in order notintendedto. 0.990 0.984 0.989 0.989 to improve the prediction accuracy, we need to avoid some qantas 0.922 0.922 0.939 0.956 low-scale features being overwhelmed by other high-scale qantasluxury 0.932 0.943 0.972 0.972 ones; therefore, we use feature scaling in order to put the spaghettios 0.944 0.964 0.989 0.989 features roughly on the same scale. suey_park 0.943 0.948 0.990 0.995 We use the standard scaling approach, where each fea- theonion 0.974 0.974 0.989 0.989 ture is standardized by centering and scaling to unit vari- ukinusa 0.870 0.875 0.995 0.995 ance. The standard score of a sample x is calculated as: voguearticles 0.951 0.967 0.971 0.977 z =(x − (x))∕(x) where  is the mean of the samples, and whyimvotingukip 0.943 0.969 0.956 0.950 is the standard deviation of the samples. avg. 0.940 0.948 0.971 0.974 Centering and scaling happen independently on each fea- ture by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using transform. Standardization of a mention network model and linguistic model. Table 3 shows the overall accuracy for each firestorm, with respect to each dataset is a common requirement for many machine learning algorithms, as they might behave badly if the individual fea- prediction model. We can see that the prediction accuracy is pretty high in general where the accuracy is within the range tures do not roughly look like standard normally distributed data (e.g., Gaussian with 0 mean and unit variance). of 87% to 100%. For the basic model, the accuracy ranges between 87% (for 6.4.3 Algorithm ‘ukinusa’) and 100% (for ‘david_cameron’), with an average of 94%. For the linguistic model, the accuracy ranges between As a classification algorithm, we used the logistic regres- about 87% (e.g., ‘ukinusa’) and 99.5% (@David_Cameron), with an average of 95%. sion algorithm. Logistic regression is a well-known and widely used classification algorithm which extends linear Finally, the two network models, mention and retweet, show very similar results in general. The accuracy ranges between regression. Instead of fitting a straight line or hyperplane, the logistic regression model uses the logistic function to about 90% (klm) and 100% (myNYPD), with an average of 97%. Overall we can see that all the prediction models are able squeeze the output of a linear equation between 0 and 1. The logistic function is defined as: (x)= 1∕(1 + exp(−x)) to predict the start of the firestorm with very high accuracy. Interpretation Network models are slightly more accu- 6.4.4 Evaluation rate than the linguistic model, which is in turn slightly more accurate than the basic model. It is logical that in times of As an evaluation measure, we used Accuracy, which is firestorms there are a lot of mentions, hashtags and retweets, i.e., explicit network properties. Even more important and simply the fraction of correctly classified instances (to all instances). For each firestorm, the prediction accuracy is interesting is the result that we can measure early changes already in the language and that these properties are much calculated as the average of the accuracy over the 4 folds. more important for the early detection of changes. The fact that we make a comparison here should illustrate how well 6.5 Results our model works alongside other more explicit models. We applied the logistic regression algorithm to each fire- storm using different prediction models: basic model, 1 3 Social Network Analysis and Mining (2022) 12:59 Page 11 of 13 59 a company and that this is seen as less bad. Another path- 7 Conclusion way worth following would be to leverage contextualized word embeddings (Peters et al. 2018) to identify especially Our goal was to predict the outbreak of a firestorm using harmful words that demand early attention. Generally, the linguistic and network-based features. Therefore, we exam- question of what motivates people to ally against a target is ined the vocabulary of tweets from a diverse set of firestorms of great scientific and social interest. and compared it to non-firestorm tweets posted by the same Our results give insights about how negative word-of- users. Additionally, we measured features describing the mouth dynamics on social media evolve and how people mention and retweet networks also comparing firestorm speak when forming an outrage collectively. Our work con- with non-firestorm tweets. We used the features in a logistic tributed to the task of predicting outbreaks of firestorms. regression model to predict the outbreak of firestorms. The Knowing where a firestorm is likely to occur can help, for identie fi d linguistic and sentimental changes were good indi - example, platform moderators to know where an interven- cators for the outbreak of a firestorm. tion in a calming manner will be required. Ultimately, this Observing linguistic features, we found that during fire- can save individuals from being harassed and insulted in storms users talk significantly less about themselves com- online social networks. pared to non-firestorm periods which manifested in sig- nificantly fewer occurrences of self-referencing pronouns like ‘I,’ ‘me’ and the like. Simultaneously, the positivity in Funding Open Access funding enabled and organized by Projekt firestorm tweets vanishes and negativity rises. Especially DEAL. the change in the use of personal pronouns served as a good indicator for the outbreak of online firestorms. This Data availability The lists of Tweet IDs of the analyzed data can be change of subject to a different object of discussion could be shared upon request. observed in an increased mentioning of a user or a hashtag who/that was the target of a firestorm, hence the perspec- Declarations tive changes. Users start pointing at others. This expressed Conflict of interests The author(s) gratefully acknowledge the finan- itself in a maximum in-degree in mention networks that cial support from the Technical University of Munich—Institute for significantly deviated from comparable time periods giving Ethics in Artificial Intelligence (IEAI). Any opinions, findings and evidence for the pragmatic action from a network perspec- conclusions or recommendations expressed in this material are those tive. However, we are aware of the fact that we have only of the author(s) and do not necessarily reflect the views of the IEAI or its partners. measured cases in which the in-degree change happens in the context of something negative. Open Access This article is licensed under a Creative Commons Attri- Our models were able to predict the outbreak of a fire- bution 4.0 International License, which permits use, sharing, adapta- storm accurately. We were able to classify the outbreak of a tion, distribution and reproduction in any medium or format, as long firestorm with high accuracy (above 87% ) in all scenarios. It as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes showed, however, that classification models using features were made. The images or other third party material in this article are derived from the mention and retweet networks performed included in the article's Creative Commons licence, unless indicated slightly better than models based on linguistic features. otherwise in a credit line to the material. If material is not included in Overall, verbal interaction is a social process and linguis- the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will tic phenomena are analyzable both within the context of lan- need to obtain permission directly from the copyright holder. To view a guage itself and in the broader context of social behavior copy of this licence, visit http://cr eativ ecommons. or g/licen ses/ b y/4.0/ . (Gumperz 1968). From a linguistic perspective, the results give an idea of how people interact with one another. For this purpose, it was important to understand both the net- work and the speech acts. Changes in  the linguistic and References sentimental characteristics of the tweets thus proved to Allport G, Postman L (1947) The psychology of rumor. J Clin Psychol be early indicators of change in the parts of social media 3(4):402 networks studied. Besides the fact that users changed their Anderson C (2006) The long tail: why the future of business is selling perspective, we could also observe that positivity in words less of more. Hyperion Books, New York vanished and negativity increased. Appel G, Grewal L, Hadi R, Stephen AT (2020) The future of social media in marketing. J Acad Mark Sci 48(1):79–95 Future work could consider clustering firestorms accord- Auger IE, Lawrence CE (1989) Algorithms for the optimal identifica- ing to their dynamics, i.e., can firestorms be differentiated tion of segment neighborhoods. Bull Math Biol 51(1):39–54 in the way users ally against a target? This is of interest Baayen H (1993) Statistical models for frequency distributions: a lin- insofar as we know that negative PR can also mean profit for guistic evaluation. Comput Humanit 26:347–363 1 3 59 Page 12 of 13 Social Network Analysis and Mining (2022) 12:59 Bail CA, Brown TW, Mann M (2017) Channeling hearts and minds: Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive data- advocacy organizations, cognitive-emotional currents, and public sets, 2nd edn. Cambridge University Press, Cambridge conversation. Am Sociol Rev 82(6):1188–1213 McCulloh I, Carley KM (2011) Detecting change in longitudinal social Brady WJ, Wills JA, Jost JT, Tucker JA, Bavel JJV (2017) Emotion networks. J Soc Struct 12(3):1–37 shapes the diffusion of moralized content in social networks. McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: PNAS 114(28):7313–7318 homophily in social networks. Ann Rev Sociol 27(1):415–444 Bybee J, Hopper P (2001) Frequency and the emergence of linguistic Mochalova A, Nanopoulos A (2014) Restricting the spread of fire - structure. In: Bybee J, Hopper P (eds) Typological studies in lan- storms in social networks. In: ECIS 2014 proceedings guage, vol 45. John Benjamins Publishing Company, Amsterdam, Morstatter F, Pfeffer J, Liu H, Carley K (2013) Is the sample good pp 1–24 enough? comparing data from Twitter’s streaming API with Twit- Cartwright D, Harary F (1956) Structural balance: a generalization of ter’s firehose. In: Proceedings of the international AAAI confer - Heider’s theory. Psychol Rev 63(5):277–293 ence on web and social media, vol 7, no 1 Chadwick A (2017) The hybrid media system: politics and power, 2nd Myers S, Zhu C, Leskovec J (2012) Information diffusion and external edn. Oxford University Press, Oxford inu fl ence in networks. In: Proceedings of the 18th ACM SIGKDD Charlton N, Singleton C, Greetham DV (2016) In the mood: the international conference on knowledge discovery and data min- dynamics of collective sentiments on Twitter. R Soc Open Sci ing, pp 33–41 3(6):160162 Newman M (2010) Networks: an introduction. Oxford University Press Crystal D (2002) Language and the internet. IEEE Trans Prof Com- Inc, New York mun 45:142–144 Newman M, Barabási AL, Watts DJ (2006) The structure and dynamics Dandekar P, Goel A, Lee DT (2013) Biased assimilation, homo- of networks. Princeton University Press, Princeton phily, and the dynamics of polarization. Proc Natl Acad Sci Pariser E (2011) The filter bubble. What the internet is hiding from 110(15):5791–5796 you. The New York Press, New York Delgado-Ballester E, López-López I, Bernal-Palazón A (2021) Why Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The devel- do people initiate an online firestorm? the role of sadness, anger, opment and psychometric properties of LIWC2015. Technical and dislike. Int J Electron Commer 25:313–337 report, The University of Texas at Austin Drasch B, Huber J, Panz S, Probst F (2015) Detecting online firestorms Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zet- in social media. In: ICIS tlemoyer L (2018) Deep contextualized word representations. In: Ferrara E, Yang Z (2015) Measuring emotional contagion in social NAACL-HLT 2018, pp 2227–2237 media. PLOS ONE 10(11):e0142390 Pfeffer J, Zorbach T, Carley KM (2014) Understanding online fire- Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical storms: negative word-of-mouth dynamics in social media net- learning. Springer series in statistics. Springer, New York works. J Mark Commun 20(1–2):117–128 Goel S, Hofman J, Lahaie S, Pennock D, Watts D (2010) Predict- Rost K, Stahel L, Frey BS (2016) Digital social norm enforcement: ing consumer behavior with Web search. In: Proceedings of the online firestorms in social media. PLoS ONE 11(6):e0155923 National Academy of Sciences. National Academy of Sciences Ruths D, Pfeffer J (2014) Social media for large studies of behavior. Section: Physical Sciences, pp 17486–17490 Science 346:1063–1064 Gonzales AL, Hancock JT, Pennebaker JW (2010) Language style Scott K (2015) The pragmatics of hashtags: inference and conversa- matching as a predictor of social dynamics in small groups. Com- tional style on Twitter. J Pragmat 81:8–20 mun Res 37(1):3–19 Scott AJ, Knott M (1974) A cluster analysis method for grouping Gumperz J (1968) The speech community. In: Duranti A (ed) Linguis- means in the analysis of variance. Biometrics 30:507–512 tic anthropology: a reader. Wiley, New York, pp 166–173 Sen A, Srivastava MS (1975) On tests for detecting change in mean. Hauser F, Hautz J, Hutter K, Füller J (2017) Firestorms: modeling con- Ann Stat 3(1):98–108 flict diffusion and management strategies in online communities. Shaikh S, Feldman LB, Barach E, Marzouki Y (2017) Tweet sentiment J Strat Inf Syst 26(4):285–321 analysis with pronoun choice reveals online community dynamics Heider F (1946) Attitudes and cognitive organization. J Psychol in response to crisis events. In: Advances in cross-cultural deci- 21:107–112 sion making. Springer, pp 345–356 Hennig M, Brandes U, Pfeffer J, Mergel I (2012) Studying social net- Sharp WG, Hargrove DS (2004) Emotional expression and modality: works. A guide to empirical research. Campus Verlag, Frankfurt an analysis of affective arousal and linguistic output in a computer Herhausen D, Ludwig S, Grewal D, Wulf J, Schoegel M (2019) Detect- vs. paper paradigm. Comput Hum Behav 20(4):461–475 ing, preventing, and mitigating online firestorms in brand com - Snijders TAB (2001) The statistical evaluation of social network munities. J Mark 83(3):1–21 dynamics. Sociol Methodol 31(1):361–395 Jackson B, Scargle JD, Barnes D, Arabhi S, Alt A, Gioumousis P, Gwin Snijders TA, Koskinen J, Schweinberger M (2010) Maximum like- E, Sangtrakulcharoen P, Tan L, Tsai TT (2005) An algorithm for lihood estimation for social network dynamics. Ann Appl Stat optimal partitioning of data on an interval. IEEE Signal Process 4(2):567–588 Lett 12(2):105–108 Stich L, Golla G, Nanopoulos A (2014) Modelling the spread of James G, Witten D, Hastie T, Tibshirani R (2014) An introduction to negative word-of-mouth in online social networks. J Decis Syst statistical learning. Springer, Cham 23(2):203–221 Johnen M, Jungblut M, Ziegele M (2018) The digital outcry: what Strathern W, Schönfeld M, Ghawi R, Pfeffer J (2020) Against the oth- incites participation behavior in an online firestorm? New Media ers! Detecting moral outrage in social media networks. In: IEEE/ Soc 20(9):3140–3160 ACM International Conference on Advances in Social Networks Killick R, Fearnhead P, Eckley IA (2012) Optimal detection of Analysis and Mining, pp 322–326 changepoints with a linear computational cost. J Am Stat Assoc Strathern W, Ghawi R, Pfeffer J (2021) Advanced statistical analysis 107(500):1590–1598 of large-scale web-based data. In: Data science in economics and Lamba H, Malik MM, Pfeffer J (2015) A tempest in a teacup? Analyz- finance for decision makers. Edited by Per Nymand-Andersen ing firestorms on Twitter. In: 2015 IEEE/ACM ASONAM. New Tausczik YR, Pennebaker JW (2009) The psychological meaning of York, NY, USA, pp 17–24 words: LIWC and computerized text analysis methods. J Lang Soc Psychol 29:24–54 1 3 Social Network Analysis and Mining (2022) 12:59 Page 13 of 13 59 Udupa S (2020) Artificial intelligence and the cultural problem of Wasserman S, Faust K (1994) Social network analysis: methods and extreme speech. Social Science Research Council (20 December applications. Cambridge University Press, Cambridge, MA 2020) Vermeer S, Araujo T, Bernritter S, Noort G (2019) Seeing the wood Publisher's Note Springer Nature remains neutral with regard to for the trees: How machine learning can help firms in identify - jurisdictional claims in published maps and institutional affiliations. ing relevant electronic word-of-mouth in social media. Int J Res Mark 36:492–508 Vespignani A (2009) Predicting the behavior of techno-social systems. Science 325:425–428 Wagner C, Singer P, Karimi F, Pfeffer J, Strohmaier M (2017) Sam- pling from social networks with attributes. In: Proceedings of the WWW conference, pp 1181–1190 1 3 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Social Network Analysis and Mining Springer Journals

Identifying lexical change in negative word-of-mouth on social media

Loading next page...
 
/lp/springer-journals/identifying-lexical-change-in-negative-word-of-mouth-on-social-media-OYWtkcsiL7

References (67)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2022
ISSN
1869-5450
eISSN
1869-5469
DOI
10.1007/s13278-022-00881-0
Publisher site
See Article on Publisher Site

Abstract

Negative word-of-mouth is a strong consumer and user response to dissatisfaction. Moral outrages can create an excessive collective aggressiveness against one single argument, one single word, or one action of a person resulting in hateful speech. In this work, we examine the change of vocabulary to explore the outbreak of online firestorms on Twitter. The sudden change of an emotional state can be captured in language. It reveals how people connect with each other to form outrage. We find that when users turn their outrage against somebody, the occurrence of self-referencing pronouns like ‘I’ and ‘me’ reduces significantly. Using data from Twitter, we derive such linguistic features together with features based on retweets and mention networks to use them as indicators for negative word-of-mouth dynamics in social media networks. Based on these features, we build three classification models that can predict the outbreak of a firestorm with high accuracy. 1 Introduction politics. Even though online firestorms are a new phenom- enon, their dynamics are similar to the way in which rumors As social media platforms with hundreds of millions of users are circulated. In 1947, Gordon Allport and Leo Postman interacting in real time on topics and events all over the defined a rumor as a “proposition for belief, passed along world, social media networks are social sensors for online from person to person, usually by word of mouth, without discussions and are known for quick and often emotional secure standards of evidence being presented” (Allport and disputes (Chadwick 2017). Online firestorms can be defined Postman 1947). as the sudden discharge of large quantities of messages When people are active on social media, they act in a containing negative word of mouth and complaint behavior socio-technical system that is mediated and driven by algo- against a person, company or group in social media networks rithms. The goal of social media platforms is to keep users (Pfeffer et al. 2014). The negative dynamics often start with engaged and to maximize their time spent on the platform. a collective “against the others” (Strathern et al. 2020). Highly engaged users who spend a lot of time on platforms In social media, negative opinions about products or are the core of a social media business model that is based companies are formed by and propagated via thousands or on selling more and better targeted ads. But the question millions of people within hours. Furthermore, massive nega- is always which content will be interesting for a particular tive online dynamics are not only limited to the business user? To answer this, recommendation systems are devel- domain, but they also affect organizations and individuals in oped to increase the chance that a user will click on a sug- gested link and read its content. These recommendation algorithms incorporate socio-demographic information, but * Wienke Strathern also data of a user’s previous activity (Leskovec et al. 2014; wienke.strathern@tum.de Anderson 2006). Raji Ghawi Furthermore, behavioral data of alters (friends) of a user raji.ghawi@tum.de are also used to suggest new content (Appel et al. 2020). Mirco Schönfeld Social scientists have studied the driving forces of social mirco.schoenfeld@uni-bayreuth.de relationships for decades, i.e., why do people connect with Jürgen Pfeffer each other. Homophily and transitivity are the most impor- juergen.pfeffer@tum.de tant factors for network formation. Homophily means that School of Social Science and Technology, Technical your friends are similar to yourself (McPherson et al. 2001). University of Munich, Munich, Germany They like similar things and are interested in similar topics. University of Bayreuth, Bayreuth, Germany Vol.:(0123456789) 1 3 59 Page 2 of 13 Social Network Analysis and Mining (2022) 12:59 Transitivity describes the fact that a person’s friends are often connected among each other (Heider 1946; Cartwright and Harary 1956). Combining these two aspects results in the fact that most people are embedded in personal networks with people that are similar to themselves and who are to a 1 high degree connected among each other. The above-described forces of how humans create net- works combined with recommendation systems have prob- lematic implications. Recommendation systems filter the content that is presented on social media and suggest new Fig. 1 Early detection of linguistic indicators (1) and prediction of “friends” to us. As a result, filter bubbles (Pariser 2011) are firestorm (2) formed around individuals on social media, i.e., they are connected to like-minded people and familiar content. The lack of diversity in access to people and content can easily on a firestorm as depicted in Fig.  1, the indicators show at lead to polarization (Dandekar et al. 2013). If we now add time point 1), whereas the firestorm takes place starting another key characteristic of social media, abbreviated com- during the phase marked by 2) in the figure. Hence, in this munication with little space for elaborate exchange, a perfect paper, we build upon and extend the work presented by breeding ground for online firestorms emerges. Consider a Strathern et al. (2020). couple of people disliking a statement or action of a politi- Our choice of methods to answer our research ques- cian, celebrity or any private individual and these people tion regarding the prediction of the beginning of online voicing their dislike aggressively on social media. Their firestorms is based on text statistics and social network online peers, who most likely have similar views (see above), analysis for longitudinal network data. We assume that will easily and quickly agree by sharing or retweeting the anomalies in behavior can be detected by statistical analy- discontent. Within hours, these negative dynamics can reach sis applied to processes over time. Hence, in this work, tens of thousands of users (Newman et al. 2006). A major we extract lexical and network-based properties, meas- problem, however, is to capture first signals of online outrage ure their occurrence for different tweet periods and use at an early stage. Knowing about these signals would help to these features to predict the outbreak of a firestorm. For intervene in a proper way to avoid escalations and negative the scope of this work, we are mainly interested in textual dynamics. data from tweets and in mention and retweet networks. In previous work, Strathern et al. (2020) tackled the ques- We use quantitative linguistics to study lexical properties. tion of anomaly detection in a network by exploring major For our linguistic analysis, we apply the Linguistic Inquiry features that indicate the outbreak of a firestorm; hence, the Word Count Tool by Pennebaker et al. (2015). To contrast goal was to early detect change and extract linguistic fea- this linguistic perspective, we also investigate mention tures. Detection of outrage (e.g., hate speech) is based on and retweet networks. Mentions and hashtags represent identification of predefined keywords, while the context in speech acts in linguistic pragmatics and are interesting in which certain topics and words are being used has to be that they represent behavioral properties in addition to the almost disregarded. To name just one extreme example, hate lexical properties (Scott 2015). For predictive analysis, we groups have managed to escape keyword-based machine define models based on linguistic features as well as mod- detection through clever combinations of words, misspell- els based on features derived from mention and retweet ings, satire and coded language (Udupa 2020). The focus of networks and compare them with each other. the analysis of Strathern et al. was on more complex lexical characteristics, which they applied as a basis for automated Our contributions are: detection. Our research question is the following: On Twitter, • Extracting linguistic and sentimental features from tex- there is constant fluctuation of content and tweets and the tual data as indicators of firestorms. question arises if, in these fluctuations, we can detect early • Defining a prediction model that accounts for linguistic that a negative event starts solely based on linguistic fea- features. tures. We assume that the start of a firestorm is a process, and because of a sudden change of emotions it can be The remainder of the paper is organized as follows: Sect. 2 early detected in sentiments and lexical items. With this highlights important related works. In Sect. 3, we introduce work, we aim at answering the following question: Once the dataset used for this analysis together with a few descrip- we identify the linguistic changes as indicators of a fire - tive statistics. What follows in Sects. 4 and 5 is a descrip- storm, can we also predict a firestorm? In an abstract view tion of the linguistic and network-based features that our 1 3 Social Network Analysis and Mining (2022) 12:59 Page 3 of 13 59 prediction is based upon. The prediction task is described in gave helpful insights about the mathematical approach to detail in Sect. 6. Section 7 concludes the paper. sentiment dynamics (Charlton et al. 2016). Arguing that rational and emotional styles of commu- nication have strong influence on conversational dynam- 2 Related work ics, sentiments were the basis to measure the frequency of cognitive and emotional language on Facebook. Bail While online firestorms are similar to rumors to some extent, et al. (2017). e.g. they often rely on hearsay and uncertainty, online fire- Instead, the analysis of linguistic patterns was used to storms pose new challenges due to the speed and potential understand affective arousal and linguist output (Sharp and global reach of social media dynamics (Pfeffer et al. 2014). Hargrove 2004). Extracting the patterns of word choice in an With respect to firestorms on social media, the analysis of online social platform reflecting on pronouns is one way to social dynamics, their early detection and prediction often characterize how a community forms in response to adverse involves research from the field of sentiment analysis, net- events such as a terrorist attack (Shaikh et al. 2017). Syn- work analysis as well as change detection. There is work chronized verbal behavior can reveal important information asking why do people join online firestorms (Delgado-Ball- about social dynamics. The effectiveness of using language ester et al. 2021). Based on the concept of moral panics, the to predict change in social psychological factors of inter- authors argue that participation behavior is driven by a moral est can be demonstrated nicely (Gonzales et al. 2010). In compass and a desire for social recognition (Johnen et al. Lamba et al. (2015), the authors detected and described 21 2018). Social norm theory refers to understanding online online firestorms discussing their impact on the network. aggression in a social–political online setting, challenging To advance knowledge about firestorms and the spread of the popular assumption that online anonymity is one of the rumors, we use the extracted data as a starting point to fol- principle factors that promote aggression (Rost et al. 2016). low up on the research findings. 2.1 Sentiment analysis 2.2 Network analysis Approaches to the analysis of firestorms focusing on the mood of the users and their expressed sentiments unveil, for Social media dynamics can be described with models and example, that in the context of online firestorms, non-anon- methods of social networks (Wasserman and Faust 1994; ymous individuals are more aggressive compared to anony- Newman 2010; Hennig et  al. 2012). Approaches mainly mous individuals (Rost et al. 2016). Online firestorms are evaluating network dynamics are, for example, proposed used as a topic of news coverage by journalists and explore by Snijders et al. Here, network dynamics were modeled journalists’ contribution to attempts of online scandalization. as network panel data (Snijders et al. 2010). The assump- By covering the outcry, journalists elevate it onto a main- tion is that the observed data are discrete observations of a stream communication platform and support the process of continuous-time Markov process on the space of all directed scandalization. Based on a typology of online firestorms, the graphs on a given node set, in which changes in tie vari- authors have found that the majority of cases address events ables are independent conditional on the current graph. The of perceived discrimination and moral misconduct aiming model for tie changes is parametric and designed for applica- at societal change (Stich et al. 2014). Online firestorms on tions to social network analysis, where the network dynam- social media have been studied to design an Online Fire- ics can be interpreted as being generated by choices made storm Detector that includes an algorithm inspired by epide- by the social actors represented by the nodes of the graph. miological surveillance systems using real-world data from This study demonstrated ways in which network structure a firestorm (Drasch et al. 2015). reacts to users posting and sharing content. While exam- Sentiment analysis was applied to analyze the emotional ining the complete dynamics of the Twitter information shape of moral discussions in social networks (Brady network, the authors showed where users post and reshare et al. 2017). It has been argued that moral–emotional lan- information while creating and destroying connections. guage increased diffusion more strongly. Highlighting the Dynamics of network structure can be characterized by importance of emotion in the social transmission of moral steady rates of change, interrupted by sudden bursts (Myers ideas, the authors demonstrate the utility of social network et  al. 2012). Network dynamics were modeled as a class methods for studying morality. A different approach is to of statistical models for longitudinal network data (Snijders measure emotional contagion in social media and networks 2001). Dynamics of online firestorms were analyzed using by evaluating the emotional valence of content the users an agent-based computer simulation (ABS) (Hauser et al. are exposed to before posting their own tweets (Ferrara 2017)—information diffusion and opinion adoption are trig- and Yang 2015). Modeling collective sentiment on Twitter gered by negative conflict messages. 1 3 59 Page 4 of 13 Social Network Analysis and Mining (2022) 12:59 Table 1 Firestorm events sorted by number of tweets 2.3 Classification in machine learning Firestorm hashtag/mention Tweets Users First day In order to efficiently analyze big data, machine learning #whyimvotingukip 39,969 32,382 2014-05-21 methods are used, with the goal of learning from experience #muslimrage 15,721 11,952 2012-09-17 in certain tasks. In particular, in supervised learning, the #CancelColbert 13,277 10,353 2014-03-28 goal is to predict some output variable that is associated with #myNYPD 12,762 10,362 2014-04-23 each input item. This task is called classification when the @TheOnion 9959 8803 2013-02-25 output variable is a category. Many standard classification @KLM 8716 8050 2014-06-29 algorithms have been developed over the last decades, such #qantas 8649 5405 2011-10-29 as logistic regression, random forests, k nearest neighbors, @David_Cameron 7096 6447 2014-03-06 support vector machines and many more (Friedman et al. suey_park 6919 3854 2014-03-28 2001; James et al. 2014). @celebboutique 6679 6189 2012-07-20 Machine learning methods have been used widely for @GaelGarciaB 6646 6234 2014-06-29 studying users’ behavior on social media (Ruths and Pfef- #NotIntendedtobeaFactualStat. 6261 4389 2011-04-13 fer 2014), predicting the behavior of techno-social systems #AskJPM 4321 3418 2013-11-14 (Vespignani 2009) and predicting consumer behavior with @SpaghettiOs 2890 2704 2013-12-07 Web search (Goel et al. 2010). Moreover, such methods are #McDStories 2374 1993 2012-01-24 also used in identifying relevant electronic word of mouth #AskBG 2221 1933 2013-10-17 in social media (Vermeer et al. 2019; Strathern et al. 2021). #QantasLuxury 2098 1658 2011-11-22 #VogueArticles 1894 1819 2014-09-14 2.4 Mixed approaches @fafsa 1828 1693 2014-06-25 @UKinUSA 142 140 2014-08-27 More recent approaches analyze online firestorms by analyz- ing both content and structural information. A text-mining study on online firestorms evaluates negative eWOM that demonstrates distinct impacts of high- and low-arousal decahose, a random 10% sample of all tweets. This is a emotions, structural tie strength, and linguistic style match (between sender and brand community) on firestorm poten- scaled up version of Twitter’s Sample API, which gives a stream of a random 1% sample of all tweets. tial (Herhausen et al. 2019). Online Firestorms were studied to develop optimized forms of counteraction, which engage Mention and retweet networks based on these sam- ples can be considered as random edge sampled networks individuals to act as supporters and initiate the spread of positive word of mouth, helping to constrain the firestorm (Wagner et al. 2017) since sampling and network construc- tion is based on Tweets that constitute the links in the as much as possible (Mochalova and Nanopoulos 2014). By monitoring psychological and linguistic features in the network. As found by Morstatter et al. (2013), the Sample API (unlike the Streaming API) indeed gives an accurate tweets and network features, we combine methods from text analysis, social network analysis and change detection to representation of the relative frequencies of hashtags over time. We assume that the decahose has this property as early detect and predict the start of a firestorm. well, with the significant benefit that it gives us more sta- tistical power to estimate the true size of smaller events. 3 Data The dataset consists of 20 firestorms with the  high- est volume of tweets as identified in Lamba et al. (2015). To address our research question, we examined 20 different Table  1 shows those events along with the number of tweets, number of users, and the date of the first day of firestorms. Some are directed against individuals and a sin- gle statement; some are against companies, campaigns and the event. The set of tweets of each firestorm covers the first week of the event. We also augmented this dataset via marketing actions. They have all received widespread public attention in social media as well as mainstream media. As including additional tweets, of the same group of users, during the same week of the event (7 days) and the week shown in Table 1, there are hashtags and also @mentions that name the target. before (8 days), such that the volume of tweets is balanced between the 2 weeks (about 50% each). The fraction of 3.1 Dataset firestorm-related tweets is between 2 and 8% of the tweets of each event (Table 1)—it is important to realize at this We used the same set of firestorms as in Lamba et  al. point that even for users engaging in online firestorms, (2015), whose data source is an archive of the Twitter 1 3 Social Network Analysis and Mining (2022) 12:59 Page 5 of 13 59 this activity is a minor part of their overall activity on the during firestorms compared to non-firestorm periods. Fur - platform. thermore, we would like to know which lexical items differ Thus, for each of the 20 firestorms, we have three types of in different phases. We extracted 90 lexical features for each tweets: (1) tweets related to the firestorm, (2) tweets posted tweet of each of the 20 firestorms. We used variables that 1 week before the firestorm and (3) tweets posted during the give standard linguistic dimensions (percentage of words firestorm (same week) but not related to it. Let us denote in the text that are pronouns, articles, auxiliary verbs) and these three sets of tweets T , T and T , respectively . informal language markers (percentage of words that refer 1 2 3 For each event, we also extracted tweets metadata includ- to the category assents, fillers, swear words, netspeak). To ing timestamp, hashtags, mentions and retweet information discover sentiments, we also used the variables affective (user and tweet ID). processes, cognitive processes, perceptual processes. The categories provide a sentiment score of positivity and nega- tivity to every single tweet. We also considered the category 4 Linguistic features posemo and negemo to see if a tweet is considered positive or negative. We also constructed our own category ‘emo’ Negative word-of-mouth sometimes contains strong emo- by calculating the difference between positive and negative tional expressions and even highly aggressive words against sentiments in tweets. Thus, weights of this category can a person or a company. Hence, the start of a firestorm might be negative and should describe the overall sentiment of be indicated by a sudden change of vocabulary and emo- a tweet. tions. Do people become emotionally thrilled and can we These categories each contain several subcategories that find changes in tweets? Can we capture a change of perspec- can be subsumed under the category names. The category of tive in the text against a target? Emotionality is reflected in personal pronouns, for example, contains several subcatego- words, the first analysis is based on the smallest structural ries referring to personal pronouns in numerous forms. One unit in language: words (Bybee and Hopper 2001). of these subcategories ‘I,’ for example, includes—besides the pronoun ‘I’—‘me,’ ‘mine,’ ‘my,’ and special netspeak 4.1 Extraction of features forms such as ‘idk’ (which means “I don’t know”). Netspeak is a written and oral language, an internet chat, To extract linguistic features and sentiment scores we use the which has developed mainly from the technical circum- Linguistic Inquiry Word Count classification scheme, short stances: the keyboard and the screen. The combination of LIWCTool (Pennebaker et al. 2015). In this way, first tex- technology and language makes it possible to write the way tual differences and similarities can be quantified by simple you speak (Crystal 2002). word frequency distribution (Baayen 1993). Furthermore, to Finally, for each individual subcategory, we obtain the understand emotions in tweets we use the sentiment analysis mean value of the respective LIWC values for the firestorm provided by the LIWCTool. Essentially, sentiment analysis tweets and the non-r fi estorm tweets. Comparing these values is the automatic determination of the valence or polarity of gives first insights about lexical differences and similarities. a text part, i.e., the classification of whether a text part has a positive, negative or neutral valence. Basically, automatic 4.2 Comparing firestorm and non‑firestorm tweets methods of sentiment analysis work either lexicon based or on the basis of machine learning. Lexicon-based meth- In order to explore how the linguistic and sentiment features ods use extensive lexicons in which individual words are of tweets change during firestorms, we perform compari- assigned positive or negative numerical values to determine sons between firestorm tweets and non-firestorm tweets with the valence of a text section (usually at the sentence level) regard to the individual LIWC subcategories. The firestorm of a text part (mostly on sentence level) (Tausczik and Pen- tweets (T ) were compared with tweets from the same user nebaker 2009). accounts from the week immediately before the firestorm LIWC contains a dictionary with about 90 output vari- (T ) and the same week of the firestorm ( T ). We used t-tests 2 3 ables, so each tweet is matched with about 90 different cat- to compare the mean value of the respective LIWC values egories. The classification scheme is based on psychologi- for the firestorm tweets and the non-firestorm tweets, where cal and linguistic research. Particularly, we were interested the level of statistical significance of those tests is expressed in sentiments to see if users show ways of aggressiveness using p-values (we used p < 0.01). Figure  2a depicts the comparisons between firestorm tweets and non-firestorm tweets with regard to the individual subcategories. Every subcategory was examined separately Comparing with (Lamba et  al. 2015), we have excluded ‘Ask- for all 20 firestorms. Thicke’ firestorm, because it has a gap of 24  h between T and T ; 2 1 hence, we added ‘suey_park’ firestorm instead. 1 3 59 Page 6 of 13 Social Network Analysis and Mining (2022) 12:59 Fig. 2 Comparison between firestorm-related tweets (T1) and non-firestorm tweets (T2 askbg and T3) w.r.t various linguistic features using T-tests with p askjpm value < 0.01 cancelcolbert celebbou�que david_cameron fafsa gaelgarciab klm mcdstories muslimrage mynypd no�ntendedtobeafact qantas qantasluxury spaghe‚os suey_park theonion ukinusa voguear�cle s whyimvo�nguki p significantly LOWER in firestorm-tweets w.r.t non-firestorm tweets no significant diff erence between firestorm-tweets and non-firestorm tweets significantly HIGHER in firestorm-tweets w.r.t non-firestorm tweets (a) Resultsof comparison lower 15 811158 16 5187 2 same 030 05 21211 higher 599 572 14 01217 (b) Numberof firestorms The blue (turquoise) cells represent the firestorms in while in 15 firestorms these words were used significantly which terms from the respective category occurred more less. Similar results are observed for category ‘she/he’ In frequently during the firestorms. The red (brick) cells rep- addition to the category ‘I’, the categories ‘posemo’ and resent the firestorms in which the same words occurred less ‘negemo’ should also be highlighted. Words representing frequently during the firestorms. The light gray cells repre- positive emotions like ‘love,’ ‘nice,’ ‘sweet’—the ‘posemo’ sent the firestorms in which there is no significant difference category—are used significantly less in almost all firestorms: between firestorm tweets and non-firestorm tweets. positive emotions were less present in 16 out of 20 fire- The results of comparison are aggregated in the table in storms. For the category ‘negemo,’ which contains words Fig. 2b, which shows, for each feature, the number of fire- representing negative emotions, this effect is reversed for storms according to the three cases of comparison: lower, all tweets—words in this category are used significantly higher and same (no significant difference). more often during most of the firestorms (14 out of 20). Results For category ‘I’ this means that in five firestorms There are 18 firestorms in which the ‘emo’ values were sig - people used words of this category significantly more often, nificantly lower during a firestorm. At the same time, there 1 3 we we you you shehe shehe they they posemo posemo negemo negemo emo emo netspeak netspeak assent assent Social Network Analysis and Mining (2022) 12:59 Page 7 of 13 59 Fig. 3 Evolution of network features over time (#myNYPD firestorm). Highlighted area indicates the start of the fire- storm (first 24 h) were only two firestorms where the differences in the values Moreover, at each time point we construct mention networks, of ‘emo’ were not significant. Another remarkable category and retweet networks taking into account all the tweets dur- is ‘assent,’ which contains words like ‘agree,’ ‘OK,’ ‘yes.’ ing the last 12 h. This way, we obtain a moving window of In this category, the effect is also reversed—words in this tweets: with a window size of 12 slices at steps of 1 h. The category are used significantly more often during almost all mention network of each moving window contains an edge firestorms (17 out of 20). Interpretation. We can state that ( user , user ) if a tweet (among tweets under consideration) 1 2 during firestorms, the I vanishes and users talk significantly posted by user contains a mention to user . The retweet 1 2 less about themselves compared to non-firestorm periods. network of each moving window contains an edge ( user , Simultaneously, the positivity in firestorms tweets vanishes user ) if a tweet (among tweets under consideration) posted and negativity rises. by user is a retweet of another (original) tweet posted by user . For each event, the mention networks constructed at dif- 5 Mention and retweet networks ferent time points are directed, unweighted networks. We performed several types of social network analysis and Besides linguistic features and sentiments expressed in extracted a set of metrics, including: tweets, online firestorms have also impact on the struc- ture of user’s social networks, such as mention and retweet • Number of nodes N and edges E, networks. Average out-degree (which equals avg. in-degree). To get insight on the evolution of each firestorm over • Maximum out-degree and maximum in-degree. time, we first split the time-line of each of the firestorm data- Relative size of the largest connected component. sets into buckets of one hour and assign tweets to buckets based on their timestamp. The result of this splitting is a Each of the aforementioned features leads to a time-series series of about 360 time slices (since the studied time-span when taken over the entire time-span of the event. For exam- of an event is 15 days). This allows us to perform analysis ple, Fig. 3 depicts some of those time-series for the features at fine granularity. of the mention and retweet networks of #myNYPD fire- First, at each time slice, we extract several basic features storm, showing how those features evolve over time. While of the corresponding hourly buckets of tweets, including: network metrics are affected by sampled datasets, we still believe that these metrics are meaningful since the sampling • Number of tweets N process was consistent over all firestorms. Number of mention tweets N Results One can clearly observe the oscillating behavior mt • Number of mentions N of those features. This oscillation is due to the alternation Ratio of mention tweets to all tweets N ∕N . of tweeting activity between daytime and night. More inter- mt t • Mention per tweet ratio: N ∕N . esting observation is the manifest change of behavior that m t occurs near the middle of the time span, which evidently 1 3 59 Page 8 of 13 Social Network Analysis and Mining (2022) 12:59 tweets before firestorm (T ) Fig. 4 Timeline of a firestorm firestorm tweets (T ) firestorm start observation period target = 0 target = 1 signals the beginning of the firestorm event. This apparent Thus, for each time slice, the corresponding bucket of change can be observed in most of the features for the event tweets is described by several features. Mainly, we distin- at hand. However, not all the features are useful to detect the guish between different types of features; each type of them trigger of the firestorm in all events. In particular, we find defines a prediction model: the maximum in-degree feature is one of the best features to detect this change. This feature can clearly detect the start Baseline model includes the basic features, such as of the firestorm (in all events). The maximum in-degree in number of tweets N , number of mentions N , etc. (see t m mention networks means the highest number of mentions Sect. 5). received by a particular user. • Mention-network model includes network features, such Interpretation Thus, the ability of this feature to detect a as, number of nodes and edges, density, reciprocity, aver- firestorm can be interpreted by considering that, generally age and max in-degree and out-degree, etc., extracted speaking, a firestorm occurs when one user is being men- from mention networks. tioned unusually high. This result is intuitive since Tweets • Retweet-network model includes the same set of network related to a certain Firestorm normally mention the victim’s features extracted from retweet networks. Twitter account. • Linguistic model extends the basic model by including Monitoring this feature in real-time would be certainly linguistic features, i.e., the mean values of extracted handy at detecting firestorms as early as possible, by signal- LIWC features (over the hourly bucket of tweets). In ing abnormal changes (increase) in this feature. However, particular, we are interested in the following features: the change of focus to a particular user can be the result of pronouns: namely: ‘i,’ ‘we,’ ‘you,’ ‘shehe,’ and ‘they’; different (including positive) events. emotions: ‘posemo,’ ‘negemo’ and ‘emo’; and ‘netspeak’ From a network perspective, an online firestorm occurs and ‘assent.’ when one user is mentioned unusually high, focusing on a Twitter handle or a hashtag. The maximum in-degree in @ By doing so, we create separate time series for each of the mention networks is significantly deviating from comparable features mentioned above. time periods. 6.2 Target variable 6 Predicting the start of a firestorm As shown in Fig. 4, the time span of the two sets of tweets T and T is 8 and 7 days, respectively, with an overlap of 1 2 1 In the previous section we identified slight changes in lexi- day between the two periods. We consider the first day of cal and sentimental cues as indicators of a firestorm. From a the firestorm as its start. Hence, we create a target variable network perspective, we identified the maximum in-degree whose value is 0 for the time points t occurring entirely to be a very good indicator for a firestorm to occur. Based before the firestorm (the first 7 days of T ) and 1 for the on these findings we want to test and compare our extracted time points t occurring during the first day of the firestorm. features for a classification task in order to build models for The rest of the firestorm days are omitted. Hence, we obtain predicting the start of a firestorm. about 7 × 24 = 168 time points where target = 0 (negative instances), as well as 24 points where target = 1 (positive 6.1 Prediction models (predictor variables) instances). As mentioned earlier, we split the time-line of each firestorm into buckets of one hour and assign tweets to buckets based on their timestamp. This number slightly varies from one firestorm to another. 1 3 Social Network Analysis and Mining (2022) 12:59 Page 9 of 13 59 Table 2 Pearson correlation of Basic features Network features Linguistic features basic features, network features and linguistic features with the Mention Retweet target variable (#myNYPD N 0.70 N 0.76 0.83 i 0.16 firestorm) N 0.71 E 0.80 0.86 we 0.12 mn − − − N 0.67 density 0.61 0.68 you 0.15 − − N ∕N 0.40 recip. 0.34 0.38 she/he 0.06 mt t N ∕N 0.21 lwcc 0.82 0.85 they 0.35 m t avg d 0.82 0.87 posemo 0.01 in 0.96 0.96 negemo 0.27 max d in − − max d 0.00 0.11 emo 0.23 out netspeak 0.56 assent 0.19 Our objective is thus to predict the value of this target Regarding linguistic features, most of those features have variable using the aforementioned sets of predictors. Hence, weak correlation (positive or negative), or no correlation the prediction turns into a binary classification task, where with the target variable. The highest correlations are for ‘net- we want to classify whether a time point t belongs to the speak’ (0.56) and ‘they’ (0.35). period of firestorm start (target = 1) or not (belongs to the period before the firestorm: target = 0), using different types 6.4 Design of the classification task of features of the tweets. This classification task needs to be performed for each firestorm separately and independently 6.4.1 Split into training and test sets from other firestorms. As in any supervised machine learning task, data instances 6.3 Comparing features between before and the need to be split into training and test subsets: the first is start of the firestorm used to train the classifier while the other is used to test it, i.e., to evaluate its performance. Typically, such splitting Before we dive deeper into the details of the classification of the dataset is performed in a random fashion, with, for task, it is interesting at this point to look at how different example, 75% of instances for training and the remaining predictor features correlate with our target variable (which 25% for testing. Moreover, in order to make a more reliable indicates the firestorm start). This would help us get insight evaluation, a cross validation approach is typically used, on the ability of those features to predict that target vari- such as the k-folds method. In k-folds cross-validation, the able. For this purpose, we calculate the Pearson correlation dataset is split into k consecutive folds, and each fold is of each feature with the target variable (its numeric value 0 then used once as a validation while the k − 1 remaining or 1). Table 2 shows the correlation values for the case of folds form the training set. This method generally results #myNYPD firestorm. in a less biased model compared to other methods, because We can observe that basic features—in particular, number it ensures that every observation from the original dataset of tweets N , number of mention tweets N and number of has the chance of appearing in the training and test set. t mt mentions N —have a relatively strong positive correlation However, in our firestorm dataset(s), positive and nega- with the target variable. tive classes are highly unbalanced, with a 1:7 ratio, i.e., This effect of strong positive correlation can be also for each positive instance there are 7 negative instances. observed for most of network features, such as number of To tackle this unbalanced issue, we use stratified k-folds, nodes N and edges E, relative size of largest (weakly) con- which is a variation in k-folds cross-validation that returns nected component lwcc, avg. and max. in-degree. In contrast, stratified folds, that is, the folds are made by preserving density has a strong negative correlation, which means that the percentage of samples for each class. this feature is lower at the start of the firestorm compared In this study, we opt to use k = 4 , and the dataset is to before the firestorm. On the other hand, reciprocity has split hence into 4 stratified folds. Thus, when the data - rather a weak correlation with the target variable; this corre- set contains 24 positive samples, and 168 negative ones, lation is positive for mention networks (+0.34) and negative then each fold will contain 24∕4 = 6 positive samples, and for retweet networks (-0.38). Finally, max d , the maximum about 168∕4 = 42 negative ones. The training is also per- out out-degree, has no correlation at all. formed 4 times, each time one of the folds is used as a test set while the remaining 3 folds are used as a training set. 1 3 59 Page 10 of 13 Social Network Analysis and Mining (2022) 12:59 This means that, each time, the 24 positive instances will Table 3 Accuracy of prediction models be distributed such that 6 instances will be in the test set Basic Linguistic Mention Retweet and 18 instances in the training set. This approach avoids askbg 0.926 0.916 0.958 0.953 the undesired situations where the training is performed askjpm 0.953 0.953 0.995 0.995 with very few or with too many positive instances. The cancelcolbert 0.948 0.953 0.990 0.984 overall evaluation score is calculated as the average over celebboutique 0.915 0.945 0.937 0.963 the 4 training times. david_cameron 1.000 0.995 0.995 0.995 fafsa 0.932 0.943 0.989 0.989 6.4.2 Feature scaling gaelgarciab 0.906 0.885 0.956 0.956 klm 0.891 0.902 0.907 0.907 In our case, different features have their values on very dif- mcdstories 0.943 0.938 0.923 0.961 ferent scales. For instance, regarding network features, the muslimrage 0.956 0.990 0.980 0.969 number of nodes N and edges E are usually > 10 , while −3 −2 mynypd 0.958 0.984 1.000 0.995 density is < 10 and reciprocity is < 10 . Thus, in order notintendedto. 0.990 0.984 0.989 0.989 to improve the prediction accuracy, we need to avoid some qantas 0.922 0.922 0.939 0.956 low-scale features being overwhelmed by other high-scale qantasluxury 0.932 0.943 0.972 0.972 ones; therefore, we use feature scaling in order to put the spaghettios 0.944 0.964 0.989 0.989 features roughly on the same scale. suey_park 0.943 0.948 0.990 0.995 We use the standard scaling approach, where each fea- theonion 0.974 0.974 0.989 0.989 ture is standardized by centering and scaling to unit vari- ukinusa 0.870 0.875 0.995 0.995 ance. The standard score of a sample x is calculated as: voguearticles 0.951 0.967 0.971 0.977 z =(x − (x))∕(x) where  is the mean of the samples, and whyimvotingukip 0.943 0.969 0.956 0.950 is the standard deviation of the samples. avg. 0.940 0.948 0.971 0.974 Centering and scaling happen independently on each fea- ture by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using transform. Standardization of a mention network model and linguistic model. Table 3 shows the overall accuracy for each firestorm, with respect to each dataset is a common requirement for many machine learning algorithms, as they might behave badly if the individual fea- prediction model. We can see that the prediction accuracy is pretty high in general where the accuracy is within the range tures do not roughly look like standard normally distributed data (e.g., Gaussian with 0 mean and unit variance). of 87% to 100%. For the basic model, the accuracy ranges between 87% (for 6.4.3 Algorithm ‘ukinusa’) and 100% (for ‘david_cameron’), with an average of 94%. For the linguistic model, the accuracy ranges between As a classification algorithm, we used the logistic regres- about 87% (e.g., ‘ukinusa’) and 99.5% (@David_Cameron), with an average of 95%. sion algorithm. Logistic regression is a well-known and widely used classification algorithm which extends linear Finally, the two network models, mention and retweet, show very similar results in general. The accuracy ranges between regression. Instead of fitting a straight line or hyperplane, the logistic regression model uses the logistic function to about 90% (klm) and 100% (myNYPD), with an average of 97%. Overall we can see that all the prediction models are able squeeze the output of a linear equation between 0 and 1. The logistic function is defined as: (x)= 1∕(1 + exp(−x)) to predict the start of the firestorm with very high accuracy. Interpretation Network models are slightly more accu- 6.4.4 Evaluation rate than the linguistic model, which is in turn slightly more accurate than the basic model. It is logical that in times of As an evaluation measure, we used Accuracy, which is firestorms there are a lot of mentions, hashtags and retweets, i.e., explicit network properties. Even more important and simply the fraction of correctly classified instances (to all instances). For each firestorm, the prediction accuracy is interesting is the result that we can measure early changes already in the language and that these properties are much calculated as the average of the accuracy over the 4 folds. more important for the early detection of changes. The fact that we make a comparison here should illustrate how well 6.5 Results our model works alongside other more explicit models. We applied the logistic regression algorithm to each fire- storm using different prediction models: basic model, 1 3 Social Network Analysis and Mining (2022) 12:59 Page 11 of 13 59 a company and that this is seen as less bad. Another path- 7 Conclusion way worth following would be to leverage contextualized word embeddings (Peters et al. 2018) to identify especially Our goal was to predict the outbreak of a firestorm using harmful words that demand early attention. Generally, the linguistic and network-based features. Therefore, we exam- question of what motivates people to ally against a target is ined the vocabulary of tweets from a diverse set of firestorms of great scientific and social interest. and compared it to non-firestorm tweets posted by the same Our results give insights about how negative word-of- users. Additionally, we measured features describing the mouth dynamics on social media evolve and how people mention and retweet networks also comparing firestorm speak when forming an outrage collectively. Our work con- with non-firestorm tweets. We used the features in a logistic tributed to the task of predicting outbreaks of firestorms. regression model to predict the outbreak of firestorms. The Knowing where a firestorm is likely to occur can help, for identie fi d linguistic and sentimental changes were good indi - example, platform moderators to know where an interven- cators for the outbreak of a firestorm. tion in a calming manner will be required. Ultimately, this Observing linguistic features, we found that during fire- can save individuals from being harassed and insulted in storms users talk significantly less about themselves com- online social networks. pared to non-firestorm periods which manifested in sig- nificantly fewer occurrences of self-referencing pronouns like ‘I,’ ‘me’ and the like. Simultaneously, the positivity in Funding Open Access funding enabled and organized by Projekt firestorm tweets vanishes and negativity rises. Especially DEAL. the change in the use of personal pronouns served as a good indicator for the outbreak of online firestorms. This Data availability The lists of Tweet IDs of the analyzed data can be change of subject to a different object of discussion could be shared upon request. observed in an increased mentioning of a user or a hashtag who/that was the target of a firestorm, hence the perspec- Declarations tive changes. Users start pointing at others. This expressed Conflict of interests The author(s) gratefully acknowledge the finan- itself in a maximum in-degree in mention networks that cial support from the Technical University of Munich—Institute for significantly deviated from comparable time periods giving Ethics in Artificial Intelligence (IEAI). Any opinions, findings and evidence for the pragmatic action from a network perspec- conclusions or recommendations expressed in this material are those tive. However, we are aware of the fact that we have only of the author(s) and do not necessarily reflect the views of the IEAI or its partners. measured cases in which the in-degree change happens in the context of something negative. Open Access This article is licensed under a Creative Commons Attri- Our models were able to predict the outbreak of a fire- bution 4.0 International License, which permits use, sharing, adapta- storm accurately. We were able to classify the outbreak of a tion, distribution and reproduction in any medium or format, as long firestorm with high accuracy (above 87% ) in all scenarios. It as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes showed, however, that classification models using features were made. The images or other third party material in this article are derived from the mention and retweet networks performed included in the article's Creative Commons licence, unless indicated slightly better than models based on linguistic features. otherwise in a credit line to the material. If material is not included in Overall, verbal interaction is a social process and linguis- the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will tic phenomena are analyzable both within the context of lan- need to obtain permission directly from the copyright holder. To view a guage itself and in the broader context of social behavior copy of this licence, visit http://cr eativ ecommons. or g/licen ses/ b y/4.0/ . (Gumperz 1968). From a linguistic perspective, the results give an idea of how people interact with one another. For this purpose, it was important to understand both the net- work and the speech acts. Changes in  the linguistic and References sentimental characteristics of the tweets thus proved to Allport G, Postman L (1947) The psychology of rumor. J Clin Psychol be early indicators of change in the parts of social media 3(4):402 networks studied. Besides the fact that users changed their Anderson C (2006) The long tail: why the future of business is selling perspective, we could also observe that positivity in words less of more. Hyperion Books, New York vanished and negativity increased. Appel G, Grewal L, Hadi R, Stephen AT (2020) The future of social media in marketing. J Acad Mark Sci 48(1):79–95 Future work could consider clustering firestorms accord- Auger IE, Lawrence CE (1989) Algorithms for the optimal identifica- ing to their dynamics, i.e., can firestorms be differentiated tion of segment neighborhoods. Bull Math Biol 51(1):39–54 in the way users ally against a target? This is of interest Baayen H (1993) Statistical models for frequency distributions: a lin- insofar as we know that negative PR can also mean profit for guistic evaluation. Comput Humanit 26:347–363 1 3 59 Page 12 of 13 Social Network Analysis and Mining (2022) 12:59 Bail CA, Brown TW, Mann M (2017) Channeling hearts and minds: Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive data- advocacy organizations, cognitive-emotional currents, and public sets, 2nd edn. Cambridge University Press, Cambridge conversation. Am Sociol Rev 82(6):1188–1213 McCulloh I, Carley KM (2011) Detecting change in longitudinal social Brady WJ, Wills JA, Jost JT, Tucker JA, Bavel JJV (2017) Emotion networks. J Soc Struct 12(3):1–37 shapes the diffusion of moralized content in social networks. McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: PNAS 114(28):7313–7318 homophily in social networks. Ann Rev Sociol 27(1):415–444 Bybee J, Hopper P (2001) Frequency and the emergence of linguistic Mochalova A, Nanopoulos A (2014) Restricting the spread of fire - structure. In: Bybee J, Hopper P (eds) Typological studies in lan- storms in social networks. In: ECIS 2014 proceedings guage, vol 45. John Benjamins Publishing Company, Amsterdam, Morstatter F, Pfeffer J, Liu H, Carley K (2013) Is the sample good pp 1–24 enough? comparing data from Twitter’s streaming API with Twit- Cartwright D, Harary F (1956) Structural balance: a generalization of ter’s firehose. In: Proceedings of the international AAAI confer - Heider’s theory. Psychol Rev 63(5):277–293 ence on web and social media, vol 7, no 1 Chadwick A (2017) The hybrid media system: politics and power, 2nd Myers S, Zhu C, Leskovec J (2012) Information diffusion and external edn. Oxford University Press, Oxford inu fl ence in networks. In: Proceedings of the 18th ACM SIGKDD Charlton N, Singleton C, Greetham DV (2016) In the mood: the international conference on knowledge discovery and data min- dynamics of collective sentiments on Twitter. R Soc Open Sci ing, pp 33–41 3(6):160162 Newman M (2010) Networks: an introduction. Oxford University Press Crystal D (2002) Language and the internet. IEEE Trans Prof Com- Inc, New York mun 45:142–144 Newman M, Barabási AL, Watts DJ (2006) The structure and dynamics Dandekar P, Goel A, Lee DT (2013) Biased assimilation, homo- of networks. Princeton University Press, Princeton phily, and the dynamics of polarization. Proc Natl Acad Sci Pariser E (2011) The filter bubble. What the internet is hiding from 110(15):5791–5796 you. The New York Press, New York Delgado-Ballester E, López-López I, Bernal-Palazón A (2021) Why Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The devel- do people initiate an online firestorm? the role of sadness, anger, opment and psychometric properties of LIWC2015. Technical and dislike. Int J Electron Commer 25:313–337 report, The University of Texas at Austin Drasch B, Huber J, Panz S, Probst F (2015) Detecting online firestorms Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zet- in social media. In: ICIS tlemoyer L (2018) Deep contextualized word representations. In: Ferrara E, Yang Z (2015) Measuring emotional contagion in social NAACL-HLT 2018, pp 2227–2237 media. PLOS ONE 10(11):e0142390 Pfeffer J, Zorbach T, Carley KM (2014) Understanding online fire- Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical storms: negative word-of-mouth dynamics in social media net- learning. Springer series in statistics. Springer, New York works. J Mark Commun 20(1–2):117–128 Goel S, Hofman J, Lahaie S, Pennock D, Watts D (2010) Predict- Rost K, Stahel L, Frey BS (2016) Digital social norm enforcement: ing consumer behavior with Web search. In: Proceedings of the online firestorms in social media. PLoS ONE 11(6):e0155923 National Academy of Sciences. National Academy of Sciences Ruths D, Pfeffer J (2014) Social media for large studies of behavior. Section: Physical Sciences, pp 17486–17490 Science 346:1063–1064 Gonzales AL, Hancock JT, Pennebaker JW (2010) Language style Scott K (2015) The pragmatics of hashtags: inference and conversa- matching as a predictor of social dynamics in small groups. Com- tional style on Twitter. J Pragmat 81:8–20 mun Res 37(1):3–19 Scott AJ, Knott M (1974) A cluster analysis method for grouping Gumperz J (1968) The speech community. In: Duranti A (ed) Linguis- means in the analysis of variance. Biometrics 30:507–512 tic anthropology: a reader. Wiley, New York, pp 166–173 Sen A, Srivastava MS (1975) On tests for detecting change in mean. Hauser F, Hautz J, Hutter K, Füller J (2017) Firestorms: modeling con- Ann Stat 3(1):98–108 flict diffusion and management strategies in online communities. Shaikh S, Feldman LB, Barach E, Marzouki Y (2017) Tweet sentiment J Strat Inf Syst 26(4):285–321 analysis with pronoun choice reveals online community dynamics Heider F (1946) Attitudes and cognitive organization. J Psychol in response to crisis events. In: Advances in cross-cultural deci- 21:107–112 sion making. Springer, pp 345–356 Hennig M, Brandes U, Pfeffer J, Mergel I (2012) Studying social net- Sharp WG, Hargrove DS (2004) Emotional expression and modality: works. A guide to empirical research. Campus Verlag, Frankfurt an analysis of affective arousal and linguistic output in a computer Herhausen D, Ludwig S, Grewal D, Wulf J, Schoegel M (2019) Detect- vs. paper paradigm. Comput Hum Behav 20(4):461–475 ing, preventing, and mitigating online firestorms in brand com - Snijders TAB (2001) The statistical evaluation of social network munities. J Mark 83(3):1–21 dynamics. Sociol Methodol 31(1):361–395 Jackson B, Scargle JD, Barnes D, Arabhi S, Alt A, Gioumousis P, Gwin Snijders TA, Koskinen J, Schweinberger M (2010) Maximum like- E, Sangtrakulcharoen P, Tan L, Tsai TT (2005) An algorithm for lihood estimation for social network dynamics. Ann Appl Stat optimal partitioning of data on an interval. IEEE Signal Process 4(2):567–588 Lett 12(2):105–108 Stich L, Golla G, Nanopoulos A (2014) Modelling the spread of James G, Witten D, Hastie T, Tibshirani R (2014) An introduction to negative word-of-mouth in online social networks. J Decis Syst statistical learning. Springer, Cham 23(2):203–221 Johnen M, Jungblut M, Ziegele M (2018) The digital outcry: what Strathern W, Schönfeld M, Ghawi R, Pfeffer J (2020) Against the oth- incites participation behavior in an online firestorm? New Media ers! Detecting moral outrage in social media networks. In: IEEE/ Soc 20(9):3140–3160 ACM International Conference on Advances in Social Networks Killick R, Fearnhead P, Eckley IA (2012) Optimal detection of Analysis and Mining, pp 322–326 changepoints with a linear computational cost. J Am Stat Assoc Strathern W, Ghawi R, Pfeffer J (2021) Advanced statistical analysis 107(500):1590–1598 of large-scale web-based data. In: Data science in economics and Lamba H, Malik MM, Pfeffer J (2015) A tempest in a teacup? Analyz- finance for decision makers. Edited by Per Nymand-Andersen ing firestorms on Twitter. In: 2015 IEEE/ACM ASONAM. New Tausczik YR, Pennebaker JW (2009) The psychological meaning of York, NY, USA, pp 17–24 words: LIWC and computerized text analysis methods. J Lang Soc Psychol 29:24–54 1 3 Social Network Analysis and Mining (2022) 12:59 Page 13 of 13 59 Udupa S (2020) Artificial intelligence and the cultural problem of Wasserman S, Faust K (1994) Social network analysis: methods and extreme speech. Social Science Research Council (20 December applications. Cambridge University Press, Cambridge, MA 2020) Vermeer S, Araujo T, Bernritter S, Noort G (2019) Seeing the wood Publisher's Note Springer Nature remains neutral with regard to for the trees: How machine learning can help firms in identify - jurisdictional claims in published maps and institutional affiliations. ing relevant electronic word-of-mouth in social media. Int J Res Mark 36:492–508 Vespignani A (2009) Predicting the behavior of techno-social systems. Science 325:425–428 Wagner C, Singer P, Karimi F, Pfeffer J, Strohmaier M (2017) Sam- pling from social networks with attributes. In: Proceedings of the WWW conference, pp 1181–1190 1 3

Journal

Social Network Analysis and MiningSpringer Journals

Published: Dec 1, 2022

There are no references for this article.