Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A big data analysis of Twitter data during premier league matches: do tweets contain information valuable for in-play forecasting of goals in football?

A big data analysis of Twitter data during premier league matches: do tweets contain information... Data-related analysis in football increasingly benefits from Big Data approaches and machine learning methods. One relevant application of data analysis in football is forecasting, which relies on understanding and accurately modelling the process of a match. The present paper tackles two neglected facets of forecasting in football: Forecasts on the total number of goals and in-play forecasting (forecasts based on within-match information). Sentiment analysis techniques were used to extract the information reflected in almost two million tweets from more than 400 Premier League matches. By means of wordclouds and timely analysis of several tweet-based features, the Twitter communication over the full course of matches and shortly before and after goals was visualized and systematically analysed. Moreover, several forecasting models including a random forest model have been used to obtain in-play forecasts. Results suggest that in-play forecasting of goals is highly challeng- ing, and in-play information does not improve forecasting accuracy. An additional analysis of goals from more than 30,000 matches from the main European football leagues supports the notion that the predictive value of in-play information is highly limited compared to pre-game information. This is a relevant result for coaches, match analysts and broadcasters who should not overestimate the value of in-play information. The present study also sheds light on how the perception and behaviour of Twitter users change over the course of a football match. A main result is that the sentiment of Twitter users decreases when the match progresses, which might be caused by an unjustified high expectation of football fans before the match. Keywords Big data · Data mining · Social networks · Twitter · Football forecasting · In-play forecasting 1 Introduction This analysis is relevant to better understand football-related Twitter communication, to assess the role of randomness in In recent years football analysis has increasingly benefited football and valuable for coaches, match analysts and broad- from Big Data analysis and machine learning methods, in casters to better understand the influence of in-play events particular in an attempt to understand tactical behaviour on the further course of a match. and identify success-enhancing strategies (Dick and Brefeld The forecasting literature reflects two important aspects 2019; Grunz et al. 2012; Memmert and Raabe 2018; Rein researchers have faced when investigating predictive tasks and Memmert 2016). The present paper puts the approach of in football. The first aspect of forecasting is statistical and Big Data analysis and machine learning into a slightly differ - related to developing team ratings and forecasting models ent context by incorporating Twitter data into the analysis. with the best possible ability to derive forecasts from obvi- It focuses on in-play forecasts in football by examining the ous predictors such as prior match results. One of the most question whether information becoming available during a prominent approaches is to estimate offensive and defensive match is valuable to forecast the further course of events. strength parameters of the teams and use these as inputs for probability models including Poisson models (Koopman and Lit 2015; Maher 1982), birth process models (Dixon * Fabian Wunderlich and Robinson 1998) and Weibull count models (Boshnakov f.wunderlich@dshs-koeln.de et al. 2017). Other researchers have used regression models 1 based on one or various covariates such as Hvattum and Institute of Exercise Training and Sport Informatics, German Arntzen (2010) using ELO ratings in combination with an Sport University Cologne, Köln, Germany Vol.:(0123456789) 1 3 23 Page 2 of 15 Social Network Analysis and Mining (2022) 12:23 ordered logit regression model or Goddard and Asimako- investigate in which match situations scoring intensities poulos (2004) using various covariates in an ordered probit deviate from a constant rate. Both approaches are mainly regression model. The present approach is primarily related focused on understanding the process of a football match to the second aspect of forecasting, which is data-based and and whether certain game situations influence the scoring attempts to identify and investigate further sources of infor- behaviour. None of these articles investigates in-play fore- mation that prove useful in football forecasting. One source casts by calculating the effect of scoring deviations on the of information obviously is betting odds (Forrest et al. 2005) accuracy of in-play forecasts. To the best of our knowledge, being interpreted as a forecast and used as a standard bench- the only paper investigating the role of in-play information mark. Further sources include human forecasts (Andersson in forecasting football is the recent work of Zou et al. (2020), et al. 2005), prediction markets (Spann and Skiera 2009), which, however, is limited to the number of goals as only ranking systems such as the FIFA World Ranking (Lasek in-play information. While our paper is limited to football, et al. 2013), market values (Peeters 2018) or sets with vari- contributions focused on in-play models and in relation to ous explanatory variables including match significance, in-play betting odds have been investigated in other sports involvement in cup competitions and geographical distance such as tennis (Easton and Uylangco 2010; Kovalchik and between teams (Goddard and Asimakopoulos 2004). Reid 2019) and cricket (Akhtar and Scarf 2012; Asif and In the literature, football forecasting is most prominently McHale 2016). Reasons for the little effort made so far on associated with forecasting the match result in terms of win, in-play forecasting in football might be a higher model com- draw or loss. This seems a little one-dimensional, in the plexity, less availability of in-play betting odds as a bench- light of the wide range of events taking place during a foot- mark in comparison to pre-game betting odds, and higher ball match. With regard to the common win/draw/loss fore- effort to gather and handle in-play data. cast, Koopman and Lit (2019) introduced a categorization The difficulty of in-play forecasting of goals in football of methods, namely models indirectly based on modelling might be surprising because intuitively fans, experts and the number of goals scored by both teams, indirectly based commentators commonly argue that they have anticipated on modelling the goal difference or modelling the result in a goal; they’ve seen it coming or explain it as the logical terms of win, draw, loss directly. Forecasting the number consequence of the course of play. This, however, could be of goals, in that sense, is not an exotic task as models fall- a biased perception and it would be quite costly to measure ing into the first category and often being based on Poisson the collaborative human perception of a football match and distributions (Karlis and Ntzoufras 2003; Koopman and Lit the collaborative anticipation of the further progress in an 2015; Maher 1982) can easily be reused for goal forecast- experimental approach. For that reason, we make use of an ing. Boshnakov et al. (2017) pursue this strategy by using existing source of (big) data: Short textual messages from a Weibull count model to obtain forecasts for both match the microblogging platform Twitter with regard to a certain result and total number of goals. Wheatcroft (2020) uses football match, which can be considered an in-play reflec- ratings based on match statistics and logistic regression tion of collaborative human perception on this match. While to forecast the number of goals and is—to the best of our traditional dataset and probability models remain a predomi- knowledge—the only paper focusing in particular on this nant approach in football forecasting (Boshnakov et al. 2017; type of forecasting. Forecasting of total goals thus can be Koopman and Lit 2019; Wheatcroft 2020), researchers have considered a neglected aspect in the forecasting literature, also started to make use of Big Data (Brown et al. 2017) and presumably driven by the fact that the match results have machine learning (Berrar et al. 2019; Hubáček et al. 2019) stronger emotional and financial consequences for the fans in this domain. Twitter data itself has been used in various and teams than the total number of goals. domains of forecasting including elections (Huberty 2015; Another research gap in football forecasting is the investi- Tumasjan et al. 2010) or stock prices (Bollen et al. 2011; gation of forecasts made during the course of a match. This Zhang et al. 2011), but have been discussed very contro- comes as a surprise as so-called in-play betting has gained versial and critically (Gayo-Avello 2013; Huberty 2015; significant importance for bookmakers (Killick and Grif- Jungherr et al. 2011). While Twitter certainly provides the fiths 2019; Lopez-Gonzalez and Griffiths 2016). Moreover, possibility to gather massive datasets, the process of actually coaches, match analysts and broadcasters are highly inter- extracting relevant information is challenging and attempts ested in analysing matches in-play. In fact, some researchers to use Twitter in football forecasting have reported mixed have put thoughts to the scoring processes during the course results (Brown et al. 2017; Godin et al. 2014; Schumaker of the match in more detail. Dixon and Robinson (1998) et al. 2016). In economic and political situations, the theo- use a birth process model allowing scoring intensities to retical mechanism is viable as Twitter may reflect the opin- change during the match and depend on the score to analyse ion of the users and both election results and stock prices are the deviations from constant scoring rates. Similarly, Heuer directly influenced by the perception of the public. In foot- and Rubner (2012) use a model-free statistical analysis to ball, this mechanism is evidently not present as a team will 1 3 Social Network Analysis and Mining (2022) 12:23 Page 3 of 15 23 not succeed in a match only because the public would like missing data, this adds up to a total of 404 matches, repre- to see the team win. In forecasting goals in-play, however, senting a smaller, but richer dataset. Data source and infor- the following mechanism is conceivable: The course of the mation included are analogous to the previously mentioned match influences the perception of the fans that will share dataset. Additionally, it includes betting odds for over-under their opinion on Twitter. If the course of play is actually 1.5 goals in the second half collected from http:// www . oddsp a predictor for upcoming goals, Twitter data might indeed or t al. com and meta-data for all goals scored (namely the cur- have predictive value. Though not considering predictive rent score as well as the minute of the goal) collected from aspects, some researchers have focused on analysis of in- the official website of the English Premier League http:// play Twitter data in relation to football matches. It has been premi erlea gue. com. Moreover, short textual messages (so- reported that fans’ sentiments reflect reactions to goals of called tweets) were obtained from the microblogging plat- the own or opposing team (Yu and Wang 2015), fans tend to form Twitter for each match covering the day of the game have a higher team identification when the team is leading and including the official match hashtag (e.g. #ARSMUN than when it is trailing (Fan et al. 2020) and communication for the match Arsenal vs. Manchester United) making use of on the video assistant referee (VAR) is strongly associated the official Twitter API (Twitter API 2020). Information on with negative sentiment (Kolbinger and Knopp 2020). In the tweets includes the textual data itself as well as the exact contrast to the present study, however, analyses were based date of creation which we relabelled to the time within the on highly limited sample sizes of five or less matches (Fan match (i.e. -30 for a tweet created 30 min prior to the match et al. 2020; Yu and Wang 2015) or on a very specific type of and 38.5 for a tweet created after 38.5 min of match time). event during the matches, namely the VAR (Kolbinger and A total of 3,139,441 tweets were collected, the final analysis Knopp 2020). only included tweets written one hour prior to the match or The contributions of the present approach are threefold. during the actual match time adding up to 1,765,379 tweets. First, a preliminary analysis sheds light on the general dif- Please note that both the Twitter Search API used within this ficulty of in-play forecasting. Second, the topics discussed study and the real-time Streaming API only provide a sample by Twitter users as well as their perception of the match of the full available data. over the course of football matches and as a reaction to goals are analysed by means of sentiment analysis techniques and 2.2 Feature extraction further non-semantic tweet characteristics. Third, the pos- sible informative value of Twitter data when used in in-play Solely the tweets in the final analysis consist of more than forecasting models is investigated. 25 million words and due to the volume and the highly unstructured nature of textual data, it is not straightforward to extract machine-useable information, which underlines 2 Data and methods the importance of an elaborate feature extraction process. English tweets including the official hashtag of a match have 2.1 Data been collected. No tweets in other languages and no addi- tional hashtags or search terms related to the match or any For a preliminary analysis, a dataset consisting of a total of official or unofficial hashtags related to the premier league 31,912 matches from 10 seasons (07/08–16/17) in 10 major and the clubs were considered. Pre-processing of tweets European leagues (first divisions of England, Spain, Ger - included removing of content other than evaluable words, many, Italy, France, Portugal, Belgium, Turkey, the Nether- like URLs, mentions, punctuation, hashtag signs, emoti- lands and Greece) was used. Data were obtained from http:// cons, characters and digits. Moreover, known contractions footb all- data. co. uk and included the following information and acronyms were replaced with full forms and intention- for each match: Teams involved, date, halftime score, final ally misspelled words (like e.g. “niiiiiiiice”) were doubled score and betting odds for over-under 2.5 goals. Betting (“nice nice”) to correct for the expressed intensification. odds can be interpreted as an aggregated market forecast and Cleaned tweets were then analysed by three different lexi- reflect a very strong benchmark for forecasting models in con-based sentiment analysis methods, namely the commer- football (Hvattum and Arntzen 2010; Štrumbelj and Šikonja cial LIWC 2015 software (Pennebaker et al. 2015) as well 2010). Over-under bets reflect betting opportunities on the as the QDAP dictionary (Rinker 2013) and the SenticNet4 total number of goals, in this case with the possibility to bet lexicon based on the work of Cambria et al. (2016). Finally, on two or less or three or more goals. For the main analysis, an average score of positivity and negativity was assigned data of Premier League football matches were obtained in to each tweet. For more details, we refer to Wunderlich and the period from 22 February 2019 to the last game before Memmert (2020), who validated this exact method using the interruption of the league caused by the COVID-19 pan- football-specific Twitter data and reported a reasonable demic on 09 March 2020. Removing three matches due to accuracy in applications with a sufficiently high number of 1 3 23 Page 4 of 15 Social Network Analysis and Mining (2022) 12:23 tweets. The sentiment of a tweet is defined as the difference regard to the tweet intensity, the number of tweets per min- between the positivity and the negativity score. Sentiment ute in a certain time interval is divided by the number of has been included for analyses, but excluded as a feature in tweets per minute in the last hour before the match. A pre- classification methods for reasons of multicollinearity and match normalized tweet intensity of 2 in the first half thus redundancy. Further non-semantic features extracted from means that during the first half the number of tweets per the tweets are the average number of words (based on tweets minute was twice as high as before the match. Time normali- after pre-processing), hashtags and emoticons included as zation is then performed by dividing through the number of well as the tweet intensity, simply referring to the number of tweets per minute across all matches. Reasons and conse- tweets. As a further in-play feature, the total number of goals quences of normalizing the data in the above way as well as scored is considered. Pre-game features are the probability the results based on normalized data will be outlined in the for over 2.5 goals and the probability for over 1.5 goals in Analysis section. the second half as obtained from the pre-game betting odds by converting decimal odds to probabilities (cf. Wunderlich 2.4 Random forest model and Memmert 2018). The corresponding counter-probabil- ities (under 2.5/1.5 goals) are not considered for reasons Random forests are an ensemble learning method based on of redundancy. Table 1 summarizes the features used for the idea of using a multitude of decision trees (so-called for- further analysis. est) going back to the work of Ho (1995). Current applica- tions commonly refer to the method developed by Breiman 2.3 Normalization (2001). Random forest methods have already been applied to forecasting in football (Schaumberger and Groll 2018) and Throughout our analysis, two different forms of normaliza- other sports (Lessmann et al. 2010). For the present analy- tion are used for the Twitter based features. Using the exam- sis, random forests were implemented in Python using the ple of number of words in a tweet, let w be the number of mit RandomForestClassifier from the package sklearn.ensemble. words of each of the n tweets i = 1… n in match m at time t The following hyperparameters were tested with regard to (where t can be a minute of play or some longer time interval the random forest classifier: The number of trees per forest such as the first half). Then we define w to be the average mt (n_estimators ranging from 250 to 1000 in steps of 250) number of words in all tweets from that time interval. Let w and the maximum depth of a tree (max_depth ranging from be the average number of words in the last hour prior to the 1 to 8). Detailed results of the hyperparameter tuning and match, then w = w /w is the average number of words mt mt m the effect of parameters on the results are discussed in the for time interval t in match m normalized for pre-match data. Analysis section. This normalization will be denoted as pre-match normalized throughout the paper. Further, let w be the average number 2.5 Cross validation of (pre-match normalized) words in time interval t across all matches, then w = w ∕ w is the number of words normal- mt mt t To validate the accuracy of forecasting models, k-fold cross ized for pre-match data and additionally for time, which will validation was used, i.e. the data were split into k subsamples simply be denoted as time normalized subsequently. using one of the subsamples as test set and the remaining This definition can be used in complete analogy for the data as training set (Browne 2000). The choice of the num- other Twitter-based features, except for the number of tweets ber of subsamples k is a trade-off dependent on the time (tweet intensity), where average values to not apply. With required for training and the total size of the data sample available. For the analysis of small time intervals data were Table 1 Features used for forecasting models throughout the paper split into 17 subsamples resulting in training sets of 6464 time intervals and test sets of 404 time intervals each. Feature Source Time Probability over 2.5 goals Betting market Pre-game 2.6 Forecasting accuracy Probability over 1.5 goals 2nd half Betting market Pre-game Number of goals scored 1st half Match data In-play Statistical measures of forecasting accuracy are based on Average negativity score Twitter In-play the idea of quantifying the difference between the forecasted Average positivity score Twitter In-play probabilities and the actual outcomes. Driven by the incon- Tweet intensity Twitter In-Play sistent use of such measures in the literature, Constantinou Average number words Twitter In-play and Fenton (2012) have assessed this topic and proposed to Average number hashtags Twitter In-play use the Rank Probability Score (RPS) as an adequate meas- Average number emoticons Twitter In-play ure for forecasting models in football. As the over-under 1 3 Social Network Analysis and Mining (2022) 12:23 Page 5 of 15 23 market just possesses two possible outcomes, the RPS can be simplified to RPS = p − o 1 1 where p is the forecasted probability of outcome 1 and o 1 1 equals 1 if outcome 1 occurred and 0 otherwise. Due to the symmetry of binary forecasts this is equivalent to calculating RPS based on the second outcome. While being the standard approach, RPS is not undisputed and also has weaknesses which have been demonstrated by Wheatcroft (2019), who suggests to use the ignorance score instead, defined as IGN =− log (p ) 2 i where p is the forecasted probability of the actual outcome i . For reasons of simplicity we only report the average RPS for each model in the Results section, but we have tested results for robustness by repeating analysis with IGN and did not experience any differences with regard to the main conclusions of the paper. 2.7 Bootstrapping To avoid any assumptions about the theoretical distribution of the data, bootstrapping methods with 10,000 resamples were used to calculate confidence intervals and as an alterna- tive to parametric hypotheses tests when comparing forecast- ing models in Tables 2 and 4 as well as in Figs. 2 and 3 with regard to sections Time dependence and Goal analysis. For an overview on bootstrapping methods and details on the Fig. 1 Boxplot illustrating the distribution of probabilities for over calculations, we refer to Efron and Tibshirani (1994). We 2.5 goals across the dataset highlight p-values falling below a significance level of 5% as significant throughout the paper. and only one in ten matches has a higher probability than 60.4%. Besides the betting odds, which in principle only reflect 3 Analysis differences in the expectations, it should also be possible to find evidence in the results directly. If matches with a 3.1 Preliminary analysis: difficulty of in‑play forecasting systematically higher or lower total scoring intensity exist, the number of goals in the first half and second half To demonstrate the general idea of probabilistic in-play should be correlated. Correlation was found to be r = 0.05 (t(31,910) = 13.98, p < 0.001), which is evidence that sys- forecasting of goals and the difficulty of this task, the large dataset of more than 30,000 matches as defined in the Data tematic differences in the goal expectations exist, but given the very small correlation coefficient a predominant influ- section is used. The first aspect to consider is to what extent the total number of goals is predictable at all or just reflects ence of randomness exists. In summary, systematic differ - ences in terms of expected scoring intensities across matches pure random processes. Figure  1 shows a box plot of the pre-game probabilities for over 2.5 goals obtained from the exist, but are highly limited. A general predictability of the number of goals, however, betting odds. If there were no different expectations for the number of goals in a match, betting odds and thus probabili- does not necessarily imply that real in-play forecasting is possible at all. Thus, the next question is whether the goal ties would be constant for all matches. The dispersion of val- ues proves that there are indeed different goal expectations, expectation is predefined at the start of the match or whether information becoming available during the match helps to however, the expectations seem to be rather homogeneous as only one in ten matches has a lower probability than 40.3%, forecast the further course of events. In order to investigate 1 3 23 Page 6 of 15 Social Network Analysis and Mining (2022) 12:23 Table 2 Results for various Model RPS UNI FRQ GOAL PROB models forecasting over-under 1.5 goals in the second half UNI None 0.2500 – – – – FRQ Pre-game 0.2485 0.0069* – – – GOAL In-play 0.2480 0.0012* 0.0467* – – PROB Pre-game 0.2434 < 0.0001* < 0.0001* < 0.0001* – BOTH Both 0.2433 < 0.0001* < 0.0001* < 0.0001* 0.6073 *p-value lower than 5% this question, forecasts for the number of goals (i.e. prob- just not a useful in-play predictor. For that reason, Twitter ability for over 1.5 goals) in the second half are performed data as a potential source of in-play information are analysed based on two variables: The pre-game probability for over in the next section. Before turning to forecasts from Twit- 2.5 goals as a market reflection of goal expectation avail- ter data, the data are analysed with respect to several other able prior to the match as well as the total number of goals aspects, namely factors influencing tweet intensity, and the actually scored in the first half as in-play information. Differ - effect of time and goals on Twitter communication. ent numbers of goals (i.e. 2.5 goals for the complete match and 1.5 goals for the second half) were chosen to consider 3.2 Twitter analysis the option with most balanced probabilities differing due to the remaining match time. The data sample was split into 5 3.2.1 Match‑based analysis of tweet intensity seasons of in-sample data (15,844 matches) and 5 seasons of out-of-sample data (16,068 matches). A total of five dif- A first qualitative observation in the Twitter data is that ferent forecasting models are analysed: The common naïve differences across matches seem to only partly depend on benchmark models UNI attaching a probability of 50% to in-play events as even before the start massive differences over 1.5 goals for each match and FRQ using the observed can occur. As an extreme example the tweet intensity for frequency of over 1.5 goals in the in-sample data for each the match Manchester vs. Arsenal was more than 100 times match (cf. Hvattum and Arntzen 2010) as well as three logis- higher than for the match Brighton vs. Burnley both pre- tic regression models using the probability of over 1.5 goals match and in-play. For this reason, a closer look on the rea- as dependent variable and the pre-game probability for over sons for varying tweet intensities in the matches shall be 2.5 goals as obtained from the betting odds (PROB), the total given. In particular, four factors potentially having influence number of goals scored in the first half (GOAL ) or both vari- on tweet intensity will be analysed: ables (BOTH) as independent variables. Table 2 presents the average rank probability scores for each forecasting model 3.2.1.1 Popularity First, we use the average number of when using the estimated model parameters to obtain fore- spectators at home matches of a team as an estimation of the casts for all matches in the out-of-sample dataset. In addi- general popularity of this team. The numbers were obtained tion, pair-wise p-values comparing the RPS values across from https:// www. weltf ussba ll. de and as the dataset contains the models are presented. matches from 18/19 to 19/20 season, the spectator numbers The forecasting results paint a clear picture and suggest were averaged across both seasons. Moreover, spectators that in-play information does have some very weak predic- were normalized by the maximum number of spectators of tive value when comparing to simple benchmarks, but no any team, which yields popularities ranging from 1.0 for additional value when controlling for pre-game information. Manchester United with the highest number of spectators The model PROB using solely the pre-game expectation to 0.14 for Bournemouth being the least popular team. The significantly outperforms both benchmarks and the model number of spectators arguably is not a perfect representa- GOAL using only in-play information. Once the pre-game tion of popularity given that the capacity of a stadium can information is included, the average rank probability scores be a highly confounding factor. Still, it can be assumed to for the model hardly improves if adding in-play information be an easily available and transparent measure with a rea- on the number of goals. Despite the large database the model sonably high correlation to popularity. To account for both BOTH using both pre-game and in-play information fails to teams’ popularities in a match concurrently, the popularity significantly outperform PROB. The above results are clear of a match is determined by multiplying the popularity of evidence for the difficulty of forecasting the total number of both teams. goals in-play, yet it is not clear whether the small benefit of in-play information is based on the fact that the goal expec- 3.2.1.2 Goals Match events in general and goals in particu- tation is predefined prior to the match or that prior goals are lar can be assumed to stimulate Twitter activity. Thus, the 1 3 Social Network Analysis and Mining (2022) 12:23 Page 7 of 15 23 total number of goals is used as the second factor potentially influencing the tweet intensity. 3.2.1.3 Scoreline The scoreline during a match determines whether the game has already been decided or whether the results is still open. This may take influence on the behav - iour of Twitter users in several ways. It could be argued that a close scoreline captivates the audience and stimulates tweet intensity. At the same time, matches being decided early may stimulate early analysis of results including joy about an upcoming victory or discussing reasons for a lost match. We summarize the scoreline of a match by summing up the length of time intervals in a match where both teams differ at least by two goals and consequently a single goal would not significantly alter the match outcome. For exam- ple, a match ending 2–0 with the second goal being scored after exactly 60 min has been “decided” for 30 min. 3.2.1.4 Weekend Finally, external factors neither related to the teams, nor related to the events in a match may have an influence on the possibility and motivation for fans to watch and tweet on football matches. Thus, we introduce a dummy variable indicating whether a match took place on a weekend (Saturday or Sunday) or during the week (Mon- day–Friday). Three linear regression models were fitted using the tweet intensity (pre-game, in-game, and total, respectively) as dependent variable and the four factors as independ- ent variables. All three regression models indicate a sig- nificant influence of the factors on the tweet intensities: F(4,399) = 91.72, p < 0.001, R = 0.474 for pre-game adj tweet intensity; F(4,399) = 88.62, p < 0.001, R = 0.465 adj for in-game tweet intensity; F(4,399) = 92.54, p < 0.001, R = 0.476 for total tweet intensity. The detailed results adj for each factor are summarized in Table 3. Results are evidence that tweet intensity both pre-game and in-game is highly significantly (p < 0.001) influenced by popularity, while there is no significant influence of the weekend on the number of tweets. Naturally, Goals and Scoreline being linked to match events unknown pre-game do not possess any significant influence on pre-game inten- sities. In-game, however, the number of goals significantly increases (p < 0.01) the number of tweets, indicating an increased stimulation of tweets via goals that will be ana- lysed in more detail in the section Goal analysis. Scoreline does not have a significant influence on in-game tweet inten- sity (p = 0.10), however, there is a slight tendency of more tweets in case of already decided matches. Driven by the larger number of in-game tweets, the results for total tweet intensity are largely consistent with the results for in-game tweet intensity. Given the heterogeneity of matches in terms of large dif- ferences in tweet intensity and the fact that other features 1 3 Table 3 Results for the linear regression model using tweet intensities as dependent variable on a match level. Separate models are summarized for tweet intensity pre game, tweet intensity in game and total tweet intensity Variable IntensityPreGame IntensityInGame Intensity Coefficient Standard error beta t p-value Coefficient Standard error beta t p-value Coefficient Standard error beta t p-value Popularity 3324.44 174.95 0.69 19.00 < 0.001* 20,389.08 1116.81 0.67 18.26 < 0.001* 23,713.52 1263.47 0.68 18.77 < 0.001* Goals − 3.56 20.11 − 0.01 − 0.18 0.86 363.58 128.35 0.12 2.83 < 0.01* 360.02 145.20 0.10 2.48 0.014* Scoreline 1.11 1.40 0.03 0.79 0.43 14.88 8.97 0.07 1.66 0.10 15.99 10.15 0.06 1.56 0.12 Weekend − 59.89 66.80 − 0.03 − 0.90 0.37 162.45 426.42 0.01 0.38 0.70 102.56 482.42 0.01 0.21 0.83 Constant − 238.93 90–94 − 2.63 < 0.01* − 2845.20 580.52 − 4.90 < 0.001* − 3084.13 656.75 −4.70 < 0.001* *Significant at 5% level 23 Page 8 of 15 Social Network Analysis and Mining (2022) 12:23 (yet to a lower degree) differ pre-match, it seems unreason- decreasing positivity and increasing negativity. Tweets get able to draw any conclusion about in-match processes from shorter once the match starts as the number of words drops non-normalized values. For this reason, the subsequent anal- after the kick-off. The number of hashtags and emoticons ysis is based on pre-match normalized data as explained in decreases as well, which can only partly be attributed to the Method section. the shorter tweets. Confidence intervals are hardly vis- ible in the figure except for the tweet intensity, where the 3.2.2 Time dependence sample size is about 400 matches compared to almost 2 million tweets for the other features. The narrow confi- Before considering potential predictive value, it seems rea- dence intervals (even for the tweet intensity) suggest that sonable to take a more general look at what happens over all results are highly robust. Some interesting conclusions the course of the match and directly before and after goals. can be drawn from the evolvement of features: First, if Figure  2 illustrates the evolvement of features over time analysing the time intervals before and after goals a useful during the matches for ten time intervals (the hour pre- normalization for time is needed. Therefore, the time nor- game and nine intervals of 10 min each within the match, malized data, as described in the Method section, is used as well as 95% confidence intervals. Please note that pre- for all further analyses. Second, football fans seem to be game values equal 1.0 for each feature due to the pre-match the happiest before the kick-off and a football match does normalization. not seem to be good for the mood (at least of tweeters). In The tweet intensity jumps when the match starts and summary, one can say that anticipation clearly is the most slightly increases over the course of the match. Interest- beautiful kind of joy. ingly, the overall sentiment of tweets decreases due to Fig. 2 Evolvement of features over the course of the match. Asterisks indicate a p-value of lower than 0.05 when comparing the respective time interval to pre-game 1 3 Social Network Analysis and Mining (2022) 12:23 Page 9 of 15 23 3.2.3 Goal analysis The analysis includes a total of 1118 goals scored in the matches from our dataset. Usage of the time normalized data makes it possible to Again, confidence intervals are narrow or even hardly take a direct look at what happens before and after goals visible indicating the robustness of results. A clear and intui- are scored. Therefore, a minute value with respect to goals tive interpretation can be given for the time interval after was assigned to each tweet. Negative values were attached the goals: Goals evoke a large number of relatively short to tweets that were posted within the last 10 min before a tweets. While the number of tweets increases by a factor of goal (e.g. -7 if the tweet was posted 7 min before the goal). three shortly after the goal, the tweet length decreases by Analogously, positive values were attached to tweets posted roughly 30%. The effects for hashtags and emoticons are within the 10 min following a goal and 0 if the tweet was partly attributable to the shortness of the tweets. Only small posted in the same minute of the goal. Tweets that were effects are visible with regard to the sentiment analysis, posted before or after several different goals cannot be unam- where surprisingly both negativity and positivity slightly biguously assigned and thus were excluded from analysis. decreases resulting in a pretty stable overall tweet senti- Tweets that are not close in time to any goal were put into ment. Please note that the tweets are attached to a match an additional category and used as a benchmark. Figure 3 and not to particular teams, which means that we cannot illustrates what happens to the various Twitter features in distinguish between the fans of the scoring and the con- the 10 min before and after a goal. The dotted vertical line ceding team. In the forecasting context, the idea is to find refers to the benchmark of tweets that are independent from signs that are already present in the data during the minutes goals, and the grey areas refer to 95% confidence intervals. before the goals are scored. Most features are in line with Fig. 3 Evolvement of features shortly before and after goals. The shortly before nor after a goal. Asterisks indicate a p-value of lower vertical line illustrates the time when the goal was scored. The hori- than 0.05 when comparing the respective minute to the benchmark zontal line refers to the benchmark of tweets that were neither written 1 3 23 Page 10 of 15 Social Network Analysis and Mining (2022) 12:23 Fig. 4 Wordcloud visualizing frequently used words from tweets Fig. 6 Wordcloud visualizing frequently used words from tweets written pre-match, i.e. within the last hour before the match written during the second half of a match, but excluding tweets writ- ten shortly before and after goals Fig. 5 Wordcloud visualizing frequently used words from tweets written during the first half of a match, but excluding tweets being Fig. 7 Wordcloud visualizing frequently used words from tweets written shortly before and after goals written shortly after goals the benchmark and thus do not support this idea. However, single goal have not been considered in consistency with the a slightly increased positivity and overall sentiment can be section Goal analysis. Wordclouds were created by means found prior to the goals, potentially being a weak early indi- of the python package wordcloud (version 1.8.1) using 50 cation of goals. The same is true for emoticons, however, as the maximum number of words. The official hashtags of being less clear. the matches, all team names and known acronyms of team names (such as “lfc” for “liverpool football club”) were not 3.2.4 Analysis of words and topics considered, as well as the predefined list of typical stop- words including words like e.g. “it”, “was” or “this”. The analyses so far have taken account of the number of Pre-game communication includes plenty of words words or the sentiment of words, but not visualized the com- related to broadcasting of the match, such as “live stream”, munication in a more detailed way. In order to gain insights “watch live” and “hd”. Moreover, several words including on the topics and frequently used words in association with “start”, “starting”, “today”, “tonight” or “now” directly refer a match, four different wordclouds representing different to the upcoming start of the match. Finally, there seems to be phases before and during the matches are used. Figure 4 noticeable discussion on which players were chosen to play analyses pre-match communication and thus refers solely to or not play by the coach, evidenced by words like “lineup”, tweets written in the last hour before a match started. Fig- “bench”, “player” and possibly also “team” or “starting”. ures 5 and 6 refer solely to tweets written during the first half Please note that unusual terms like “coyg” (come on you and the second half, respectively. However, in order to cap- gunners) or “ynwa” (you’ll never walk alone) refer to foot- ture general instead of event-based communication on the ball-related acronyms that were not contained in our list of matches, those tweets being associated to one or more goals known acronyms and thus remained included in the data. (i.e. written in the 10 min before or after goals) were not Communication during the match is still subject to a lot considered. Finally, Fig. 7 analyses event-based communica- of discussion on how to follow broadcasts of the match, as tion and thus includes tweets written in the 10 min following words like “live stream”, “live” and “hd” stay highly pre- a goal. Tweets that cannot unambiguously associated with a sent. The presence of words like “play”, “player”, “playing” 1 3 Social Network Analysis and Mining (2022) 12:23 Page 11 of 15 23 or “fan” can be considered very general and expectable for regression based on the betting odds of over-under 2.5 communication during the match, that is not associated to goals as well as over-under 1.5 goals in the second half. goals. In general, differences between first and second half Results show that UNI is an unreasonable choice for the seem to be rather limited, except for the prominent role short time intervals, which is attributable to the fact that of the word “game” in the second half. This suggests that goal scoring probabilities for intervals of 5 min are way towards the end of the match or given a clear scoreline, users smaller than 50%. As expectable, FRQ representing the tend to already discuss the game as a whole, summarize it or lowest level of information also possesses the weakest pre- draw conclusions from it. dictive accuracy. ODDS possesses the highest predictive Communication directly following a goal is naturally quality, which is in line with the notion that betting odds strongly influenced by words in direct association to the are a strong predictor of football matches. The main result goal (“goal”, “score”, “lead”), or related to discussing the is that LR and RF, although including additional in-play circumstances of a goal, such as “var” (video assistant ref- information both fail to outperform pre-game information eree) or “penalty”. The large occurrence of the word “game” based on the betting odds. As such, Twitter data did not might again indicate that once a goal decides a match, it is improve pre-game forecasts for the number of goals in already discussed as a whole. Moreover, some expectable matches. Except for UNI, all other models are pretty close expressions of emotional reactions like “good” or “shit” are in terms of accuracy with the only significant difference included, however, by far not dominating the wordcloud. between FRQ and ODDS. This underlines that in-play forecasting seems to be a difficult task. Results for the 3.2.5 In‑play goal forecast hyperparameter tuning are summarized in Fig. 8, which shows that increasing the number of trees had a very lim- In order to answer the question whether the information ited effect, while the optimal maximum depth lies around in the data is sufficient to forecast goals in-play, small 3 to 4. The results in Table 4 refer to the optimal specifica- time intervals were considered. Matches were split into tion of n_estimators = 500 and max_depth = 3. Please note intervals of 5 min and normalized Twitter features were that the hyperparameters did not have major effects on the calculated accordingly. The variable to be forecasted is forecasting accuracy of RF ranging from 0.1466 to 0.1470. an indicator of whether a goal was scored in the next time More importantly, for none of the hyperparameters tested, interval and the last interval of 5 min per match was con- a significant difference between RF and LR or ODDS was sequently excluded from the data. This results in a sample found. As such, the hyperparameter selection does not of 6868 time intervals. No betting odds are available for affect any of the results of the present study. time intervals of 5 min, therefore ODDS refers to a logistic Fig. 8 Results of hyperparameter tuning for the random forest model. The forecasting accuracy as RPS is illustrated in dependence of the num- ber of trees (n_estimators) and the maximum tree depth (max_depth) 1 3 23 Page 12 of 15 Social Network Analysis and Mining (2022) 12:23 Table 4 Results for forecasting Model Information RPS p-value compared to goals in time intervals of 5 min from the preceding time interval UNI FRQ LR RF UNI None 0.2500 – – – – FRQ Pre-game 0.1469 < 0.0001* – – – LR Both 0.1467 < 0.0001* 0.4362 – – RF Both 0.1466 < 0.0001* 0.1578 0.7966 – ODDS Pre-game 0.1465 < 0.0001* 0.0452* 0.3642 0.3584 *p-value lower than 5% It seems that fans (or at least those active on Twitter) tend 4 Discussion to be disappointed by football matches, possibly caused by unjustified high expectations before and at the beginning The results of the present study shed light on three differ - of matches. The use of Twitter data and sentiment analysis ent aspects of in-play forecasting with Twitter data, namely techniques enables researchers to investigate perception and in-play forecasting in general, a detailed analysis of Twitter psychological reactions of users during football matches. communication over the course of matches and the value of Further research with a psychological focus could investigate Twitter in in-play forecasting in football. which mechanisms drive the disappointment of fans during The preliminary analysis suggests that in-play forecasting matches. of goals in general is a difficult task. Results are evidence for The analysis of minutes before and after goals reveals the limited value of in-play information (i.e. goals) to fore- the reaction to goals, in particular a dramatic increase in cast the further course of a match when compared to betting tweet intensity where tweets are significantly shorter and a odds as pre-game information, a fact that football players, resulting lower number of hashtags and emoticons. The most coaches, match analysts, broadcasters and fans would prob- unintuitive and difficult to explain result is the slightly lower ably strongly deny. Possible explanations are the high pre- negativity and positivity directly after goals. It is impor- dictive quality of betting odds in football forecasting (For- tant to note that the tweets in our database are assigned to rest et al. 2005; Hvattum and Arntzen 2010; Štrumbelj and the match and not to a single team, thus emotions of both Šikonja 2010) and the significant role of randomness in goal teams’ fans should be included which makes an unchanged scoring in football (Brechot and Flepp 2020; Lames 2018, overall sentiment comprehensible. However, even if includ- Wunderlich et al. 2021). Moreover, the result is in line with ing fans of the team scoring, the team receiving and even Wunderlich and Memmert (2018) who showed that betting neutral observers, one would at least expect an increased odds of prior matches possess more predictive value than the emotionality as a reaction to the goal. One explanation could results of the matches themselves. be neutral tweets that have a descriptive and no evaluative The analysis of tweet intensity revealed that both in-play expression (e.g. “Penalty for The Red Devils. Rashford and pre-game, tweet intensity is predominantly driven by steps up and CONVERTS! Manchester United 1–0 Chel- the popularity of the two teams competing. Moreover, tweet sea.”) or tweets that were potentially written with a lot of intensity is increased in-play in matches with a higher num- emotion, but do not include any words with a clear positive ber of total goals scored. The analysis of time dependence or negative connotation identifiable by a sentiment analy - and goal analysis reveal how the reactions of Twitter users sis algorithm (e.g. "GOOO[…]OOOL!!!! Rashford!!! 1–0 change over the course of matches and after goals are scored. United!!!). With regard to the sentiments, although being Before the matches start less, but longer tweets are written, validated in football, textual data are highly domain-specic fi when compared to during the match, which is explainable by and increased accuracy might be achievable if using domain- a heightened interest and faster sequence of events in-play. specific methods such as football-specific lexica of words. A In terms of the topics and words contained, pre-game tweets more detailed analysis on what drives this unintuitive result, are highly influenced by communication on how to follow however, is beyond the scope of this study. broadcasts of the match and discussing which players are While the Twitter data clearly react to goals scored, a playing. Differences between communication in the first and main focus of our approach was to test Twitter data for second half are highly limited, while tweets directly follow- possible predictive value. The present data clearly do not ing goals are naturally dominated by discussion on the score, support the idea that in-play Twitter data have predic- the goal itself and its possible causes. The most striking tive value as forecasts based on pre-game betting odds result with regard to time dependence is a steadily increas- were not outperformed by a logistic regression model as ing negativity and a steadily decreasing positivity while the well as a random forest model including in-play Twitter match evolves, resulting in a clearly decreasing sentiment. 1 3 Social Network Analysis and Mining (2022) 12:23 Page 13 of 15 23 information. The fact that random forest models did not 5 Conclusions outperform logistic regression and hyperparameter tun- ing did have very limited effects on the accuracy suggests The present approach investigates in-play forecasting of that this is actually attributable to the missing informa- football matches in general and a Big Data approach using tive value of the Twitter data and not to the selection of Twitter data in particular. Results are evidence that in-play methods. Put simply, we could not extract information forecasting of goals is a highly challenging task as informa- from Twitter data that helps to forecast upcoming goals. tion gathered in-play (both basic events like goals and tex- Three possible aspects could explain this result. First, the tual data from Twitter) are not improving forecasting accu- in-play predictability seems to be very limited in general racy when compared to pre-game information. In addition, as previously demonstrated. Further studies investigat- results suggest that the fans’ perception of a match gets more ing in-play notational data or positional data could shed and more negative over time as the sentiment of tweets on more light on the question to which degree in-play fore- Twitter is decreasing over the course of the match. casting is possible at all. Second, Twitter data might not include information that is relevant for forecasting. In a Funding Open Access funding enabled and organized by Projekt way, this is surprising as Twitter can be seen as a source DEAL. The authors received no specific funding for this project. of crowd wisdom and such sources have been shown to be highly valuable in forecasting football (Forrest et al. Declarations 2005; Peeters 2018; Spann and Skiera 2009). On the other side, Twitter is not a vehicle directly related to forecast- Conflict of interest The authors declare to have no conflict of interest. ing such as the betting market or prediction markets and moreover information is not easily extractable from Twit- Ethical approval We did not conduct any experimental research involv- ing humans or animals. ter. Thus, the third possible aspect is that the information reflected in Twitter data might not have been extracted Data availability No data with regard to the direct content of tweets effectively. Textual data are highly unstructured which will be published or forwarded in line with the Developer Agreement makes the extraction of information difficult and leads to and Policy of the Twitter API. All further data used within this study are available in the public domain and the respective sources have been a limited degree of accuracy for sentiment analysis tech- mentioned in the manuscript. niques (Wunderlich and Memmert 2020). Further progress in this domain can be expected as sentiment analysis is a Open Access This article is licensed under a Creative Commons Attri- highly relevant topic in computer science (Mäntylä et al. bution 4.0 International License, which permits use, sharing, adapta- 2018; Piryani et al. 2017), nevertheless it will remain chal- tion, distribution and reproduction in any medium or format, as long lenging to algorithmically reproduce human understanding as you give appropriate credit to the original author(s) and the source, of textual data. The problem of extracting relevant data provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are might be aggravated by the short time intervals of 5 min included in the article's Creative Commons licence, unless indicated yielding limited tweet samples and a higher randomness in otherwise in a credit line to the material. If material is not included in the features. To account for the issue of short time inter- the article's Creative Commons licence and your intended use is not vals, we repeated analysis using data from the complete permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a first half of a match to forecast the number of goals in copy of this licence, visit http://cr eativ ecommons. or g/licen ses/ b y/4.0/ . the second half of a match. Despite larger time intervals, results implied the same conclusions, which suggest that the limited in-play predictive value is not attributable to the small time intervals. References In experimental research, the present results could be assessed as a null result as they do not support the notion Akhtar S, Scarf P (2012) Forecasting test cricket match outcomes of predictive in-play value of Twitter data and question the in play. Int J Forecast 28(3):632–643. https://doi. or g/10. 1016/j. general value of in-play information including goals. Still, ijfor ecast. 2011. 08. 005 Andersson P, Edman J, Ekman M (2005) Predicting the World Cup this is surprising and valuable information to coaches, 2002 in soccer: performance and confidence of experts and non- match analysts and broadcasters who should question care- experts. Int J Forecast 21(3):565–576. https://doi. or g/10. 1016/j. fully to what extent in-play information can be used at all ijfor ecast. 2005. 03. 004 to draw conclusions on the further course of a match. Asif M, McHale IG (2016) In-play forecasting of win probability in one-day international cricket: a dynamic logistic regression model. Int J Forecast 32(1):34–43. https://doi. or g/10. 1016/j. i jfor ecast. 2015. 02. 005 1 3 23 Page 14 of 15 Social Network Analysis and Mining (2022) 12:23 Berrar D, Lopes P, Davis J, Dubitzky W (2019) Guest editorial: spe- Press, Los Alamitos, pp 278–282. https://d oi.o rg/10. 1 109/I CDAR. cial issue on machine learning for soccer. Mach Learn 108(1):1–1995. 598994 7. https:// doi. org/ 10. 1007/ s10994- 018- 5763-8 Hubáček O, Šourek G, Železný F (2019) Exploiting sports-betting Bollen J, Mao H, Zeng X-J (2011) Twitter mood predicts the stock market using machine learning. Int J Forecast. https:// doi. org/ 10. market. J Comput Sci 2(1):1–8. https:// doi. org/ 10. 1016/j. jocs. 1016/j. ijfor ecast. 2019. 01. 001 2010. 12. 007 Huberty M (2015) Can we vote with our tweet? On the perennial dif- Boshnakov G, Kharrat T, McHale IG (2017) A bivariate Weibull ficulty of election forecasting with social media. Int J Forecast count model for forecasting association football scores. Int J 31(3):992–1007. https://doi. or g/10. 1016/j. i jforecas t.2014. 08. 005 Forecast 33(2):458–466. https:// doi. or g/ 10. 1016/j. ijfor ecas t. Hvattum LM, Arntzen H (2010) Using ELO ratings for match result 2016. 11. 006 prediction in association football. Int J Forecast 26(3):460–470. Brechot M, Flepp R (2020) Dealing with randomness in match out-https:// doi. org/ 10. 1016/j. ijfor ecast. 2009. 10. 002 comes: how to rethink performance evaluation in European club Jungherr A, Jürgens P, Schoen H (2011) Why the pirate party won football using expected goals. J Sports Econ 21(4):335–362. the German election of 2009 or the trouble with predictions: https:// doi. org/ 10. 1177/ 15270 02519 897962 a response to Tumasjan, A., Sprenger, T. O., Sander, P. G., & Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https:// Welpe, I. M. “Predicting Elections With Twitter: What 140 Char- doi. org/ 10. 1023/A: 10109 33404 324 acters Reveal About Political Sentiment”. Soc Sci Comput Rev Brown A, Rambaccussing D, Reade JJ, Rossi G (2017) Forecasting 30(2):229–234. https:// doi. org/ 10. 1177/ 08944 39311 404119 with social media: evidence from tweets on soccer matches. Karlis D, Ntzoufras I (2003) Analysis of sports data by using bivariate Econ Inq 20(3):1363. https:// doi. org/ 10. 1111/ ecin. 12506 Poisson models. J R Stat Soc Ser D (The Stat) 52(3):381–393. Browne (2000) Cross-validation methods. J Math Psychol 44(1):108–https:// doi. org/ 10. 1111/ 1467- 9884. 00366 132. https:// doi. org/ 10. 1006/ jmps. 1999. 1279 Killick EA, Griffiths MD (2019) In-play sports betting: a scoping study. Cambria E, Poria S, Bajpai R, Schuller B (2016) SenticNet 4: a Int J Ment Heal Addict 17(6):1456–1495. https://doi. or g/10. 1007/ semantic resource for sentiment analysis based on conceptual s11469- 018- 9896-6 primitives. In: Proceedings of COLING 2016, the 26th interna- Kolbinger O, Knopp M (2020) Video kills the sentiment-exploring tional conference on computational linguistics: technical papers, fans’ reception of the video assistant referee in the English pre- pp 2666–2677 mier league using Twitter data. PLoS ONE 15(12):e0242728. Constantinou AC, Fenton NE (2012) Solving the problem of inad-https:// doi. org/ 10. 1371/ journ al. pone. 02427 28 equate scoring rules for assessing probabilistic football forecast Koopman SJ, Lit R (2015) A dynamic bivariate Poisson model for models. J Quant Anal Sports. https://do i.or g/1 0.1 515/1 559-0 410. analysing and forecasting match results in the English premier 1418 league. J R Stat Soc A Stat Soc 178(1):167–186. https:// doi. org/ Dick U, Brefeld U (2019) Learning to rate player positioning in soc-10. 1111/ rssa. 12042 cer. Big Data 7(1):71–82. https:// doi. org/ 10. 1089/ big. 2018. 0054 Koopman SJ, Lit R (2019) Forecasting football match results in Dixon MJ, Robinson ME (1998) A birth process model for association national league competitions using score-driven time series mod- football matches. Statistician 47(3):523–538 els. Int J Forecast 35(2):797–809. https:// doi. org/ 10. 1016/j. ijfor Easton S, Uylangco K (2010) Forecasting outcomes in tennis matches ecast. 2018. 10. 011 using within-match betting markets. Int J Forecast 26(3):564–575. Kovalchik S, Reid M (2019) A calibration method with dynamic https:// doi. org/ 10. 1016/j. ijfor ecast. 2009. 10. 004 updates for within-match forecasting of wins in tennis. Int J Fore- Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. CRC cast 35(2):756–766. https:// doi. org/ 10. 1016/j. ijfor ecast. 2017. 11. Press, Boca Raton 008 Fan M, Billings A, Zhu X, Yu P (2020) Twitter-based BIRGing: big Lames M (2018) Chance involvement in goal scoring in football—an data analysis of english national team fans during the 2018 FIFA empirical approach. Ger J Exerc Sport Res 48(2):278–286. https:// world cup. Commun Sport 8(3):317–345. https://doi. or g/10. 1177/ doi. org/ 10. 1007/ s12662- 018- 0518-z 21674 79519 834348 Lasek J, Szlávik Z, Bhulai S (2013) The predictive power of rank- Forrest D, Goddard J, Simmons R (2005) Odds-setters as forecasters: ing systems in association football. Int J Appl Pattern Recognit the case of English football. Int J Forecast 21(3):551–564. https:// 1(1):27. https:// doi. org/ 10. 1504/ IJAPR. 2013. 052339 doi. org/ 10. 1016/j. ijfor ecast. 2005. 03. 003 Lessmann S, Sung M-C, Johnson JE (2010) Alternative methods of Gayo-Avello D (2013) A meta-analysis of state-of-the-art electoral pre- predicting competitive events: an application in horserace betting diction from Twitter data. Soc Sci Comput Rev 31(6):649–679. markets. Int J Forecast 26(3):518–536. https:// doi. org/ 10. 1016/j. https:// doi. org/ 10. 1177/ 08944 39313 493979ijfor ecast. 2009. 12. 013 Goddard J, Asimakopoulos I (2004) Forecasting football results and the Lopez-Gonzalez H, Griffiths MD (2016) Is European online gambling efficiency of fixed-odds betting. J Forecast 23(1):51–66. https:// regulation adequately addressing in-play betting advertising? doi. org/ 10. 1002/ for. 877 Gaming Law Rev Econ 20(6):495–503. https:// doi. org/ 10. 1089/ Godin F, Zuallaert J, Vandersmissen B, de Neve W, van de Walle R glre. 2016. 2064 (2014) Beating the bookmakers: leveraging statistics and Twitter Maher MJ (1982) Modelling association football scores. Stat Neerl microposts for predicting soccer results. In: KDD Workshop on 36(3):109–118. https:// doi. org/ 10. 1111/j. 1467- 9574.19 82. tb007 large-scale sports analytics 82.x Grunz A, Memmert D, Perl J (2012) Tactical pattern recognition in soc- Mäntylä MV, Graziotin D, Kuutila M (2018) The evolution of senti- cer games by means of special self-organizing maps. Hum Mov ment analysis—a review of research topics, venues, and top cited Sci 31(2):334–343. https://d oi.o rg/1 0.1 016/j.h umov.2 011.0 2.0 08 papers. Comput Sci Rev 27:16–32. https:// doi. org/ 10. 1016/j. cos- Heuer A, Rubner O (2012) How does the past of a soccer match influ-rev. 2017. 10. 002 ence its future? Concepts and statistical analysis. PLoS ONE Memmert D, Raabe D (2018) Data analytics in football. Routledge, 7(11):e47678. https:// doi. org/ 10. 1371/ journ al. pone. 00476 78 Abingdon. https:// doi. org/ 10. 4324/ 97813 51210 164 Ho TK (1995) Random decision forests. In: Proceedings of the third Peeters T (2018) Testing the wisdom of crowds in the field: transfer - international conference on document analysis and recognition, markt valuations and international soccer results. Int J Forecast August 14–16, 1995, Montréal, Canada. IEEE Computer Society 34(1):17–29. https:// doi. org/ 10. 1016/j. ijfor ecast. 2017. 08. 002 1 3 Social Network Analysis and Mining (2022) 12:23 Page 15 of 15 23 Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The develop- Wheatcroft E (2020) A profitable model for predicting the over/under ment and psychometric properties of LIWC2015. University of market in football. Int J Forecast. https:// doi. org/ 10. 1016/j. ijfor Texas at Austin. https:// doi. org/ 10. 15781/ T29G6Zecast. 2019. 11. 001 Piryani R, Madhavi D, Singh VK (2017) Analytical mapping of opin- Wunderlich F, Memmert D (2018) The betting odds rating sys- ion mining and sentiment analysis research during 2000–2015. tem: using soccer forecasts to forecast soccer. PLoS ONE Inf Process Manag 53(1):122–150. https://doi. or g/10. 1016/j. ipm. 13(6):e0198668. https:// doi. org/ 10. 1371/ journ al. pone. 01986 68 2016. 07. 001 Wunderlich F, Memmert D (2020) Innovative approaches in sports Rein R, Memmert D (2016) Big data and tactical analysis in elite soc- science—lexicon-based sentiment analysis as a tool to analyze cer: future challenges and opportunities for sports science. Spring- sports-related Twitter communication. Appl Sci 10(2):431. https:// erplus 5(1):1410. https:// doi. org/ 10. 1186/ s40064- 016- 3108-2doi. org/ 10. 3390/ app10 020431 Rinker TW (2013) qdapDictionaries: dictionaries to accompany the Wunderlich F, Seck A, Memmert D (2021) The influence of random- qdap package. Retrieved from http:// github. com/ trink er/ qdapD ness on goals in football decreases over time. An empirical analy- ictio naries sis of randomness involved in goal scoring in the English Premier Schaumberger G, Groll A (2018) Predicting matches in interna- League. J Sports Sci 39(20):2322–2337. https:// doi. org/ 10. 1080/ tional football tournaments with random forests. Stat Model 02640 414. 2021. 19306 85 18(5–6):460–482 Yu Y, Wang X (2015) World cup 2014 in the Twitter world: a big data Schumaker RP, Jarmoszko AT, Labedz CS (2016) Predicting wins and analysis of sentiments in U.S. sports fans’ tweets. Comput Hum spread in the premier league using a sentiment analysis of twit- Behav 48:392–400. https:// doi. org/ 10. 1016/j. chb. 2015. 01. 075 ter. Decis Support Syst 88:76–84. https:// doi. org/ 10. 1016/j. dss. Zhang X, Fuehres H, Gloor PA (2011) Predicting stock market indica- 2016. 05. 010 tors through Twitter “I hope it is not as bad as I fear.” Proc Soc Spann M, Skiera B (2009) Sports forecasting: a comparison of the Behav Sci 26:55–62. https://doi. or g/10. 1016/j. sbspr o.2011. 10. 562 forecast accuracy of prediction markets, betting odds and tipsters. Zou Q, Song K, Shi J (2020) A Bayesian in-play prediction model for J Forecast 28(1):55–72. https:// doi. org/ 10. 1002/ for. 1091 association football outcomes. Appl Sci 10(8):2904. https:// doi. Štrumbelj E, Šikonja MR (2010) Online bookmakers’ odds as forecasts: org/ 10. 3390/ app10 082904 the case of European soccer leagues. Int J Forecast 26(3):482– 488. https:// doi. org/ 10. 1016/j. ijfor ecast. 2009. 10. 005 Publisher's Note Springer Nature remains neutral with regard to Tumasjan A, Sprenger TO, Sandner PG, Welpe IM (2010) Predicting jurisdictional claims in published maps and institutional affiliations. elections with Twitter: what 140 characters reveal about political sentiment. Icwsm 10(1):178–185 Twitter API (2020) Retrieved from https:// devel oper. twitt er. com/ Wheatcroft E (2019) Evaluating probabilistic forecasts of football matches: the case against the ranked probability score. https:// arxiv. org/ abs/ 1908. 08980 1 3 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Social Network Analysis and Mining Springer Journals

A big data analysis of Twitter data during premier league matches: do tweets contain information valuable for in-play forecasting of goals in football?

Loading next page...
 
/lp/springer-journals/a-big-data-analysis-of-twitter-data-during-premier-league-matches-do-P66h3BcgeZ
Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2021
ISSN
1869-5450
eISSN
1869-5469
DOI
10.1007/s13278-021-00842-z
Publisher site
See Article on Publisher Site

Abstract

Data-related analysis in football increasingly benefits from Big Data approaches and machine learning methods. One relevant application of data analysis in football is forecasting, which relies on understanding and accurately modelling the process of a match. The present paper tackles two neglected facets of forecasting in football: Forecasts on the total number of goals and in-play forecasting (forecasts based on within-match information). Sentiment analysis techniques were used to extract the information reflected in almost two million tweets from more than 400 Premier League matches. By means of wordclouds and timely analysis of several tweet-based features, the Twitter communication over the full course of matches and shortly before and after goals was visualized and systematically analysed. Moreover, several forecasting models including a random forest model have been used to obtain in-play forecasts. Results suggest that in-play forecasting of goals is highly challeng- ing, and in-play information does not improve forecasting accuracy. An additional analysis of goals from more than 30,000 matches from the main European football leagues supports the notion that the predictive value of in-play information is highly limited compared to pre-game information. This is a relevant result for coaches, match analysts and broadcasters who should not overestimate the value of in-play information. The present study also sheds light on how the perception and behaviour of Twitter users change over the course of a football match. A main result is that the sentiment of Twitter users decreases when the match progresses, which might be caused by an unjustified high expectation of football fans before the match. Keywords Big data · Data mining · Social networks · Twitter · Football forecasting · In-play forecasting 1 Introduction This analysis is relevant to better understand football-related Twitter communication, to assess the role of randomness in In recent years football analysis has increasingly benefited football and valuable for coaches, match analysts and broad- from Big Data analysis and machine learning methods, in casters to better understand the influence of in-play events particular in an attempt to understand tactical behaviour on the further course of a match. and identify success-enhancing strategies (Dick and Brefeld The forecasting literature reflects two important aspects 2019; Grunz et al. 2012; Memmert and Raabe 2018; Rein researchers have faced when investigating predictive tasks and Memmert 2016). The present paper puts the approach of in football. The first aspect of forecasting is statistical and Big Data analysis and machine learning into a slightly differ - related to developing team ratings and forecasting models ent context by incorporating Twitter data into the analysis. with the best possible ability to derive forecasts from obvi- It focuses on in-play forecasts in football by examining the ous predictors such as prior match results. One of the most question whether information becoming available during a prominent approaches is to estimate offensive and defensive match is valuable to forecast the further course of events. strength parameters of the teams and use these as inputs for probability models including Poisson models (Koopman and Lit 2015; Maher 1982), birth process models (Dixon * Fabian Wunderlich and Robinson 1998) and Weibull count models (Boshnakov f.wunderlich@dshs-koeln.de et al. 2017). Other researchers have used regression models 1 based on one or various covariates such as Hvattum and Institute of Exercise Training and Sport Informatics, German Arntzen (2010) using ELO ratings in combination with an Sport University Cologne, Köln, Germany Vol.:(0123456789) 1 3 23 Page 2 of 15 Social Network Analysis and Mining (2022) 12:23 ordered logit regression model or Goddard and Asimako- investigate in which match situations scoring intensities poulos (2004) using various covariates in an ordered probit deviate from a constant rate. Both approaches are mainly regression model. The present approach is primarily related focused on understanding the process of a football match to the second aspect of forecasting, which is data-based and and whether certain game situations influence the scoring attempts to identify and investigate further sources of infor- behaviour. None of these articles investigates in-play fore- mation that prove useful in football forecasting. One source casts by calculating the effect of scoring deviations on the of information obviously is betting odds (Forrest et al. 2005) accuracy of in-play forecasts. To the best of our knowledge, being interpreted as a forecast and used as a standard bench- the only paper investigating the role of in-play information mark. Further sources include human forecasts (Andersson in forecasting football is the recent work of Zou et al. (2020), et al. 2005), prediction markets (Spann and Skiera 2009), which, however, is limited to the number of goals as only ranking systems such as the FIFA World Ranking (Lasek in-play information. While our paper is limited to football, et al. 2013), market values (Peeters 2018) or sets with vari- contributions focused on in-play models and in relation to ous explanatory variables including match significance, in-play betting odds have been investigated in other sports involvement in cup competitions and geographical distance such as tennis (Easton and Uylangco 2010; Kovalchik and between teams (Goddard and Asimakopoulos 2004). Reid 2019) and cricket (Akhtar and Scarf 2012; Asif and In the literature, football forecasting is most prominently McHale 2016). Reasons for the little effort made so far on associated with forecasting the match result in terms of win, in-play forecasting in football might be a higher model com- draw or loss. This seems a little one-dimensional, in the plexity, less availability of in-play betting odds as a bench- light of the wide range of events taking place during a foot- mark in comparison to pre-game betting odds, and higher ball match. With regard to the common win/draw/loss fore- effort to gather and handle in-play data. cast, Koopman and Lit (2019) introduced a categorization The difficulty of in-play forecasting of goals in football of methods, namely models indirectly based on modelling might be surprising because intuitively fans, experts and the number of goals scored by both teams, indirectly based commentators commonly argue that they have anticipated on modelling the goal difference or modelling the result in a goal; they’ve seen it coming or explain it as the logical terms of win, draw, loss directly. Forecasting the number consequence of the course of play. This, however, could be of goals, in that sense, is not an exotic task as models fall- a biased perception and it would be quite costly to measure ing into the first category and often being based on Poisson the collaborative human perception of a football match and distributions (Karlis and Ntzoufras 2003; Koopman and Lit the collaborative anticipation of the further progress in an 2015; Maher 1982) can easily be reused for goal forecast- experimental approach. For that reason, we make use of an ing. Boshnakov et al. (2017) pursue this strategy by using existing source of (big) data: Short textual messages from a Weibull count model to obtain forecasts for both match the microblogging platform Twitter with regard to a certain result and total number of goals. Wheatcroft (2020) uses football match, which can be considered an in-play reflec- ratings based on match statistics and logistic regression tion of collaborative human perception on this match. While to forecast the number of goals and is—to the best of our traditional dataset and probability models remain a predomi- knowledge—the only paper focusing in particular on this nant approach in football forecasting (Boshnakov et al. 2017; type of forecasting. Forecasting of total goals thus can be Koopman and Lit 2019; Wheatcroft 2020), researchers have considered a neglected aspect in the forecasting literature, also started to make use of Big Data (Brown et al. 2017) and presumably driven by the fact that the match results have machine learning (Berrar et al. 2019; Hubáček et al. 2019) stronger emotional and financial consequences for the fans in this domain. Twitter data itself has been used in various and teams than the total number of goals. domains of forecasting including elections (Huberty 2015; Another research gap in football forecasting is the investi- Tumasjan et al. 2010) or stock prices (Bollen et al. 2011; gation of forecasts made during the course of a match. This Zhang et al. 2011), but have been discussed very contro- comes as a surprise as so-called in-play betting has gained versial and critically (Gayo-Avello 2013; Huberty 2015; significant importance for bookmakers (Killick and Grif- Jungherr et al. 2011). While Twitter certainly provides the fiths 2019; Lopez-Gonzalez and Griffiths 2016). Moreover, possibility to gather massive datasets, the process of actually coaches, match analysts and broadcasters are highly inter- extracting relevant information is challenging and attempts ested in analysing matches in-play. In fact, some researchers to use Twitter in football forecasting have reported mixed have put thoughts to the scoring processes during the course results (Brown et al. 2017; Godin et al. 2014; Schumaker of the match in more detail. Dixon and Robinson (1998) et al. 2016). In economic and political situations, the theo- use a birth process model allowing scoring intensities to retical mechanism is viable as Twitter may reflect the opin- change during the match and depend on the score to analyse ion of the users and both election results and stock prices are the deviations from constant scoring rates. Similarly, Heuer directly influenced by the perception of the public. In foot- and Rubner (2012) use a model-free statistical analysis to ball, this mechanism is evidently not present as a team will 1 3 Social Network Analysis and Mining (2022) 12:23 Page 3 of 15 23 not succeed in a match only because the public would like missing data, this adds up to a total of 404 matches, repre- to see the team win. In forecasting goals in-play, however, senting a smaller, but richer dataset. Data source and infor- the following mechanism is conceivable: The course of the mation included are analogous to the previously mentioned match influences the perception of the fans that will share dataset. Additionally, it includes betting odds for over-under their opinion on Twitter. If the course of play is actually 1.5 goals in the second half collected from http:// www . oddsp a predictor for upcoming goals, Twitter data might indeed or t al. com and meta-data for all goals scored (namely the cur- have predictive value. Though not considering predictive rent score as well as the minute of the goal) collected from aspects, some researchers have focused on analysis of in- the official website of the English Premier League http:// play Twitter data in relation to football matches. It has been premi erlea gue. com. Moreover, short textual messages (so- reported that fans’ sentiments reflect reactions to goals of called tweets) were obtained from the microblogging plat- the own or opposing team (Yu and Wang 2015), fans tend to form Twitter for each match covering the day of the game have a higher team identification when the team is leading and including the official match hashtag (e.g. #ARSMUN than when it is trailing (Fan et al. 2020) and communication for the match Arsenal vs. Manchester United) making use of on the video assistant referee (VAR) is strongly associated the official Twitter API (Twitter API 2020). Information on with negative sentiment (Kolbinger and Knopp 2020). In the tweets includes the textual data itself as well as the exact contrast to the present study, however, analyses were based date of creation which we relabelled to the time within the on highly limited sample sizes of five or less matches (Fan match (i.e. -30 for a tweet created 30 min prior to the match et al. 2020; Yu and Wang 2015) or on a very specific type of and 38.5 for a tweet created after 38.5 min of match time). event during the matches, namely the VAR (Kolbinger and A total of 3,139,441 tweets were collected, the final analysis Knopp 2020). only included tweets written one hour prior to the match or The contributions of the present approach are threefold. during the actual match time adding up to 1,765,379 tweets. First, a preliminary analysis sheds light on the general dif- Please note that both the Twitter Search API used within this ficulty of in-play forecasting. Second, the topics discussed study and the real-time Streaming API only provide a sample by Twitter users as well as their perception of the match of the full available data. over the course of football matches and as a reaction to goals are analysed by means of sentiment analysis techniques and 2.2 Feature extraction further non-semantic tweet characteristics. Third, the pos- sible informative value of Twitter data when used in in-play Solely the tweets in the final analysis consist of more than forecasting models is investigated. 25 million words and due to the volume and the highly unstructured nature of textual data, it is not straightforward to extract machine-useable information, which underlines 2 Data and methods the importance of an elaborate feature extraction process. English tweets including the official hashtag of a match have 2.1 Data been collected. No tweets in other languages and no addi- tional hashtags or search terms related to the match or any For a preliminary analysis, a dataset consisting of a total of official or unofficial hashtags related to the premier league 31,912 matches from 10 seasons (07/08–16/17) in 10 major and the clubs were considered. Pre-processing of tweets European leagues (first divisions of England, Spain, Ger - included removing of content other than evaluable words, many, Italy, France, Portugal, Belgium, Turkey, the Nether- like URLs, mentions, punctuation, hashtag signs, emoti- lands and Greece) was used. Data were obtained from http:// cons, characters and digits. Moreover, known contractions footb all- data. co. uk and included the following information and acronyms were replaced with full forms and intention- for each match: Teams involved, date, halftime score, final ally misspelled words (like e.g. “niiiiiiiice”) were doubled score and betting odds for over-under 2.5 goals. Betting (“nice nice”) to correct for the expressed intensification. odds can be interpreted as an aggregated market forecast and Cleaned tweets were then analysed by three different lexi- reflect a very strong benchmark for forecasting models in con-based sentiment analysis methods, namely the commer- football (Hvattum and Arntzen 2010; Štrumbelj and Šikonja cial LIWC 2015 software (Pennebaker et al. 2015) as well 2010). Over-under bets reflect betting opportunities on the as the QDAP dictionary (Rinker 2013) and the SenticNet4 total number of goals, in this case with the possibility to bet lexicon based on the work of Cambria et al. (2016). Finally, on two or less or three or more goals. For the main analysis, an average score of positivity and negativity was assigned data of Premier League football matches were obtained in to each tweet. For more details, we refer to Wunderlich and the period from 22 February 2019 to the last game before Memmert (2020), who validated this exact method using the interruption of the league caused by the COVID-19 pan- football-specific Twitter data and reported a reasonable demic on 09 March 2020. Removing three matches due to accuracy in applications with a sufficiently high number of 1 3 23 Page 4 of 15 Social Network Analysis and Mining (2022) 12:23 tweets. The sentiment of a tweet is defined as the difference regard to the tweet intensity, the number of tweets per min- between the positivity and the negativity score. Sentiment ute in a certain time interval is divided by the number of has been included for analyses, but excluded as a feature in tweets per minute in the last hour before the match. A pre- classification methods for reasons of multicollinearity and match normalized tweet intensity of 2 in the first half thus redundancy. Further non-semantic features extracted from means that during the first half the number of tweets per the tweets are the average number of words (based on tweets minute was twice as high as before the match. Time normali- after pre-processing), hashtags and emoticons included as zation is then performed by dividing through the number of well as the tweet intensity, simply referring to the number of tweets per minute across all matches. Reasons and conse- tweets. As a further in-play feature, the total number of goals quences of normalizing the data in the above way as well as scored is considered. Pre-game features are the probability the results based on normalized data will be outlined in the for over 2.5 goals and the probability for over 1.5 goals in Analysis section. the second half as obtained from the pre-game betting odds by converting decimal odds to probabilities (cf. Wunderlich 2.4 Random forest model and Memmert 2018). The corresponding counter-probabil- ities (under 2.5/1.5 goals) are not considered for reasons Random forests are an ensemble learning method based on of redundancy. Table 1 summarizes the features used for the idea of using a multitude of decision trees (so-called for- further analysis. est) going back to the work of Ho (1995). Current applica- tions commonly refer to the method developed by Breiman 2.3 Normalization (2001). Random forest methods have already been applied to forecasting in football (Schaumberger and Groll 2018) and Throughout our analysis, two different forms of normaliza- other sports (Lessmann et al. 2010). For the present analy- tion are used for the Twitter based features. Using the exam- sis, random forests were implemented in Python using the ple of number of words in a tweet, let w be the number of mit RandomForestClassifier from the package sklearn.ensemble. words of each of the n tweets i = 1… n in match m at time t The following hyperparameters were tested with regard to (where t can be a minute of play or some longer time interval the random forest classifier: The number of trees per forest such as the first half). Then we define w to be the average mt (n_estimators ranging from 250 to 1000 in steps of 250) number of words in all tweets from that time interval. Let w and the maximum depth of a tree (max_depth ranging from be the average number of words in the last hour prior to the 1 to 8). Detailed results of the hyperparameter tuning and match, then w = w /w is the average number of words mt mt m the effect of parameters on the results are discussed in the for time interval t in match m normalized for pre-match data. Analysis section. This normalization will be denoted as pre-match normalized throughout the paper. Further, let w be the average number 2.5 Cross validation of (pre-match normalized) words in time interval t across all matches, then w = w ∕ w is the number of words normal- mt mt t To validate the accuracy of forecasting models, k-fold cross ized for pre-match data and additionally for time, which will validation was used, i.e. the data were split into k subsamples simply be denoted as time normalized subsequently. using one of the subsamples as test set and the remaining This definition can be used in complete analogy for the data as training set (Browne 2000). The choice of the num- other Twitter-based features, except for the number of tweets ber of subsamples k is a trade-off dependent on the time (tweet intensity), where average values to not apply. With required for training and the total size of the data sample available. For the analysis of small time intervals data were Table 1 Features used for forecasting models throughout the paper split into 17 subsamples resulting in training sets of 6464 time intervals and test sets of 404 time intervals each. Feature Source Time Probability over 2.5 goals Betting market Pre-game 2.6 Forecasting accuracy Probability over 1.5 goals 2nd half Betting market Pre-game Number of goals scored 1st half Match data In-play Statistical measures of forecasting accuracy are based on Average negativity score Twitter In-play the idea of quantifying the difference between the forecasted Average positivity score Twitter In-play probabilities and the actual outcomes. Driven by the incon- Tweet intensity Twitter In-Play sistent use of such measures in the literature, Constantinou Average number words Twitter In-play and Fenton (2012) have assessed this topic and proposed to Average number hashtags Twitter In-play use the Rank Probability Score (RPS) as an adequate meas- Average number emoticons Twitter In-play ure for forecasting models in football. As the over-under 1 3 Social Network Analysis and Mining (2022) 12:23 Page 5 of 15 23 market just possesses two possible outcomes, the RPS can be simplified to RPS = p − o 1 1 where p is the forecasted probability of outcome 1 and o 1 1 equals 1 if outcome 1 occurred and 0 otherwise. Due to the symmetry of binary forecasts this is equivalent to calculating RPS based on the second outcome. While being the standard approach, RPS is not undisputed and also has weaknesses which have been demonstrated by Wheatcroft (2019), who suggests to use the ignorance score instead, defined as IGN =− log (p ) 2 i where p is the forecasted probability of the actual outcome i . For reasons of simplicity we only report the average RPS for each model in the Results section, but we have tested results for robustness by repeating analysis with IGN and did not experience any differences with regard to the main conclusions of the paper. 2.7 Bootstrapping To avoid any assumptions about the theoretical distribution of the data, bootstrapping methods with 10,000 resamples were used to calculate confidence intervals and as an alterna- tive to parametric hypotheses tests when comparing forecast- ing models in Tables 2 and 4 as well as in Figs. 2 and 3 with regard to sections Time dependence and Goal analysis. For an overview on bootstrapping methods and details on the Fig. 1 Boxplot illustrating the distribution of probabilities for over calculations, we refer to Efron and Tibshirani (1994). We 2.5 goals across the dataset highlight p-values falling below a significance level of 5% as significant throughout the paper. and only one in ten matches has a higher probability than 60.4%. Besides the betting odds, which in principle only reflect 3 Analysis differences in the expectations, it should also be possible to find evidence in the results directly. If matches with a 3.1 Preliminary analysis: difficulty of in‑play forecasting systematically higher or lower total scoring intensity exist, the number of goals in the first half and second half To demonstrate the general idea of probabilistic in-play should be correlated. Correlation was found to be r = 0.05 (t(31,910) = 13.98, p < 0.001), which is evidence that sys- forecasting of goals and the difficulty of this task, the large dataset of more than 30,000 matches as defined in the Data tematic differences in the goal expectations exist, but given the very small correlation coefficient a predominant influ- section is used. The first aspect to consider is to what extent the total number of goals is predictable at all or just reflects ence of randomness exists. In summary, systematic differ - ences in terms of expected scoring intensities across matches pure random processes. Figure  1 shows a box plot of the pre-game probabilities for over 2.5 goals obtained from the exist, but are highly limited. A general predictability of the number of goals, however, betting odds. If there were no different expectations for the number of goals in a match, betting odds and thus probabili- does not necessarily imply that real in-play forecasting is possible at all. Thus, the next question is whether the goal ties would be constant for all matches. The dispersion of val- ues proves that there are indeed different goal expectations, expectation is predefined at the start of the match or whether information becoming available during the match helps to however, the expectations seem to be rather homogeneous as only one in ten matches has a lower probability than 40.3%, forecast the further course of events. In order to investigate 1 3 23 Page 6 of 15 Social Network Analysis and Mining (2022) 12:23 Table 2 Results for various Model RPS UNI FRQ GOAL PROB models forecasting over-under 1.5 goals in the second half UNI None 0.2500 – – – – FRQ Pre-game 0.2485 0.0069* – – – GOAL In-play 0.2480 0.0012* 0.0467* – – PROB Pre-game 0.2434 < 0.0001* < 0.0001* < 0.0001* – BOTH Both 0.2433 < 0.0001* < 0.0001* < 0.0001* 0.6073 *p-value lower than 5% this question, forecasts for the number of goals (i.e. prob- just not a useful in-play predictor. For that reason, Twitter ability for over 1.5 goals) in the second half are performed data as a potential source of in-play information are analysed based on two variables: The pre-game probability for over in the next section. Before turning to forecasts from Twit- 2.5 goals as a market reflection of goal expectation avail- ter data, the data are analysed with respect to several other able prior to the match as well as the total number of goals aspects, namely factors influencing tweet intensity, and the actually scored in the first half as in-play information. Differ - effect of time and goals on Twitter communication. ent numbers of goals (i.e. 2.5 goals for the complete match and 1.5 goals for the second half) were chosen to consider 3.2 Twitter analysis the option with most balanced probabilities differing due to the remaining match time. The data sample was split into 5 3.2.1 Match‑based analysis of tweet intensity seasons of in-sample data (15,844 matches) and 5 seasons of out-of-sample data (16,068 matches). A total of five dif- A first qualitative observation in the Twitter data is that ferent forecasting models are analysed: The common naïve differences across matches seem to only partly depend on benchmark models UNI attaching a probability of 50% to in-play events as even before the start massive differences over 1.5 goals for each match and FRQ using the observed can occur. As an extreme example the tweet intensity for frequency of over 1.5 goals in the in-sample data for each the match Manchester vs. Arsenal was more than 100 times match (cf. Hvattum and Arntzen 2010) as well as three logis- higher than for the match Brighton vs. Burnley both pre- tic regression models using the probability of over 1.5 goals match and in-play. For this reason, a closer look on the rea- as dependent variable and the pre-game probability for over sons for varying tweet intensities in the matches shall be 2.5 goals as obtained from the betting odds (PROB), the total given. In particular, four factors potentially having influence number of goals scored in the first half (GOAL ) or both vari- on tweet intensity will be analysed: ables (BOTH) as independent variables. Table 2 presents the average rank probability scores for each forecasting model 3.2.1.1 Popularity First, we use the average number of when using the estimated model parameters to obtain fore- spectators at home matches of a team as an estimation of the casts for all matches in the out-of-sample dataset. In addi- general popularity of this team. The numbers were obtained tion, pair-wise p-values comparing the RPS values across from https:// www. weltf ussba ll. de and as the dataset contains the models are presented. matches from 18/19 to 19/20 season, the spectator numbers The forecasting results paint a clear picture and suggest were averaged across both seasons. Moreover, spectators that in-play information does have some very weak predic- were normalized by the maximum number of spectators of tive value when comparing to simple benchmarks, but no any team, which yields popularities ranging from 1.0 for additional value when controlling for pre-game information. Manchester United with the highest number of spectators The model PROB using solely the pre-game expectation to 0.14 for Bournemouth being the least popular team. The significantly outperforms both benchmarks and the model number of spectators arguably is not a perfect representa- GOAL using only in-play information. Once the pre-game tion of popularity given that the capacity of a stadium can information is included, the average rank probability scores be a highly confounding factor. Still, it can be assumed to for the model hardly improves if adding in-play information be an easily available and transparent measure with a rea- on the number of goals. Despite the large database the model sonably high correlation to popularity. To account for both BOTH using both pre-game and in-play information fails to teams’ popularities in a match concurrently, the popularity significantly outperform PROB. The above results are clear of a match is determined by multiplying the popularity of evidence for the difficulty of forecasting the total number of both teams. goals in-play, yet it is not clear whether the small benefit of in-play information is based on the fact that the goal expec- 3.2.1.2 Goals Match events in general and goals in particu- tation is predefined prior to the match or that prior goals are lar can be assumed to stimulate Twitter activity. Thus, the 1 3 Social Network Analysis and Mining (2022) 12:23 Page 7 of 15 23 total number of goals is used as the second factor potentially influencing the tweet intensity. 3.2.1.3 Scoreline The scoreline during a match determines whether the game has already been decided or whether the results is still open. This may take influence on the behav - iour of Twitter users in several ways. It could be argued that a close scoreline captivates the audience and stimulates tweet intensity. At the same time, matches being decided early may stimulate early analysis of results including joy about an upcoming victory or discussing reasons for a lost match. We summarize the scoreline of a match by summing up the length of time intervals in a match where both teams differ at least by two goals and consequently a single goal would not significantly alter the match outcome. For exam- ple, a match ending 2–0 with the second goal being scored after exactly 60 min has been “decided” for 30 min. 3.2.1.4 Weekend Finally, external factors neither related to the teams, nor related to the events in a match may have an influence on the possibility and motivation for fans to watch and tweet on football matches. Thus, we introduce a dummy variable indicating whether a match took place on a weekend (Saturday or Sunday) or during the week (Mon- day–Friday). Three linear regression models were fitted using the tweet intensity (pre-game, in-game, and total, respectively) as dependent variable and the four factors as independ- ent variables. All three regression models indicate a sig- nificant influence of the factors on the tweet intensities: F(4,399) = 91.72, p < 0.001, R = 0.474 for pre-game adj tweet intensity; F(4,399) = 88.62, p < 0.001, R = 0.465 adj for in-game tweet intensity; F(4,399) = 92.54, p < 0.001, R = 0.476 for total tweet intensity. The detailed results adj for each factor are summarized in Table 3. Results are evidence that tweet intensity both pre-game and in-game is highly significantly (p < 0.001) influenced by popularity, while there is no significant influence of the weekend on the number of tweets. Naturally, Goals and Scoreline being linked to match events unknown pre-game do not possess any significant influence on pre-game inten- sities. In-game, however, the number of goals significantly increases (p < 0.01) the number of tweets, indicating an increased stimulation of tweets via goals that will be ana- lysed in more detail in the section Goal analysis. Scoreline does not have a significant influence on in-game tweet inten- sity (p = 0.10), however, there is a slight tendency of more tweets in case of already decided matches. Driven by the larger number of in-game tweets, the results for total tweet intensity are largely consistent with the results for in-game tweet intensity. Given the heterogeneity of matches in terms of large dif- ferences in tweet intensity and the fact that other features 1 3 Table 3 Results for the linear regression model using tweet intensities as dependent variable on a match level. Separate models are summarized for tweet intensity pre game, tweet intensity in game and total tweet intensity Variable IntensityPreGame IntensityInGame Intensity Coefficient Standard error beta t p-value Coefficient Standard error beta t p-value Coefficient Standard error beta t p-value Popularity 3324.44 174.95 0.69 19.00 < 0.001* 20,389.08 1116.81 0.67 18.26 < 0.001* 23,713.52 1263.47 0.68 18.77 < 0.001* Goals − 3.56 20.11 − 0.01 − 0.18 0.86 363.58 128.35 0.12 2.83 < 0.01* 360.02 145.20 0.10 2.48 0.014* Scoreline 1.11 1.40 0.03 0.79 0.43 14.88 8.97 0.07 1.66 0.10 15.99 10.15 0.06 1.56 0.12 Weekend − 59.89 66.80 − 0.03 − 0.90 0.37 162.45 426.42 0.01 0.38 0.70 102.56 482.42 0.01 0.21 0.83 Constant − 238.93 90–94 − 2.63 < 0.01* − 2845.20 580.52 − 4.90 < 0.001* − 3084.13 656.75 −4.70 < 0.001* *Significant at 5% level 23 Page 8 of 15 Social Network Analysis and Mining (2022) 12:23 (yet to a lower degree) differ pre-match, it seems unreason- decreasing positivity and increasing negativity. Tweets get able to draw any conclusion about in-match processes from shorter once the match starts as the number of words drops non-normalized values. For this reason, the subsequent anal- after the kick-off. The number of hashtags and emoticons ysis is based on pre-match normalized data as explained in decreases as well, which can only partly be attributed to the Method section. the shorter tweets. Confidence intervals are hardly vis- ible in the figure except for the tweet intensity, where the 3.2.2 Time dependence sample size is about 400 matches compared to almost 2 million tweets for the other features. The narrow confi- Before considering potential predictive value, it seems rea- dence intervals (even for the tweet intensity) suggest that sonable to take a more general look at what happens over all results are highly robust. Some interesting conclusions the course of the match and directly before and after goals. can be drawn from the evolvement of features: First, if Figure  2 illustrates the evolvement of features over time analysing the time intervals before and after goals a useful during the matches for ten time intervals (the hour pre- normalization for time is needed. Therefore, the time nor- game and nine intervals of 10 min each within the match, malized data, as described in the Method section, is used as well as 95% confidence intervals. Please note that pre- for all further analyses. Second, football fans seem to be game values equal 1.0 for each feature due to the pre-match the happiest before the kick-off and a football match does normalization. not seem to be good for the mood (at least of tweeters). In The tweet intensity jumps when the match starts and summary, one can say that anticipation clearly is the most slightly increases over the course of the match. Interest- beautiful kind of joy. ingly, the overall sentiment of tweets decreases due to Fig. 2 Evolvement of features over the course of the match. Asterisks indicate a p-value of lower than 0.05 when comparing the respective time interval to pre-game 1 3 Social Network Analysis and Mining (2022) 12:23 Page 9 of 15 23 3.2.3 Goal analysis The analysis includes a total of 1118 goals scored in the matches from our dataset. Usage of the time normalized data makes it possible to Again, confidence intervals are narrow or even hardly take a direct look at what happens before and after goals visible indicating the robustness of results. A clear and intui- are scored. Therefore, a minute value with respect to goals tive interpretation can be given for the time interval after was assigned to each tweet. Negative values were attached the goals: Goals evoke a large number of relatively short to tweets that were posted within the last 10 min before a tweets. While the number of tweets increases by a factor of goal (e.g. -7 if the tweet was posted 7 min before the goal). three shortly after the goal, the tweet length decreases by Analogously, positive values were attached to tweets posted roughly 30%. The effects for hashtags and emoticons are within the 10 min following a goal and 0 if the tweet was partly attributable to the shortness of the tweets. Only small posted in the same minute of the goal. Tweets that were effects are visible with regard to the sentiment analysis, posted before or after several different goals cannot be unam- where surprisingly both negativity and positivity slightly biguously assigned and thus were excluded from analysis. decreases resulting in a pretty stable overall tweet senti- Tweets that are not close in time to any goal were put into ment. Please note that the tweets are attached to a match an additional category and used as a benchmark. Figure 3 and not to particular teams, which means that we cannot illustrates what happens to the various Twitter features in distinguish between the fans of the scoring and the con- the 10 min before and after a goal. The dotted vertical line ceding team. In the forecasting context, the idea is to find refers to the benchmark of tweets that are independent from signs that are already present in the data during the minutes goals, and the grey areas refer to 95% confidence intervals. before the goals are scored. Most features are in line with Fig. 3 Evolvement of features shortly before and after goals. The shortly before nor after a goal. Asterisks indicate a p-value of lower vertical line illustrates the time when the goal was scored. The hori- than 0.05 when comparing the respective minute to the benchmark zontal line refers to the benchmark of tweets that were neither written 1 3 23 Page 10 of 15 Social Network Analysis and Mining (2022) 12:23 Fig. 4 Wordcloud visualizing frequently used words from tweets Fig. 6 Wordcloud visualizing frequently used words from tweets written pre-match, i.e. within the last hour before the match written during the second half of a match, but excluding tweets writ- ten shortly before and after goals Fig. 5 Wordcloud visualizing frequently used words from tweets written during the first half of a match, but excluding tweets being Fig. 7 Wordcloud visualizing frequently used words from tweets written shortly before and after goals written shortly after goals the benchmark and thus do not support this idea. However, single goal have not been considered in consistency with the a slightly increased positivity and overall sentiment can be section Goal analysis. Wordclouds were created by means found prior to the goals, potentially being a weak early indi- of the python package wordcloud (version 1.8.1) using 50 cation of goals. The same is true for emoticons, however, as the maximum number of words. The official hashtags of being less clear. the matches, all team names and known acronyms of team names (such as “lfc” for “liverpool football club”) were not 3.2.4 Analysis of words and topics considered, as well as the predefined list of typical stop- words including words like e.g. “it”, “was” or “this”. The analyses so far have taken account of the number of Pre-game communication includes plenty of words words or the sentiment of words, but not visualized the com- related to broadcasting of the match, such as “live stream”, munication in a more detailed way. In order to gain insights “watch live” and “hd”. Moreover, several words including on the topics and frequently used words in association with “start”, “starting”, “today”, “tonight” or “now” directly refer a match, four different wordclouds representing different to the upcoming start of the match. Finally, there seems to be phases before and during the matches are used. Figure 4 noticeable discussion on which players were chosen to play analyses pre-match communication and thus refers solely to or not play by the coach, evidenced by words like “lineup”, tweets written in the last hour before a match started. Fig- “bench”, “player” and possibly also “team” or “starting”. ures 5 and 6 refer solely to tweets written during the first half Please note that unusual terms like “coyg” (come on you and the second half, respectively. However, in order to cap- gunners) or “ynwa” (you’ll never walk alone) refer to foot- ture general instead of event-based communication on the ball-related acronyms that were not contained in our list of matches, those tweets being associated to one or more goals known acronyms and thus remained included in the data. (i.e. written in the 10 min before or after goals) were not Communication during the match is still subject to a lot considered. Finally, Fig. 7 analyses event-based communica- of discussion on how to follow broadcasts of the match, as tion and thus includes tweets written in the 10 min following words like “live stream”, “live” and “hd” stay highly pre- a goal. Tweets that cannot unambiguously associated with a sent. The presence of words like “play”, “player”, “playing” 1 3 Social Network Analysis and Mining (2022) 12:23 Page 11 of 15 23 or “fan” can be considered very general and expectable for regression based on the betting odds of over-under 2.5 communication during the match, that is not associated to goals as well as over-under 1.5 goals in the second half. goals. In general, differences between first and second half Results show that UNI is an unreasonable choice for the seem to be rather limited, except for the prominent role short time intervals, which is attributable to the fact that of the word “game” in the second half. This suggests that goal scoring probabilities for intervals of 5 min are way towards the end of the match or given a clear scoreline, users smaller than 50%. As expectable, FRQ representing the tend to already discuss the game as a whole, summarize it or lowest level of information also possesses the weakest pre- draw conclusions from it. dictive accuracy. ODDS possesses the highest predictive Communication directly following a goal is naturally quality, which is in line with the notion that betting odds strongly influenced by words in direct association to the are a strong predictor of football matches. The main result goal (“goal”, “score”, “lead”), or related to discussing the is that LR and RF, although including additional in-play circumstances of a goal, such as “var” (video assistant ref- information both fail to outperform pre-game information eree) or “penalty”. The large occurrence of the word “game” based on the betting odds. As such, Twitter data did not might again indicate that once a goal decides a match, it is improve pre-game forecasts for the number of goals in already discussed as a whole. Moreover, some expectable matches. Except for UNI, all other models are pretty close expressions of emotional reactions like “good” or “shit” are in terms of accuracy with the only significant difference included, however, by far not dominating the wordcloud. between FRQ and ODDS. This underlines that in-play forecasting seems to be a difficult task. Results for the 3.2.5 In‑play goal forecast hyperparameter tuning are summarized in Fig. 8, which shows that increasing the number of trees had a very lim- In order to answer the question whether the information ited effect, while the optimal maximum depth lies around in the data is sufficient to forecast goals in-play, small 3 to 4. The results in Table 4 refer to the optimal specifica- time intervals were considered. Matches were split into tion of n_estimators = 500 and max_depth = 3. Please note intervals of 5 min and normalized Twitter features were that the hyperparameters did not have major effects on the calculated accordingly. The variable to be forecasted is forecasting accuracy of RF ranging from 0.1466 to 0.1470. an indicator of whether a goal was scored in the next time More importantly, for none of the hyperparameters tested, interval and the last interval of 5 min per match was con- a significant difference between RF and LR or ODDS was sequently excluded from the data. This results in a sample found. As such, the hyperparameter selection does not of 6868 time intervals. No betting odds are available for affect any of the results of the present study. time intervals of 5 min, therefore ODDS refers to a logistic Fig. 8 Results of hyperparameter tuning for the random forest model. The forecasting accuracy as RPS is illustrated in dependence of the num- ber of trees (n_estimators) and the maximum tree depth (max_depth) 1 3 23 Page 12 of 15 Social Network Analysis and Mining (2022) 12:23 Table 4 Results for forecasting Model Information RPS p-value compared to goals in time intervals of 5 min from the preceding time interval UNI FRQ LR RF UNI None 0.2500 – – – – FRQ Pre-game 0.1469 < 0.0001* – – – LR Both 0.1467 < 0.0001* 0.4362 – – RF Both 0.1466 < 0.0001* 0.1578 0.7966 – ODDS Pre-game 0.1465 < 0.0001* 0.0452* 0.3642 0.3584 *p-value lower than 5% It seems that fans (or at least those active on Twitter) tend 4 Discussion to be disappointed by football matches, possibly caused by unjustified high expectations before and at the beginning The results of the present study shed light on three differ - of matches. The use of Twitter data and sentiment analysis ent aspects of in-play forecasting with Twitter data, namely techniques enables researchers to investigate perception and in-play forecasting in general, a detailed analysis of Twitter psychological reactions of users during football matches. communication over the course of matches and the value of Further research with a psychological focus could investigate Twitter in in-play forecasting in football. which mechanisms drive the disappointment of fans during The preliminary analysis suggests that in-play forecasting matches. of goals in general is a difficult task. Results are evidence for The analysis of minutes before and after goals reveals the limited value of in-play information (i.e. goals) to fore- the reaction to goals, in particular a dramatic increase in cast the further course of a match when compared to betting tweet intensity where tweets are significantly shorter and a odds as pre-game information, a fact that football players, resulting lower number of hashtags and emoticons. The most coaches, match analysts, broadcasters and fans would prob- unintuitive and difficult to explain result is the slightly lower ably strongly deny. Possible explanations are the high pre- negativity and positivity directly after goals. It is impor- dictive quality of betting odds in football forecasting (For- tant to note that the tweets in our database are assigned to rest et al. 2005; Hvattum and Arntzen 2010; Štrumbelj and the match and not to a single team, thus emotions of both Šikonja 2010) and the significant role of randomness in goal teams’ fans should be included which makes an unchanged scoring in football (Brechot and Flepp 2020; Lames 2018, overall sentiment comprehensible. However, even if includ- Wunderlich et al. 2021). Moreover, the result is in line with ing fans of the team scoring, the team receiving and even Wunderlich and Memmert (2018) who showed that betting neutral observers, one would at least expect an increased odds of prior matches possess more predictive value than the emotionality as a reaction to the goal. One explanation could results of the matches themselves. be neutral tweets that have a descriptive and no evaluative The analysis of tweet intensity revealed that both in-play expression (e.g. “Penalty for The Red Devils. Rashford and pre-game, tweet intensity is predominantly driven by steps up and CONVERTS! Manchester United 1–0 Chel- the popularity of the two teams competing. Moreover, tweet sea.”) or tweets that were potentially written with a lot of intensity is increased in-play in matches with a higher num- emotion, but do not include any words with a clear positive ber of total goals scored. The analysis of time dependence or negative connotation identifiable by a sentiment analy - and goal analysis reveal how the reactions of Twitter users sis algorithm (e.g. "GOOO[…]OOOL!!!! Rashford!!! 1–0 change over the course of matches and after goals are scored. United!!!). With regard to the sentiments, although being Before the matches start less, but longer tweets are written, validated in football, textual data are highly domain-specic fi when compared to during the match, which is explainable by and increased accuracy might be achievable if using domain- a heightened interest and faster sequence of events in-play. specific methods such as football-specific lexica of words. A In terms of the topics and words contained, pre-game tweets more detailed analysis on what drives this unintuitive result, are highly influenced by communication on how to follow however, is beyond the scope of this study. broadcasts of the match and discussing which players are While the Twitter data clearly react to goals scored, a playing. Differences between communication in the first and main focus of our approach was to test Twitter data for second half are highly limited, while tweets directly follow- possible predictive value. The present data clearly do not ing goals are naturally dominated by discussion on the score, support the idea that in-play Twitter data have predic- the goal itself and its possible causes. The most striking tive value as forecasts based on pre-game betting odds result with regard to time dependence is a steadily increas- were not outperformed by a logistic regression model as ing negativity and a steadily decreasing positivity while the well as a random forest model including in-play Twitter match evolves, resulting in a clearly decreasing sentiment. 1 3 Social Network Analysis and Mining (2022) 12:23 Page 13 of 15 23 information. The fact that random forest models did not 5 Conclusions outperform logistic regression and hyperparameter tun- ing did have very limited effects on the accuracy suggests The present approach investigates in-play forecasting of that this is actually attributable to the missing informa- football matches in general and a Big Data approach using tive value of the Twitter data and not to the selection of Twitter data in particular. Results are evidence that in-play methods. Put simply, we could not extract information forecasting of goals is a highly challenging task as informa- from Twitter data that helps to forecast upcoming goals. tion gathered in-play (both basic events like goals and tex- Three possible aspects could explain this result. First, the tual data from Twitter) are not improving forecasting accu- in-play predictability seems to be very limited in general racy when compared to pre-game information. In addition, as previously demonstrated. Further studies investigat- results suggest that the fans’ perception of a match gets more ing in-play notational data or positional data could shed and more negative over time as the sentiment of tweets on more light on the question to which degree in-play fore- Twitter is decreasing over the course of the match. casting is possible at all. Second, Twitter data might not include information that is relevant for forecasting. In a Funding Open Access funding enabled and organized by Projekt way, this is surprising as Twitter can be seen as a source DEAL. The authors received no specific funding for this project. of crowd wisdom and such sources have been shown to be highly valuable in forecasting football (Forrest et al. Declarations 2005; Peeters 2018; Spann and Skiera 2009). On the other side, Twitter is not a vehicle directly related to forecast- Conflict of interest The authors declare to have no conflict of interest. ing such as the betting market or prediction markets and moreover information is not easily extractable from Twit- Ethical approval We did not conduct any experimental research involv- ing humans or animals. ter. Thus, the third possible aspect is that the information reflected in Twitter data might not have been extracted Data availability No data with regard to the direct content of tweets effectively. Textual data are highly unstructured which will be published or forwarded in line with the Developer Agreement makes the extraction of information difficult and leads to and Policy of the Twitter API. All further data used within this study are available in the public domain and the respective sources have been a limited degree of accuracy for sentiment analysis tech- mentioned in the manuscript. niques (Wunderlich and Memmert 2020). Further progress in this domain can be expected as sentiment analysis is a Open Access This article is licensed under a Creative Commons Attri- highly relevant topic in computer science (Mäntylä et al. bution 4.0 International License, which permits use, sharing, adapta- 2018; Piryani et al. 2017), nevertheless it will remain chal- tion, distribution and reproduction in any medium or format, as long lenging to algorithmically reproduce human understanding as you give appropriate credit to the original author(s) and the source, of textual data. The problem of extracting relevant data provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are might be aggravated by the short time intervals of 5 min included in the article's Creative Commons licence, unless indicated yielding limited tweet samples and a higher randomness in otherwise in a credit line to the material. If material is not included in the features. To account for the issue of short time inter- the article's Creative Commons licence and your intended use is not vals, we repeated analysis using data from the complete permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a first half of a match to forecast the number of goals in copy of this licence, visit http://cr eativ ecommons. or g/licen ses/ b y/4.0/ . the second half of a match. Despite larger time intervals, results implied the same conclusions, which suggest that the limited in-play predictive value is not attributable to the small time intervals. References In experimental research, the present results could be assessed as a null result as they do not support the notion Akhtar S, Scarf P (2012) Forecasting test cricket match outcomes of predictive in-play value of Twitter data and question the in play. Int J Forecast 28(3):632–643. https://doi. or g/10. 1016/j. general value of in-play information including goals. Still, ijfor ecast. 2011. 08. 005 Andersson P, Edman J, Ekman M (2005) Predicting the World Cup this is surprising and valuable information to coaches, 2002 in soccer: performance and confidence of experts and non- match analysts and broadcasters who should question care- experts. Int J Forecast 21(3):565–576. https://doi. or g/10. 1016/j. fully to what extent in-play information can be used at all ijfor ecast. 2005. 03. 004 to draw conclusions on the further course of a match. Asif M, McHale IG (2016) In-play forecasting of win probability in one-day international cricket: a dynamic logistic regression model. Int J Forecast 32(1):34–43. https://doi. or g/10. 1016/j. i jfor ecast. 2015. 02. 005 1 3 23 Page 14 of 15 Social Network Analysis and Mining (2022) 12:23 Berrar D, Lopes P, Davis J, Dubitzky W (2019) Guest editorial: spe- Press, Los Alamitos, pp 278–282. https://d oi.o rg/10. 1 109/I CDAR. cial issue on machine learning for soccer. Mach Learn 108(1):1–1995. 598994 7. https:// doi. org/ 10. 1007/ s10994- 018- 5763-8 Hubáček O, Šourek G, Železný F (2019) Exploiting sports-betting Bollen J, Mao H, Zeng X-J (2011) Twitter mood predicts the stock market using machine learning. Int J Forecast. https:// doi. org/ 10. market. J Comput Sci 2(1):1–8. https:// doi. org/ 10. 1016/j. jocs. 1016/j. ijfor ecast. 2019. 01. 001 2010. 12. 007 Huberty M (2015) Can we vote with our tweet? On the perennial dif- Boshnakov G, Kharrat T, McHale IG (2017) A bivariate Weibull ficulty of election forecasting with social media. Int J Forecast count model for forecasting association football scores. Int J 31(3):992–1007. https://doi. or g/10. 1016/j. i jforecas t.2014. 08. 005 Forecast 33(2):458–466. https:// doi. or g/ 10. 1016/j. ijfor ecas t. Hvattum LM, Arntzen H (2010) Using ELO ratings for match result 2016. 11. 006 prediction in association football. Int J Forecast 26(3):460–470. Brechot M, Flepp R (2020) Dealing with randomness in match out-https:// doi. org/ 10. 1016/j. ijfor ecast. 2009. 10. 002 comes: how to rethink performance evaluation in European club Jungherr A, Jürgens P, Schoen H (2011) Why the pirate party won football using expected goals. J Sports Econ 21(4):335–362. the German election of 2009 or the trouble with predictions: https:// doi. org/ 10. 1177/ 15270 02519 897962 a response to Tumasjan, A., Sprenger, T. O., Sander, P. G., & Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https:// Welpe, I. M. “Predicting Elections With Twitter: What 140 Char- doi. org/ 10. 1023/A: 10109 33404 324 acters Reveal About Political Sentiment”. Soc Sci Comput Rev Brown A, Rambaccussing D, Reade JJ, Rossi G (2017) Forecasting 30(2):229–234. https:// doi. org/ 10. 1177/ 08944 39311 404119 with social media: evidence from tweets on soccer matches. Karlis D, Ntzoufras I (2003) Analysis of sports data by using bivariate Econ Inq 20(3):1363. https:// doi. org/ 10. 1111/ ecin. 12506 Poisson models. J R Stat Soc Ser D (The Stat) 52(3):381–393. Browne (2000) Cross-validation methods. J Math Psychol 44(1):108–https:// doi. org/ 10. 1111/ 1467- 9884. 00366 132. https:// doi. org/ 10. 1006/ jmps. 1999. 1279 Killick EA, Griffiths MD (2019) In-play sports betting: a scoping study. Cambria E, Poria S, Bajpai R, Schuller B (2016) SenticNet 4: a Int J Ment Heal Addict 17(6):1456–1495. https://doi. or g/10. 1007/ semantic resource for sentiment analysis based on conceptual s11469- 018- 9896-6 primitives. In: Proceedings of COLING 2016, the 26th interna- Kolbinger O, Knopp M (2020) Video kills the sentiment-exploring tional conference on computational linguistics: technical papers, fans’ reception of the video assistant referee in the English pre- pp 2666–2677 mier league using Twitter data. PLoS ONE 15(12):e0242728. Constantinou AC, Fenton NE (2012) Solving the problem of inad-https:// doi. org/ 10. 1371/ journ al. pone. 02427 28 equate scoring rules for assessing probabilistic football forecast Koopman SJ, Lit R (2015) A dynamic bivariate Poisson model for models. J Quant Anal Sports. https://do i.or g/1 0.1 515/1 559-0 410. analysing and forecasting match results in the English premier 1418 league. J R Stat Soc A Stat Soc 178(1):167–186. https:// doi. org/ Dick U, Brefeld U (2019) Learning to rate player positioning in soc-10. 1111/ rssa. 12042 cer. Big Data 7(1):71–82. https:// doi. org/ 10. 1089/ big. 2018. 0054 Koopman SJ, Lit R (2019) Forecasting football match results in Dixon MJ, Robinson ME (1998) A birth process model for association national league competitions using score-driven time series mod- football matches. Statistician 47(3):523–538 els. Int J Forecast 35(2):797–809. https:// doi. org/ 10. 1016/j. ijfor Easton S, Uylangco K (2010) Forecasting outcomes in tennis matches ecast. 2018. 10. 011 using within-match betting markets. Int J Forecast 26(3):564–575. Kovalchik S, Reid M (2019) A calibration method with dynamic https:// doi. org/ 10. 1016/j. ijfor ecast. 2009. 10. 004 updates for within-match forecasting of wins in tennis. Int J Fore- Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. CRC cast 35(2):756–766. https:// doi. org/ 10. 1016/j. ijfor ecast. 2017. 11. Press, Boca Raton 008 Fan M, Billings A, Zhu X, Yu P (2020) Twitter-based BIRGing: big Lames M (2018) Chance involvement in goal scoring in football—an data analysis of english national team fans during the 2018 FIFA empirical approach. Ger J Exerc Sport Res 48(2):278–286. https:// world cup. Commun Sport 8(3):317–345. https://doi. or g/10. 1177/ doi. org/ 10. 1007/ s12662- 018- 0518-z 21674 79519 834348 Lasek J, Szlávik Z, Bhulai S (2013) The predictive power of rank- Forrest D, Goddard J, Simmons R (2005) Odds-setters as forecasters: ing systems in association football. Int J Appl Pattern Recognit the case of English football. Int J Forecast 21(3):551–564. https:// 1(1):27. https:// doi. org/ 10. 1504/ IJAPR. 2013. 052339 doi. org/ 10. 1016/j. ijfor ecast. 2005. 03. 003 Lessmann S, Sung M-C, Johnson JE (2010) Alternative methods of Gayo-Avello D (2013) A meta-analysis of state-of-the-art electoral pre- predicting competitive events: an application in horserace betting diction from Twitter data. Soc Sci Comput Rev 31(6):649–679. markets. Int J Forecast 26(3):518–536. https:// doi. org/ 10. 1016/j. https:// doi. org/ 10. 1177/ 08944 39313 493979ijfor ecast. 2009. 12. 013 Goddard J, Asimakopoulos I (2004) Forecasting football results and the Lopez-Gonzalez H, Griffiths MD (2016) Is European online gambling efficiency of fixed-odds betting. J Forecast 23(1):51–66. https:// regulation adequately addressing in-play betting advertising? doi. org/ 10. 1002/ for. 877 Gaming Law Rev Econ 20(6):495–503. https:// doi. org/ 10. 1089/ Godin F, Zuallaert J, Vandersmissen B, de Neve W, van de Walle R glre. 2016. 2064 (2014) Beating the bookmakers: leveraging statistics and Twitter Maher MJ (1982) Modelling association football scores. Stat Neerl microposts for predicting soccer results. In: KDD Workshop on 36(3):109–118. https:// doi. org/ 10. 1111/j. 1467- 9574.19 82. tb007 large-scale sports analytics 82.x Grunz A, Memmert D, Perl J (2012) Tactical pattern recognition in soc- Mäntylä MV, Graziotin D, Kuutila M (2018) The evolution of senti- cer games by means of special self-organizing maps. Hum Mov ment analysis—a review of research topics, venues, and top cited Sci 31(2):334–343. https://d oi.o rg/1 0.1 016/j.h umov.2 011.0 2.0 08 papers. Comput Sci Rev 27:16–32. https:// doi. org/ 10. 1016/j. cos- Heuer A, Rubner O (2012) How does the past of a soccer match influ-rev. 2017. 10. 002 ence its future? Concepts and statistical analysis. PLoS ONE Memmert D, Raabe D (2018) Data analytics in football. Routledge, 7(11):e47678. https:// doi. org/ 10. 1371/ journ al. pone. 00476 78 Abingdon. https:// doi. org/ 10. 4324/ 97813 51210 164 Ho TK (1995) Random decision forests. In: Proceedings of the third Peeters T (2018) Testing the wisdom of crowds in the field: transfer - international conference on document analysis and recognition, markt valuations and international soccer results. Int J Forecast August 14–16, 1995, Montréal, Canada. IEEE Computer Society 34(1):17–29. https:// doi. org/ 10. 1016/j. ijfor ecast. 2017. 08. 002 1 3 Social Network Analysis and Mining (2022) 12:23 Page 15 of 15 23 Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The develop- Wheatcroft E (2020) A profitable model for predicting the over/under ment and psychometric properties of LIWC2015. University of market in football. Int J Forecast. https:// doi. org/ 10. 1016/j. ijfor Texas at Austin. https:// doi. org/ 10. 15781/ T29G6Zecast. 2019. 11. 001 Piryani R, Madhavi D, Singh VK (2017) Analytical mapping of opin- Wunderlich F, Memmert D (2018) The betting odds rating sys- ion mining and sentiment analysis research during 2000–2015. tem: using soccer forecasts to forecast soccer. PLoS ONE Inf Process Manag 53(1):122–150. https://doi. or g/10. 1016/j. ipm. 13(6):e0198668. https:// doi. org/ 10. 1371/ journ al. pone. 01986 68 2016. 07. 001 Wunderlich F, Memmert D (2020) Innovative approaches in sports Rein R, Memmert D (2016) Big data and tactical analysis in elite soc- science—lexicon-based sentiment analysis as a tool to analyze cer: future challenges and opportunities for sports science. Spring- sports-related Twitter communication. Appl Sci 10(2):431. https:// erplus 5(1):1410. https:// doi. org/ 10. 1186/ s40064- 016- 3108-2doi. org/ 10. 3390/ app10 020431 Rinker TW (2013) qdapDictionaries: dictionaries to accompany the Wunderlich F, Seck A, Memmert D (2021) The influence of random- qdap package. Retrieved from http:// github. com/ trink er/ qdapD ness on goals in football decreases over time. An empirical analy- ictio naries sis of randomness involved in goal scoring in the English Premier Schaumberger G, Groll A (2018) Predicting matches in interna- League. J Sports Sci 39(20):2322–2337. https:// doi. org/ 10. 1080/ tional football tournaments with random forests. Stat Model 02640 414. 2021. 19306 85 18(5–6):460–482 Yu Y, Wang X (2015) World cup 2014 in the Twitter world: a big data Schumaker RP, Jarmoszko AT, Labedz CS (2016) Predicting wins and analysis of sentiments in U.S. sports fans’ tweets. Comput Hum spread in the premier league using a sentiment analysis of twit- Behav 48:392–400. https:// doi. org/ 10. 1016/j. chb. 2015. 01. 075 ter. Decis Support Syst 88:76–84. https:// doi. org/ 10. 1016/j. dss. Zhang X, Fuehres H, Gloor PA (2011) Predicting stock market indica- 2016. 05. 010 tors through Twitter “I hope it is not as bad as I fear.” Proc Soc Spann M, Skiera B (2009) Sports forecasting: a comparison of the Behav Sci 26:55–62. https://doi. or g/10. 1016/j. sbspr o.2011. 10. 562 forecast accuracy of prediction markets, betting odds and tipsters. Zou Q, Song K, Shi J (2020) A Bayesian in-play prediction model for J Forecast 28(1):55–72. https:// doi. org/ 10. 1002/ for. 1091 association football outcomes. Appl Sci 10(8):2904. https:// doi. Štrumbelj E, Šikonja MR (2010) Online bookmakers’ odds as forecasts: org/ 10. 3390/ app10 082904 the case of European soccer leagues. Int J Forecast 26(3):482– 488. https:// doi. org/ 10. 1016/j. ijfor ecast. 2009. 10. 005 Publisher's Note Springer Nature remains neutral with regard to Tumasjan A, Sprenger TO, Sandner PG, Welpe IM (2010) Predicting jurisdictional claims in published maps and institutional affiliations. elections with Twitter: what 140 characters reveal about political sentiment. Icwsm 10(1):178–185 Twitter API (2020) Retrieved from https:// devel oper. twitt er. com/ Wheatcroft E (2019) Evaluating probabilistic forecasts of football matches: the case against the ranked probability score. https:// arxiv. org/ abs/ 1908. 08980 1 3

Journal

Social Network Analysis and MiningSpringer Journals

Published: Dec 1, 2022

Keywords: Big data; Data mining; Social networks; Twitter; Football forecasting; In-play forecasting

References