D’Urso, Pierpaolo; De Giovanni, Livia; Vitale, Vincenzina
doi: 10.1007/s10479-022-04558-xpmid: N/A
A robust fuzzy clustering model for mixed data is proposed. For each variable, or attribute, a proper dissimilarity measure is computed and the clustering procedure combines the dissimilarity matrices with weights objectively computed during the optimization process. The weights reflect the relevance of each attribute type in the clustering results. A simulation study and an empirical application to football players data are presented that show the effectiveness of the proposed clustering algorithm in finding clusters that would be hidden unless a multi-attributes approach were used.
Carpita, Maurizio; Pasca, Paola; Arima, Serena; Ciavolino, Enrico
doi: 10.1007/s10479-023-05185-wpmid: N/A
In sports, studying player performances is a key issue since it provides a guideline for strategic choices and helps teams in the complex procedure of buying and selling of players. In this paper we aim at investigating the ability of various composite indicators to define a measurement structure for the global soccer performance. We rely on data provided by the EA Sports experts, who are the ultimate authority on soccer performance measurement: they periodically produce a set of players’ attributes that make up the broader, theoretical performance dimensions. Considering the potential of clustering techniques to confirm or disconfirm the experts’ assumptions in terms of aggregations between indicators, 29 players’ performance attributes or variables (from the FIFA19 version of the videogame, that is, sofifa) have been considered and processed with three different techniques: the Cluster of variables around latent variables (CLV), the Principal covariates regression (PCovR) and Bayesian model-based clustering (B-MBC). The three procedures yielded clusters that differed from experts’ classification. In order to identify the most appropriate measurement structure, the resulting clusters have been embedded into Structural equation models with partial least squares (PLS-SEMs) with a Higher-Order Component (that is, the overall soccer performance). The statistically derived composite indicators have been compared with those of experts’ classification. Results support the concurrent validity of composite indicators derived through the statistical methods: overall, they show that, in the lack of expert judgement, composite indicators, as well as the resulting PLS-SEM models, are a viable alternative given their greater correlation to players’ economic value and salary.
doi: 10.1007/s10479-021-04439-9pmid: N/A
In this work we analyse the global soccer player transfer market providing a network approach that takes into account both the number of transfers and the related costs for football players in the world market. We propose a community detection methodology that considers different features of the network. We cluster countries according to similarities in their roles in the transfer market and to the presence of indirect connections due to common neighbours. Numerical results show a strict relation between the composition of clusters and the economic value of the football leagues of different countries. Indeed, we observe that, on average, leagues with a similar economic value belongs to the same cluster. The analysis has been also extended providing a comparison based on the world trade network. We observe that prominent European players in the economic trades are also relevant in the soccer transfer network.
Ficcadenti, Valerio; Cerqueti, Roy; Varde’i, Ciro Hosseini
doi: 10.1007/s10479-022-04609-3pmid: N/A
In this paper, we present a data-analysis rank-size approach to assess the features of soccer competitions and competitors. We investigate the championships rankings and the teams’ final scores in the most relevant Italian league, the “Serie A”, between 1930 and 2020. We use the final rankings and the teams’ scores to explore the presence of rank-size regimes in the various yearly championships. Besides, we analyse the teams one by one, ranking their performance over the years and using the rank-size law’s parameters to compare their performances across the tournaments. We chose to do so via the Discrete Generalised Beta Distribution, a three-parameter rank-size function. We offer a cluster analysis of the rank-size law parameters based on a k-means algorithm to provide additional insights and capture similarities and deviations among championships and teams. Concluding, we propose a measure of competitiveness within championships and per team. The best fit results are statistically outstanding, and the cluster analysis presents two main clusters capturing teams’ performances and years in which they have competed in the “Serie A”. The competitiveness analysis shows that the teams at the bottom of the championships ranking have obtained decreasing scores in recent years.
doi: 10.1007/s10479-021-04224-8pmid: N/A
Several studies deal with the development of advanced statistical methods for predicting football match results. These predictions are then used to construct profitable betting strategies. Even if the most popular bets are based on whether one expects that a team will win, lose, or draw in the next game, nowadays a variety of other outcomes are available for betting purposes. While some of these events are binary in nature (e.g. the red cards occurrence), others can be seen as binary outcomes. In this paper we propose a simple framework, based on score-driven models, able to obtain accurate forecasts for binary outcomes in soccer matches. To show the usefulness of the proposed statistical approach, two experiments to the English Premier League and to the Italian Serie A are provided for predicting red cards occurrence, Under/Over and Goal/No Goal events.
doi: 10.1007/s10479-022-04722-3pmid: N/A
Implied winning probabilities are usually derived from betting odds by the normalization: inverse odds are divided by the booksum (sum of the inverse odds) to ensure that the implied probabilities add up to 1. Another, less frequently used method, is Shin’s model, which endogenously accounts for a possible favourite-longshot bias. In this paper, we compare these two methods in two betting markets on soccer games. The method we use for the comparison is new and has two advantages. Unlike the binning method that is used predominantly, it is based on match-level data. The method allows for residual favourite-longshot bias, and also allows for incorporation of match specific variables that may determine the relation between the actual probability of the outcome and the implied winning probabilities. The method can be applied to any probabilistic classification problem. In our application, we find that Shin’s model yields unbiased estimates for the actual probability of outcome in the English Premier League. In the Spanish La Liga, implied probabilities derived from the betting odds using either the method of normalization or Shin’s model suffer from favourite bias: favourites tend to win their matches more frequently than the implied probabilities suggest.
Badiella, Llorenç; Puig, Pedro; Lago-Peñas, Carlos; Casals, Martí
doi: 10.1007/s10479-022-04733-0pmid: N/A
The aim of the current study is to analyze the effects of red and yellow cards on the scoring rate in elite soccer. The sample was composed of 1826 matches in the top five European leagues. All events were structured in 5-min intervals and were analyzed by means of a Generalized Linear Mixed Model with Poisson distribution, considering the presence of correlated data, where the dependent variable is represented by scoring rate. Team strength and home advantage were considered implicitly by means of a transformation of the betting odds for each game. The model also took into account the goal difference and time evolution. Overall, we found that after a sending off, each team’s scoring rate changes significantly, damaging the penalised team and favouring its opponent. When the player who is sent off belongs to the Away team, the impact of a red card is more or less maintained over time intervals. The red card effect, on the other hand, tends to fade over time when the affected team is stronger. The relative difference in scoring rates is also affected by the goal difference and the difference in booked players, being slightly lower for the team going ahead if it has more booked players. Our approach allows estimating the expected cumulative soring rate through time for various red card scenarios. Particularly if a red card is given with 30 min of remaining time, the expected impact is 0.39 goals if the guilty player is on the visiting team and 0.50 if he plays for the home team. Coaches and analysts could use this information to establish objectives for players and teams in training and matches and to be prepared for these very different scenarios of numerical superiority or inferiority.
Ötting, Marius; Karlis, Dimitris
doi: 10.1007/s10479-022-04660-0pmid: N/A
Driven by recent advances in technology, tracking devices allow to collect high-frequency data on the position of players in (association) football matches and in many other sports. Although such data sets are available to every professional team, most teams still rely on time-consuming video analysis when analysing future opponents, for example with regard to how goals were scored or a team’s general style of play. In this contribution, we provide a data-driven approach for automated classification of tactics in football. For that purpose, we consider hidden Markov models (HMMs) to analyse high-frequency tracking data, where the underlying states serve for a team’s tactic. In particular, as space control in football has been considered a major driver of success, we focus on the effective playing space, which is the convex hull created by the players excluding the goalkeeper. This quantity relates to both playing style and team behavior. Using copula-based HMMs, we model jointly the effective playing space of both teams to account for the competitive nature of the game. Our model thus provides an estimate of a team’s playing style at each time point, which can be beneficial for team managers but also of huge interest to football fans.
Darko, Adjei Peter; Liang, Decui; Zhang, Yinrunjie; Kobina, Agbodah
doi: 10.1007/s10479-022-04992-xpmid: 36187176
The emergence of sports tourism has compelled sports managers to rethink the management and improvement of sports facilities. Through service quality analysis, sports managers can identify the strengths and weaknesses of their activities for possible advancement. Hence, this study aims to develop a decision support model based on integrating online reviews and data envelopment analysis to measure the degree of tourist satisfaction and provide benchmarking goals for service improvement. The proposed model employs text mining techniques to discover service quality attributes from text reviews. According to the discovered service quality attributes, we conduct sentiment analysis to reveal the sentiment polarities of the text reviews. Then, we refine the polarities and ratings of online reviews into linguistic distribution assessments. Furthermore, we develop a linguistic distribution output-oriented non-discretionary bestpoint slack-based measure (BP-SBM) to compute the degree of tourist satisfaction and benchmarking goals. The linguistic distribution output-oriented non-discretionary BP-SBM can handle both positive and negative data values, thus overcoming the flaws of the traditional model. Meanwhile, the proposed decision support model investigates how the service-quality attributes interact to provide improvement pathways for an underperforming stadium based on association rule mining. We test the applicability of the proposed decision support model on some Elite stadia in Europe.
Showing 1 to 10 of 33 Articles