TY - JOUR AU - Kim, Tai-hoon AB - Abstract Social computing is a disruptive technology that is changing how we do business and building the enterprises. Mining consumer insights and consumer segmentation are the new drivers that are likely to be the cornerstone of developing new service innovations and social interactions. This special issue was designed to stimulate research on the data mining tools and methods for social computing. In this editorial, we are outlining the broad research framework on social computing, including the papers in this special issue and areas for future research. 1. INTRODUCTION Social computing involves a broad spectrum of web- and cloud-based services which encourage and exploit the active participation of users and the content they create. Thus, blogs, email, instant messaging, chat rooms, wikis, social bookmarking, groups discussions over the internet and other instances of what is often called social software illustrate some ideas from the paradigm of social computing. The premise of social computing is that it is possible to design digital systems that support useful functionality by making socially produced information available to their users. There is a considerable body of scholars who believe that this promise can be realized through the analytics of Big Social Data or Social Big Data as the popularity of social media and computer-mediated communication has resulted in high-volume and highly semantic data about digital social interactions [1]. However, such concepts can extract quantitative indicators on the social behaviors of the data analyzed. However, there is no solid and commonly agreed reference studies pointing to how effective these analytics for understanding about the very substance of the social behaviors over these media. The lack of consensus in what Big Social Data can do [2] is due to the need to involve humanities fields such qualitative schools in psychology, sociology, anthropology and ethnography just to mention few. The examples of relevant methods are hermeneutics, participant observation, thick description, semiotics and close reading. This approach that involves humanities fields is now getting popularity and is generally known as Thick Data Analytics [3, 4]. The emergence of social computing with Big and Thick analytics will create opportunities to study social and cultural processes and dynamics in new ways. For the first time, we can follow imaginations, opinions, ideas and feelings of hundreds of millions of people as well as go deeper and understand the pain points and issues involved with certain small groups. Figure 1 illustrates the main analytics tools used for social computing. FIGURE 1. Open in new tabDownload slide Notable analytics tools for social computing. FIGURE 1. Open in new tabDownload slide Notable analytics tools for social computing. This special issue aims to provide comprehensive and high-quality strategies, methods, architecture, algorithms and features of the advanced data mining tools, and methods for social computing. 2. IN THIS ISSUE We have accepted only 18 papers out of 31 submissions. It was a great opportunity to have this overwhelming interest in the research topic that we have raised in this special issue. Before describing the summaries of the accepted papers, we would like to thank the editor in chief of The Computer Journal and all the staff who helped us through the term of this special issue and also we would like to thank our reviewers who dedicated their valuable time in reviewing and suggesting venues of improvements and revisions to many of the submitted papers. The first paper by Rui Gao et al. studies the couplet generation model, which automatically generates the second line of a couplet by giving the first line. Unlike other sequence generation problems, couplet generation not only considers the sequential context within a sentence line but also emphasizes the relationships between the corresponding words of first and second lines. Therefore, a trapezoidal context character embedding the vector model has been developed firstly, which considers the ‘sequence context’ and the ‘corresponding word context’ simultaneously. Afterward, we chose the typical encoder–decoder framework to solve the sequence to sequence problems, of which the encoder and decoder are used by bi-directional GRU (Gated Recurrent Unit) and one directional GRU, respectively. To further increase the semantic consistency of the first and second lines of couplets, the pre-trained sentence vector of the first line is added to the attention mechanism in the model. Weidong Huang et al. analyze the tourist destination image model of cognitive and emotional experience and establish a research framework from the perspective of cognition, based on the online travel community user-generated content (UGC) data analysis of tourist destination image cognitive stage, the cognitive subject and emotional experience. This seeks to help make full use of the UGC derived from destination management, from the perspective of cognitive maintenance destination image and improved tourist experiences. In this paper, Esra Sarac Essiz et al., an artificial bee colony (ABC)-based feature selection system, was proposed for the cyberbully detection problem. IG (Information Gain), CHI2 (Chi Square) and ReliefF (Relief Features Selection) methods are used as traditional feature selection methods, while classifications are obtained by using the Weka data mining tool for the experiments. The proposed ABC-based feature selection method is coded in the Java NetBeans platform. The main aim of this study is to examine the effects of the ABC-based feature selection algorithm on classification performance for cyberbullying, which has become a significant worldwide social issue in recent years. With this purpose, the classification performance of the proposed ABC-based feature selection method is compared with three different traditional methods such as information gain, ReliefF and chi-square. The fourth paper by Ai Wang et al. studies the scale transformation mechanism problem of management objects. The observation scale hierarchy (management scale) with clear management objectives could automatically be recognized through changing the observation scales, to improve the practical management efficiency. Firstly, an intelligent computing framework based on the scale transformation is established, which reduces the over dependency of human involvement in traditional scale transformation methods. Then, the scale characteristic reasoning inference is put forward to improve the knowledge acquisition mechanism of scale transformation. Finally, a knowledge acquisition algorithm based on the variable-scale clustering is proposed. The paper entitled ‘Effective Link Prediction with Topological and Temporal Information using Wavelet Neural Network Embedding’ by Xian Mo et al. proposed an effective framework, TT-GWNN (Topological and Temporal Graph Wavelet Neural Network), for link prediction in temporal networks, which captures both topological and temporal evolution features of the networks. Essentially, we proposed the SWRW (Second-order Weighted Random Walk) algorithm to extract both topological and temporal features in each snapshot to model network evolution. The algorithm combined previous snapshots of first-order weight and second-order weight into a weighted graph and used a damping factor to assign greater weight to more recent snapshots. In this way, SWRW can better preserve both the topological structure and temporal evolution features of the networks. Experiments demonstrated the effectiveness of our TT-GWNN model, and it achieved significant gains in performance than the baseline models. The authors’ Gang Han et al. used the Google GDELT database as a data source and obtained full-text data of English news in 25 countries along ‘the Belt and Road’. The paper also introduced the topic model, combined with the unsupervised method (latent Dirichlet allocation, LDA) and the supervision method (labeled LDA) to mine the topics contained in the news data. It constructed the transportation development model and analyzed the development trend of transportation in various countries. The study found that the development trend of transportation in the countries along the line is unbalanced, which can be divided into four types: rapid development type, stable development type, slow development type and lagging development type. The method of this paper can effectively extract temporal and spatial variation of news events, discover potential risks in various countries, support real-time and dynamic monitoring of the social development situation of the countries along the border and provide auxiliary decision support for the implementation of the ‘the Belt and Road’ initiative, which has important application value. The seventh paper by Niyati Baliyan et al. focuses on the generation of new blog entries containing the summary of multiple input blogs that are related to each other. Having information about a specific topic in one place eases the work of reading as one needs not to juggle and shuffle around multiple blogs and hence reduces the manual efforts to some extent. Multiple blogs on the same topic are fed to the system in PDF format, after which the content is extracted into Python strings to process it further. Followed by that is tokenization, where tokens are formed by checking for the delimiter space and extracting individual’s words out. These tokens are then filtered by removing the unwanted stop words of low relevance to us; we use Natural Language Toolkit for achieving the same, and in turn, stemming is performed on the filtered tokens, which groups words of similar meaning into one cluster. In this study, the researchers Ahlem Drif et al. focuses a deep learning-based model to predict the dissemination of information about Algeria’s Hirak on the social media platform, Twitter, and to analyze the emotions conveyed by Twitter users in posts about the Hirak. User characteristics were found to have the greatest influence on information diffusion on online social networks (OSNs). Sentiment analysis was also performed on the Twitter dataset, using machine learning to develop an in-depth understanding of the emotions motivating Algerians to participate in the Hirak. The type of emotion associated with posts was found to affect opinion diffusion on OSNs. Before the Hirak, Algerians were scared to express their political views. However, a fortnight after it began, they were overjoyed that they had managed to break their longstanding silence on political matters. The Algerian people were strengthened by their hope and this fortified them even in their darkest hour; guiding them to continue their protest in the belief that their current circumstances would improve. The feelings discovered by this study’s analysis were found to be an excellent representation of Algerian public opinion toward the political context of the country. The mobilization of these citizens through social networks reinforced their conviction that positive change can be achieved through peaceful activism. Thus, emotions were shown to be a significant motivator of civic engagement. Abir Troudi et al. specified the different challenges and problems faced by the existing works related to recommendation systems. Events are usually characterized by several dimensions. The use of only one dimension can create an ambiguity that influences negatively the recommendation result and limits its performance and accuracy. To overcome these problems, we proposed an approach for Multi-Dimensional Event Recommendation, which intends to recommend real-world events based on various dimensions: topic based on users’/social interests, location, engagement, freshness and popularity. It is composed of five main phases to achieve recommending events. The first phase serves to elaborate on the processes of construction of a global user profile and event detection and analysis. The second phase is related to the dimension extraction process that describes both users’ profiles and real-world events. The third phase represents the similarity calculation process between three dimensions: engagement, location and content. In the fourth phase, we applied a score correlation process in which each dimension is assigned a weight to vary its importance and add a contribution to the most dominant dimension that has a great influence. Moreover, before offering events to users, the proposed approach uses the ranking process in the fifth phase. The latter serves to rank relevant events based on their freshness and popularity to endow more significance to the new ones, which are widely discussed in social media sources. In this paper, the authors’ Ghazi Abdalla et al. applied three deep learning techniques, which are CNN, Bi-LSTM and CNN-Bi-LSTM in the analysis process of customer reviews of fast-food restaurants. The experimented datasets consist of tweets in English that are collected automatically by using Tweepy from the Twitter API. In the preprocessing step, tweets are cleaned, which supports the proposed model to increase the performance and decrease the training time. The pre-trained word2vec model was used in the word embedding process to convert the text into a machine-readable format and then feed into the proposed models to train and test. This paper by Kumaran et al. implemented a monitor placement approach in this proposed work for identifying misinformation and determining the provenance of misinformation. The misinformation handled in the paper includes sarcasm, false information and rumors. The proposed work addresses the top suspected sources of misinformation to optimally place monitors for detecting misinformation tweets using a page rank approach. The other state-of-the-art methods use random selection algorithms for node selection to place monitors for misinformation’s detection. Unlike other approaches, the proposed approach selects the influential nodes as the control nodes and also selects the top suspected sources of misinformation. From those nodes, the monitoring process takes place to watch the paths between them for misinformation propagation by reducing the unnecessary paths to monitor. In the article entitled ‘Proposed ABC Algorithm as Feature Selector to Predict the Leadership Perception of Site Managers’, Mumine Keles et al. proposed a method for selecting relevant features with maintaining acceptable accuracy. The method implemented the ABC algorithm as a feature selection to predict the leadership perception of the construction employees. Three classifiers were used by the ABC method, including Random Forest, Sequential Minimal Optimization and K-Nearest Neighborhood with 84.1584% as highest accuracy. The results show that a nature inspiration-based optimization algorithm such as ABC algorithm as feature selector is satisfactory in prediction of the Construction Employee’s Leadership Perception. The article entitled ‘Stable Communities Detection Method for Temporal Multiplex Graphs: Heterogeneous Social Network Case Study’, Wala Rebhi et al. proposed a two-step method to detect stable communities in temporal multiplex graphs of heterogeneous social networks. The first step aims to find the best static graph partition at each instant by applying a new hybrid community detection algorithm, which considers both relations heterogeneities and nodes similarities. Then, the second step considers the temporal dimension in order to find final stable communities. The experiments on synthetic graphs and a real social network show that the proposed method is capable to extract high-quality communities. In the article entitled ‘Detecting Spam Product Reviews in Roman Urdu Script’, Naveed Hussain et al. proposed a method for spam review detection for Roman Urdu reviews using classification models that use linguistic features and behavioral features. The performance of each classifier is evaluated in a number of perspectives: (i) linguistic features are used to calculate accuracy (F1 score) of each classifier; (ii) behavioral features combined with distributional and non-distributional aspects are used to evaluate accuracy (F1 score) of each classifier and (iii) the combination of both linguistic and behavioral features (distributional and non-distributional aspects) is used to evaluate the accuracy of each classifier. The outcome of this research can be used to increase customers’ confidence in the South Asian region. In the article entitled ‘Positioning and Categorizing Mass Media Using Reaction Emojis on Facebook’, Ming-Hung Wang investigated a method to leverage the reaction emojis delivered from users to media fan pages on Facebook to study how users react to media organizations and the implications of selective exposure. Using a 1-year-long observation of user activities on mass media pages, the investigation involves performing a series of quantitative approaches to locate media agencies, measure the distances between them and cluster organizations into groups. A total of 30 fan pages of mass media organizations in Taiwan are investigated. The outcomes suggest that the report genres and topics are key factors to categorize media groups through reaction emojis from the online audience. In the article entitled ‘A Generic Analogy Centered Software Cost Estimation based on Differential Evolution Exploration Process’, Wani Zahid et al. proposed an analogy centered model based on differential evolution exploration process to estimate the development effort and calendar time required to develop any software project. The proposed model has been assessed on 676 projects from 05 different datasets, and the results achieved are significantly better when compared with other bench mark analogy-based estimation studies with less computational cost. In the article entitled ‘Deep Learning-based Sentiment Analysis of Facebook Data: The Case of Turkish Users’, Önder Çoban et al. proposed a sentiment analysis method that work on Facebook data collected from Turkish user accounts. The method has differences from existing studies utilizing both machine learning and deep learning techniques. The method provides not only measures on sentiment but it also reports some statistical indicators of user activities across various user attributes (e.g. gender and age). The experimental based on this method indicates that recurrent neural networks achieve the best accuracy (i.e. 0.916) with word embeddings for Facebook data in the context of the Turkish language. In the article entitled ‘A Random Forest Classification Algorithm based Personal Thermal Sensation Model for Personalized Conditioning System in Office Buildings’, Quing Li et al. proposed a personal thermal sensation model is used as the main component for personalized conditioning system, which is an effective method to fulfill thermal comfort requirements of the occupants, considering the energy consumption. As a result, the method has achieved 70.2% accuracy, comparing with the 57.4% accuracy of support vector machine and 67.7% accuracy of neutral network in an ASHRAE RP884 database. Therefore, the newly developed model can be used in personalized thermal adjustment systems with intelligent control functions. DATA AVAILABILITY No new data were generated or analysed in support of this editorial. REFERENCES 1. Olshannikova , E. , Olsson , T., Huhtamäki , J. and Kärkkäinen , H. ( 2017 ) Conceptualizing big social data . J. Big Data , 4 , 3 . Google Scholar Crossref Search ADS WorldCat 2. Manovich , L. ( 2011 ) Trending: The promises and the challenges of big social data . Debates in the Digital Humanities , 2 , 460 – 475 . Google Scholar OpenURL Placeholder Text WorldCat 3. Fiaidhi , J. ( 2020 ) Envisioning insight-driven learning based on thick data analytics with focus on healthcare . IEEE Access , 8 , 114998 – 115004 . Google Scholar Crossref Search ADS WorldCat 4. Fiaidhi , J. , Mohammed , S. and Fiaidhi , J. ( 2019 ) Thick data: A new qualitative analytics for identifying customer insights . IT Professional , 21 , 4 – 13 . Google Scholar Crossref Search ADS WorldCat © The British Computer Society 2021. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - Advanced Data Mining Tools and Methods for Social Computing JF - The Computer Journal DO - 10.1093/comjnl/bxab032 DA - 2021-03-19 UR - https://www.deepdyve.com/lp/oxford-university-press/advanced-data-mining-tools-and-methods-for-social-computing-o9ejKPsHJa SP - 281 EP - 285 VL - 64 IS - 3 DP - DeepDyve ER -