TY - JOUR AU - Pho, Kim-Hung AB - Abstract Most of the theories have considered big data as an interesting subject in the information technology domain. Big data is a term for describing huge databases that traditional methods in data processing suffer from analyzing them. Recognizing and clustering emerging topics in this area will help researchers whose aim is to work on this interesting subject. Text mining and social network analysis algorithms are utilized for identifying the emerging trends for big data domain. In this study, at first, we gathered the whole papers that are relevant to big data domain and then the word co-occurrence network was created based on the extracted keywords. Then the best clusters were identified and the relationship between keywords was recognized by the association rules technique. In conclusion, some suggestions were mentioned for future studies. 1 Introduction After the Internet became widespread in late 1990 (Google was born in 1998) by home users, increasing usage of social networks and Internet of things technologies were resulted in data enhancement. Saving and analyzing of such huge amount of data is one of the controversial challenges between managers of the business organizations. Such huge amount of data will result in definition of big data term that currently has been progressed a lot and has been considered by various scientific and industrial comminutes. Big data refers to ‘a huge collection of data gathered through customer behavior, social networks’ posts, tagging and output of sensors’ (Johnson, 2012). Accordingly, the discovery of the knowledge from such datasets does not seem easy. On the other hand, by aiming for machine learning, data mining, or sentiment analysis techniques, the extracted knowledge from big resources of data has high feasibility to achieve. In the academia, researchers have been used dramatically of big data theory and its application for solving a large number of real-world problems in the recent years. The focus of journals, such as Big Data Research, Journal of Big Data, and International Journal of Big Data Intelligence, as well as annual conferences, such as big data congress, is on this emerging area and the interested researchers have published their achievement into these venues. See Park and Leydesdorff (2013) for the detailed history of big data in academic literature. The present study is dedicated to identifying the most up-to-date and attractive topics of the big data field using the analysis of the keywords for the scientific articles. We utilized text clustering, co-occurrence keyword network, and association rules algorithms for identifying the emerging trends of big data domain. We seek to answer the following questions: Q1: Which topics are in the center of the clusters (the most important part) in big data field? Which topics are allocated to top clusters? Q2: Which topics are related to the big data field through association rules? Q3: How is the co-occurrence keyword network of big data domain? 2 Literature Review In this section, the previous literature in the big data and text mining techniques for identifying the emerging topics are reviewed. Big data background goes back to the year 1974 when Mr. Nour referred it in his book (Naur, 1974). However, Rousseau named 1990 as the year of big data’s birth. As it was mentioned in the introduction section, big data is related to analyze huge databases which their analysis is often a complex and difficult procedure task. Each day, one quintillion bite data (equivalent to the digit 1 with eighteen 0) is generated and the extraction of information from such huge data is a challenging task to do (Stimmel, 2014). Social medias such as Facebook or Twitter as well as the received data from meteorological stations are named as the samples for generating the big data. In another way, analysis and extraction of such data bring a valuable knowledge for high-level managers which can bring a great contribution for them in the procedure of decision-making for their organization. In addition to the process of decision-making process, big data can help the individuals to other applications in the industries including but not limited to supply planning, increasing energy efficiency, and product quality. Most of the studies carried out by the academic researchers into the field of big data were referred to the domain of computer science, engineering, and telecommunications, as well as business economic (Huang et al., 2015; Singh et al., 2015). Until now, more than 1,000 papers have been published on big data topic based on academic view, which has been raised in recent 5 years. Recognition of big data trends and its key subjects help to the researchers especially ones who aim to reach to the general view in scientific area and also will help to managers and activators of industries that tend to inform about up-to-date subjects of one scientific and industrial area. There have also been many attempts to cover the usages of analytics and design patterns in facilitating the utilization of big data domain and decision-making efficiency in the literature ( Raeesi Vanani, 2017; Sohrabi et al., 2018a,b ). The use of various types of databases has encouragement for big data streams in the enterprise information systems such as enterprise information systems that can also help in developing better ideas and innovations in the future (Sohrabi et al., 2011; Sohrabi and Raeesi Vanani, 2011; Sohrabi et al., 2012, 2017). In comparison with statistical methods (Shao and Mahmoudi, 2019), social network analysis and text clustering are the two important text mining techniques, which have a vital role in recognizing knowledge and hidden patterns for extracting the emerging trends for a specified area (Khan and Wood, 2015; Jalali et al., 2017; Raeesi Vanani and Jalali, 2017; Raeesi Vanani and Jalali, 2018; Jalali et al., 2018; ). In the recent few years, different studies have been conducted in recognizing of various scientific topics (Moral-Muñoz et al., 2014; Banshal et al., 2015; Fang, 2015; Murgado-Armenteros et al., 2015; JafarJalali, 2016; Jalali and Park, 2018; Moro et al., 2019) that big data from this perspective was not in vain. Xian and Madhavan (2014) have investigated the papers for big data domain on engineering aspects between the years 2009 and 2011 considering the scientific cooperation networks for researchers using bibliometric techniques. Surjandari et al. have checked out Indonesian researchers’ papers based on keyword co-analysis for the period of 5 years (2010–2014). The obtained results investigated fifteen main topics. Halevi and Moed (2012) have also reviewed the specific evaluation of big data according to ‘big data’ keywords (Isasi et al., 2015) with the integration of bibliometric methods and systematic analysis, found key trends of big data in the supply chain. Studies have shown that previous research has not looked at other big data areas and related technologies because the big data term itself has its own themes, or other emerged areas rising from its nature. The present research attempts to answer the questions raised in the introduction section by applying the keywords related to the big data field on the major information systems journals. 3 Data Gathering On 3 May 2017, a search on the database of Web of Science website based on the following query has been applied. Initially, it is required to explain the process of selecting keywords. Because the results to check for finding a source to cover all the keywords in the big data domain were failed, all the relevant papers have been extracted using ‘big data’ as a keyword in ‘Topic’ part of the web of science search database. Next, the frequency of the keywords was checked and then the co-occurrence network was generated. The results for these two strategies were considered as the applied terms on the Web of Science database. In addition, the top information systems journals (high impact factor journals) have been chosen as the baseline for obtaining the final queries. The following query is the result of applying significant keywords for the term big data in the major forty information systems journals. This query which has been searched over the Web of Science database is as follows: (‘big data’ OR ‘cloud computing’ OR ‘computational science’ OR ‘cyber infrastructure’ OR ‘data integration’ OR ‘data mining’ OR ‘data warehouse’ OR ‘hadoop’ OR ‘mapreduce’ OR ‘nosql’ OR ‘Internet of things’ OR ‘deduce’ OR ‘smart grid’ OR ‘data quality’ OR ‘hadoop’ OR ‘libpcap’) AND (‘MIS Quarterly’ OR ‘Information and Organization’ OR ‘Decision Support Systems’ OR ‘Journal of Management Information Systems’ OR ‘Knowledge-Based Systems’ OR ‘Journal of Strategic Information Systems’ OR ‘Information and Management’ OR ‘Industrial Management and Data Systems’ OR’ACM Transactions on Management Information Systems’ OR ‘International Journal of Accounting Information Systems’ OR ‘Journal of Information Systems’ OR ‘Service Oriented Computing and Applications’ OR ‘Data Base for Advances in Information Systems’ OR ‘International Journal of Data Mining, Modeling and Management’ OR ‘International Journal of Grid and Utility Computing’ OR ‘Journal of Decision Systems’ OR ‘International Journal of Enterprise Information Systems’ OR ‘Journal of Research and Practice in Information Technology’ OR ‘International Journal of Information Systems and Supply Chain Management’ OR ‘International Journal of Business Information Systems’ OR ‘International Journal of Accounting and Information Management’ OR ‘Information Management and Computer Security’ OR ‘European Journal of Information Systems’ OR ‘Information Systems Journal’ OR ‘Information Systems Research’ OR ‘Information Systems’ OR ‘Expert Systems with Applications’ OR ‘Information and Management’ OR ‘Journal of Information Technology’ OR ‘International Journal of Systems Science’ OR ‘Journal of the American Society for Information Science and Technology’ OR ‘Knowledge and Information Systems’ OR ‘Journal of the Association of Information Systems’ OR ‘Business and Information Systems Engineering’ OR ‘International Journal of Information Technology and Decision Making’ OR ‘Information Processing and Management’ OR ‘Industrial Management and Data Systems’ OR ‘Information Systems Frontiers’ OR ‘Mobile Information Systems’ OR ‘Journal of Intelligent Information Systems’ OR ‘Science China Information Sciences’) Eventually, 2,367 papers were obtained by the mentioned query. Because the keywords are the most important part of each scientific paper giving us an overall overview of the paper, this study has focused on analyzing all the keywords utilized in the big data area. The co-occurrence network, clustering, and association rules methods which are considered the most important and the main methods in text mining algorithms have been used for the analysis of the obtained 2,367 big data papers. 4 Keyword Co-occurrence Network Keyword co-occurrence network is one of the initial steps in social networks analyses. These networks illustrate the correlation between edges and nodes (Wellman and Berkowitz, 1988). In the analysis of social networks, nodes and edges are the most important network members (Hanneman and Riddle, 2005). In the keyword co-occurrence network of this article, the nodes are the keywords used by the authors in their articles, and the edges are the conceptual or empirical connections between the keywords. We obtained the following amounts for our network: the number of edges is 5,848 and the number of nodes is 20,145. Furthermore, there are four separated edges that have no connection with the nodes. Other important parameters are network diameter, weighted average of clustering coefficient, and density which have been explained in Table 1. Network diameter is equal to 9.6, the value of density is equal to 0.0011, and weighted average of clustering coefficient of the keyword co-occurrence network is equaled to the value of 8.87. It is worth noting that the keywords such as data mining, data warehousing, and cloud computing that are the general terms related to the infrastructure or functions of big data management and analytics were omitted after the initial pre-processing. Calculating the diameter, density, and weighted average of clustering coefficient presents a comprehensive vision about the keyword co-occurrence network in the big data domain which also helps us in keyword clustering since without creating this type of network, clustering often becomes meaningless. Table 1 Keywords network correlation parameters Parameter name . Explanation . Diameter The longest route between two nodes in one network (Wasserman and Faust, 1994) Density Calculating the ratio of communications to the number of all possible connections in a network (Girvan and Newman, 2002) Weighted average of clustering coefficient An average of degrees wherein the nodes of one network have the tendency to cluster accumulation (Nahapiet and Ghoshal, 1998). Parameter name . Explanation . Diameter The longest route between two nodes in one network (Wasserman and Faust, 1994) Density Calculating the ratio of communications to the number of all possible connections in a network (Girvan and Newman, 2002) Weighted average of clustering coefficient An average of degrees wherein the nodes of one network have the tendency to cluster accumulation (Nahapiet and Ghoshal, 1998). Open in new tab Table 1 Keywords network correlation parameters Parameter name . Explanation . Diameter The longest route between two nodes in one network (Wasserman and Faust, 1994) Density Calculating the ratio of communications to the number of all possible connections in a network (Girvan and Newman, 2002) Weighted average of clustering coefficient An average of degrees wherein the nodes of one network have the tendency to cluster accumulation (Nahapiet and Ghoshal, 1998). Parameter name . Explanation . Diameter The longest route between two nodes in one network (Wasserman and Faust, 1994) Density Calculating the ratio of communications to the number of all possible connections in a network (Girvan and Newman, 2002) Weighted average of clustering coefficient An average of degrees wherein the nodes of one network have the tendency to cluster accumulation (Nahapiet and Ghoshal, 1998). Open in new tab 5 Keyword Clustering After the formation of the modified keyword co-occurrence network, we use k-means clustering technique as a text mining algorithm to cluster the keywords. This algorithm has been widely used in the broad literature as one of the most efficient and fast algorithms of segmentation and clustering. Its accuracy in finding the most relevant keywords has convinced the authors to use it for the text clustering step (Mahmoudi and Abbasalizadeh, 2018a–d; Mahmoudi et al., 2018). By applying k-means algorithm, 715 keywords have been chosen among 5,536 nodes by setting the minimum intensity parameter between nodes with a value of 10. Finally, these keywords have been divided into twenty-two clusters. Also, the keyword density was determined by the colors. The red color has the maximum density and the blue has the lower one. The plurality of the keywords in the center of clustering illustrates their importance. For visualizations of keywords network, VOSviewer is utilized. The software provides a great functionality for creating effective visualization schemas that are easily readable and interpretable based on a long list of inter-related keywords and phrases. As it is represented in Fig. 1, subjects such as text mining algorithms classification, clustering, association affairs, decision tree, feature selection, business intelligence, data streams, and systematic web are more important and the keywords such as design patterns, social networks, and information technology trends which are far from the center of clusters are less important. The importance is measured by the frequency of words occurrences as a phrase or in nearness to each other. The farther the words or phrases from each other in the sentences and the less their frequencies occur, the less the importance of the phrases and words in the scientific field of study will be interpreted. In other words, the more the keywords are placed in the clustering center, the more they indicate the importance of the keyword. In Table 2, the top ten clusters have been introduced and named based on clusters’ content. Fig. 1 Open in new tabDownload slide Keywords clustering Fig. 1 Open in new tabDownload slide Keywords clustering Table 2 Top clusters extracted from k-means clustering Keywords . Number of keywords . Name . Classification, Six Sigma, Prediction, Sensitivity Analysis, Quality Improvement, Data Mining Software, Manufacturing 8 Supervised learning Clustering, Change Mining, Customer Segmentation, k-means, Modeling, Spatial Data Warehouse, Model Driven Architect 7 Un-supervised learning Data Streams, E-commerce, Framework, Fuzzy Systems, Web Usage Mining, Static Streams, Hoeffiding Trees, Evolving Systems 8 Big data analysis tools Business Intelligence, Bi-clustering, Design Science, Feature Construction, Features, Fraud Detection, Heuristics, Knowledge Representation, Knowledge-based Systems, Meta-Learning, Pattern Classification, Problem Solving 12 Business intelligence Air Pollution, Air Quality, Environment Expert Systems, Forecasting, Neural Networks, Spatial Classification 7 Geographic forecast Applied Artificial Intelligence, Autonomic Dbms, Autonomic Systems, DSS, OLTP, Performance Tuning 6 Automatic systems Data Analytics, Evolutionary Fuzzy Systems, Fuzzy Association Rules, Genetic Algorithm, KDD, Pricing, Rule Discovery, Subgroup Discovery 8 Adaptive learning Data Aggregation, Data Cube, Metadata, Online Analytic Process, Query Optimization, Xml, Scientific Databases, Re-write Rules, Spatial Databases 9 Online analysis Sequence Mining, Sequential Patterns, Ensemble Classification, Direct Marketing, Database Marketing, Customer Churn Prediction, Constrained Optimization 7 Market forecast Stream Mining, Social Network, Smart Cities, Semantics, Ipv6, GIS, Data Management 7 Social networks Keywords . Number of keywords . Name . Classification, Six Sigma, Prediction, Sensitivity Analysis, Quality Improvement, Data Mining Software, Manufacturing 8 Supervised learning Clustering, Change Mining, Customer Segmentation, k-means, Modeling, Spatial Data Warehouse, Model Driven Architect 7 Un-supervised learning Data Streams, E-commerce, Framework, Fuzzy Systems, Web Usage Mining, Static Streams, Hoeffiding Trees, Evolving Systems 8 Big data analysis tools Business Intelligence, Bi-clustering, Design Science, Feature Construction, Features, Fraud Detection, Heuristics, Knowledge Representation, Knowledge-based Systems, Meta-Learning, Pattern Classification, Problem Solving 12 Business intelligence Air Pollution, Air Quality, Environment Expert Systems, Forecasting, Neural Networks, Spatial Classification 7 Geographic forecast Applied Artificial Intelligence, Autonomic Dbms, Autonomic Systems, DSS, OLTP, Performance Tuning 6 Automatic systems Data Analytics, Evolutionary Fuzzy Systems, Fuzzy Association Rules, Genetic Algorithm, KDD, Pricing, Rule Discovery, Subgroup Discovery 8 Adaptive learning Data Aggregation, Data Cube, Metadata, Online Analytic Process, Query Optimization, Xml, Scientific Databases, Re-write Rules, Spatial Databases 9 Online analysis Sequence Mining, Sequential Patterns, Ensemble Classification, Direct Marketing, Database Marketing, Customer Churn Prediction, Constrained Optimization 7 Market forecast Stream Mining, Social Network, Smart Cities, Semantics, Ipv6, GIS, Data Management 7 Social networks Open in new tab Table 2 Top clusters extracted from k-means clustering Keywords . Number of keywords . Name . Classification, Six Sigma, Prediction, Sensitivity Analysis, Quality Improvement, Data Mining Software, Manufacturing 8 Supervised learning Clustering, Change Mining, Customer Segmentation, k-means, Modeling, Spatial Data Warehouse, Model Driven Architect 7 Un-supervised learning Data Streams, E-commerce, Framework, Fuzzy Systems, Web Usage Mining, Static Streams, Hoeffiding Trees, Evolving Systems 8 Big data analysis tools Business Intelligence, Bi-clustering, Design Science, Feature Construction, Features, Fraud Detection, Heuristics, Knowledge Representation, Knowledge-based Systems, Meta-Learning, Pattern Classification, Problem Solving 12 Business intelligence Air Pollution, Air Quality, Environment Expert Systems, Forecasting, Neural Networks, Spatial Classification 7 Geographic forecast Applied Artificial Intelligence, Autonomic Dbms, Autonomic Systems, DSS, OLTP, Performance Tuning 6 Automatic systems Data Analytics, Evolutionary Fuzzy Systems, Fuzzy Association Rules, Genetic Algorithm, KDD, Pricing, Rule Discovery, Subgroup Discovery 8 Adaptive learning Data Aggregation, Data Cube, Metadata, Online Analytic Process, Query Optimization, Xml, Scientific Databases, Re-write Rules, Spatial Databases 9 Online analysis Sequence Mining, Sequential Patterns, Ensemble Classification, Direct Marketing, Database Marketing, Customer Churn Prediction, Constrained Optimization 7 Market forecast Stream Mining, Social Network, Smart Cities, Semantics, Ipv6, GIS, Data Management 7 Social networks Keywords . Number of keywords . Name . Classification, Six Sigma, Prediction, Sensitivity Analysis, Quality Improvement, Data Mining Software, Manufacturing 8 Supervised learning Clustering, Change Mining, Customer Segmentation, k-means, Modeling, Spatial Data Warehouse, Model Driven Architect 7 Un-supervised learning Data Streams, E-commerce, Framework, Fuzzy Systems, Web Usage Mining, Static Streams, Hoeffiding Trees, Evolving Systems 8 Big data analysis tools Business Intelligence, Bi-clustering, Design Science, Feature Construction, Features, Fraud Detection, Heuristics, Knowledge Representation, Knowledge-based Systems, Meta-Learning, Pattern Classification, Problem Solving 12 Business intelligence Air Pollution, Air Quality, Environment Expert Systems, Forecasting, Neural Networks, Spatial Classification 7 Geographic forecast Applied Artificial Intelligence, Autonomic Dbms, Autonomic Systems, DSS, OLTP, Performance Tuning 6 Automatic systems Data Analytics, Evolutionary Fuzzy Systems, Fuzzy Association Rules, Genetic Algorithm, KDD, Pricing, Rule Discovery, Subgroup Discovery 8 Adaptive learning Data Aggregation, Data Cube, Metadata, Online Analytic Process, Query Optimization, Xml, Scientific Databases, Re-write Rules, Spatial Databases 9 Online analysis Sequence Mining, Sequential Patterns, Ensemble Classification, Direct Marketing, Database Marketing, Customer Churn Prediction, Constrained Optimization 7 Market forecast Stream Mining, Social Network, Smart Cities, Semantics, Ipv6, GIS, Data Management 7 Social networks Open in new tab As it can be seen in Fig. 1, topics such as algorithms and tools of data mining (classification, clustering, association rules, decision trees, and feature selection), business intelligence, data streams, or semantic web are of higher importance, whereas the keywords that are so far away from the center of the clusters, such as design patterns, social networks, or IT trends, are much less important. The most important areas, such as the interconnection of big data with social networks, data generation infrastructure, various analytics algorithms such as clustering, classification, and other supervised and unsupervised learning mechanisms, as well as effective visualization of results are concluded after interpreting the results. As it can be drawn, the utilization of analytics over the streams of big data is of great significance for the scientific community and the visualization of results as well as providing an effective alternative for managers to decide upon can form the future of big data management and analytics. Table 2 analysis is presented in Section 7. 6 Association Rules Data mining is a popular technique for identifying the hidden patterns among small or large amounts of data (Amiri et al., 2017; Jalali and Park, 2017; Jalali et al., 2019a,b,c,d; Jalali et al., 2020). Association rules as a powerful data mining technique have the ability to calculate and illustrate the correlation between the huge collections of data (Jalali et al., 2017; Hasani et al., 2018). By applying association rules, we can identify relationships between keywords and analyze their behavior. In this article, the association rules method called Apriori has been used in order to find the correlation between keywords. This algorithm was applied to the whole collection of keywords datasets. The propositional association rules are expressed as A ⇒ B. In order to verify the validity of the rules, two parameters called support and confidence are employed for the simulation experiments. The rule of A ⇒ B in the collection of all the keywords in this article has a support equal to S, if S presents the keywords contain A ∪ B. Besides, this rule has a confidence equal to C, if C presents the transactions containing A which is also in B. In other words: Support (A ⇒ B) = P (A ∪ B) Confidence (A ⇒ B) = P (B|A) For example, the following rule states that 79% of keywords related to the term Pattern lead to the term Data, and 6% of the whole words include both terms Pattern and Data. Pattern ⇒ Data [Support = 0.06, Confidence = 0.796] In Fig. 2, the keyword association rules were obtained by Rapidminer software. Fig. 2 Open in new tabDownload slide Keyword association rules of big data management Fig. 2 Open in new tabDownload slide Keyword association rules of big data management In the association rules interpretation of measure, it should be noted that the more the confidence and support values, the more it is likely for the keywords to occur together. Therefore, rules can be extracted based on the occurrence of keywords. The rules with the highest confidence and support values can better describe the interrelationships among the relevant keywords in the realm of big data management and analytics, as illustrated in Fig. 2. It should be noticed that the support and confidence values are provided in the figure for the interconnected rules. 7 Discussion As shown in Table 2 and Fig. 2, the results of the evaluation of the big data domain were presented. The most important issues that have been considered in recent studies in this area are as follows: Advanced Business Analytics: including data analysis using unsupervised algorithms, supervised algorithms, and adaptive algorithms to learn the process of data and regular patterns. Business intelligence: providing analytical and illustrative reports based on the results of advanced analyses. Trend forecasting: one of the most important issues in massive data research is the possibility of predicting a large volume of data, which is consistent with advanced business analysis, and in some ways, it can be considered as a subset of advanced analyses. Social network analyses: these analyses are based on the interaction between Internet social networks and mobile networks. Nowadays the huge amount of big data are produced by these networks, and the analysis of big data is very important in these networks. On the other hand, the most important rules that can be extracted from the keywords are categorized around the following key issues: Data modeling and extracted information from big data. Analyzing the association rules among various scientific fields in big data. Gathered knowledge management in big data. Neural networks and decision trees for forecasting data trends. Clustering and classification of data in order to analyze the significant differences between different data sectors Decision-making based on systematic analysis and advanced business algorithms. Given the results extracted and the analyzes mentioned, it can be concluded that an important part of the research carried out in the field of big data, focused on advanced business and the storage mechanism and retrieving data in the form of big data, regardless of advanced analysis and data mining on the data, will not be of much importance. The greatest advantage of massive data is the ability to access regular patterns and trends based on a large amount of data and strategic decisions in business based on these patterns. As such, it is expected that in the near future, the most important areas where big data research will be shaped include but not limited to advanced analytics and data mining algorithms, text mining, e-commerce, web mining, mobile analytics, and social networks, as well as managing the organization and making intelligent decisions based on the outputs of such analysis. The mentioned facts can be obtained from the analysis of the clustering that was conducted on the most relevant keywords as well as the association rules of keyword occurrences. 8 Conclusion In this article, first all of the big data papers in top information systems journals have been extracted then formed the keywords network and efficient fields in big data identified by text mining algorithms such as clustering and correlation. This survey will help researchers and especially managers who want to decide on their management decisions and future investments according to the big data analytics. Also, the future approaches in big data have been checked out so, next researchers can consider these as the basis of their research. Future studies should focus on the other emerging information systems such as cloud computing, green information technology, and open source technology based on mentioned methods. The co-occurrence network of keywords was formed and using the text-mining algorithms such as clustering and association rules, the domains affecting the big data field were identified. This research helps significantly to researchers and especially managers who are determined to understand the emerging issues in the big data field and help to decide on intelligent management of organizations and future investments. Also, future research approaches to big data in connection with other high-tech realms were also explored in a transparent and distinct way, and future researchers will be able to base them on future research. It is suggested that researchers in their future research explore other emerging areas of information systems such as cloud computing, green information technology, or open source technology using the methods used in this article. References Amiri M. J. , Abedi-Koupai J., Jafar Jalali S. M., Mousavi S. F. ( 2017 ). Modeling of fixed-bed column system of Hg (II) ions on ostrich bone Ash/nZVI composite by artificial neural network . Journal of Environmental Engineering , 143 ( 9 ): 1 – 8 . Google Scholar Crossref Search ADS WorldCat Banshal S. K. , Uddin A., Singh V. K. ( 2015 ). Identifying themes and trends in CS research output from India. In 2015 International Conference on Cognitive Computing and Information Processing (CCIP), Noida, India, March. IEEE, pp. 1 – 6 . Fang Y. ( 2015 ). Visualizing the structure and the evolving of digital medicine: a scientometrics review . Scientometrics , 105 ( 1 ): 5 – 21 . Google Scholar Crossref Search ADS WorldCat Girvan M. , Newman M. E. ( 2002 ). Community structure in social and biological networks . Proceedings of the National Academy of Sciences , 99 ( 12 ): 7821 – 6 . Google Scholar Crossref Search ADS WorldCat Halevi G. , Moed H. F. ( 2012 ). The evolution of big data as a research and scientific topic: overview of the literature . Research Trends, 30(1): 3–6 . Google Scholar OpenURL Placeholder Text WorldCat Hanneman R. A. , Riddle M. ( 2005 ). Introduction to Social Network Methods. Riverside, CA: University of California. Hasani H. , Jalali S. M. J., Rezaei D., Maleki M. ( 2018 ). A data mining framework for classification of organisational performance based on rough set theory . Asian Journal of Management Science and Applications , 3 ( 2 ): 156 – 80 . Google Scholar OpenURL Placeholder Text WorldCat Huang Y. , Schuehle J., Porter A. L., Youtie J. ( 2015 ). A systematic method to create search strategies for emerging technologies based on the Web of Science: illustrated for ‘Big Data’ . Scientometrics , 105 ( 3 ): 2005 – 22 . Google Scholar Crossref Search ADS WorldCat Isasi N. K. G. , Frazzon E. M., Uriona M. ( 2015 ). Big data and business analytics in the supply chain: a review of the literature . IEEE Latin America Transactions , 13 ( 10 ): 3382 – 91 . Google Scholar Crossref Search ADS WorldCat Jalali S. M. J. ( 2016 ). Visualizing e-government emerging and fading themes using SNA techniques. In 2016 10th International Conference on e-Commerce in Developing Countries: with focus on e-Tourism (ECDC), Isfahan, Iran, April. IEEE, pp. 1 – 4 . Jalali S. M. J. , Hedjam R., Khosravi A., Heidari A. A., Mirjalili S., Nahavandi S. ( 2020 ). Autonomous robot navigation using moth-flame-based neuroevolution. In Mirjalili, S., Faris, H. and Aljarah, I. (eds), Evolutionary Machine Learning Techniques . Singapore: Springer , pp. 67 – 83 . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC Jalali S. M. J. , Karimi M., Khosravi A., Nahavandi S. ( 2019a ). An efficient neuroevolution approach for heart disease detection. In 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), Bari,Italy, October. IEEE, pp. 3771 – 6 . Jalali S. J. , Khosravi A., Alizadehsani R. et al. . ( 2019c ). Parsimonious evolutionary-based model development for detecting artery disease. In The 20th IEEE International Conference on Industrial Technology, Melbourne, Australia. IEEE, pp. 800 – 5 . Jalali S. M. J. , Khosravi A., Kebria P. M., Hedjam R., Nahavandi S. ( 2019d ). Autonomous robot navigation system using the evolutionary multi-verse optimizer algorithm. In 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), Bari, Italy, October. IEEE, pp. 1221 – 6 . Jalali S. M. J. , Mahdizadeh E., Mahmoudi M. R., Moro S. ( 2018 ). Analytical assessment process of e-learning domain research between 1980 and 2014 . International Journal of Management in Education , 12( 1 ): 43 – 56 . Google Scholar OpenURL Placeholder Text WorldCat Jalali S. M. J. , Kebria P. M., Khosravi A., Saleh K., Nahavandi D., Nahavandi S. ( 2019b ). Optimal autonomous driving through deep imitation learning and neuroevolution. In 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), Bari, Italy, October. IEEE, pp. 1215 – 20 . Jalali S. M. J. , Mohammad S., Park H. W. ( 2017 ). Conversations about open data on Twitter . International Journal of Contents , 13 ( 1 ): 166 – 78 . Google Scholar OpenURL Placeholder Text WorldCat Jalali S. M. J. , Moro S., Mahmoudi M. R., Ghaffary K. A., Maleki M., Alidoostan A. ( 2017 ). A comparative analysis of classifiers in cancer prediction using multiple data mining techniques . International Journal of Business Intelligence and Systems Engineering , 1( 2 ): 166 – 78 . Google Scholar OpenURL Placeholder Text WorldCat Jalali S. M. J. , Park H. W. ( 2018 ). State of the art in business analytics: themes and collaborations . Quality and Quantity , 52 ( 2 ): 627 – 33 . Google Scholar Crossref Search ADS WorldCat Johnson B. D. ( 2012 ). The secret life of data . The Futurist , 46 ( 4 ): 20 – 3 . Google Scholar OpenURL Placeholder Text WorldCat Khan G. F. , Wood J. ( 2015 ). Information technology management domain: emerging themes and keyword analysis . Scientometrics , 105 ( 2 ): 959 – 72 . Google Scholar Crossref Search ADS WorldCat Mahmoudi M. R. , Abbasalizadeh A. ( 2018a ). On comparing and clustering the alternatives of love in Saadi’s lyric poems (Ghazals) . Digital Scholarship in the Humanities , 34(1): 146–51. Google Scholar OpenURL Placeholder Text WorldCat Mahmoudi M. R. , Abbasalizadeh A. ( 2018b ). Statistical analysis about the order of Quran’s revelation . Digital Scholarship in the Humanities , 34 ( 1 ): 152 – 8 . Google Scholar Crossref Search ADS WorldCat Mahmoudi M. R. , Abbasalizadeh A. ( 2018c ). How statistics and text mining can be applied to literary studies? . Digital Scholarship in the Humanities , 34(3): 536–41. Google Scholar OpenURL Placeholder Text WorldCat Mahmoudi M. R. , Abbasalizadeh A. ( 2018d ). Statistical analysis about the God’s traits in Quran . Digital Scholarship in the Humanities. In press . Google Scholar OpenURL Placeholder Text WorldCat Mahmoudi M. R. , Abbasalizadeh A.,, Rahmati M. ( 2018 ). An statistical approach to investigate the alternatives of love in Moulana’s Divan. International Journal of Business Intelligence and Data Mining. doi: 10.1504/IJBIDM.2019.10018197. Moral-Muñoz J. A., Cobo, M. J., Peis, E., Arroyo-Morales, M., and Herrera-Viedma, E. ( 2014) . Analyzing the research in integrative and complementary medicine by means of science mapping . Complementary Therapies in Medicine 22 ( 2 ): 409 – 18 . Google Scholar Crossref Search ADS PubMed WorldCat Moro S. , Ramos P., Esmerado J., Jalali S. M. J. ( 2019 ). Can we trace back hotel online reviews’ characteristics using gamification features? . International Journal of Information Management , 44 : 88 – 95 . Google Scholar Crossref Search ADS WorldCat Murgado-Armenteros Eva María et al. . 2015 . Analysing the conceptual evolution of qualitative marketing research through science mapping analysis . Scientometrics , 102 ( 1) : 519 – 57 . Google Scholar OpenURL Placeholder Text WorldCat Nahapiet J. , Ghoshal S. (1998). Social capital, intellectual capital, and the organizational advantage. Academy of Management Review , 23 (2): 242 – 66I . Naur P. ( 1974 ). Concise Survey of Computer Methods . Lund : Studentlitteratur AB . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Park H. W. , Leydesdorff L. ( 2013 ). Decomposing social and semantic networks in emerging ‘big data’ research . Journal of Informetrics , 7 ( 3 ): 756 – 65 . Google Scholar Crossref Search ADS WorldCat Raeesi Vanani I. , ( 2017 ). Designing a predictive analytics for the formulation of intelligent decision making policies for VIP customers investing in the bank . Journal of Information Technology Management , 9 ( 3 ): 477 – 511 . Google Scholar OpenURL Placeholder Text WorldCat Raeesi Vanani I. , Jalali S. M. J. ( 2017 ). Analytical evaluation of emerging scientific trends in business intelligence through the utilisation of burst detection algorithm . International Journal of Bibliometrics in Business and Management , 1 ( 1 ): 70 – 9 . Google Scholar Crossref Search ADS WorldCat Raeesi Vanani I. , Jalali S. M. J. ( 2018 ). A comparative analysis of emerging scientific themes in business analytics . International Journal of Business Information Systems , 29 ( 2 ): 183 – 206 . Google Scholar Crossref Search ADS WorldCat Shao X. Y. , Mahmoudi M. R. ( 2019 ). Statistical comparison between the alternatives of love in the poems of Sa’adi and Moulana. Digital Scholarship in the Humanities. 10.1093/llc/fqz062 . Singh V. K. , Banshal S. K., Singhal K., Uddin A. ( 2015 ). Scientometric mapping of research on ‘Big Data’ . Scientometrics , 105 ( 2 ): 727 – 41 . Google Scholar Crossref Search ADS WorldCat Sohrabi B. , Raeesi Vanani I. ( 2011 ). Collaborative planning of ERP implementation: a design science approach . International Journal of Enterprise Information Systems , 7 ( 3 ): 58 – 67 . Google Scholar Crossref Search ADS WorldCat Sohrabi B. , Raeesi Vanani I., Abedin B. ( 2018 ). Human resources management and information systems trend analysis using text clustering . International Journal of Human Capital and Information Technology Professionals , 9 ( 3 ): 1 – 24 . Google Scholar Crossref Search ADS WorldCat Sohrabi B. , Raeesi Vanani I., Baranizade Shineh M. ( 2017 ). Designing a predictive analytics solution for evaluating the scientific trends in information systems domain . Webology , 14 ( 1 ): 32 – 52 . Google Scholar OpenURL Placeholder Text WorldCat Sohrabi B. , Raeesi Vanani I., Gooyavar A., Naderi N. ( 2019a ). Predicting the readmission of heart failure patients through data analytics . Journal of Information and Knowledge Management , 18 ( 1 ): 1950012-1–1950012-20. Google Scholar OpenURL Placeholder Text WorldCat Sohrabi B. , Raeesi Vanani I., Jalali S. M. J., Abedin E. ( 2019b ). Evaluation of research trends in knowledge management: a hybrid analysis through burst detection and text clustering . Journal of Information and Knowledge Management , 18 ( 04 ): 1950043. Google Scholar OpenURL Placeholder Text WorldCat Sohrabi B. , Raeesi Vanani I., Qorbani D., Forte P. ( 2012 ). An integrative view of knowledge sharing impact on e-learning quality: a model for higher education institutes . International Journal of Enterprise Information Systems , 8 ( 2 ): 14 – 29 . Google Scholar Crossref Search ADS WorldCat Sohrabi B. , Tahmasebipur K., Raeesi Vanani I. ( 2011 ). Designing a fuzzy expert system for ERP selection . Industrial Management Journal , 3 ( 6 ): 39 – 58 . Google Scholar OpenURL Placeholder Text WorldCat Stimmel Carol L. ( 2014 ). Big Data Analytics Strategies for the Smart Grid . CRC Press . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Wasserman S. , Faust K. ( 1994 ). Social Network Analysis: Methods and Applications, Vol. 8. Cambridge University Press. Wellman B. , Berkowitz S. D. (eds) ( 1988 ). Social Structures: A Network Approach, Vol. 2. CUP Archive. Xian H. , Madhavan K. ( 2014 ). Anatomy of scholarly collaboration in engineering education: a big‐data bibliometric analysis . Journal of Engineering Education , 103 ( 3 ): 486 – 514 . Google Scholar Crossref Search ADS WorldCat © The Author(s) 2020. Published by Oxford University Press on behalf of EADH. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - Research trends on big data domain using text mining algorithms JF - Digital Scholarship in the Humanities DO - 10.1093/llc/fqaa012 DA - 2021-09-29 UR - https://www.deepdyve.com/lp/oxford-university-press/research-trends-on-big-data-domain-using-text-mining-algorithms-0TOiaznhhX SP - 361 EP - 370 VL - 36 IS - 2 DP - DeepDyve ER -