Application of hyper-convergent platform for big data in exploring regional innovation systemsFinogeev, Alexey G.; Gamidullaeva, Leyla A.; Vasin, Sergey M.
doi: 10.1504/IJDMMM.2020.111395pmid: N/A
The authors developed a decentralised hyper-convergent analytical platform for the collection and processing of big data in order to explore the monitoring processes of distributed objects in the regions on the basis of multi-agent approach. The platform is intended for modular integration of tools for searching, collecting, processing and big data mining from cyber-physical and cyber-social objects. The results of the intellectual analysis are used to assess the integrated criteria for the effectiveness of innovation systems of distributed monitoring and forecasting the dynamics of the influence of various factors on technological and socio-economic processes. The work analyses convergent and hyper-convergent systems, substantiates the necessity of creating a multi-agent decentralised platform for big data collection and analytical processing. The article proposes the principles of streaming architecture for the data integration analytical processing to resolve the problems of searching, parallel processing, data mining and uploading of information into a cloud storage. The paper also considers the main components of the hyper-convergent analytical platform. A new concept of distributed extraction, transformation, loading, mining (ETLM) system is considered.
Data modelling for large-scale social media analytics: design challenges and lessons learnedAydin, Ahmet Arif; Anderson, Kenneth M.
doi: 10.1504/IJDMMM.2020.111409pmid: N/A
We live in a world of big data; organisations collect, store, and analyse large volumes of data for various purposes. The five V's of big data introduce new challenges for developers to handle when performing data processing and analysis. Indeed, data modelling is one of the most challenging and critical aspects of big data because it determines how data will be structured and stored; these decisions then impact how that data can be processed and analysed. In this paper, we report on designing a data model for storing and analysing Twitter data in support of crisis informatics. In this work, we leverage the data model provided by columnar NoSQL data stores to design column families that can efficiently index, sort, store and analyse large Twitter datasets. In particular, our column families are designed to achieve efficient batch data processing. We evaluate these claims and discuss our future work.
A new quantitative method for simplifying complex fuzzy cognitive mapsObiedat, Mamoon; Al-yousef, Ali; Banikhalaf, Mustafa; Talafha, Khairallah Al
doi: 10.1504/IJDMMM.2020.111402pmid: N/A
Fuzzy cognitive map (FCM) is a qualitative soft computing approach addresses uncertain human perceptions of diverse real-world problems. The map depicts the problem in the form of problem nodes and cause-effect relationships among them. Complex problems often produce complex maps that may be difficult to understand or predict, and therefore, maps need to be simplified. Previous studies used subjectively simplification/condensation processes by grouping similar variables into one variable in a qualitative manner. This paper proposes a quantitative method for simplifying FCM. It uses the spectral clustering quantitative technique to classify/group related variables into new clusters without human intervention. Initially, improvements were added to this clustering technique to properly handle FCM matrix data. Then, the proposed method was examined by an application dataset to validate its appropriateness in FCM simplification. The results showed that the method successfully classified the dataset into meaningful clusters.
The bootstrap procedure in classification problemsVrigazova, Borislava Petrova; Ivanov, Ivan Ganchev
doi: 10.1504/IJDMMM.2020.111400pmid: N/A
In classification problems, cross-validation chooses random samples from the dataset in order to improve the ability of the model to classify properly new observations in the respective class. Research articles from various fields show that when applied to regression problems, the bootstrap can improve either the prediction ability of the model or the ability for feature selection. The purpose of our research is to show that the bootstrap as a model selection procedure in classification problems can outperform cross-validation. We compare the performance measures of cross-validation and the bootstrap on a set of classification problems and analyse their practical advantages and disadvantages. We show that the bootstrap procedure can accelerate execution time compared to the cross-validation procedure while preserving the accuracy of the classification model. This advantage of the bootstrap is particularly important in big datasets as the time needed for fitting the model can be reduced without decreasing the model's performance.
A quest for better anomaly detectorsSoleymani, Mehdi
doi: 10.1504/IJDMMM.2020.111399pmid: N/A
Anomaly detection is a very popular method for detecting exceptional observations which are very rare. It has been frequently used in medical diagnosis, fraud detection, etc. In this article, we revisit some popular algorithms for anomaly detection and investigate why we are on a quest for a better algorithm for identifying anomalies. We propose a new algorithm, which unlike other popular algorithms, is not looking for outliers directly, but it searches for them by removing the inliers (opposite to outliers) in an iterative way. We present an extensive simulation study to show the performance of the proposed algorithm compared to its competitors.