Purpose – The purpose of this paper is to address the challenge of opinion mining in text documents to perform further analysis such as community detection and consistency control. More specifically, we aim to identify and extract opinions from natural language documents and to represent them in a structured manner to identify communities of opinion holders based on their common opinions. Another goal is to rapidly identify similar or contradictory opinions on a target issued by different holders. Design/methodology/approach – For the opinion extraction problem we opted for a supervised approach focusing on the feature selection problem to improve our classification results. On the community detection problem, we rely on the Infomap community detection algorithm and the multi-scale community detection framework used on a graph representation based on the available opinions and social data. Findings – The classification performance in terms of precision and recall was significantly improved by adding a set of “meta-features” based on grouping rules of certain part of speech (POS) instead of the actual words. Concerning the evaluation of the community detection feature, we have used two quality metrics: the network modularity and the normalized mutual information (NMI). We evaluated seven one-target similarity functions and ten multi-target aggregation functions and concluded that linear functions perform poorly for data sets with multiple targets, while functions that calculate the average similarity have greater resilience to noise. Originality/value – Although our solution relies on existing approaches, we managed to adapt and integrate them in an efficient manner. Based on the initial experimental results obtained, we managed to integrate original enhancements to improve the performance of the obtained results.
International Journal of Web Information Systems – Emerald Publishing
Published: Nov 11, 2014