Purpose – It is difficult to build our own social data set because data in social media is generally too vast and noisy. The aim of this study is to specify design and implementation details of the Twitter data collecting tool with a rule‐based filtering module. Additionally, the paper aims to see how people communicate with each other through social networks in a case study with rule‐based analysis. Design/methodology/approach – The authors developed a java‐based data gathering tool with a rule‐based filtering module for collecting data from Twitter. This paper introduces the design specifications and explain the implementation details of the Twitter Data Collecting Tool with detailed Unified Modeling Language (UML) diagrams. The Model View Controller (MVC) framework is applied in this system to support various types of user interfaces. Findings – The Twitter Data Collecting Tool is able to gather a huge amount of data from Twitter and filter the data with modest rules for complex logic. This case study shows that a historical event creates buzz on Twitter and people's interests on the event are reflected in their Twitter activity. Research limitations/implications – Applying data‐mining techniques to the social network data has so much potential. A possible improvement to the Twitter Data Collecting Tool would be an adaptation of a built‐in data‐mining module. Originality/value – This paper focuses on designing a system handling massive amounts of Twitter Data. This is the first approach to embed a rule engine for filtering and analyzing social data. This paper will be valuable to those who may want to build their own Twitter dataset, apply customized filtering options to get rid of unnecessary, noisy data, and analyze social data to discover new knowledge.
International Journal of Web Information Systems – Emerald Publishing
Published: Aug 23, 2013
Keywords: Twitter; Crawling; Data‐mining; Social analysis; Super Bowl 2012; Rule engine; Social networks; Social media