PurposeBased on user-generated content from a Chinese social media platform, this paper aims to investigate multiple methods of constructing user profiles and their effectiveness in predicting their gender, age and geographic location.Design/methodology/approachThis investigation collected 331,634 posts from 4,440 users of Sina Weibo. The data were divided into two parts, for training and testing . First, a vector space model and topic models were applied to construct user profiles. A classification model was then learned by a support vector machine according to the training data set. Finally, we used the classification model to predict users’ gender, age and geographic location in the testing data set.FindingsThe results revealed that in constructing user profiles, latent semantic analysis performed better on the task of predicting gender and age. By contrast, the method based on a traditional vector space model worked better in making predictions regarding the geographic location. In the process of applying a topic model to construct user profiles, the authors found that different prediction tasks should use different numbers of topics.Originality/valueThis study explores different user profile construction methods to predict Chinese social media network users’ gender, age and geographic location. The results of this paper will help to improve the quality of personal information gathered from social media platforms, and thereby improve personalized recommendation systems and personalized marketing.
The Electronic Library – Emerald Publishing
Published: Aug 7, 2017