Cross-lingual event-centered news clustering based
on elements semantic correlations of different news
Received: 15 September 2016 /Revised: 10 March 2017 / Accepted: 17 May 2017 /
Published online: 7 July 2017
Springer Science+Business Media New York 2017
Abstract Cross-lingual event-centered news clustering aims to perform the clustering of news
documents written in different languages into groups of documents that describe the same event. In
order to solve the problem of similarity computation between bi-lingual documents, this paper
propose a new method based on semantic correlations of news elements. First, using bilingual entity
lexical and terms co-occurrences in news to acquire the semantic correlation of news elements in
different language. Then, we compute the similarity between news in different languages using the
GVSM model on this basis. Finally, Spectral Clustering is applied to categorize news stories.
Experimental results show our method achieves promising results on the F value.
With the development of economy globalization and internationalization of businesses, the
associations among different countries get increasingly closer. More incidents and topics have
gained their mutual concerns. Cross-lingual event-centered news clustering is to group news
written in different languages into coherent clusters, each news (i.e., each news story) of the
same cluster cover the same event. So it can help people to grasp the current international and
regional hot events, and get views of different countries on the same event.
Multimed Tools Appl (2017) 76:25129–25143
Supported by the National Natural Science Foundation of China (Grant No. 61175068).
* Zhengtao Yu
The School of Computer Science and Technology, Anhui University of Technology,
Maanshan 243002, China
The School of Information Engineering and Automation, Kunming University of Science and
Technology, Kunming 650500, China