Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Fighting hate speech from bilingual hinglish speaker’s perspective, a transformer- and translation-based approach.

Fighting hate speech from bilingual hinglish speaker’s perspective, a transformer- and... Many people have begun to use social media platforms due to the increased use of the Internet over the previous decade. It has a lot of benefits, but it also comes with a lot of risks and drawbacks, such as Hate speech. People in multilingual societies, such as India, frequently mix their native language with English while speaking, so detecting hate content in such bilingual code-mixed data has drawn the larger interest of the research community. The majority of previous work focuses on high-resource language such as English, but very few researchers have concentrated on the mixed bilingual data like Hinglish. In this study, we investigated the performance of transformer models like IndicBERT and multilingual Bidirectional Encoder Representation(mBERT), as well as transfer learning from pre-trained language models like ULMFiT and Bidirectional encoder Representation(BERT), to find hateful content in Hinglish. Also, Transformer-based Interpreter and Feature extraction model on Deep Neural Network (TIF-DNN), is proposed in this work. The experimental results found that our proposed model outperforms existing state-of-art methods for Hate speech identification in Hinglish language with an accuracy of 73%. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Social Network Analysis and Mining Springer Journals

Fighting hate speech from bilingual hinglish speaker’s perspective, a transformer- and translation-based approach.

Loading next page...
 
/lp/springer-journals/fighting-hate-speech-from-bilingual-hinglish-speaker-s-perspective-a-CbuyuTnPkx
Publisher
Springer Journals
Copyright
Copyright © The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2022
ISSN
1869-5450
eISSN
1869-5469
DOI
10.1007/s13278-022-00920-w
Publisher site
See Article on Publisher Site

Abstract

Many people have begun to use social media platforms due to the increased use of the Internet over the previous decade. It has a lot of benefits, but it also comes with a lot of risks and drawbacks, such as Hate speech. People in multilingual societies, such as India, frequently mix their native language with English while speaking, so detecting hate content in such bilingual code-mixed data has drawn the larger interest of the research community. The majority of previous work focuses on high-resource language such as English, but very few researchers have concentrated on the mixed bilingual data like Hinglish. In this study, we investigated the performance of transformer models like IndicBERT and multilingual Bidirectional Encoder Representation(mBERT), as well as transfer learning from pre-trained language models like ULMFiT and Bidirectional encoder Representation(BERT), to find hateful content in Hinglish. Also, Transformer-based Interpreter and Feature extraction model on Deep Neural Network (TIF-DNN), is proposed in this work. The experimental results found that our proposed model outperforms existing state-of-art methods for Hate speech identification in Hinglish language with an accuracy of 73%.

Journal

Social Network Analysis and MiningSpringer Journals

Published: Dec 1, 2022

Keywords: Hinglish; Code-mixed; mBERT; Transformers

References