A set of parameters for automatically annotating a Sentiment Arabic Corpus

Guellil Imane; Darwish Kareem; Azouaou Faical

doi:10.1108/ijwis-03-2019-0008

Loading next page...

References (57)

Social big data mining: a survey focused on opinion mining and sentiments analysis
Introduction to Arabic natural language processing
Synthesis Lectures on Human Language Technologies, 3
A conventional orthography for Algerian Arabic
Nawaf Abdulla, N. Ahmed, M. Shehab, M. Al-Ayyoub, M. Al-Kabi, Saleh Al-Rifai (2014)
Towards Improving the Lexicon-Based Approach for Arabic Sentiment Analysis
Int. J. Inf. Technol. Web Eng., 9
Semantic sentiment analysis of Arabic texts
International Journal of Advanced Computer Science and Applications, 8
Labr: a large scale Arabic book reviews dataset
Machine translation experiments on padic: a parallel Arabic dialect corpus
Semeval-2017 task 4: sentiment analysis in twitter
Automatic lexicon construction for Arabic sentiment analysis
I. Guellil, A. Faical (2017)
Bilingual lexicon for Algerian Arabic dialect treatment in social media. In: WiNLP: women and underrepresented minorities in natural language processing (co-located with ACL 2017)
Document embeddings for Arabic sentiment analysis
Estimating the sentiment of Arabic social media contents: a survey
Nora Al-Twairesh, Hend Al-Khalifa, A. Al-Salman, Y. Al-Ohali (2017)
AraSenTi-Tweet: A Corpus for Arabic Sentiment Analysis of Saudi Tweets
Asda: Analyseur syntaxique du dialecte alg {\’e} rien dans un but d’analyse s {\’e} mantique
Alg/fr: a step by step construction of a lexicon between Algerian dialect and French
Sentiment analysis of French movie reviews
Arabic multi-dialect segmentation: bi-lstm-crf vs. svm
M'hamed Mataoui, Omar Zelmati, Madiha Boumechache (2016)
A Proposed Lexicon-Based Sentiment Analysis Approach for the Vernacular Algerian Arabic
Res. Comput. Sci., 110
Sentiment analysis in Arabic
A hybrid approach for sentiment classification of Egyptian dialect tweets
Sadam Al-Azani, El-Sayed El-Alfy (2017)
Using Word Embedding and Ensemble Learning for Highly Imbalanced Data Sentiment Analysis in Short Arabic Text
Classifying sentiment in Arabic social networks: naive search versus naive Bayes
Walaa Medhat, A. Hassan, H. Korashy (2014)
Sentiment analysis algorithms and applications: A survey
Ain Shams Engineering Journal, 5
Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, Manfred Stede (2011)
Lexicon-Based Methods for Sentiment Analysis
Computational Linguistics, 37
Building resources for Algerian Arabic dialects
ASTD: Arabic sentiment tweets dataset
Distributed representations of words and phrases and their compositionality
Machine translation for Arabic dialects (survey)
Information Processing and Management
Haiyun Peng, E. Cambria, A. Hussain (2017)
A Review of Sentiment Analysis Research in Chinese Language
Cognitive Computation, 9
Mohammed Rushdi-Saleh, M. Martín-Valdivia, L. López, José Ortega (2011)
OCA: Opinion corpus for Arabic
Journal of the American Society for Information Science and Technology, 62
Ghadah Alwakid, T. Osman, T. Hughes-Roberts (2017)
Challenges in Sentiment Analysis for Arabic Social Networks
Arabic text classification based on word and document embeddings
Sentireview: sentiment analysis based on text and emoticons
Arabic sentiment analysis: a survey
International Journal of Advanced Computer Science and Applications, 6
Learning from relatives: unified dialectal Arabic segmentation
Arabic language sentiment analysis on health services
Subjectivity and sentiment analysis of Arabic: a survey
The Penn Arabic treebank: building a large-scale annotated Arabic corpus
Subjectivity and sentiment analysis of modern standard Arabic and Arabic microblogs
Awatif: a multi-genre corpus for modern standard Arabic subjectivity and sentiment analysis
Mohamed Elarnaoty, S. Abdelrahman, A. Fahmy (2012)
A Machine Learning Approach For Opinion Holder Extraction In Arabic Language
ArXiv, abs/1206.1011
Muhammad Abdul-Mageed, Mona Diab, Sandra Kübler (2014)
SAMAR: Subjectivity and sentiment analysis for Arabic social media
Comput. Speech Lang., 28
Combining sentiment lexicons of Arabic terms
Arabic natural language processing: an overview
Journal of King Saud University-Computer and Information Sciences
A study of a non-resourced language: the case of one of the Algerian dialects
Distributed representations of sentences and documents
Exploiting emoticons in sentiment analysis
Arabic sentiment analysis: lexicon-based and corpus-based
Khalid Khalifa, N. Omar (2014)
A Hybrid method using Lexicon-based Approach and Naive Bayes Classifier for Arabic Opinion Question Answering
J. Comput. Sci., 10
Sentiment classification techniques for Arabic language: a survey
Arabic dialect identification with an unsupervised learning (based on a lexicon) application case: Algerian dialect
Arabic sentiment analysis approaches: an analytical survey
International Journal of Scientific and Engineering Research, 7
Lexicon-based sentiment analysis of Arabic tweets
International Journal of Social Network Mining, 2
A new modeling approach for Arabic opinion mining recognition
Sentiment analysis of Tunisian dialects: linguistic resources and experiments
Word embeddings for Arabic sentiment analysis
Survey on Arabic sentiment analysis in twitter
International Science Index, 9

Publisher: Emerald Publishing
Copyright: © Emerald Publishing Limited
ISSN: 1744-0084
DOI: 10.1108/ijwis-03-2019-0008
Publisher site: See Article on Publisher Site

Abstract

This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on social medias. It focuses on the Algerian dialect, which is a sub-dialect of Maghrebi Arabic. Although Algerian is spoken by roughly 40 million speakers, few studies address the automated processing in general and the sentiment analysis in specific for Algerian.Design/methodology/approachThe approach is based on the construction and use of a sentiment lexicon to automatically annotate a large corpus of Algerian text that is extracted from Facebook. Using this approach allow to significantly increase the size of the training corpus without calling the manual annotation. The annotated corpus is then vectorized using document embedding (doc2vec), which is an extension of word embeddings (word2vec). For sentiments classification, the authors used different classifiers such as support vector machines (SVM), Naive Bayes (NB) and logistic regression (LR).FindingsThe results suggest that NB and SVM classifiers generally led to the best results and MLP generally had the worst results. Further, the threshold that the authors use in selecting messages for the training set had a noticeable impact on recall and precision, with a threshold of 0.6 producing the best results. Using PV-DBOW led to slightly higher results than using PV-DM. Combining PV-DBOW and PV-DM representations led to slightly lower results than using PV-DBOW alone. The best results were obtained by the NB classifier with F1 up to 86.9 per cent.Originality/valueThe principal originality of this paper is to determine the right parameters for automatically annotating an Algerian dialect corpus. This annotation is based on a sentiment lexicon that was also constructed automatically.

Journal

International Journal of Web Information Systems – Emerald Publishing

Published: Oct 15, 2019

Keywords: Arabic sentiment analysis; Algerian dialect; Sentiment lexicon; Sentiment corpus; Doc2vec

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A set of parameters for automatically annotating a Sentiment Arabic Corpus

A set of parameters for automatically annotating a Sentiment Arabic Corpus

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A set of parameters for automatically annotating a Sentiment Arabic Corpus

A set of parameters for automatically annotating a Sentiment Arabic Corpus

References (57)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies