Indexing Arabic texts using association rule data mining

Ramzi A. Haraty; Rouba Nasrallah

doi:10.1108/lht-07-2017-0147

Loading next page...

References (31)

S. Khoja (2001)
APT: Arabic Part-of-speech Tagger
B. Sharef, N. Omar, Zeyad Sharef (2014)
An automated arabic text categorization based on the frequency ratio accumulation
Int. Arab J. Inf. Technol., 11
Tarek El-Shishtawy, F. El-Ghannam (2012)
An Accurate Arabic Root-Based Lemmatizer for Information Retrieval Purposes
ArXiv, abs/1203.3584
Abdul Saudagar, Habeeb Mohammed (2016)
Concatenation Technique for Extracted Arabic Characters for Efficient Content-based Indexing and Searching
D. Bhalodiya, K. Patel, Chhaya Patel (2013)
An efficient way to find frequent pattern with dynamic programming approach
2013 Nirma University International Conference on Engineering (NUiCONE)
L. Tedd (1991)
Text retrieval: The state of the art
Information Processing and Management, 27
T. Joachims (1998)
Text Categorization with Support Vector Machines: Learning with Many Relevant Features
S. Mahmood, M. Shahbaz, A. Guergachi (2014)
Negative and Positive Association Rules Mining from Text Using Frequent and Infrequent Itemsets
The Scientific World Journal, 2014
W. Daher (2002)
An Arabic Auto-indexing System
Abdulrahman Molijy, Ismail Hmeidi, I. Alsmadi (2012)
Indexing of Arabic documents automatically based on lexical analysis
ArXiv, abs/1205.1602
R. Haraty, Alaa Hamid (2002)
Segmenting Handwritten Arabic Text
S. Dumais (2016)
Using SVMs for Text Categorization
, 13
F. Harrag, Eyas El-Qawasmah (2009)
Neural Network for Arabic text classification
2009 Second International Conference on the Applications of Digital Information and Web Technologies
R. Haraty, S. Khatib (2005)
T-Stem - A Superior Stemmer and Temporal Extractor for Arabic Texts
J. Digit. Inf. Manag., 3
R. Haraty, C. Ghaddar (2004)
Arabic Text Recognition
Int. Arab J. Inf. Technol., 1
Haneen Khader, Abeer Al-Marridi, Hena Alpona, Suchithra Kunhoth, Abdulaali Hassaine, S. Al-Maadeed (2014)
An interactive annotation tool for indexing Historical manuscripts
2014 World Symposium on Computer Applications & Research (WSCAR)
Abdullah Hattab (2012)
Arabic Content Classification System Using statistical Bayes classifier With Words Detection and Correction
S. Al-Harbi, A. Almuhareb, A. Al-Thubaity, M. Khorsheed, A. Al-Eajeh (2008)
Automatic Arabic text classification
Holger Billhardt (2000)
Using Term Co-occurrence Data for Document Indexing and Retrieval
P. Gawrysiak, Lukasz Gancarz, M. Okoniewski (2002)
Recording word position information for improved document categorization
A. McCallum, K. Nigam (1998)
A comparison of event models for naive bayes text classification
Peter Turney, Patrick Pantel (2010)
From Frequency to Meaning: Vector Space Models of Semantics
J. Artif. Intell. Res., 37
M. Sahami, S. Dumais, D. Heckerman, E. Horvitz (1998)
A Bayesian Approach to Filtering Junk E-Mail
N. Mansour, R. Haraty, Walid Daher, M. Houri (2008)
An auto-indexing method for Arabic text
Inf. Process. Manag., 44
Mohammad Qureshi, Hassan Aldheleai, Yahya Tamandani (2015)
An improved documents classification technique using association rules mining
2015 IEEE International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN)
Sue Jordan, M. Gabe, Louise Newson, S. Snelgrove, Gerwyn Panes, A. Picek, I. Russell, Michael Dennis (2014)
Medication Monitoring for People with Dementia in Care Homes: The Feasibility and Clinical Impact of Nurse-Led Monitoring
The Scientific World Journal, 2014
V. Bhujade, N. Janwe (2011)
Knowledge Discovery in Text Mining Technique Using Association Rules Extraction
2011 International Conference on Computational Intelligence and Communication Networks
Fawaz Al-Anzi, Dia AbuZeina (2017)
Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic Indexing
J. King Saud Univ. Comput. Inf. Sci., 29
Yiming Yang, Xin Liu (1999)
A re-examination of text categorization methods
(2003)
An Arabic auto-indexing system for Arabic information retrieval
Hadeel Alazzam, Abdulsalam Alsmady (2017)
A distributed Arabic text classification approach using latent semantic analysis for big data
2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), 1

Publisher: Emerald Publishing
Copyright: © Emerald Publishing Limited
ISSN: 0737-8831
DOI: 10.1108/lht-07-2017-0147
Publisher site: See Article on Publisher Site

Abstract

The purpose of this paper is to propose a new model to enhance auto-indexing Arabic texts. The model denotes extracting new relevant words by relating those chosen by previous classical methods to new words using data mining rules.Design/methodology/approachThe proposed model uses an association rule algorithm for extracting frequent sets containing related items – to extract relationships between words in the texts to be indexed with words from texts that belong to the same category. The associations of words extracted are illustrated as sets of words that appear frequently together.FindingsThe proposed methodology shows significant enhancement in terms of accuracy, efficiency and reliability when compared to previous works.Research limitations/implicationsThe stemming algorithm can be further enhanced. In the Arabic language, we have many grammatical rules. The more we integrate rules to the stemming algorithm, the better the stemming will be. Other enhancements can be done to the stop-list. This is by adding more words to it that should not be taken into consideration in the indexing mechanism. Also, numbers should be added to the list as well as using the thesaurus system because it links different phrases or words with the same meaning to each other, which improves the indexing mechanism. The authors also invite researchers to add more pre-requisite texts to have better results.Originality/valueIn this paper, the authors present a full text-based auto-indexing method for Arabic text documents. The auto-indexing method extracts new relevant words by using data mining rules, which has not been investigated before. The method uses an association rule mining algorithm for extracting frequent sets containing related items to extract relationships between words in the texts to be indexed with words from texts that belong to the same category. The benefits of the method are demonstrated using empirical work involving several Arabic texts.

Journal

Library Hi Tech – Emerald Publishing

Published: Mar 7, 2019

Keywords: Precision; Recall; Arabic text; Auto-indexing; Frequent sets; Rule-based data mining

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Indexing Arabic texts using association rule data mining

Indexing Arabic texts using association rule data mining

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Indexing Arabic texts using association rule data mining

Indexing Arabic texts using association rule data mining

References (31)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies