Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You and Your Team.

Learn More →

An Arabic text categorization approach using term weighting and multiple reducts

An Arabic text categorization approach using term weighting and multiple reducts Text categorization is the process of assigning a predefined category label to an unlabeled document based on its content. One of the challenges of automatic text categorization is the high dimensionality of data that may affect the performance of the categorization model. This paper proposed an approach for the categorization of Arabic text based on term weighting and the reduct concept of the rough set theory to reduce the number of terms used to generate the classification rules that form the classifier. The paper proposed a multiple minimal reduct extraction algorithm by improving the Quick reduct algorithm. The multiple reducts are used to generate the set of classification rules which represent the rough set classifier. To evaluate the proposed approach, an Arabic corpus of 2700 documents nine categories is used. In the experiment, we compared the results of the proposed approach when using multiple and single minimal reducts. The results showed that the proposed approach had achieved an accuracy of 94% when using multiple reducts, which outperformed the single reduct method which achieved an accuracy of 86%. The results of the experiments also showed that the proposed approach outperforms both the K-NN and J48 algorithms regarding classification accuracy using the dataset on hand. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Soft Computing Springer Journals

An Arabic text categorization approach using term weighting and multiple reducts

Soft Computing , Volume 23 (14) – Jun 5, 2018

Loading next page...
 
/lp/springer_journal/an-arabic-text-categorization-approach-using-term-weighting-and-Xaszs4vklT
Publisher
Springer Journals
Copyright
Copyright © 2018 by Springer-Verlag GmbH Germany, part of Springer Nature
Subject
Engineering; Computational Intelligence; Artificial Intelligence; Mathematical Logic and Foundations; Control, Robotics, Mechatronics
ISSN
1432-7643
eISSN
1433-7479
DOI
10.1007/s00500-018-3249-z
Publisher site
See Article on Publisher Site

Abstract

Text categorization is the process of assigning a predefined category label to an unlabeled document based on its content. One of the challenges of automatic text categorization is the high dimensionality of data that may affect the performance of the categorization model. This paper proposed an approach for the categorization of Arabic text based on term weighting and the reduct concept of the rough set theory to reduce the number of terms used to generate the classification rules that form the classifier. The paper proposed a multiple minimal reduct extraction algorithm by improving the Quick reduct algorithm. The multiple reducts are used to generate the set of classification rules which represent the rough set classifier. To evaluate the proposed approach, an Arabic corpus of 2700 documents nine categories is used. In the experiment, we compared the results of the proposed approach when using multiple and single minimal reducts. The results showed that the proposed approach had achieved an accuracy of 94% when using multiple reducts, which outperformed the single reduct method which achieved an accuracy of 86%. The results of the experiments also showed that the proposed approach outperforms both the K-NN and J48 algorithms regarding classification accuracy using the dataset on hand.

Journal

Soft ComputingSpringer Journals

Published: Jun 5, 2018

References