Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Arabic script language identification using letter frequency neural networks

Arabic script language identification using letter frequency neural networks Purpose – With the rapid emergence and explosion of the internet and the trend of globalization, a tremendous number of textual documents written in different languages are electronically accessible online from the world wide web. Efficiently and effectively managing these documents written in different languages is important to organizations and individuals. Therefore, the purpose of this paper is to propose letter frequency neural networks to enhance the performance of language identification. Design/methodology/approach – Initially, the paper analyzes the feasibility of using a windowing algorithm in order to find the best method in selecting the features of Arabic script documents language identification using backpropagation neural networks. Previously, it had been found that the sliding window and non‐sliding window algorithm used as feature selection methods in the experiments did not yield a good result. Therefore, this paper proposes, a language identification of Arabic script documents based on letter frequency using a backpropagation neural network and used the datasets belonging to Arabic, Persian, Urdu and Pashto language documents which are all Arabic script languages. Findings – The experiments have shown that the average root mean squared error of Arabic script document language identification based on letter frequency feature selection algorithm is lower than the windowing algorithm. Originality/value – This paper highlights the fact that using neural networks with proper feature selection methods will increase the performance of language identification. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Web Information Systems Emerald Publishing

Arabic script language identification using letter frequency neural networks

Loading next page...
 
/lp/emerald-publishing/arabic-script-language-identification-using-letter-frequency-neural-6ifnuvzXbH
Publisher
Emerald Publishing
Copyright
Copyright © 2008 Emerald Group Publishing Limited. All rights reserved.
ISSN
1744-0084
DOI
10.1108/17440080810919503
Publisher site
See Article on Publisher Site

Abstract

Purpose – With the rapid emergence and explosion of the internet and the trend of globalization, a tremendous number of textual documents written in different languages are electronically accessible online from the world wide web. Efficiently and effectively managing these documents written in different languages is important to organizations and individuals. Therefore, the purpose of this paper is to propose letter frequency neural networks to enhance the performance of language identification. Design/methodology/approach – Initially, the paper analyzes the feasibility of using a windowing algorithm in order to find the best method in selecting the features of Arabic script documents language identification using backpropagation neural networks. Previously, it had been found that the sliding window and non‐sliding window algorithm used as feature selection methods in the experiments did not yield a good result. Therefore, this paper proposes, a language identification of Arabic script documents based on letter frequency using a backpropagation neural network and used the datasets belonging to Arabic, Persian, Urdu and Pashto language documents which are all Arabic script languages. Findings – The experiments have shown that the average root mean squared error of Arabic script document language identification based on letter frequency feature selection algorithm is lower than the windowing algorithm. Originality/value – This paper highlights the fact that using neural networks with proper feature selection methods will increase the performance of language identification.

Journal

International Journal of Web Information SystemsEmerald Publishing

Published: Nov 21, 2008

Keywords: Neural net; Programming and algorithm theory; Algorithmic languages

References