Access the full text.
Sign up today, get DeepDyve free for 14 days.
K. Esmaili (2012)
Challenges in Kurdish Text ProcessingArXiv, abs/1212.0074
David Hull (1996)
Stemming Algorithms: A Case Study for Detailed EvaluationJ. Am. Soc. Inf. Sci., 47
P. Samvelian (2007)
A lexical account of Sorani Kurdish prepositionsProceedings of the International Conference on Head-Driven Phrase Structure Grammar
K. Esmaili, H. Abolhassani, Mahmood Neshati, Ehsan Behrangi, A. Rostami, Mojtaba Nasiri (2007)
Mahak: A Test Collection for Evaluation of Farsi Information Retrieval Systems2007 IEEE/ACS International Conference on Computer Systems and Applications
(2011)
Fitting into Morphological Structure: Accounting for Sorani Kurdish Endoclitics
(2013)
Managing gigabytes for Java
E. Voorhees (1994)
Query expansion using lexical-semantic relations
C. Paice (1994)
An evaluation method for stemming algorithms
Available at: http://mg4j.dsi.unimi.it/. (????)
(2013)
Pewan's Download Link
K. Novak (2003)
The guardianNature Reviews Cancer, 3
Jinxi Xu, W. Croft (1998)
Corpus-based stemming using cooccurrence of word variantsACM Trans. Inf. Syst., 16
K. Esmaili, Donya Eliassi, Shahin Salavati, P. Aliabadi, Asrin Mohammadi, Somayeh Yosefi, Shownem Hakimi (2013)
Building a Test Collection for Sorani Kurdish2013 ACS International Conference on Computer Systems and Applications (AICCSA)
Guardian. 2013. The Guardian. www.guardian.co.uk
Mandar Mitra, Prosenjit Majumdar (2008)
FIRE: Forum for Information Retrieval Evaluation
Géraldine Walther, Benoît Sagot (2010)
Developing a Large-Scale Lexicon for a Less-Resourced Language: General Methodology and Preliminary Experiments on Sorani Kurdish
(2006)
Kurmanji Kurdish: A Reference Grammar with Selected Readings
E. Voorhees, D. Harman (1999)
Overview of the Eighth Text REtrieval Conference (TREC-8)
J. Lovins (1968)
Development of a stemming algorithmMech. Transl. Comput. Linguistics, 11
A. Farghaly, K. Shaalan (2009)
Arabic Natural Language Processing: Challenges and SolutionsACM Trans. Asian Lang. Inf. Process., 8
(1998)
Building a Kurdish Language Corpus: An Overview of the Technical Problems
Zobia Rehman, W. Anwar, U. Bajwa (2011)
Challenges in Urdu Text Tokenization and Sentence Boundary Disambiguation
M. Porter (1997)
An algorithm for suffix strippingProgram, 40
A. Hassanpour, J. Sheyholislami, T. Skutnabb-Kangas, (2012)
Introduction. Kurdish: Linguicide, resistance and hope, 2012
M. Shamsfard, Hoda Jafari, M. Ilbeygi (2010)
STeP-1: A Set of Fundamental Tools for Persian Text Processing
Lisa Ballesteros, W. Croft (1996)
Dictionary Methods for Cross-Lingual Information Retrieval
D. Mackenzie (1961)
Kurdish dialect studies
W. Barkhoda, Bahram ZahirAzami, Anvar Bahrampour, Om-Kolsoom Shahryari (2009)
A comparison between allophone, syllable, and diphone based TTS systems for Kurdish language2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)
Abolfazl AleAhmad, P. Hakimian, Farzad Mahdikhani, F. Oroumchian (2007)
N-gram and Local Context Analysis for Persian text retrieval2007 9th International Symposium on Signal Processing and Its Applications
J. Sheyholislami (2010)
Identity, language, and new media: the Kurdish caseLanguage Policy, 9
(2013)
Conference and Labs of the Evaluation Forum
Paul McNamee, J. Mayfield (2004)
Character N-Gram Tokenization for European Language Text RetrievalInformation Retrieval, 7
Jinxi Xu, Alexander Fraser, R. Weischedel (2002)
Empirical studies in strategies for Arabic retrieval
J. Savoy (1999)
A Stemming Procedure and Stopword List for General French CorporaJ. Am. Soc. Inf. Sci., 50
H. Heaps (1978)
Information retrieval, computational and theoretical aspects
K. Jones, C. Rijsbergen (1976)
INFORMATION RETRIEVAL TEST COLLECTIONSJournal of Documentation, 32
N. Beebe (2007)
A Complete Bibliography of ACM Transactions on Asian Language Information Processing
(2010)
OSAC: Open source Arabic corpus
(2013)
Apache Lucene
(2013)
Peyamner News Agency
(2013)
Voice of America - Kurdish (Sorani)
Martin Braschler, B. Ripplinger (2004)
How Effective is Stemming and Decompounding for German Text Retrieval?Information Retrieval, 7
(2013)
Hajir Dictionary
W. Dement (1998)
Introduction, 53
E. Voorhees (2004)
Overview of the TREC 2004 Robust Retrieval Track
M. Dorleijn (2005)
A study of European, Persian and Arabic loans in standard Sorani
J. Zobel (1998)
How reliable are the results of large-scale information retrieval experiments?
Abolfazl AleAhmad, Hadi Amiri, Ehsan Darrudi, M. Rahgozar, F. Oroumchian (2009)
Hamshahri: A standard Persian text collectionKnowl. Based Syst., 22
(2013)
Terrier IR platform. http://terrier.org
K. Esmaili, Shahin Salavati (2013)
Sorani Kurdish versus Kurmanji Kurdish: An Empirical Comparison
F. Lazarinis, Jesús Vilares, J. Tait, E. Efthimiadis (2009)
Current research issues and trends in non-English Web searchingInformation Retrieval, 12
G. Haig, Y. Matras (2002)
Kurdish linguistics: a brief overview, 55
C. Middleton, R. Baeza-Yates (2007)
A Comparison of Open Source Search Engines
S. Ceri, A. Bozzon, Marco Brambilla, Emanuele Valle, P. Fraternali, S. Quarteroni (2013)
An Introduction to Information Retrieval
G. Salton, E. Fox, Harry Wu (1983)
Extended Boolean information retrievalCommun. ACM, 26
Towards Kurdish Information Retrieval KYUMARS SHEYKH ESMAILI, Technicolor, France SHAHIN SALAVATI, University of Kurdistan, Iran ANWITAMAN DATTA, Nanyang Technological University, Singapore The Kurdish language is an Indo-European language spoken in Kurdistan, a large geographical region in the Middle East. Despite having a large number of speakers, Kurdish is among the less-resourced languages and has not seen much attention from the IR and NLP research communities. This article reports on the outcomes of a project aimed at providing essential resources for processing Kurdish texts. A principal output of this project is Pewan, the first standard Test Collection to evaluate Kurdish Information Retrieval systems. The other language resources that we have built include a lightweight stemmer and a list of stopwords. Our second principal contribution is using these newly-built resources to conduct a thorough experimental study on Kurdish documents. Our experimental results show that normalization, and to a lesser extent, stemming, can greatly improve the performance of Kurdish IR systems. Categories and Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval General Terms: Design, Measurement, Experimentation, Performance Additional Key Words and Phrases: Kurdish language, Sorani Kurdish, Kurmanji Kurdish, test collection, stemming, cross-lingual information retrieval ACM Reference Format: Sheykh
ACM Transactions on Asian Language Information Processing (TALIP) – Association for Computing Machinery
Published: Jun 1, 2014
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.