“Whoa! It's like Spotify but for academic articles.”

Instant Access to Thousands of Journals for just $40/month

Get 2 Weeks Free

Design and development of Iberia: a corpus of scientific Spanish

Iberia is a synchronic corpus of scientific Spanish designed mainly for terminological studies. In this paper, we describe its design and the infrastructure for its acquisition, processing and exploitation, including mark-up, linguistic annotation, indexing and the user interface. Two preprocessing tasks affecting a large number of words are described in detail: dehyphenation and identification of text fragments in other languages. We also show how some of the reported statistics, namely, dispersion and association, are used for research on lexis. 1. Introduction The Iberia project3 was launched to bridge the gap between corpus linguistics and linguistic research in scientific Spanish. The aim of the Iberia project was two-fold: (i) the creation of a synchronic representative corpus of scientific texts in Spanish and (ii) the creation of the infrastructure for its linguistic processing and exploitation. Iberia will be part of an observatory of neologisms in science whose purpose is to study terminological usage of words in a variety of fields, to detect term obsolescence and to track recent 1 Departamento de Ingeniería Informática, Universidad Autónoma de Madrid, Campus de Cantoblanco, c/ Francisco Tomás y Valiente, 11, Madrid 28049, Spain. Correspondence to: Jordi Porta Zamorano, e-mail: jordi.porta@uam.es 2 Centro de Ciencias http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Corpora Edinburgh University Press

Loading next page...

You're reading a free preview. Subscribe to read the entire article.

And millions more from thousands of peer-reviewed journals, for just $40/month

Get 2 Weeks Free

To be the best researcher, you need access to the best research

  • With DeepDyve, you can stop worrying about how much articles cost, or if it's too much hassle to order — it's all at your fingertips. Your research is important and deserves the top content.
  • Read from thousands of the leading scholarly journals from Springer, Elsevier, Nature, IEEE, Wiley-Blackwell and more.
  • All the latest content is available, no embargo periods.