Access the full text.
Sign up today, get DeepDyve free for 14 days.
References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.
Abstract In this paper we discuss two simple statistical methods for analyzing spoken language as represented in transcription corpora. The methods are strictly data-driven, in the sense that no preset grammatical category system and no lexical knowledge is assumed. The first method (‘Siblings’) is used for word type clustering within one language; the second method (‘Cousins’) is used for translation between two cognate languages. Exploiting two large Scandinavian speech corpora (one Danish and one Swedish), we show how bilingual dictionary entries can be derived from raw transcription data directly. Our investigations shed new light on the so called ‘disfluencies’ typical of the spoken language, showing them to be syntax errors only from the viewpoint of written language grammar. Finally, we discuss how the proposed methods could be applied to languages that do not have a writing system at all and no record of linguistic description. 1
Acta Linguistica Hafniensia: International – Taylor & Francis
Published: Jan 1, 2004
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.