Benutzer:WritingLikeHell/Vorbereitung
The Karl Eberhards Corpus of spontaneously spoken southern German in dialogues[1] (short KEC) is a corpus of spontaneously spoken German recorded between 2014 and 2016 at the Department of General Linguistics of the Eberhard Karls Universität Tübingen. The corpus is hosted at the BAS CLARIN repository and contains fourty one hour long acoustic recordings of dialogues between two friends on various topics. Recordings were performed in two isolated recording chambers. The corpus contains manual annotations of word boundaries, and forced aligned segment and morphological annotations. The corpus also contains electromagnetic articulography (EMA) recordings for thirty speakers. Annotations come in the form of textgrids for the speech analysis software Praat.
Contents
[Bearbeiten | Quelltext bearbeiten]The KEC contains a total of 79 hours of recorded speech with a total of 450,311 words (23,265 different tokens). Speakers were allowed to chose the topic on their own to allow for a fluent, natural discussion. As a result, the corpus contains vocabulary from different kinds of topics.
Electromagnetic Articulography
[Bearbeiten | Quelltext bearbeiten]In addition to acoustic recordings, the corpus contains EMA recordings for 20 speakers for a duration of thirty minutes. EMA recordings contain 51,762 words (5,364) tokens. EMA sensors were recorded at following locations: tongue back, tongue mid, tongue tip, upper teeth, lower teeth, upper lip, lower lip, left lip edge and jaw. Apart from the jaw and LL sensor, all sensors were attached along the midsagittal plane. In addition, three reference sensors were placed at the nasion and the left and right mastoid.
Frequency distributions of words
[Bearbeiten | Quelltext bearbeiten]Coropra of spoken language allow to estimate frequency distributions of words in a given language. The following table illustrates the twenty most common words in the corpus[1], including their relative frequency in the corpus.
Word | Relative Frequency |
---|---|
ja | 0.043 |
und | 0.038 |
ich | 0.026 |
so | 0.024 |
das | 0.022 |
die | 0.020 |
dann | 0.016 |
auch | 0.015 |
da | 0.013 |
aber | 0.012 |
also | 0.011 |
der | 0.011 |
halt | 0.011 |
ist | 0.011 |
nicht | 0.010 |
du | 0.009 |
war | 0.008 |
was | 0.009 |
hat | 0.007 |
'ne | 0.007 |
See also
[Bearbeiten | Quelltext bearbeiten]External links
[Bearbeiten | Quelltext bearbeiten]Einzelnachweise
[Bearbeiten | Quelltext bearbeiten]- ↑ a b Arnold, D. and Tomaschek, F.: The Karl Eberhards Corpus of spontaneously spoken Southern German in dialogues - audio and articulatory recordings. In: Draxler, C; Kleber, F. (Hrsg.): Tagungsband der 12. Tagung Phonetik und Phonologie im deutschsprachigen Raum. Ludwig-Maximilians-Universität München 2016, S. 9–11, urn:urn:nbn:de:bvb:19-epub-29405-2(?!?!).