Corpus of Spoken Greek

The Institute's Corpus of Spoken Greek is part of the Greek talk-in-interaction and Conversation Analysis research project, directed by professor emerita Th.-S. Pavlidou. It was originally designed for the qualitative analysis of language and linguistic communication, especially from the perspective of Conversation Analysis, which gives it its special features. Part of the Corpus, though, is available on line and can be used for quantitative analysis.

Features of the Institute's Corpus of Spoken Greek
The need for a corpus of spoken Greek arises primarily from the priority modern linguistics attributes to spoken over written discourse in general. Based, however, on the findings of sociolinguistics, the study of spoken discourse ought to be grounded in language material drawn from naturally-occurring circumstances of communication, such, that is, that allow for its spontaneous and unconstrained production. As a consequence, the compilation of a corpus of spoken discourse poses a series of challenges for researchers (which range from guaranteeing the 'naturalness' of the material, overcoming, i.e., the so-called 'observer's paradox', to ensuring the participating individuals' consent to the tape-recording, video-recording etc. of their speech) which do not arise in relation to corpora of written discourse made up of written and published texts.

The Corpus of Spoken Greek was originally designed for the qualitative analysis of language and linguistic communication, especially from the perspective of Conversation Analysis. Consequently, particular emphasis is placed on the transcription of tape- or video-recorded material as a detailed representation of sound reality.

For Conversation Analysis, transcription is not a mechanistic procedure (see related software in the market) nor is it restricted to the presentation of content (see print news interviews). On the contrary, the 'translation' of sound into writing presupposes theoretical processing and analysis as well as relevant training, and requires multiple 'corrections' by different individuals.

As a result, the transcribed texts of the Institute's Corpus of Spoken Greek depart from the standard orthographic representation of spoken discourse in that additional symbols are used to mark overlaps, pauses, intonational and other features of spoken discourse (see Transcription Symbols). The texts also differ in the degree of precision with which they have been transcribed.

Size and Discourse Genres of the Institute's Corpus of Spoken Greek
The digitized section of the Corpus (tape-/video-recorded material) amounts to about 190.000 MB, with the transcribed section approximating 1,8 million words. The material has been drawn from naturally-occurring circumstances of communication and comprises the following array of discourse types:

  • everyday conversations among friends and relatives (example)
  • telephone calls (example)
  • classroom interaction (example)
  • television news (example)
  • other television broadcasts (example)

The Institute's Corpus of Spoken Greek also incorporates the previous archive of spoken discourse GR-SPEECH (see Pavlidou, Th.-S. (2002) [in Greek]. GR-SPEECH: σώμα ελληνικών προφορικών κειμένων. Μελέτες για την Ελληνική Γλώσσα 22: 124-134. [GR-SPEECH: corpus of Greek spoken texts. Studies in Greek Linguistics 22: 124-134]).

Access and conditions of use
The Corpus of Spoken Greek of the Institute of Modern Greek Studies was originally compiled for the qualitative analysis of language and linguistic communication, especially from the perspective of Conversation Analysis. Part of the Corpus, though, can be utilized for quantitative analysis as well, and is available online (corpus-ins.lit.auth.gr/corpus/index.html) by simply registering at the respective website.

Moreover, the Corpus can be made available for qualitative research, the project's affordances permitting. For the conditions pertaining to access and use please send an email to <This email address is being protected from spambots. You need JavaScript enabled to view it.>.

See also:

Pavlidou, Th.-S. 2012. The Corpus of Spoken Greek: goals, challenges, perspectives. LREC Proceedings, Workshop 18 (Best Practices for Speech Corpora in Linguistic Research), 23-28.

Pavlidou, Th.-S., Kapellidi, Ch. & Karafoti, E. 2014. The Corpus of Spoken Greek (CSG), In: Best Practices for Spoken Corpora in Linguistic Research, Ş. Ruhi, M. Haugh, T. Schmidt & K. Wörner (eds), 56-74. Newcastle upon Tyne: Cambridge Scholars Publishing.

Pavlidou, Th.-S. (ed.). 2016. [in Greek] Making a Record of the Greek Language. Thessaloniki: Institute of Modern Greek Studies.

 

Doctoral theses completed within the research program
Greek talk-in-interaction and Conversation Analysis
(main supervisor: Th.-S. Pavlidou)

Kapellidi, Ch. 2011. Subjectivity and Self-presentation in Linguistic Interaction. [in Greek]. Unpublished PhD Thesis, Aristotle University of Thessaloniki.

Alvanoudi, A. 2013.The Social and Cognitive Dimensions of Grammatical Gender. [in Greek]. Unpublished PhD Thesis, Aristotle University of Thessaloniki.

Karafoti, E. 2014. Politeness, Impoliteness, and the Face of the Speaker. [in Greek]. Unpublished PhD Thesis, Aristotle University of Thessaloniki.