The emergence of words from iterated sound imitations

Julia Trzeciakowska and Elizabeth Qing Zhang

There has been an ongoing debate concerning the modality in which language has emerged (cf. Wacewicz & Żywiczyński 2020). The gestural scenarios of language origins have recently gained prominence, with experimental studies typically focusing on the role of visual iconicity in bootstrapping the emergence of gestural communication and sign languages (cf. Fay et al. 2014; Goldin-Meadow 2016; Klima & Bellugi 1980). However, there is a novel line of research on spoken languages and sound symbolism, which shows that iconic processes may bootstrap the emergence of words, suggesting a greater importance of the vocal-auditory modality in language evolution (cf. Ćwiek et al., 2019; Edmiston et al., 2018; Perlman et al., 2015; Perlman & Cain, 2014; Pernis & Vigliocco, 2014; Imai & Kita, 2014).
The current study contributes to the debate as it addresses an empirically underexplored auditory-vocal modality and the power of imitation in the development of first words. Our study is based on the recently published research by Edmiston et al. (2018) which scrutinised the uninstructed emergence of words from repeated vocal imitations of environmental sounds (e.g., glass breaking, water splashing) and found that iterated vocal imitations stabilize in form and function (measured by an increase of acoustic and orthographic similarity). Yet, the currently available data do not allow for generalizing over other language groups.
Our project aims at verifying whether the emergence of words through iterated vocal imitations of environmental sounds is a universal phenomenon. The populations tested in our study are Polish, German, and Chinese native speakers. The stimulus for imitations consists of 1) the inanimate environmental sounds used by Edmiston et al. (2018), and 2) additional fully lexicalised onomatopoeias (e.g., dog barking, sneezing, clock ticking). We follow Edmiston et al. (2018) in using an iterated learning paradigm and transmission chains of up to 8 participants each (8 generations). One group of participants imitate one another’s vocal productions like in the children’s “Telephone game”. Then, the second group of participants transcribe the vocal imitations taken from the 1st and 8th generations into their native orthography.
We expect that because the imitations are getting more word-like (easier to imitate and transcribe) orthographic transcriptions of imitations in later generations will be more similar to one another by means of an increase of orthographic agreement in transcribed imitations (measured by Ratcliff Obershelp algorithm). We also assume that participants may know the conventional already existing fully lexicalised onomatopoeias and thus resort to their lexical knowledge. Hence, we expect that vocal imitations of fully lexicalised sounds will stabilize at a faster pace than environmental inanimate sounds and imitations of fully lexicalised onomatopoeic sounds will turn into their lexicalised versions already existing in a tested language.


