Biomechanics as the primordial basis for the emergence of co-speech gesture

Wim Pouw & Susanne Fuchs

In this talk I will explicate a theoretical perspective which forefronts the gesture-prosody link as a possibly more primordial basis for the phylogenetic emergence of co-speech gesture as opposed to referential aspects of gesture.

Humans have the universal habit of using their upper limbs for communicative expression during speaking. Leading theories hold that co-speech gestures occur because they visually enrich speech (Goldin-Meadow & Brentari, 2017). At odds with this visual only view is that speakers do not necessarily recruit gesture for visual enrichment of speech: Humans for example gesture on the phone when their interlocutor cannot see them (Bavelas et al., 2008).

Perhaps then gestures are not to be exclusively seen as signals transmitted via the visual channel. Indeed, another important aspect of co-speech gestures, is that no matter what they depict, they closely coordinate with speech prosody (Esteve-Gibert & Prieto, 2013; Rochet-Capellan et al., 2008). Specifically, gesture’s salient expressions (e.g., sudden increases in acceleration or deceleration) tend to align with moments of prosodic emphasis in speech (Wagner et al., 2014). This gesture-prosody relation is so strong, that recent deep neural networks trained on associations of gesture and speech from an individual, have succeeded in producing very natural-looking synthetic gestures based on feeding such networks novel speech acoustics from that same individual (Ginosar et al., 2019). That gestures can be recreated from acoustics, shows that there is a very tight relation between prosodic-acoustic information directly invariant to gestural movement information. Such research dovetails with findings that speakers in conversation who cannot see but only hear each other tend to synchronize their postural sway, i.e., the slight and nearly imperceptible movement needed to keep a person upright (Shockley, Baker, Richardson, & Fowler, 2007; Shockley, Santana, & Fowler, 2003). This social resonance effect suggests that speech acoustics can be a medium through which bodies can synchronize to a shared social rhythm.

Hand gesture and speech prosody may emerge straightforwardly from the physics of the body. Upper limb movements affect vocal expression through biomechanical effects on the lower vocal tract (Pouw et al., 2019, 2020). Vocalizations are found to be acoustically imprinted by peripheral upper limb movements due to recruiting a wider set of muscles around the trunk which also affect respiratory-related muscles crucial for vocal production. This suggests that some aspects of the gesture-prosody link have a simplicity to it – gesture-speech synchronization can at times emerge straightforwardly from biomechanics.

Any successful evolutionary explanation of the origins of language necessarily entails that it theorizes from a set of more basic competences from which new feats can emerge (Sheets-Johnstone, 2011). Reverse engineering how language-like representational capabilities of humans came about is a very daunting project in this respect, receiving much discussion (Kendon,2017) with little theoretical convergence. If we focus however on the possibly more basic link between gesture and prosody, there are clear connections between how gesture couples to prosodic aspects of speech and how animals’ bodies affect voice qualities that are informative for social interaction (Pisanski et al., 2016)

References

Bavelas, J., Gerwing, J., Sutton, C., & Prevost, D. (2008). Gesturing on the telephone: Independent effects of dialogue and visibility. Journal of Memory and Language, 58(2), 495–520.
https://doi.org/10.1016/j.jml.2007.02.004

Esteve-Gibert, N., & Prieto, P. (2013). Prosodic structure shapes the temporal Realization of intonation and manual gesture movements. Journal of Speech, Language, and Hearing Research, 56(3), 850–864.
https://doi.org/10.1044/1092-4388(2012/12-0049)

Ginosar, S., Bar, A., Kohavi, G., Chan, C., Owens, A., & Malik, J. (2019). Learning individual styles of conversational gesture . Proceedings of the EEE Conference on Computer Vision and Pattern Recognition, 3497–3506.
https://arxiv.org/abs/1906.04160

Goldin-Meadow, S., & Brentari, D. (2017). Gesture, sign, and language: The coming of age of sign language and gesture studies. The Behavioral and Brain Sciences, 40, e46.
https://doi.org/10.1017/S0140525X15001247

Kendon, A. (2017). Reflections on the “gesture-first” hypothesis of language origins. Psychonomic Bulletin & Review, 24(1), 163–170.
https://doi.org/10.3758/s13423-016-1117-3

Pisanski, K., Cartei, V., McGettigan, C., Raine, J., & Reby, D. (2016). Voice modulation: A window into the origins of human vocal control? Trends in Cognitive Sciences, 20(4), 304–318.
https://doi.org/10.1016/j.tics.2016.01.002

Pouw, W., Harrison, S. J., & Dixon, J. A. (2019). Gesture-speech physics: The biomechanical basis of the emergence of gesture-speech synchrony. Journal of Experimental Psychology: General, 149(2), 391–404.
https://doi.org/10.1037/xge0000646

Pouw, W., Paxton, A., Harrison, S. J., & Dixon, J. A. (2020). Acoustic information about upper limb movement in voicing. Proceedings of the National Academy of Sciences, 117(12), 11364–11367.
https://doi.org/10.1073/pnas.2004163117

Rochet-Capellan, A., Laboissière, R., Galván, A., & Schwartz, J. (2008). The speech focus position effect on jaw–finger coordination in a pointing task. Journal of Speech, Language, and Hearing Research, 51(6),1507–1521.
https://doi.org/10.1044/1092-4388(2008/07-0173)

Sheets-Johnstone, M. (2011). The Primacy of Movement. John Benjamins.

Wagner, P., Malisz, Z., & Kopp, S. (2014). Gesture and speech in interaction: An overview. Speech Communication, 57, 209–232.
https://doi.org/10.1016/j.specom.2013.09.008