Tim Sainburg, Anna Mai and Timothy Gentner
To convey meaning, human language relies on hierarchically organized, long-range relationships spanning words, phrases, sentences, and discourse. As the distances between elements (e.g., phonemes, characters, words) in human language sequences increase, the strength of the long range relationships between those elements decays following a power law. This power-law relationship has been attributed variously to long-range sequential organization present in human language syntax, semantics, and discourse structure. However, non-linguistic behaviors in numerous phylogenetically distant species, ranging from humpback whale song to fruit fly motility, also demonstrate similar long-range statistical dependencies. Therefore, we hypothesized that long-range statistical dependencies in human speech may occur independently of linguistic structure. To test this hypothesis, we measured long-range dependencies in several speech corpora from children (aged 6 months — 12 years). We find that adult-like power-law statistical dependencies are present in human vocalizations at the earliest detectable ages, prior to the production of complex linguistic structure. These linguistic structures cannot, therefore, be the sole cause of long-range statistical dependencies in language.