Data for: The learnability consequences of Zipfian distributions:Word Segmentation is Facilitated in More Predictable Distributions

Lavi-Rotbain, Ori; Arnon, Inbal

Research Data

Data for: The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions

Author(s) / Creator(s)

Lavi-Rotbain, Ori

Arnon, Inbal

Abstract / Description

Data of Hebrew speaking children and adults on an auditory statistical learning experiment looking at the effect of distribution predictability on segmentation.

While the languages of the world differ in many respects, they share certain commonalties, which can provide insight on our shared cognition. Here, we explore the learnability consequences of one of the striking commonalities between languages. Across languages, word frequencies follow a Zipfian distribution, showing a power law relation between a word's frequency and its rank. While their source in language has been studied extensively, less work has explored the learnability consequences of such distributions for language learners. We propose that the greater predictability of words in this distribution (relative to less skewed distributions) can facilitate word segmentation, a crucial aspect of early language acquisition. To explore this, we quantify word predictability using unigram entropy, assess it across languages using naturalistic corpora of child-directed speech and then ask whether similar unigram predictability facilitates word segmentation in the lab. We find similar unigram entropy in child-directed speech across 15 languages. We then use an auditory word segmentation task to show that the unigram predictability levels found in natural language are uniquely facilitative for word segmentation for both children and adults. These findings illustrate the facilitative impact of skewed input distributions on learning and raise questions about the possible role of cognitive pressures in the prevalence of Zipfian distributions in language.

Dataset for: Lavi-Rotbain, O. & Arnon, I. (2022). The learnability consequences of Zipfian distributions in language. Cognition, 223. https://doi.org/10.1016/j.cognition.2022.105038

Keyword(s)

Language acquisition Distributional learning Information theory Zipf's law Word segmentation

Persistent Identifier

https://doi.org/10.23668/psycharchives.3009

Date of first publication

2020-05-29

Publisher

PsychArchives

Is referenced by

https://doi.org/10.1016/j.cognition.2022.105038

Citation

Lavi-Rotbain, O., & Arnon, I. (2020). Data for: The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions [Data set]. PsychArchives. https://doi.org/10.23668/PSYCHARCHIVES.3009

children_data.csv

CSV - 3.72MB

MD5 : 073f80bd93180bbd6d3ab9ae0d99c988

Sharing Level 0 (Public Use) CC BY-SA 4.0

Download

Description: Children data of an auditory statistical learning experiment, looking at the effect of distribution predictability on segmemtation.
adults_data.csv

CSV - 7.61MB

MD5 : c69c5156581e3bf9b766ff54af9c4820

Sharing Level 0 (Public Use) CC BY-SA 4.0

Download

Description: Adults data of an auditory statistical learning experiment, looking at the effect of distribution predictability on segmemtation.

Is related to

Preprint
The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions

Lavi-Rotbain, Ori & Arnon, Inbal, 2020-06, PsychArchives

One of the striking commonalities between languages is the way word frequencies are distributed. Across languages, word frequencies follow a Zipfian distribution, showing a power law relation between a word's frequency and its rank (Zipf, 1949). Intuitively, this means that languages have relatively few high-frequency words and many low-frequency ones. While studied extensively, little work has explored the learnability consequences of the greater predictability of words in such distributions. Here, we propose such distributions confer a learnability advantage for word segmentation, a foundational aspect of language acquisition. We capture the greater predictability of words using the information-theoretic notion of efficiency, which tells us how predictable a distribution is relative to a uniform one. We first use corpus analyses to show that child-directed speech is similarly predictable across fifteen different languages. We then experimentally investigate the impact of distribution predictability on children and adults. We show that word segmentation is uniquely facilitated at the predictability levels found in language, compared both with uniform distributions and with skewed distributions that are less predictable than those of natural language. We further show that distribution predictability impacts learning more than distribution shape, and that learning is not improved further in distributions more predictable than natural language. These novel findings illustrate learners' sensitivity to the overall predictability of the linguistic environment; suggest that the predictability levels found in language provide an optimal environment for learning; and point to the possible role of cognitive pressures in the emergence and propensity of such distributions in language.

There are no other versions of this object.

Author(s) / Creator(s)

Lavi-Rotbain, Ori
Author(s) / Creator(s)

Arnon, Inbal
PsychArchives acquisition timestamp

2020-05-29T07:22:03Z
Made available on

2020-05-29T07:22:03Z
Date of first publication

2020-05-29
Abstract / Description

Data of Hebrew speaking children and adults on an auditory statistical learning experiment looking at the effect of distribution predictability on segmentation.

en
Abstract / Description

While the languages of the world differ in many respects, they share certain commonalties, which can provide insight on our shared cognition. Here, we explore the learnability consequences of one of the striking commonalities between languages. Across languages, word frequencies follow a Zipfian distribution, showing a power law relation between a word's frequency and its rank. While their source in language has been studied extensively, less work has explored the learnability consequences of such distributions for language learners. We propose that the greater predictability of words in this distribution (relative to less skewed distributions) can facilitate word segmentation, a crucial aspect of early language acquisition. To explore this, we quantify word predictability using unigram entropy, assess it across languages using naturalistic corpora of child-directed speech and then ask whether similar unigram predictability facilitates word segmentation in the lab. We find similar unigram entropy in child-directed speech across 15 languages. We then use an auditory word segmentation task to show that the unigram predictability levels found in natural language are uniquely facilitative for word segmentation for both children and adults. These findings illustrate the facilitative impact of skewed input distributions on learning and raise questions about the possible role of cognitive pressures in the prevalence of Zipfian distributions in language.

en
Abstract / Description

Dataset for: Lavi-Rotbain, O. & Arnon, I. (2022). The learnability consequences of Zipfian distributions in language. Cognition, 223. https://doi.org/10.1016/j.cognition.2022.105038

en
Review status

unknown

en
Citation

Lavi-Rotbain, O., & Arnon, I. (2020). Data for: The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions [Data set]. PsychArchives. https://doi.org/10.23668/PSYCHARCHIVES.3009

en
Persistent Identifier

https://hdl.handle.net/20.500.12034/2628
Persistent Identifier

https://doi.org/10.23668/psycharchives.3009
Language of content

eng
Publisher

PsychArchives

en
Is referenced by

https://doi.org/10.1016/j.cognition.2022.105038
Is related to

https://doi.org/10.23668/psycharchives.3075
Is related to

https://doi.org/10.1016/j.cognition.2022.105038
Keyword(s)

Language acquisition

en
Keyword(s)

Distributional learning

en
Keyword(s)

Information theory

en
Keyword(s)

Zipf's law

en
Keyword(s)

Word segmentation

en
Dewey Decimal Classification number(s)

150
Title

Data for: The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions

en
DRO type

researchData

en