Data for: The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions
Author(s) / Creator(s)
Lavi-Rotbain, Ori
Arnon, Inbal
Abstract / Description
Data of Hebrew speaking children and adults on an auditory statistical learning experiment looking at the effect of distribution predictability on segmentation.
While the languages of the world differ in many respects, they share certain commonalties, which can provide insight on our shared cognition. Here, we explore the learnability consequences of one of the striking commonalities between languages. Across languages, word frequencies follow a Zipfian distribution, showing a power law relation between a word's frequency and its rank. While their source in language has been studied extensively, less work has explored the learnability consequences of such distributions for language learners. We propose that the greater predictability of words in this distribution (relative to less skewed distributions) can facilitate word segmentation, a crucial aspect of early language acquisition. To explore this, we quantify word predictability using unigram entropy, assess it across languages using naturalistic corpora of child-directed speech and then ask whether similar unigram predictability facilitates word segmentation in the lab. We find similar unigram entropy in child-directed speech across 15 languages. We then use an auditory word segmentation task to show that the unigram predictability levels found in natural language are uniquely facilitative for word segmentation for both children and adults. These findings illustrate the facilitative impact of skewed input distributions on learning and raise questions about the possible role of cognitive pressures in the prevalence of Zipfian distributions in language.
Dataset for: Lavi-Rotbain, O. & Arnon, I. (2022). The learnability consequences of Zipfian distributions in language. Cognition, 223. https://doi.org/10.1016/j.cognition.2022.105038
Keyword(s)
Language acquisition Distributional learning Information theory Zipf's law Word segmentationPersistent Identifier
Date of first publication
2020-05-29
Publisher
PsychArchives
Is referenced by
Citation
Lavi-Rotbain, O., & Arnon, I. (2020). Data for: The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions [Data set]. PsychArchives. https://doi.org/10.23668/PSYCHARCHIVES.3009
-
children_data.csvCSV - 3.72MBMD5: 073f80bd93180bbd6d3ab9ae0d99c988Description: Children data of an auditory statistical learning experiment, looking at the effect of distribution predictability on segmemtation.
-
adults_data.csvCSV - 7.61MBMD5: c69c5156581e3bf9b766ff54af9c4820Description: Adults data of an auditory statistical learning experiment, looking at the effect of distribution predictability on segmemtation.
-
There are no other versions of this object.
-
Author(s) / Creator(s)Lavi-Rotbain, Ori
-
Author(s) / Creator(s)Arnon, Inbal
-
PsychArchives acquisition timestamp2020-05-29T07:22:03Z
-
Made available on2020-05-29T07:22:03Z
-
Date of first publication2020-05-29
-
Abstract / DescriptionData of Hebrew speaking children and adults on an auditory statistical learning experiment looking at the effect of distribution predictability on segmentation.en
-
Abstract / DescriptionWhile the languages of the world differ in many respects, they share certain commonalties, which can provide insight on our shared cognition. Here, we explore the learnability consequences of one of the striking commonalities between languages. Across languages, word frequencies follow a Zipfian distribution, showing a power law relation between a word's frequency and its rank. While their source in language has been studied extensively, less work has explored the learnability consequences of such distributions for language learners. We propose that the greater predictability of words in this distribution (relative to less skewed distributions) can facilitate word segmentation, a crucial aspect of early language acquisition. To explore this, we quantify word predictability using unigram entropy, assess it across languages using naturalistic corpora of child-directed speech and then ask whether similar unigram predictability facilitates word segmentation in the lab. We find similar unigram entropy in child-directed speech across 15 languages. We then use an auditory word segmentation task to show that the unigram predictability levels found in natural language are uniquely facilitative for word segmentation for both children and adults. These findings illustrate the facilitative impact of skewed input distributions on learning and raise questions about the possible role of cognitive pressures in the prevalence of Zipfian distributions in language.en
-
Abstract / DescriptionDataset for: Lavi-Rotbain, O. & Arnon, I. (2022). The learnability consequences of Zipfian distributions in language. Cognition, 223. https://doi.org/10.1016/j.cognition.2022.105038en
-
Review statusunknownen
-
CitationLavi-Rotbain, O., & Arnon, I. (2020). Data for: The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions [Data set]. PsychArchives. https://doi.org/10.23668/PSYCHARCHIVES.3009en
-
Persistent Identifierhttps://hdl.handle.net/20.500.12034/2628
-
Persistent Identifierhttps://doi.org/10.23668/psycharchives.3009
-
Language of contenteng
-
PublisherPsychArchivesen
-
Is referenced byhttps://doi.org/10.1016/j.cognition.2022.105038
-
Is related tohttps://doi.org/10.23668/psycharchives.3075
-
Is related tohttps://doi.org/10.1016/j.cognition.2022.105038
-
Keyword(s)Language acquisitionen
-
Keyword(s)Distributional learningen
-
Keyword(s)Information theoryen
-
Keyword(s)Zipf's lawen
-
Keyword(s)Word segmentationen
-
Dewey Decimal Classification number(s)150
-
TitleData for: The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributionsen
-
DRO typeresearchDataen