Research Data

Data for: The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions

Author(s) / Creator(s)

Lavi-Rotbain, Ori
Arnon, Inbal

Abstract / Description

Data of Hebrew speaking children and adults on an auditory statistical learning experiment looking at the effect of distribution predictability on segmentation.
While the languages of the world differ in many respects, they share certain commonalties, which can provide insight on our shared cognition. Here, we explore the learnability consequences of one of the striking commonalities between languages. Across languages, word frequencies follow a Zipfian distribution, showing a power law relation between a word's frequency and its rank. While their source in language has been studied extensively, less work has explored the learnability consequences of such distributions for language learners. We propose that the greater predictability of words in this distribution (relative to less skewed distributions) can facilitate word segmentation, a crucial aspect of early language acquisition. To explore this, we quantify word predictability using unigram entropy, assess it across languages using naturalistic corpora of child-directed speech and then ask whether similar unigram predictability facilitates word segmentation in the lab. We find similar unigram entropy in child-directed speech across 15 languages. We then use an auditory word segmentation task to show that the unigram predictability levels found in natural language are uniquely facilitative for word segmentation for both children and adults. These findings illustrate the facilitative impact of skewed input distributions on learning and raise questions about the possible role of cognitive pressures in the prevalence of Zipfian distributions in language.
Dataset for: Lavi-Rotbain, O. & Arnon, I. (2022). The learnability consequences of Zipfian distributions in language. Cognition, 223. https://doi.org/10.1016/j.cognition.2022.105038

Keyword(s)

Language acquisition Distributional learning Information theory Zipf's law Word segmentation

Persistent Identifier

Date of first publication

2020-05-29

Publisher

PsychArchives

Is referenced by

Citation

Lavi-Rotbain, O., & Arnon, I. (2020). Data for: The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions [Data set]. PsychArchives. https://doi.org/10.23668/PSYCHARCHIVES.3009
  • Author(s) / Creator(s)
    Lavi-Rotbain, Ori
  • Author(s) / Creator(s)
    Arnon, Inbal
  • PsychArchives acquisition timestamp
    2020-05-29T07:22:03Z
  • Made available on
    2020-05-29T07:22:03Z
  • Date of first publication
    2020-05-29
  • Abstract / Description
    Data of Hebrew speaking children and adults on an auditory statistical learning experiment looking at the effect of distribution predictability on segmentation.
    en
  • Abstract / Description
    While the languages of the world differ in many respects, they share certain commonalties, which can provide insight on our shared cognition. Here, we explore the learnability consequences of one of the striking commonalities between languages. Across languages, word frequencies follow a Zipfian distribution, showing a power law relation between a word's frequency and its rank. While their source in language has been studied extensively, less work has explored the learnability consequences of such distributions for language learners. We propose that the greater predictability of words in this distribution (relative to less skewed distributions) can facilitate word segmentation, a crucial aspect of early language acquisition. To explore this, we quantify word predictability using unigram entropy, assess it across languages using naturalistic corpora of child-directed speech and then ask whether similar unigram predictability facilitates word segmentation in the lab. We find similar unigram entropy in child-directed speech across 15 languages. We then use an auditory word segmentation task to show that the unigram predictability levels found in natural language are uniquely facilitative for word segmentation for both children and adults. These findings illustrate the facilitative impact of skewed input distributions on learning and raise questions about the possible role of cognitive pressures in the prevalence of Zipfian distributions in language.
    en
  • Abstract / Description
    Dataset for: Lavi-Rotbain, O. & Arnon, I. (2022). The learnability consequences of Zipfian distributions in language. Cognition, 223. https://doi.org/10.1016/j.cognition.2022.105038
    en
  • Review status
    unknown
    en
  • Citation
    Lavi-Rotbain, O., & Arnon, I. (2020). Data for: The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions [Data set]. PsychArchives. https://doi.org/10.23668/PSYCHARCHIVES.3009
    en
  • Persistent Identifier
    https://hdl.handle.net/20.500.12034/2628
  • Persistent Identifier
    https://doi.org/10.23668/psycharchives.3009
  • Language of content
    eng
  • Publisher
    PsychArchives
    en
  • Is referenced by
    https://doi.org/10.1016/j.cognition.2022.105038
  • Is related to
    https://doi.org/10.23668/psycharchives.3075
  • Is related to
    https://doi.org/10.1016/j.cognition.2022.105038
  • Keyword(s)
    Language acquisition
    en
  • Keyword(s)
    Distributional learning
    en
  • Keyword(s)
    Information theory
    en
  • Keyword(s)
    Zipf's law
    en
  • Keyword(s)
    Word segmentation
    en
  • Dewey Decimal Classification number(s)
    150
  • Title
    Data for: The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions
    en
  • DRO type
    researchData
    en