Conference Object

Balancing Errors as an Approach Towards Better Use of Larger Samples in Psychological Research

Author(s) / Creator(s)

Thielmann, Isabel
Hilbig, Benjamin E.

Abstract / Description

A well-documented concern about psychological research that has been frequently stressed during the current replicability crisis maintains that studies are often based on too small samples and tests thus yield insufficient statistical power. Correspondingly, it has been repeatedly emphasized that a vital step in overcoming the low replicability of psychological studies is to recruit larger samples whenever possible. However, despite the indubitable importance of larger samples – and the value of corresponding policy changes by editors and reviewers – increasing power will, all else being equal, only reduce one type of error (namely, β) whereas α is held constant at 5%. As a consequence, errors may be severely imbalanced, which is problematic for at least two reasons: First, retaining imbalanced errors implicitly assigns greater importance to one error over the other by affecting the “relative seriousness” of errors. In the extreme, increasing sample sizes and thus statistical power will inadvertently assign greater seriousness to β than to α if the latter is held constant at .05. Second, and more importantly, fixing α at .05 ultimately means that the statistical test cannot achieve consistency, meaning that tests will not point to the true state of the world even in the large sample limit. By implication, the conclusiveness of (non-)significant results will remain limited despite large samples and high power. To demonstrate this, we conducted two simulations comparing the Positive Predictive Value (PPV) and the proportion of correct inferences implied by fixed α versus balanced errors (i.e., α = β). For PPV, simulations showed that once the sample size is sufficiently large to render β < α (i.e., 1–β > .95), adjusting α corresponding to β results in higher PPV than holding α fixed at .05, irrespective of the probability p(H1) that the alternative hypothesis is true. For the proportion of correct inferences, in turn, results imply that balanced errors are to be preferred over fixed α in two situations: (1) whenever β < α (i.e., as soon as the sample size yields β < .05) which holds practically independent of p(H1) and (2) whenever p(H1) > .50, practically irrespective of the absolute magnitude of α and β. Fixing α = .05, by contrast, is only superior whenever β > α and p(H1) < .50, that is, whenever statistical power is not entirely satisfactory and the alternative hypothesis is known to be less likely to hold than the null. Overall, we therefore advocate for extending the calls for higher statistical power by also calling for balanced errors based on straightforward compromise power analyses if samples are large. In other words, to fully exploit the advantages of large samples and to render statistical tests consistent, researchers should not blindly replace a general lack of power with increasingly imbalanced errors, but instead strive for smaller error probabilities in general.

Persistent Identifier

Date of first publication

2019-03-13

Is part of

Open Science 2019, Trier, Germany

Publisher

ZPID (Leibniz Institute for Psychology Information)

Citation

Thielmann, I., & Hilbig, B. E. (2019, March 13). Balancing Errors as an Approach Towards Better Use of Larger Samples in Psychological Research. ZPID (Leibniz Institute for Psychology Information). https://doi.org/10.23668/psycharchives.2396
  • Author(s) / Creator(s)
    Thielmann, Isabel
  • Author(s) / Creator(s)
    Hilbig, Benjamin E.
  • PsychArchives acquisition timestamp
    2019-04-03T12:29:04Z
  • Made available on
    2019-04-03T12:29:04Z
  • Date of first publication
    2019-03-13
  • Abstract / Description
    A well-documented concern about psychological research that has been frequently stressed during the current replicability crisis maintains that studies are often based on too small samples and tests thus yield insufficient statistical power. Correspondingly, it has been repeatedly emphasized that a vital step in overcoming the low replicability of psychological studies is to recruit larger samples whenever possible. However, despite the indubitable importance of larger samples – and the value of corresponding policy changes by editors and reviewers – increasing power will, all else being equal, only reduce one type of error (namely, β) whereas α is held constant at 5%. As a consequence, errors may be severely imbalanced, which is problematic for at least two reasons: First, retaining imbalanced errors implicitly assigns greater importance to one error over the other by affecting the “relative seriousness” of errors. In the extreme, increasing sample sizes and thus statistical power will inadvertently assign greater seriousness to β than to α if the latter is held constant at .05. Second, and more importantly, fixing α at .05 ultimately means that the statistical test cannot achieve consistency, meaning that tests will not point to the true state of the world even in the large sample limit. By implication, the conclusiveness of (non-)significant results will remain limited despite large samples and high power. To demonstrate this, we conducted two simulations comparing the Positive Predictive Value (PPV) and the proportion of correct inferences implied by fixed α versus balanced errors (i.e., α = β). For PPV, simulations showed that once the sample size is sufficiently large to render β < α (i.e., 1–β > .95), adjusting α corresponding to β results in higher PPV than holding α fixed at .05, irrespective of the probability p(H1) that the alternative hypothesis is true. For the proportion of correct inferences, in turn, results imply that balanced errors are to be preferred over fixed α in two situations: (1) whenever β < α (i.e., as soon as the sample size yields β < .05) which holds practically independent of p(H1) and (2) whenever p(H1) > .50, practically irrespective of the absolute magnitude of α and β. Fixing α = .05, by contrast, is only superior whenever β > α and p(H1) < .50, that is, whenever statistical power is not entirely satisfactory and the alternative hypothesis is known to be less likely to hold than the null. Overall, we therefore advocate for extending the calls for higher statistical power by also calling for balanced errors based on straightforward compromise power analyses if samples are large. In other words, to fully exploit the advantages of large samples and to render statistical tests consistent, researchers should not blindly replace a general lack of power with increasingly imbalanced errors, but instead strive for smaller error probabilities in general.
    en_US
  • Citation
    Thielmann, I., & Hilbig, B. E. (2019, March 13). Balancing Errors as an Approach Towards Better Use of Larger Samples in Psychological Research. ZPID (Leibniz Institute for Psychology Information). https://doi.org/10.23668/psycharchives.2396
    en
  • Persistent Identifier
    https://hdl.handle.net/20.500.12034/2028
  • Persistent Identifier
    https://doi.org/10.23668/psycharchives.2396
  • Language of content
    eng
    en_US
  • Publisher
    ZPID (Leibniz Institute for Psychology Information)
    en_US
  • Is part of
    Open Science 2019, Trier, Germany
    en_US
  • Dewey Decimal Classification number(s)
    150
  • Title
    Balancing Errors as an Approach Towards Better Use of Larger Samples in Psychological Research
    en_US
  • DRO type
    conferenceObject
    en_US
  • Visible tag(s)
    ZPID Conferences and Workshops