Conference Object

How do researchers interpret the results of multiple experiments?

Author(s) / Creator(s)

van den Akker, Olmo

Abstract / Description

Introduction: In both social and experimental psychology a single study is typically not considered to be sufficient to test a theory, and multiple study papers are the norm. In this project, we consider how researchers assess the validity of a theory when they are presented with the results of multiple studies that all test that theory. More specifically, we consider what researchers’ beliefs in the theory are as a function of the number of significant vs. nonsignificant studies, and whether this relationship depends on the type of study (direct or conceptual replication) and the role of the respondent (researcher or reviewer). This information is especially relevant in the context of the current replication crisis in psychology, which has prompted a discussion on what evidence sufficiently corroborates a phenomenon. In addition, we carry out a preregistered secondary analysis in which we look at individual researcher data to find out which heuristics researchers use when assessing the outcomes of multiple studies. We lump each researcher into one of six categories: those who use Bayesian inference (i.e. the normative approach using Bayes' rule incorporating information about statistical power and the significance level), those who use deterministic vote counting (i.e. those who believe the theory is true if the proportion of significant results is higher than 0.5, will believe the theory is false when that proportion is lower than 0.5, and have a 50/50 belief if the proportion is precisely 0.5), those who use proportional vote counting (i.e. those who equate their belief in the theory to the proportion of significant results), those who average their prior belief with the proportion of significant results, those with irrational response patterns, and those whose response patterns are inconsistent with any of the previous categories. Method: Sample: Our sample consisted of 505 participants from social and experimental psychology who commonly conduct (as researchers) or judge (as reviewers or editors) experimental research consisting of multiple studies. Procedure: Our vignette study involved eight different scenarios, each presenting the results of four experiments. All presented scenarios stated that other researchers had previously published the results of one experiment, A, and found a statistically significant effect in line with a given theory. The vignette then stated that the participant had conducted (in the ‘author’ version of the vignette) or were asked to review (in the ‘reviewer’ version of the vignette) four experiments that replicated the findings of the original study. The first new experiment, A’, was a direct replication of the earlier experiment, whereas the other three experiments (B, C, and D) were conceptual replications. All participants were told to imagine that their prior belief in the theory was 50% and that the number of participants, the costs of all experiments, the nominal significance level, and the statistical power in all five experiments (including the original experiment A) were typical for experimental studies in psychology. After each scenario, we presented subjects with several questions, starting with a set of three general questions regarding the theory and following up with a set of questions concerning the participants’ behavior either as author or reviewer. First, participants indicated how they assessed their belief in the theory on the basis of the presented evidence by means of a slider bar, with points going from low probability (0%) to high probability (100%) of the theory being correct. Second, we asked participants to indicate whether they thought that the theory was correct or not based on the outcomes in the scenario (‘yes’ or ‘no’). Third, we asked those in the role of author whether they would submit a paper based on at least one of the experiments to a journal, and those in the role of reviewer whether they would recommend such a paper for publication. Fourth, we asked ‘authors’ whether they would want to conduct an additional conceptual replication, E, given the results of the earlier experiments, and we asked ‘reviewers’ whether they would recommend the authors to conduct an additional conceptual replication, E. Results of the main analysis: We found that participants’ belief in the theory increases with the number of significant results, and that direct replications were considered to be more important than conceptual replications for participant's beliefs in the underlying theory. We found no difference between authors and reviewers in their propensity to submit or recommend to publish sets of results, but we did find that authors are generally more likely to desire an additional experiment. Results of the secondary analysis: The results show that only 6 participants out of the 505 used the normative method of Bayesian inference and that the majority of participants use vote counting approaches that tend to undervalue the evidence for the underlying theory if two or more results are statistically significant. Conclusions and Discussion: The main results of our study are that: - researchers valued direct replications more than conceptual replications when deciding on the validity of a theory, perhaps not surprising in the light of of the current popularity of large-scale direct replication efforts like the Many Labs Replication Project and the Reproducibility Project: Psychology - authors and reviewers contribute equally to publication bias, even though previous research mostly pointed to authors not submitting nonsignificant results as the main cause of publication bias. - researchers make structural errors when assessing scientific papers with multiple outcomes. Most notable, they use simple heuristics to make sense of this complex situation, which often leads them to undervalue the evidence in favor of a theory. Hopefully, this information can be used to create methods to educate current and future researchers to avoid making these errors.

Persistent Identifier

Date of first publication

2019-03-14

Is part of

Open Science 2019, Trier, Germany

Publisher

ZPID (Leibniz Institute for Psychology Information)

Citation

Van Den Akker, O. (2019, March 14). How do researchers interpret the results of multiple experiments? ZPID (Leibniz Institute for Psychology Information). https://doi.org/10.23668/psycharchives.2403
  • Author(s) / Creator(s)
    van den Akker, Olmo
  • PsychArchives acquisition timestamp
    2019-04-03T13:25:19Z
  • Made available on
    2019-04-03T13:25:19Z
  • Date of first publication
    2019-03-14
  • Abstract / Description
    Introduction: In both social and experimental psychology a single study is typically not considered to be sufficient to test a theory, and multiple study papers are the norm. In this project, we consider how researchers assess the validity of a theory when they are presented with the results of multiple studies that all test that theory. More specifically, we consider what researchers’ beliefs in the theory are as a function of the number of significant vs. nonsignificant studies, and whether this relationship depends on the type of study (direct or conceptual replication) and the role of the respondent (researcher or reviewer). This information is especially relevant in the context of the current replication crisis in psychology, which has prompted a discussion on what evidence sufficiently corroborates a phenomenon. In addition, we carry out a preregistered secondary analysis in which we look at individual researcher data to find out which heuristics researchers use when assessing the outcomes of multiple studies. We lump each researcher into one of six categories: those who use Bayesian inference (i.e. the normative approach using Bayes' rule incorporating information about statistical power and the significance level), those who use deterministic vote counting (i.e. those who believe the theory is true if the proportion of significant results is higher than 0.5, will believe the theory is false when that proportion is lower than 0.5, and have a 50/50 belief if the proportion is precisely 0.5), those who use proportional vote counting (i.e. those who equate their belief in the theory to the proportion of significant results), those who average their prior belief with the proportion of significant results, those with irrational response patterns, and those whose response patterns are inconsistent with any of the previous categories. Method: Sample: Our sample consisted of 505 participants from social and experimental psychology who commonly conduct (as researchers) or judge (as reviewers or editors) experimental research consisting of multiple studies. Procedure: Our vignette study involved eight different scenarios, each presenting the results of four experiments. All presented scenarios stated that other researchers had previously published the results of one experiment, A, and found a statistically significant effect in line with a given theory. The vignette then stated that the participant had conducted (in the ‘author’ version of the vignette) or were asked to review (in the ‘reviewer’ version of the vignette) four experiments that replicated the findings of the original study. The first new experiment, A’, was a direct replication of the earlier experiment, whereas the other three experiments (B, C, and D) were conceptual replications. All participants were told to imagine that their prior belief in the theory was 50% and that the number of participants, the costs of all experiments, the nominal significance level, and the statistical power in all five experiments (including the original experiment A) were typical for experimental studies in psychology. After each scenario, we presented subjects with several questions, starting with a set of three general questions regarding the theory and following up with a set of questions concerning the participants’ behavior either as author or reviewer. First, participants indicated how they assessed their belief in the theory on the basis of the presented evidence by means of a slider bar, with points going from low probability (0%) to high probability (100%) of the theory being correct. Second, we asked participants to indicate whether they thought that the theory was correct or not based on the outcomes in the scenario (‘yes’ or ‘no’). Third, we asked those in the role of author whether they would submit a paper based on at least one of the experiments to a journal, and those in the role of reviewer whether they would recommend such a paper for publication. Fourth, we asked ‘authors’ whether they would want to conduct an additional conceptual replication, E, given the results of the earlier experiments, and we asked ‘reviewers’ whether they would recommend the authors to conduct an additional conceptual replication, E. Results of the main analysis: We found that participants’ belief in the theory increases with the number of significant results, and that direct replications were considered to be more important than conceptual replications for participant's beliefs in the underlying theory. We found no difference between authors and reviewers in their propensity to submit or recommend to publish sets of results, but we did find that authors are generally more likely to desire an additional experiment. Results of the secondary analysis: The results show that only 6 participants out of the 505 used the normative method of Bayesian inference and that the majority of participants use vote counting approaches that tend to undervalue the evidence for the underlying theory if two or more results are statistically significant. Conclusions and Discussion: The main results of our study are that: - researchers valued direct replications more than conceptual replications when deciding on the validity of a theory, perhaps not surprising in the light of of the current popularity of large-scale direct replication efforts like the Many Labs Replication Project and the Reproducibility Project: Psychology - authors and reviewers contribute equally to publication bias, even though previous research mostly pointed to authors not submitting nonsignificant results as the main cause of publication bias. - researchers make structural errors when assessing scientific papers with multiple outcomes. Most notable, they use simple heuristics to make sense of this complex situation, which often leads them to undervalue the evidence in favor of a theory. Hopefully, this information can be used to create methods to educate current and future researchers to avoid making these errors.
    en_US
  • Citation
    Van Den Akker, O. (2019, March 14). How do researchers interpret the results of multiple experiments? ZPID (Leibniz Institute for Psychology Information). https://doi.org/10.23668/psycharchives.2403
    en
  • Persistent Identifier
    https://hdl.handle.net/20.500.12034/2035
  • Persistent Identifier
    https://doi.org/10.23668/psycharchives.2403
  • Language of content
    eng
    en_US
  • Publisher
    ZPID (Leibniz Institute for Psychology Information)
    en_US
  • Is part of
    Open Science 2019, Trier, Germany
    en_US
  • Dewey Decimal Classification number(s)
    150
  • Title
    How do researchers interpret the results of multiple experiments?
    en_US
  • DRO type
    conferenceObject
    en_US
  • Visible tag(s)
    ZPID Conferences and Workshops