Other

Supplementary materials to: Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences

Author(s) / Creator(s)

Vijayakumar, Ranjith
Cheung, Mike

Abstract / Description

Machine learning methods have become very popular in diverse fields due to their focus on predictive accuracy, but little work has been conducted on how to assess the replicability of their findings. We introduce and adapt replication methods advocated in psychology to the aims and procedural needs of machine learning research. In Study 1, we illustrate these methods with the use of an empirical data set, assessing the replication success of a predictive accuracy measure, namely, R 2 on the cross-validated and test sets of the samples. We introduce three replication aims. First, tests of inconsistency examine whether single replications have successfully rejected the original study. Rejection will be supported if the 95% confidence interval (CI) of R 2 difference estimates between replication and original does not contain zero. Second, tests of consistency help support claims of successful replication. We can decide apriori on a region of equivalence, where population values of the difference estimates are considered equivalent for substantive reasons. The 90% CI of a different estimate lying fully within this region supports replication. Third, we show how to combine replications to construct meta-analytic intervals for better precision of predictive accuracy measures. In Study 2, R 2 is reduced from the original in a subset of replication studies to examine the ability of the replication procedures to distinguish true replications from nonreplications. We find that when combining studies sampled from same population to form meta-analytic intervals, random-effects methods perform best for cross-validated measures while fixed-effects methods work best for test measures. Among machine learning methods, regression was comparable to many complex methods, while support vector machine performed most reliably across a variety of scenarios. Social scientists who use machine learning to model empirical data can use these methods to enhance the reliability of their findings.
Supplementary materials to: Vijayakumar, R., & Cheung, M. W.-L. (2019). Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences. Social Science Computer Review, 089443931988844. https://doi.org/10.1177/0894439319888445

Keyword(s)

machine learning model comparison predictive accuracy psychological research replicability

Persistent Identifier

Date of first publication

2019-11-07

Publisher

PsychArchives

Is referenced by

Citation

Vijayakumar, R., & Cheung, M. (2019, September 20). Supplementary materials to: Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences. PsychArchives. https://doi.org/10.23668/psycharchives.2597
  • 2
    2019-11-07
    corrected R code in Appendix
  • 1
    2019-09-20
  • Author(s) / Creator(s)
    Vijayakumar, Ranjith
  • Author(s) / Creator(s)
    Cheung, Mike
  • PsychArchives acquisition timestamp
    2019-11-07T11:59:10Z
  • Made available on
    2019-09-20T12:51:32Z
  • Made available on
    2019-11-07T11:59:10Z
  • Date of first publication
    2019-11-07
  • Abstract / Description
    Machine learning methods have become very popular in diverse fields due to their focus on predictive accuracy, but little work has been conducted on how to assess the replicability of their findings. We introduce and adapt replication methods advocated in psychology to the aims and procedural needs of machine learning research. In Study 1, we illustrate these methods with the use of an empirical data set, assessing the replication success of a predictive accuracy measure, namely, R 2 on the cross-validated and test sets of the samples. We introduce three replication aims. First, tests of inconsistency examine whether single replications have successfully rejected the original study. Rejection will be supported if the 95% confidence interval (CI) of R 2 difference estimates between replication and original does not contain zero. Second, tests of consistency help support claims of successful replication. We can decide apriori on a region of equivalence, where population values of the difference estimates are considered equivalent for substantive reasons. The 90% CI of a different estimate lying fully within this region supports replication. Third, we show how to combine replications to construct meta-analytic intervals for better precision of predictive accuracy measures. In Study 2, R 2 is reduced from the original in a subset of replication studies to examine the ability of the replication procedures to distinguish true replications from nonreplications. We find that when combining studies sampled from same population to form meta-analytic intervals, random-effects methods perform best for cross-validated measures while fixed-effects methods work best for test measures. Among machine learning methods, regression was comparable to many complex methods, while support vector machine performed most reliably across a variety of scenarios. Social scientists who use machine learning to model empirical data can use these methods to enhance the reliability of their findings.
    en_US
  • Abstract / Description
    Supplementary materials to: Vijayakumar, R., & Cheung, M. W.-L. (2019). Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences. Social Science Computer Review, 089443931988844. https://doi.org/10.1177/0894439319888445
    en_US
  • Publication status
    acceptedVersion
  • Review status
    notReviewed
  • Citation
    Vijayakumar, R., & Cheung, M. (2019, September 20). Supplementary materials to: Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences. PsychArchives. https://doi.org/10.23668/psycharchives.2597
    en_US
  • Persistent Identifier
    https://hdl.handle.net/20.500.12034/2220.2
  • Persistent Identifier
    https://doi.org/10.23668/psycharchives.2637
  • Language of content
    eng
    en_US
  • Publisher
    PsychArchives
    en_US
  • Is referenced by
    https://doi.org/10.1177/0894439319888445
  • Is related to
    https://doi.org/10.1177/0894439319888445
  • Keyword(s)
    machine learning
    en_US
  • Keyword(s)
    model comparison
    en
  • Keyword(s)
    predictive accuracy
    en
  • Keyword(s)
    psychological research
    en
  • Keyword(s)
    replicability
    en
  • Dewey Decimal Classification number(s)
    150
  • Title
    Supplementary materials to: Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences
    en_US
  • DRO type
    other
    en_US