Supplementary materials to: Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences
Author(s) / Creator(s)
Vijayakumar, Ranjith
Cheung, Mike
Abstract / Description
Machine learning methods have become very popular in diverse fields due to their focus on predictive accuracy, but little work has been conducted on how to assess the replicability of their findings. We introduce and adapt replication methods advocated in psychology to the aims and procedural needs of machine learning research. In Study 1, we illustrate these methods with the use of an empirical data set, assessing the replication success of a predictive accuracy measure, namely, R 2 on the cross-validated and test sets of the samples. We introduce three replication aims. First, tests of inconsistency examine whether single replications have successfully rejected the original study. Rejection will be supported if the 95% confidence interval (CI) of R 2 difference estimates between replication and original does not contain zero. Second, tests of consistency help support claims of successful replication. We can decide apriori on a region of equivalence, where population values of the difference estimates are considered equivalent for substantive reasons. The 90% CI of a different estimate lying fully within this region supports replication. Third, we show how to combine replications to construct meta-analytic intervals for better precision of predictive accuracy measures. In Study 2, R 2 is reduced from the original in a subset of replication studies to examine the ability of the replication procedures to distinguish true replications from nonreplications. We find that when combining studies sampled from same population to form meta-analytic intervals, random-effects methods perform best for cross-validated measures while fixed-effects methods work best for test measures. Among machine learning methods, regression was comparable to many complex methods, while support vector machine performed most reliably across a variety of scenarios. Social scientists who use machine learning to model empirical data can use these methods to enhance the reliability of their findings.
Supplementary materials to: Vijayakumar, R., & Cheung, M. W.-L. (2019). Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences. Social Science Computer Review, 089443931988844. https://doi.org/10.1177/0894439319888445
Keyword(s)
machine learning model comparison predictive accuracy psychological research replicabilityPersistent Identifier
Date of first publication
2019-11-07
Publisher
PsychArchives
Is referenced by
Citation
Vijayakumar, R., & Cheung, M. (2019, September 20). Supplementary materials to: Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences. PsychArchives. https://doi.org/10.23668/psycharchives.2597
-
APPENDIX.pdfAdobe PDF - 470.41KBMD5: 017d1a98df8edaafde90681aa9cc73c1
-
22019-11-07corrected R code in Appendix
-
Author(s) / Creator(s)Vijayakumar, Ranjith
-
Author(s) / Creator(s)Cheung, Mike
-
PsychArchives acquisition timestamp2019-11-07T11:59:10Z
-
Made available on2019-09-20T12:51:32Z
-
Made available on2019-11-07T11:59:10Z
-
Date of first publication2019-11-07
-
Abstract / DescriptionMachine learning methods have become very popular in diverse fields due to their focus on predictive accuracy, but little work has been conducted on how to assess the replicability of their findings. We introduce and adapt replication methods advocated in psychology to the aims and procedural needs of machine learning research. In Study 1, we illustrate these methods with the use of an empirical data set, assessing the replication success of a predictive accuracy measure, namely, R 2 on the cross-validated and test sets of the samples. We introduce three replication aims. First, tests of inconsistency examine whether single replications have successfully rejected the original study. Rejection will be supported if the 95% confidence interval (CI) of R 2 difference estimates between replication and original does not contain zero. Second, tests of consistency help support claims of successful replication. We can decide apriori on a region of equivalence, where population values of the difference estimates are considered equivalent for substantive reasons. The 90% CI of a different estimate lying fully within this region supports replication. Third, we show how to combine replications to construct meta-analytic intervals for better precision of predictive accuracy measures. In Study 2, R 2 is reduced from the original in a subset of replication studies to examine the ability of the replication procedures to distinguish true replications from nonreplications. We find that when combining studies sampled from same population to form meta-analytic intervals, random-effects methods perform best for cross-validated measures while fixed-effects methods work best for test measures. Among machine learning methods, regression was comparable to many complex methods, while support vector machine performed most reliably across a variety of scenarios. Social scientists who use machine learning to model empirical data can use these methods to enhance the reliability of their findings.en_US
-
Abstract / DescriptionSupplementary materials to: Vijayakumar, R., & Cheung, M. W.-L. (2019). Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences. Social Science Computer Review, 089443931988844. https://doi.org/10.1177/0894439319888445en_US
-
Publication statusacceptedVersion
-
Review statusnotReviewed
-
CitationVijayakumar, R., & Cheung, M. (2019, September 20). Supplementary materials to: Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences. PsychArchives. https://doi.org/10.23668/psycharchives.2597en_US
-
Persistent Identifierhttps://hdl.handle.net/20.500.12034/2220.2
-
Persistent Identifierhttps://doi.org/10.23668/psycharchives.2637
-
Language of contentengen_US
-
PublisherPsychArchivesen_US
-
Is referenced byhttps://doi.org/10.1177/0894439319888445
-
Is related tohttps://doi.org/10.1177/0894439319888445
-
Keyword(s)machine learningen_US
-
Keyword(s)model comparisonen
-
Keyword(s)predictive accuracyen
-
Keyword(s)psychological researchen
-
Keyword(s)replicabilityen
-
Dewey Decimal Classification number(s)150
-
TitleSupplementary materials to: Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciencesen_US
-
DRO typeotheren_US