Supplementary materials to: Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences

Vijayakumar, Ranjith; Cheung, Mike

Other

Supplementary materials to: Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences

Author(s) / Creator(s)

Vijayakumar, Ranjith

Cheung, Mike

Abstract / Description

Machine learning methods have become very popular in diverse fields due to their focus on predictive accuracy, but little work has been conducted on how to assess the replicability of their findings. We introduce and adapt replication methods advocated in psychology to the aims and procedural needs of machine learning research. In Study 1, we illustrate these methods with the use of an empirical data set, assessing the replication success of a predictive accuracy measure, namely, R 2 on the cross-validated and test sets of the samples. We introduce three replication aims. First, tests of inconsistency examine whether single replications have successfully rejected the original study. Rejection will be supported if the 95% confidence interval (CI) of R 2 difference estimates between replication and original does not contain zero. Second, tests of consistency help support claims of successful replication. We can decide apriori on a region of equivalence, where population values of the difference estimates are considered equivalent for substantive reasons. The 90% CI of a different estimate lying fully within this region supports replication. Third, we show how to combine replications to construct meta-analytic intervals for better precision of predictive accuracy measures. In Study 2, R 2 is reduced from the original in a subset of replication studies to examine the ability of the replication procedures to distinguish true replications from nonreplications. We find that when combining studies sampled from same population to form meta-analytic intervals, random-effects methods perform best for cross-validated measures while fixed-effects methods work best for test measures. Among machine learning methods, regression was comparable to many complex methods, while support vector machine performed most reliably across a variety of scenarios. Social scientists who use machine learning to model empirical data can use these methods to enhance the reliability of their findings.

Supplementary materials to: Vijayakumar, R., & Cheung, M. W.-L. (2019). Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences. Social Science Computer Review, 089443931988844. https://doi.org/10.1177/0894439319888445

Keyword(s)

machine learning model comparison predictive accuracy psychological research replicability

Persistent Identifier

https://doi.org/10.23668/psycharchives.2637

Date of first publication

2019-11-07

Publisher

PsychArchives

Is referenced by

https://doi.org/10.1177/0894439319888445

Citation

Vijayakumar, R., & Cheung, M. (2019, September 20). Supplementary materials to: Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences. PsychArchives. https://doi.org/10.23668/psycharchives.2597

APPENDIX.pdf

Adobe PDF - 470.41KB

MD5: 017d1a98df8edaafde90681aa9cc73c1

Sharing Level 0 (Public Use) CC-BY-SA 4.0

Download

2

2019-11-07

corrected R code in Appendix
1

2019-09-20

View object

Author(s) / Creator(s)

Vijayakumar, Ranjith
Author(s) / Creator(s)

Cheung, Mike
PsychArchives acquisition timestamp

2019-11-07T11:59:10Z
Made available on

2019-09-20T12:51:32Z
Made available on

2019-11-07T11:59:10Z
Date of first publication

2019-11-07
Abstract / Description

Machine learning methods have become very popular in diverse fields due to their focus on predictive accuracy, but little work has been conducted on how to assess the replicability of their findings. We introduce and adapt replication methods advocated in psychology to the aims and procedural needs of machine learning research. In Study 1, we illustrate these methods with the use of an empirical data set, assessing the replication success of a predictive accuracy measure, namely, R 2 on the cross-validated and test sets of the samples. We introduce three replication aims. First, tests of inconsistency examine whether single replications have successfully rejected the original study. Rejection will be supported if the 95% confidence interval (CI) of R 2 difference estimates between replication and original does not contain zero. Second, tests of consistency help support claims of successful replication. We can decide apriori on a region of equivalence, where population values of the difference estimates are considered equivalent for substantive reasons. The 90% CI of a different estimate lying fully within this region supports replication. Third, we show how to combine replications to construct meta-analytic intervals for better precision of predictive accuracy measures. In Study 2, R 2 is reduced from the original in a subset of replication studies to examine the ability of the replication procedures to distinguish true replications from nonreplications. We find that when combining studies sampled from same population to form meta-analytic intervals, random-effects methods perform best for cross-validated measures while fixed-effects methods work best for test measures. Among machine learning methods, regression was comparable to many complex methods, while support vector machine performed most reliably across a variety of scenarios. Social scientists who use machine learning to model empirical data can use these methods to enhance the reliability of their findings.

en_US
Abstract / Description

Supplementary materials to: Vijayakumar, R., & Cheung, M. W.-L. (2019). Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences. Social Science Computer Review, 089443931988844. https://doi.org/10.1177/0894439319888445

en_US
Publication status

acceptedVersion
Review status

notReviewed
Citation

Vijayakumar, R., & Cheung, M. (2019, September 20). Supplementary materials to: Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences. PsychArchives. https://doi.org/10.23668/psycharchives.2597

en_US
Persistent Identifier

https://hdl.handle.net/20.500.12034/2220.2
Persistent Identifier

https://doi.org/10.23668/psycharchives.2637
Language of content

eng

en_US
Publisher

PsychArchives

en_US
Is referenced by

https://doi.org/10.1177/0894439319888445
Is related to

https://doi.org/10.1177/0894439319888445
Keyword(s)

machine learning

en_US
Keyword(s)

model comparison

en
Keyword(s)

predictive accuracy

en
Keyword(s)

psychological research

en
Keyword(s)

replicability

en
Dewey Decimal Classification number(s)

150
Title

Supplementary materials to: Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences

en_US
DRO type

other

en_US