REGEMA: Guidelines for Conducting and Reporting Reliability Generalization Meta-analyses

Sánchez-Meca, Julio; López-Pina, José A.; Rubio-Aparicio, María; Marín-Martínez, Fulgencio; Núñez-Núñez, Rosa Mª; López-García, Juan J.; López-López, José A.

Conference Object

REGEMA: Guidelines for Conducting and Reporting Reliability Generalization Meta-analyses

Author(s) / Creator(s)

Sánchez-Meca, Julio

López-Pina, José A.

Rubio-Aparicio, María

Marín-Martínez, Fulgencio

Núñez-Núñez, Rosa Mª

López-García, Juan J.

López-López, José A.

Abstract / Description

Background: Reliability is one of the most important properties to assess psychometric quality of psychological measurement instruments. There is a mistaken idea, very extended among researchers, that reliability is an immutable property of a measurement instrument. However, reliability is not a property inherent to the test, but of the scores obtained when the test is applied to a given sample of participants in specific conditions (Gronlund & Linn, 1990). Inducing reliability from previous applications of a test is a phenomenon very extended among researchers that is appropriate only if the previous and the current study have samples of participants similar in composition and variability (Vacha-Haase et al., 2000). As it is very infrequent that studies use similar samples, then reliability induction becomes a malpractice that must be dismissed from research. Fortunately, not all of the primary studies induce reliability from previous studies, but they report reliability coefficients with their own sample. If reliability varies from an application of a test to the next, then meta-analysis becomes a very useful methodology to statistically integrate the reliability estimates. With this purpose, Vacha-Haase (1998) coined the term ‘reliability generalization’ (RG) to refer to this kind of meta-analysis. An RG meta-analysis aims to investigate how measurement error of a test scores varies among different contexts, samples, and target populations. In particular, an RG meta-analysis enables: (a) to estimate the average reliability of a test scores, (b) to assess whether reliability coefficients are heterogeneous and, (c) in case of heterogeneity, to find characteristics of the studies that can explain, at least, part of the variability of the reliability coefficients (Henson & Thompson, 2002; Sánchez-Meca et al., 2013; Vacha-Haase et al., 2002). From its inception in 1998, more than 120 RG meta-analyses have been published in psychology. This kind of meta-analysis presents distinctive characteristics that make it different in some aspects from typical meta-analyses to integrate effect sizes. In an RG meta-analysis the ‘effect size’ are the reliability coefficients reported in the primary studies. This circumstance makes that the typical guidelines proposed in the meta-analytic arena for reporting meta-analyses does not adapt well to RG meta-analyses. Such guidelines as PRISMA (Moher et al., 2009), MARS (APA Publications and Communications Board Working Group on Journal Article Reporting Standards, 2008), AMSTAR-2 (Shea et al., 2017), MOOSE (Stroup et al., 2000), or the recent recommendations of the American Psychological Association (Appelbaum et al., 2018) include items that are not applicable to RG meta-analyses, and do not contain important items to be considered in RG meta-analyses. Objectives: Up our knowledge, there have not been proposed specific guidelines for conducting and reporting RG meta-analyses that take into account their special features. Therefore, the purpose of this investigation was to elaborate a checklist specifically devised to help meta-analysts to conduct and report RG meta-analyses. The name for this checklist is REGEMA (REliability GEneralization Meta-Analysis). Method: A first step consisted in a sound review of the items and criteria included in the most usually applied guidelines for systematic reviews and meta-analyses proposed in the meta-analytic literature: PRISMA, MARS, AMSTAR-2, and MOOSE. Based on this review, a second step consisted in elaborating a set of items or criteria that might be useful for REGEMA checklist. With this purpose, brainstorming meetings were held among the members of the Meta-analysis Unit team (University of Murcia) to obtain a first version of REGEMA checklist. Once elaborated a tentative REGEMA checklist, the third step consisted in sending the list to 30 researchers experts in meta-analysis. The criteria for selecting the researchers were: (a) to have large expertise in the methodology of meta-analysis, and/or (b) to have published several RG meta-analyses in psychology. Once received the comments, suggestions, and criticisms of the experts, the final step consisted in elaborating the definitive REGEMA checklist. Results: The revision of PRISMA, MARS, AMSTAR-2, and MOOSE guidelines confirmed that none of them adapted well to be applied to RG meta-analyses. Once revised the items and criteria included in these guidelines, our research team carried out more than 20 brainstorming meetings to elaborate a first version of REGEMA checklist composed by 30 items. The tentative REGEMA checklist was electronically sent to 30 researchers with expertise in meta-analysis in order to obtain feedback on the adequacy of the checklist. Out of them, 12 experts answered and their interesting and useful comments and suggestions were added to the checklist. Finally, the REGEMA checklist was composed by 29 items structured shown in Table 1: one item for the Title, one for the Abstract, two for the Introduction, 14 for the Method, six for the Results, four for the Discussion, and one for Funding. Table 1. REGEMA checklist. Cluster Item Title/Abstract 1. Title 2. Abstract Introduction 3. Background 4. Objectives Method 5. Selection criteria 6. Search strategies 7. Data extraction 8. Reported reliability 9. Type of reliability induction 10. Data extraction of inducing studies 11. Reliability of data extraction 12. Transformation method 13. Statistical model 14. Weighting method 15. Heterogeneity assessment 16. Moderator analyses 17. Additional analyses 18. Software Results 19. Results of the study selection process 20. Mean reliability and heterogeneity 21. Moderator analyses 22. Sensitivity analyses 23. Comparison of inducing and reporting studies 24. Data set Discussion 25. Summary of results 26. Limitations 27. Implications for practice 28. Implications for future research Funding 29. Funding. Conclusions: In order to bridging a gap in the meta-analytic literature, we have elaborated the REGEMA checklist, a list of guidelines for conducting and reporting RG meta-analyses that is adapted to the special characteristics of this kind of meta-analysis. Based on the experience of Meta-analysis Unit’s research team carrying out meta-analyses for more than 30 years, the REGEMA checklist have good construct validity. Future research must to assess its inter-coder reliability by applying it to RG meta-analyses already published. REGEMA checklist can be useful for meta-analysts interested in conducting RG meta-analysis, as well as for readers of these meta-analyses and even for editors of journals that may use it to assess the reporting quality of RG meta-analyses sent to publish. References: APA Publications and Communications Board Working Group on Journal Article Reporting Standards (2008). Reporting standards for research in psychology: Why do we need them? What might they be? American Psychologist, 63, 839-851. Appelbaum, M., Cooper, H., Kline, R.B., Mayo-Wilson, E., Nezu, A.M., & Rao, S.M. (2018). Journal article reporting standards for quantitative research in psychology: The APA Publications and Communications Board Task Force report. American Psychologist, 73, 3-25. Gronlund, N.E. y Linn, R.L. (1990). Measurement and evaluation in teaching (6ª ed.). Nueva York: Macmillan. Henson, R.K. y Thompson, B. (2002). Characterizing measurement error in scores across studies: Some recommendations for conducting “reliability generalization” studies. Measurement and Evaluation in Counseling and Development, 35, 113-126. Moher, D., Liberati, A., Tetzlaff, J., Altman, D.G., The PRISMA Group (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Journal of Clinical Epidemiology, 62, 1006-1012. Sánchez-Meca, J., López-López, J.A. y López-Pina, J.A. (2013). Some recommended statistical analytic practices when reliability generalization studies are conducted. British Journal of Mathematical and Statistical Psychology, 66, 402-425. Shea, B.J., Reeves, B.C., Wells, G., Thuku, M., Hamel, C., Moran, J., …, & Henry, D.A. (2017). AMSTAR 2: A critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. British Medical Journal, 358:j4008. http://dx.doi.org/10.1136/bmj.j4008. Stroup, D.F., Berlin, J.A., Morton, S.C., Olkin, I., Williamson, G.D., et al. (2000). Journal of the American Medical Association, 283, 2008-2012. Vacha-Haase, T. (1998). Reliability generalization: Exploring variance in measurement error affecting score reliability across studies. Educational and Psychological Measurement, 58, 6-20. Vacha-Haase, T., Henson, R.K. y Caruso, J.C. (2002). Reliability generalization: Moving toward improved understanding and use of score reliability. Educational and Psychological Measurement, 62, 562-569. Vacha-Haase, T., Kogan, L.R. y Thompson, B. (2000). Sample compositions and variabilities in published studies versus those in test manuals. Educational and Psychological Measurement, 60, 509-522.

Persistent Identifier

https://doi.org/10.23668/psycharchives.2476

Date of first publication

2019-05-30

Is part of

Research Synthesis 2019 incl. Pre-Conference Symposium Big Data in Psychology, Dubrovnik, Croatia

Publisher

ZPID (Leibniz Institute for Psychology Information)

Citation

Sánchez-Meca, J., López-Pina, J. A., Rubio-Aparicio, M., Marín-Martínez, F., Núñez-Núñez, R. M., López-García, J. J., & López-López, J. A. (2019, May 30). REGEMA: Guidelines for Conducting and Reporting Reliability Generalization Meta-analyses. ZPID (Leibniz Institute for Psychology Information). https://doi.org/10.23668/psycharchives.2476

1_RS 2019 REGEMA.pdf

Adobe PDF - 3.78MB

MD5: 2aea6ffa02b4cbd56639def56bb61961

Sharing Level 0 (Public Use) CC-BY-SA 4.0

Download

Description: Conference Talk

There are no other versions of this object.

Author(s) / Creator(s)

Sánchez-Meca, Julio
Author(s) / Creator(s)

López-Pina, José A.
Author(s) / Creator(s)

Rubio-Aparicio, María
Author(s) / Creator(s)

Marín-Martínez, Fulgencio
Author(s) / Creator(s)

Núñez-Núñez, Rosa Mª
Author(s) / Creator(s)

López-García, Juan J.
Author(s) / Creator(s)

López-López, José A.
PsychArchives acquisition timestamp

2019-06-11T14:38:28Z
Made available on

2019-06-11T14:38:28Z
Date of first publication

2019-05-30
Abstract / Description

Background: Reliability is one of the most important properties to assess psychometric quality of psychological measurement instruments. There is a mistaken idea, very extended among researchers, that reliability is an immutable property of a measurement instrument. However, reliability is not a property inherent to the test, but of the scores obtained when the test is applied to a given sample of participants in specific conditions (Gronlund & Linn, 1990). Inducing reliability from previous applications of a test is a phenomenon very extended among researchers that is appropriate only if the previous and the current study have samples of participants similar in composition and variability (Vacha-Haase et al., 2000). As it is very infrequent that studies use similar samples, then reliability induction becomes a malpractice that must be dismissed from research. Fortunately, not all of the primary studies induce reliability from previous studies, but they report reliability coefficients with their own sample. If reliability varies from an application of a test to the next, then meta-analysis becomes a very useful methodology to statistically integrate the reliability estimates. With this purpose, Vacha-Haase (1998) coined the term ‘reliability generalization’ (RG) to refer to this kind of meta-analysis. An RG meta-analysis aims to investigate how measurement error of a test scores varies among different contexts, samples, and target populations. In particular, an RG meta-analysis enables: (a) to estimate the average reliability of a test scores, (b) to assess whether reliability coefficients are heterogeneous and, (c) in case of heterogeneity, to find characteristics of the studies that can explain, at least, part of the variability of the reliability coefficients (Henson & Thompson, 2002; Sánchez-Meca et al., 2013; Vacha-Haase et al., 2002). From its inception in 1998, more than 120 RG meta-analyses have been published in psychology. This kind of meta-analysis presents distinctive characteristics that make it different in some aspects from typical meta-analyses to integrate effect sizes. In an RG meta-analysis the ‘effect size’ are the reliability coefficients reported in the primary studies. This circumstance makes that the typical guidelines proposed in the meta-analytic arena for reporting meta-analyses does not adapt well to RG meta-analyses. Such guidelines as PRISMA (Moher et al., 2009), MARS (APA Publications and Communications Board Working Group on Journal Article Reporting Standards, 2008), AMSTAR-2 (Shea et al., 2017), MOOSE (Stroup et al., 2000), or the recent recommendations of the American Psychological Association (Appelbaum et al., 2018) include items that are not applicable to RG meta-analyses, and do not contain important items to be considered in RG meta-analyses. Objectives: Up our knowledge, there have not been proposed specific guidelines for conducting and reporting RG meta-analyses that take into account their special features. Therefore, the purpose of this investigation was to elaborate a checklist specifically devised to help meta-analysts to conduct and report RG meta-analyses. The name for this checklist is REGEMA (REliability GEneralization Meta-Analysis). Method: A first step consisted in a sound review of the items and criteria included in the most usually applied guidelines for systematic reviews and meta-analyses proposed in the meta-analytic literature: PRISMA, MARS, AMSTAR-2, and MOOSE. Based on this review, a second step consisted in elaborating a set of items or criteria that might be useful for REGEMA checklist. With this purpose, brainstorming meetings were held among the members of the Meta-analysis Unit team (University of Murcia) to obtain a first version of REGEMA checklist. Once elaborated a tentative REGEMA checklist, the third step consisted in sending the list to 30 researchers experts in meta-analysis. The criteria for selecting the researchers were: (a) to have large expertise in the methodology of meta-analysis, and/or (b) to have published several RG meta-analyses in psychology. Once received the comments, suggestions, and criticisms of the experts, the final step consisted in elaborating the definitive REGEMA checklist. Results: The revision of PRISMA, MARS, AMSTAR-2, and MOOSE guidelines confirmed that none of them adapted well to be applied to RG meta-analyses. Once revised the items and criteria included in these guidelines, our research team carried out more than 20 brainstorming meetings to elaborate a first version of REGEMA checklist composed by 30 items. The tentative REGEMA checklist was electronically sent to 30 researchers with expertise in meta-analysis in order to obtain feedback on the adequacy of the checklist. Out of them, 12 experts answered and their interesting and useful comments and suggestions were added to the checklist. Finally, the REGEMA checklist was composed by 29 items structured shown in Table 1: one item for the Title, one for the Abstract, two for the Introduction, 14 for the Method, six for the Results, four for the Discussion, and one for Funding. Table 1. REGEMA checklist. Cluster Item Title/Abstract 1. Title 2. Abstract Introduction 3. Background 4. Objectives Method 5. Selection criteria 6. Search strategies 7. Data extraction 8. Reported reliability 9. Type of reliability induction 10. Data extraction of inducing studies 11. Reliability of data extraction 12. Transformation method 13. Statistical model 14. Weighting method 15. Heterogeneity assessment 16. Moderator analyses 17. Additional analyses 18. Software Results 19. Results of the study selection process 20. Mean reliability and heterogeneity 21. Moderator analyses 22. Sensitivity analyses 23. Comparison of inducing and reporting studies 24. Data set Discussion 25. Summary of results 26. Limitations 27. Implications for practice 28. Implications for future research Funding 29. Funding. Conclusions: In order to bridging a gap in the meta-analytic literature, we have elaborated the REGEMA checklist, a list of guidelines for conducting and reporting RG meta-analyses that is adapted to the special characteristics of this kind of meta-analysis. Based on the experience of Meta-analysis Unit’s research team carrying out meta-analyses for more than 30 years, the REGEMA checklist have good construct validity. Future research must to assess its inter-coder reliability by applying it to RG meta-analyses already published. REGEMA checklist can be useful for meta-analysts interested in conducting RG meta-analysis, as well as for readers of these meta-analyses and even for editors of journals that may use it to assess the reporting quality of RG meta-analyses sent to publish. References: APA Publications and Communications Board Working Group on Journal Article Reporting Standards (2008). Reporting standards for research in psychology: Why do we need them? What might they be? American Psychologist, 63, 839-851. Appelbaum, M., Cooper, H., Kline, R.B., Mayo-Wilson, E., Nezu, A.M., & Rao, S.M. (2018). Journal article reporting standards for quantitative research in psychology: The APA Publications and Communications Board Task Force report. American Psychologist, 73, 3-25. Gronlund, N.E. y Linn, R.L. (1990). Measurement and evaluation in teaching (6ª ed.). Nueva York: Macmillan. Henson, R.K. y Thompson, B. (2002). Characterizing measurement error in scores across studies: Some recommendations for conducting “reliability generalization” studies. Measurement and Evaluation in Counseling and Development, 35, 113-126. Moher, D., Liberati, A., Tetzlaff, J., Altman, D.G., The PRISMA Group (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Journal of Clinical Epidemiology, 62, 1006-1012. Sánchez-Meca, J., López-López, J.A. y López-Pina, J.A. (2013). Some recommended statistical analytic practices when reliability generalization studies are conducted. British Journal of Mathematical and Statistical Psychology, 66, 402-425. Shea, B.J., Reeves, B.C., Wells, G., Thuku, M., Hamel, C., Moran, J., …, & Henry, D.A. (2017). AMSTAR 2: A critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. British Medical Journal, 358:j4008. http://dx.doi.org/10.1136/bmj.j4008. Stroup, D.F., Berlin, J.A., Morton, S.C., Olkin, I., Williamson, G.D., et al. (2000). Journal of the American Medical Association, 283, 2008-2012. Vacha-Haase, T. (1998). Reliability generalization: Exploring variance in measurement error affecting score reliability across studies. Educational and Psychological Measurement, 58, 6-20. Vacha-Haase, T., Henson, R.K. y Caruso, J.C. (2002). Reliability generalization: Moving toward improved understanding and use of score reliability. Educational and Psychological Measurement, 62, 562-569. Vacha-Haase, T., Kogan, L.R. y Thompson, B. (2000). Sample compositions and variabilities in published studies versus those in test manuals. Educational and Psychological Measurement, 60, 509-522.

en_US
Citation

Sánchez-Meca, J., López-Pina, J. A., Rubio-Aparicio, M., Marín-Martínez, F., Núñez-Núñez, R. M., López-García, J. J., & López-López, J. A. (2019, May 30). REGEMA: Guidelines for Conducting and Reporting Reliability Generalization Meta-analyses. ZPID (Leibniz Institute for Psychology Information). https://doi.org/10.23668/psycharchives.2476

en
Persistent Identifier

https://hdl.handle.net/20.500.12034/2102
Persistent Identifier

https://doi.org/10.23668/psycharchives.2476
Language of content

eng

en_US
Publisher

ZPID (Leibniz Institute for Psychology Information)

en_US
Is part of

Research Synthesis 2019 incl. Pre-Conference Symposium Big Data in Psychology, Dubrovnik, Croatia

en_US
Dewey Decimal Classification number(s)

150
Title

REGEMA: Guidelines for Conducting and Reporting Reliability Generalization Meta-analyses

en_US
DRO type

conferenceObject

en_US
Visible tag(s)

ZPID Conferences and Workshops