|dc.description.abstract||Background: Reliability is one of the most important properties to assess psychometric quality of psychological measurement instruments. There is a mistaken idea, very extended among researchers, that reliability is an immutable property of a measurement instrument. However, reliability is not a property inherent to the test, but of the scores obtained when the test is applied to a given sample of participants in specific conditions (Gronlund & Linn, 1990). Inducing reliability from previous applications of a test is a phenomenon very extended among researchers that is appropriate only if the previous and the current study have samples of participants similar in composition and variability (Vacha-Haase et al., 2000). As it is very infrequent that studies use similar samples, then reliability induction becomes a malpractice that must be dismissed from research. Fortunately, not all of the primary studies induce reliability from previous studies, but they report reliability coefficients with their own sample. If reliability varies from an application of a test to the next, then meta-analysis becomes a very useful methodology to statistically integrate the reliability estimates. With this purpose, Vacha-Haase (1998) coined the term ‘reliability generalization’ (RG) to refer to this kind of meta-analysis. An RG meta-analysis aims to investigate how measurement error of a test scores varies among different contexts, samples, and target populations. In particular, an RG meta-analysis enables: (a) to estimate the average reliability of a test scores, (b) to assess whether reliability coefficients are heterogeneous and, (c) in case of heterogeneity, to find characteristics of the studies that can explain, at least, part of the variability of the reliability coefficients (Henson & Thompson, 2002; Sánchez-Meca et al., 2013; Vacha-Haase et al., 2002). From its inception in 1998, more than 120 RG meta-analyses have been published in psychology. This kind of meta-analysis presents distinctive characteristics that make it different in some aspects from typical meta-analyses to integrate effect sizes. In an RG meta-analysis the ‘effect size’ are the reliability coefficients reported in the primary studies. This circumstance makes that the typical guidelines proposed in the meta-analytic arena for reporting meta-analyses does not adapt well to RG meta-analyses. Such guidelines as PRISMA (Moher et al., 2009), MARS (APA Publications and Communications Board Working Group on Journal Article Reporting Standards, 2008), AMSTAR-2 (Shea et al., 2017), MOOSE (Stroup et al., 2000), or the recent recommendations of the American Psychological Association (Appelbaum et al., 2018) include items that are not applicable to RG meta-analyses, and do not contain important items to be considered in RG meta-analyses. Objectives: Up our knowledge, there have not been proposed specific guidelines for conducting and reporting RG meta-analyses that take into account their special features. Therefore, the purpose of this investigation was to elaborate a checklist specifically devised to help meta-analysts to conduct and report RG meta-analyses. The name for this checklist is REGEMA (REliability GEneralization Meta-Analysis). Method: A first step consisted in a sound review of the items and criteria included in the most usually applied guidelines for systematic reviews and meta-analyses proposed in the meta-analytic literature: PRISMA, MARS, AMSTAR-2, and MOOSE. Based on this review, a second step consisted in elaborating a set of items or criteria that might be useful for REGEMA checklist. With this purpose, brainstorming meetings were held among the members of the Meta-analysis Unit team (University of Murcia) to obtain a first version of REGEMA checklist. Once elaborated a tentative REGEMA checklist, the third step consisted in sending the list to 30 researchers experts in meta-analysis. The criteria for selecting the researchers were: (a) to have large expertise in the methodology of meta-analysis, and/or (b) to have published several RG meta-analyses in psychology. Once received the comments, suggestions, and criticisms of the experts, the final step consisted in elaborating the definitive REGEMA checklist. Results: The revision of PRISMA, MARS, AMSTAR-2, and MOOSE guidelines confirmed that none of them adapted well to be applied to RG meta-analyses. Once revised the items and criteria included in these guidelines, our research team carried out more than 20 brainstorming meetings to elaborate a first version of REGEMA checklist composed by 30 items. The tentative REGEMA checklist was electronically sent to 30 researchers with expertise in meta-analysis in order to obtain feedback on the adequacy of the checklist. Out of them, 12 experts answered and their interesting and useful comments and suggestions were added to the checklist. Finally, the REGEMA checklist was composed by 29 items structured shown in Table 1: one item for the Title, one for the Abstract, two for the Introduction, 14 for the Method, six for the Results, four for the Discussion, and one for Funding. Table 1. REGEMA checklist. Cluster
5. Selection criteria
6. Search strategies
7. Data extraction
8. Reported reliability
9. Type of reliability induction
10. Data extraction of inducing studies
11. Reliability of data extraction
12. Transformation method
13. Statistical model
14. Weighting method
15. Heterogeneity assessment
16. Moderator analyses
17. Additional analyses
19. Results of the study selection process
20. Mean reliability and heterogeneity
21. Moderator analyses
22. Sensitivity analyses
23. Comparison of inducing and reporting studies
24. Data set
25. Summary of results
27. Implications for practice
28. Implications for future research
29. Funding. Conclusions: In order to bridging a gap in the meta-analytic literature, we have elaborated the REGEMA checklist, a list of guidelines for conducting and reporting RG meta-analyses that is adapted to the special characteristics of this kind of meta-analysis. Based on the experience of Meta-analysis Unit’s research team carrying out meta-analyses for more than 30 years, the REGEMA checklist have good construct validity. Future research must to assess its inter-coder reliability by applying it to RG meta-analyses already published. REGEMA checklist can be useful for meta-analysts interested in conducting RG meta-analysis, as well as for readers of these meta-analyses and even for editors of journals that may use it to assess the reporting quality of RG meta-analyses sent to publish. References: APA Publications and Communications Board Working Group on Journal Article Reporting Standards (2008). Reporting standards for research in psychology: Why do we need them? What might they be? American Psychologist, 63, 839-851.
Appelbaum, M., Cooper, H., Kline, R.B., Mayo-Wilson, E., Nezu, A.M., & Rao, S.M. (2018). Journal article reporting standards for quantitative research in psychology: The APA Publications and Communications Board Task Force report. American Psychologist, 73, 3-25.
Gronlund, N.E. y Linn, R.L. (1990). Measurement and evaluation in teaching (6ª ed.). Nueva York: Macmillan.
Henson, R.K. y Thompson, B. (2002). Characterizing measurement error in scores across studies: Some recommendations for conducting “reliability generalization” studies. Measurement and Evaluation in Counseling and Development, 35, 113-126.
Moher, D., Liberati, A., Tetzlaff, J., Altman, D.G., The PRISMA Group (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Journal of Clinical Epidemiology, 62, 1006-1012.
Sánchez-Meca, J., López-López, J.A. y López-Pina, J.A. (2013). Some recommended statistical analytic practices when reliability generalization studies are conducted. British Journal of Mathematical and Statistical Psychology, 66, 402-425.
Shea, B.J., Reeves, B.C., Wells, G., Thuku, M., Hamel, C., Moran, J., …, & Henry, D.A. (2017). AMSTAR 2: A critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. British Medical Journal, 358:j4008. http://dx.doi.org/10.1136/bmj.j4008.
Stroup, D.F., Berlin, J.A., Morton, S.C., Olkin, I., Williamson, G.D., et al. (2000). Journal of the American Medical Association, 283, 2008-2012.
Vacha-Haase, T. (1998). Reliability generalization: Exploring variance in measurement error affecting score reliability across studies. Educational and Psychological Measurement, 58, 6-20.
Vacha-Haase, T., Henson, R.K. y Caruso, J.C. (2002). Reliability generalization: Moving toward improved understanding and use of score reliability. Educational and Psychological Measurement, 62, 562-569.
Vacha-Haase, T., Kogan, L.R. y Thompson, B. (2000). Sample compositions and variabilities in published studies versus those in test manuals. Educational and Psychological Measurement, 60, 509-522.||en_US|