Conference Object

Dealing with Artificially Dichotomized Variables in Meta-Analytic Structural Equation Modeling

Author(s) / Creator(s)

de Jonge, Hannelies
Jak, Suzanne
Kan, Kees-Jan

Abstract / Description

Background: Meta-analysis (Glass, 1976) is a commonly used statistical technique to aggregate sample effect sizes of different independent primary studies in order to draw inferences concerning population effects. To extend the range of research questions that can be answered, new meta-analytic models have been developed, such as meta-analytic structural equation modeling (MASEM) (Becker, 1992, 1995; Cheung, 2014, 2015a; Cheung & Chang, 2005; Jak, 2015; Viswesvaran & Ones, 1995). In primary studies, an effect size may represent the strength and direction of the association between any two variables of interest. Such an effect size can be expressed in different ways, for example as Pearson product-moment correlation, Cohens’ d, biserial correlation, and point-biserial correlation. How an effect size is expressed depends on the nature of the variables (e.g., continuous or dichotomous), but also on the way the variables are measured or analyzed. If one of the two continuous variables is artificially dichotomized, one may express the effect size as a point-biserial correlation. However, this typically provides a negatively biased estimate of the true underlying Pearson product-moment correlation (e.g., Cohen, 1983; MacCallum, Zhang, Preacher, & Rucker, 2002). The biserial correlation on the other hand should generally provide an unbiased estimate (Soper, 1914; Tate, 1955). Bias in the effect size of any primary study may affect meta-analytic results in the same direction (Jacobs & Viechtbauer, 2017). Therefore, we may expect that the use of the point-biserial correlation for the relationship between an artificially dichotomized and continuous variable also biases MASEM-parameters. In the current study we will evaluate how using point-biserial correlations versus biserial correlations from primary studies may affects path coefficients, their standard errors, and model fit in MASEM. Based on the results, we expect to be able to inform researchers about which of the two investigated effect sizes is the most appropriate to use in MASEM-applications and under which conditions. Aim: Our aim is to investigate the effects of using (1) the point-biserial correlation and (2) the biserial correlation for the relationship between an artificially dichotomized variable and a continuous variable on MASEM-parameters and model fit. Specifically, our interest lies in path coefficients, standard errors of these coefficients, and model fit. Method: We simulated meta-analytic data according to a full mediation (hence overidentified) population model (see Figure 1), with a continuous predictor variable X, continuous mediator M, and a continuous variable Y as outcome. Depending on the condition, the predictor variable X is artificially dichotomized in all or a given percentage of the primary studies. We chose this population model because in educational research the median number of variables in a ‘typical’ meta-analysis is three (de Jonge & Jak, 2018) and because mediation is a popular research topic. Figure 1. Population model with fixed parameter values. Under this population model, random meta-analytic datasets were generated under different conditions. We systematically varied the following: (1) the size of the (standardized) path coefficient between X and M (.16, .23, .33), (2) the percentage of primary studies in which X was artificially dichotomized (25%, 75%, 100%), and (3) the cut-off point at which X was artificially dichotomized (at the median value, so a proportion of .05, or when groups become unbalance, at a proportion of .01). These choices were mainly based on typical situations in educational research. The size of the path coefficient, reflect the minimum, mean/median, and maximum pooled Pearson product-moment correlations in a ‘typical’ meta-analysis in educational research (de Jonge & Jak, 2018). The 75% primary studies that artificially dichotomize the variable X, is based on a comparable example of a meta-analysis in educational research (Jansen, Elffers, & Jak, 2019). We used between-study variances of .01. The number of primary studies in a meta-analysis was fixed at the median number of a ‘typical’ meta-analysis, which is 44 (de Jonge & Jak, 2018). Because we use a random-effects MASEM-method, the assumption is thus that the population comprises 44 subpopulations from which the 44 samples are drawn, and that the weighted mean of the subpopulation parameters equals the population parameter. Given a specific condition and the fixed number of 44 primary studies, we randomly sampled the within primary study sample sizes from a positively skewed distribution as used in Hafdahl (2007) with a mean of 421.75, yielding ‘typical’ sample sizes (de Jonge & Jak, 2018) for every iteration. We imposed 39% missing correlations (Sheng, Kong, Cortina, & Hou, 2016) by (pseudo) randomly deleting either variable M or Y from 26 of the 44 studies. In each condition, we generated 2000 meta-analytic datasets drawn from the 44 subpopulations, which we analyzed using (1) the point-biserial and (2) the biserial correlation as effect size between the artificially dichotomized predictor X and continuous mediator M. The full mediation model was fitted using random-effects two stage structural equation modeling (TSSEM) (Cheung, 2014) within the R-package ‘metaSEM’ (Cheung, 2015b). As recommended (Becker, 2009; Hafdahl, 2007), we used the weighted mean correlation across the included primary studies to estimate the sampling variances and covariances of the correlation coefficients in the primary studies. Next, over the converged simulated datasets, we (1) estimated the relative percentage bias in both path coefficients (less than 5% bias was considered negligible; Hoogland & Boomsma, 1998), (2) calculated the relative percentage bias of the standard errors of these path coefficients (less than 10% bias was considered acceptable; Hoogland & Boomsma, 1998), (3) calculated the rejection rates of the chi-square statistic of the model of Stage 2 (df = 1,  = .05) and tested whether the rejection rate significantly differed from the nominal -level with the proportion test, and (4) compared the theoretical chi-square distribution (df = 1) with the empirical chi-square distribution (by means of QQplots and the Kolmogorov-Smirnov test). Main Results: When the point-biserial correlation for the relation between an artificially dichotomized predictor and a continuous mediator was used, the path coefficient of this relationship in the population (βMX) seems systematically underestimated. When the biserial correlation was used instead of the point-biserial correlation, this path coefficient could be considered unbiased in each condition. The estimated path coefficient between the two continuous variables (βYM) could also be considered unbiased in all conditions, no matter if the biserial or point-biserial correlation was used. The relative percentage bias in the standard errors of all path coefficients could be considered as not substantial according to the criteria that were applied. However, we noticed that the relative percentage bias in the standard error of the path coefficient between the predictor and mediator (βMX) seems systematically negatively biased when the biserial correlation was used. We also found that the relative percentage bias in the standard error of the path coefficient between the continuous variables Y and M (βYM) seems systematically negative, regardless if the point-biserial or biserial correlation was used. In most conditions, the rejection rate of the chi-square test of model fit at Stage 2 of the random-effects TSSEM was slightly above the nominal -level, no matter if the point-biserial or biserial correlation was used. The results of the Kolmogorov-Smirnov test and QQplots show that when the biserial correlation was used, there was a statistically significant difference between the empirical chi-square distribution and the theoretical chi-square distribution in five of the 18 conditions. When the point-biserial correlation was used, there was a significant difference in the same five conditions plus in three other conditions. There seems to be no clear pattern in which conditions the distributions differed significantly or not. Expected Conclusions and Implications: We advise researchers who want to apply MASEM and want to investigate mediation to convert the effect size between any dichotomized predictor and continuous variable to a biserial correlation, not to a point-biserial correlation. References: Becker, B. J. (1995). Corrections to “using results from replicated studies to estimate linear models”. Journal of Educational and Behavioral Statistics, 20, 100–102. doi:10.2307/1165390 Becker, B. J. (2009). Model-based meta-analysis. In H. Cooper, L. V. Hedges, & J.C. Valentine (Eds.) The handbook of research synthesis and meta-analysis (2nd ed., pp. 377–395). New York: Russell Sage Foundation. Becker, B. J. (1992). Using results from replicated studies to estimate linear models. Journal of Educational Statistics, 17, 341–362. doi:10.2307/1165128 Cheung, M. W.-L. (2014). Fixed-and random-effects meta-analytic structural equation modeling: Examples and analyses in R. Behavior Research Methods, 46, 29–40. doi:10.3758/s13428-013-0361-y Cheung, M. W.-L. (2015a). Meta-analysis: A structural equation modeling approach. Chichester, United Kingdom: John Wiley & Sons. Cheung, M. W.-L. (2015b). metaSEM: An R package for meta-analysis using structural equation modeling. Frontiers in Psychology, 5, [1521]. https://doi.org/10.3389/fpsyg.2014.01521 Cheung, M. W.-L., & Chan, W. (2005). Meta-analytic structural equation modeling: a two-stage approach. Psychological methods, 10, 40-64. doi:10.1037/1082-989X.10.1.40 Cohen, J. (1983). The cost of dichotomization. Applied psychological measurement, 7, 249-253. https://doi.org/10.1177/014662168300700301 de Jonge, H., & Jak, S. (2018, June). A Meta-Meta-Analysis: Identifying Typical Conditions of Meta-Analyses in Educational Research. Paper presented at the conference Research Synthesis 2018 of Leibniz Institute for Psychology Information, Trier, Germany. http://dx.doi.org/10.23668/psycharchives.853 Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. The Educational Researcher, 10, 3–8. doi:10.3102/0013189X005010003 Hafdahl, A. R. (2007). Combining correlation matrices: Simulation analysis of improved fixed-effects methods. Journal of Educational and Behavioral Statistics, 32, 180–205. doi:10.3102/1076998606298041 Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in covariance structure modeling: An overview and a meta-analysis. Sociological Methods & Research, 26, 329–367. doi:10.1177/0049124198026003003 Jacobs, P., & Viechtbauer, W. (2017). Estimation of the biserial correlation and its sampling variance for use in meta‐analysis. Research synthesis methods, 8, 161-180. doi:10.1002/jrsm.1218 Jak, S. (2015). Meta-analytic structural equation modelling. Springer International Publishing. Jansen, D., Elffers, L., & Jak, S. (2019). The functions of shadow education in school careers: a systematic review. Manuscript submitted for publication. MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the practice of dichotomization of quantitative variables. Psychological methods, 7, 19–40. doi:10.1037//1082-989X.7.1.19 Tate, R. F. (1955). The theory of correlation between two continuous variables when one is dichotomized. Biometrika, 42, 205–216. doi:10.2307/2333437 Sheng, Z., Kong, W., Cortina, J. M., & Hou, S. (2016). Analyzing matrices of meta‐analytic correlations: current practices and recommendations. Research synthesis methods, 7, 187-208. doi:10.1002/jrsm.1206 Soper, H. E. (1914). On the probable error of the bi-serial expression for the correlation coefficient. Biometrika, 10, 384–390. doi:10.2307/2331789 Viswesvaran, C., & Ones, D. (1995). Theory testing: Combining psychometric meta-analysis and structural equations modeling. Personnel Psychology, 48, 865–885. doi:10.1111/j.1744-6570.1995.tb01784.x

Persistent Identifier

Date of first publication

2019-05-29

Is part of

Research Synthesis 2019 incl. Pre-Conference Symposium Big Data in Psychology, Dubrovnik, Croatia

Publisher

ZPID (Leibniz Institute for Psychology Information)

Citation

De Jonge, H., Jak, S., & Kan, K.-J. (2019). Dealing with Artificially Dichotomized Variables in Meta-Analytic Structural Equation Modeling. ZPID (Leibniz Institute for Psychology Information). https://doi.org/10.23668/psycharchives.2490
  • Author(s) / Creator(s)
    de Jonge, Hannelies
  • Author(s) / Creator(s)
    Jak, Suzanne
  • Author(s) / Creator(s)
    Kan, Kees-Jan
  • PsychArchives acquisition timestamp
    2019-06-18T09:58:48Z
  • Made available on
    2019-06-18T09:58:48Z
  • Date of first publication
    2019-05-29
  • Abstract / Description
    Background: Meta-analysis (Glass, 1976) is a commonly used statistical technique to aggregate sample effect sizes of different independent primary studies in order to draw inferences concerning population effects. To extend the range of research questions that can be answered, new meta-analytic models have been developed, such as meta-analytic structural equation modeling (MASEM) (Becker, 1992, 1995; Cheung, 2014, 2015a; Cheung & Chang, 2005; Jak, 2015; Viswesvaran & Ones, 1995). In primary studies, an effect size may represent the strength and direction of the association between any two variables of interest. Such an effect size can be expressed in different ways, for example as Pearson product-moment correlation, Cohens’ d, biserial correlation, and point-biserial correlation. How an effect size is expressed depends on the nature of the variables (e.g., continuous or dichotomous), but also on the way the variables are measured or analyzed. If one of the two continuous variables is artificially dichotomized, one may express the effect size as a point-biserial correlation. However, this typically provides a negatively biased estimate of the true underlying Pearson product-moment correlation (e.g., Cohen, 1983; MacCallum, Zhang, Preacher, & Rucker, 2002). The biserial correlation on the other hand should generally provide an unbiased estimate (Soper, 1914; Tate, 1955). Bias in the effect size of any primary study may affect meta-analytic results in the same direction (Jacobs & Viechtbauer, 2017). Therefore, we may expect that the use of the point-biserial correlation for the relationship between an artificially dichotomized and continuous variable also biases MASEM-parameters. In the current study we will evaluate how using point-biserial correlations versus biserial correlations from primary studies may affects path coefficients, their standard errors, and model fit in MASEM. Based on the results, we expect to be able to inform researchers about which of the two investigated effect sizes is the most appropriate to use in MASEM-applications and under which conditions. Aim: Our aim is to investigate the effects of using (1) the point-biserial correlation and (2) the biserial correlation for the relationship between an artificially dichotomized variable and a continuous variable on MASEM-parameters and model fit. Specifically, our interest lies in path coefficients, standard errors of these coefficients, and model fit. Method: We simulated meta-analytic data according to a full mediation (hence overidentified) population model (see Figure 1), with a continuous predictor variable X, continuous mediator M, and a continuous variable Y as outcome. Depending on the condition, the predictor variable X is artificially dichotomized in all or a given percentage of the primary studies. We chose this population model because in educational research the median number of variables in a ‘typical’ meta-analysis is three (de Jonge & Jak, 2018) and because mediation is a popular research topic. Figure 1. Population model with fixed parameter values. Under this population model, random meta-analytic datasets were generated under different conditions. We systematically varied the following: (1) the size of the (standardized) path coefficient between X and M (.16, .23, .33), (2) the percentage of primary studies in which X was artificially dichotomized (25%, 75%, 100%), and (3) the cut-off point at which X was artificially dichotomized (at the median value, so a proportion of .05, or when groups become unbalance, at a proportion of .01). These choices were mainly based on typical situations in educational research. The size of the path coefficient, reflect the minimum, mean/median, and maximum pooled Pearson product-moment correlations in a ‘typical’ meta-analysis in educational research (de Jonge & Jak, 2018). The 75% primary studies that artificially dichotomize the variable X, is based on a comparable example of a meta-analysis in educational research (Jansen, Elffers, & Jak, 2019). We used between-study variances of .01. The number of primary studies in a meta-analysis was fixed at the median number of a ‘typical’ meta-analysis, which is 44 (de Jonge & Jak, 2018). Because we use a random-effects MASEM-method, the assumption is thus that the population comprises 44 subpopulations from which the 44 samples are drawn, and that the weighted mean of the subpopulation parameters equals the population parameter. Given a specific condition and the fixed number of 44 primary studies, we randomly sampled the within primary study sample sizes from a positively skewed distribution as used in Hafdahl (2007) with a mean of 421.75, yielding ‘typical’ sample sizes (de Jonge & Jak, 2018) for every iteration. We imposed 39% missing correlations (Sheng, Kong, Cortina, & Hou, 2016) by (pseudo) randomly deleting either variable M or Y from 26 of the 44 studies. In each condition, we generated 2000 meta-analytic datasets drawn from the 44 subpopulations, which we analyzed using (1) the point-biserial and (2) the biserial correlation as effect size between the artificially dichotomized predictor X and continuous mediator M. The full mediation model was fitted using random-effects two stage structural equation modeling (TSSEM) (Cheung, 2014) within the R-package ‘metaSEM’ (Cheung, 2015b). As recommended (Becker, 2009; Hafdahl, 2007), we used the weighted mean correlation across the included primary studies to estimate the sampling variances and covariances of the correlation coefficients in the primary studies. Next, over the converged simulated datasets, we (1) estimated the relative percentage bias in both path coefficients (less than 5% bias was considered negligible; Hoogland & Boomsma, 1998), (2) calculated the relative percentage bias of the standard errors of these path coefficients (less than 10% bias was considered acceptable; Hoogland & Boomsma, 1998), (3) calculated the rejection rates of the chi-square statistic of the model of Stage 2 (df = 1,  = .05) and tested whether the rejection rate significantly differed from the nominal -level with the proportion test, and (4) compared the theoretical chi-square distribution (df = 1) with the empirical chi-square distribution (by means of QQplots and the Kolmogorov-Smirnov test). Main Results: When the point-biserial correlation for the relation between an artificially dichotomized predictor and a continuous mediator was used, the path coefficient of this relationship in the population (βMX) seems systematically underestimated. When the biserial correlation was used instead of the point-biserial correlation, this path coefficient could be considered unbiased in each condition. The estimated path coefficient between the two continuous variables (βYM) could also be considered unbiased in all conditions, no matter if the biserial or point-biserial correlation was used. The relative percentage bias in the standard errors of all path coefficients could be considered as not substantial according to the criteria that were applied. However, we noticed that the relative percentage bias in the standard error of the path coefficient between the predictor and mediator (βMX) seems systematically negatively biased when the biserial correlation was used. We also found that the relative percentage bias in the standard error of the path coefficient between the continuous variables Y and M (βYM) seems systematically negative, regardless if the point-biserial or biserial correlation was used. In most conditions, the rejection rate of the chi-square test of model fit at Stage 2 of the random-effects TSSEM was slightly above the nominal -level, no matter if the point-biserial or biserial correlation was used. The results of the Kolmogorov-Smirnov test and QQplots show that when the biserial correlation was used, there was a statistically significant difference between the empirical chi-square distribution and the theoretical chi-square distribution in five of the 18 conditions. When the point-biserial correlation was used, there was a significant difference in the same five conditions plus in three other conditions. There seems to be no clear pattern in which conditions the distributions differed significantly or not. Expected Conclusions and Implications: We advise researchers who want to apply MASEM and want to investigate mediation to convert the effect size between any dichotomized predictor and continuous variable to a biserial correlation, not to a point-biserial correlation. References: Becker, B. J. (1995). Corrections to “using results from replicated studies to estimate linear models”. Journal of Educational and Behavioral Statistics, 20, 100–102. doi:10.2307/1165390 Becker, B. J. (2009). Model-based meta-analysis. In H. Cooper, L. V. Hedges, & J.C. Valentine (Eds.) The handbook of research synthesis and meta-analysis (2nd ed., pp. 377–395). New York: Russell Sage Foundation. Becker, B. J. (1992). Using results from replicated studies to estimate linear models. Journal of Educational Statistics, 17, 341–362. doi:10.2307/1165128 Cheung, M. W.-L. (2014). Fixed-and random-effects meta-analytic structural equation modeling: Examples and analyses in R. Behavior Research Methods, 46, 29–40. doi:10.3758/s13428-013-0361-y Cheung, M. W.-L. (2015a). Meta-analysis: A structural equation modeling approach. Chichester, United Kingdom: John Wiley & Sons. Cheung, M. W.-L. (2015b). metaSEM: An R package for meta-analysis using structural equation modeling. Frontiers in Psychology, 5, [1521]. https://doi.org/10.3389/fpsyg.2014.01521 Cheung, M. W.-L., & Chan, W. (2005). Meta-analytic structural equation modeling: a two-stage approach. Psychological methods, 10, 40-64. doi:10.1037/1082-989X.10.1.40 Cohen, J. (1983). The cost of dichotomization. Applied psychological measurement, 7, 249-253. https://doi.org/10.1177/014662168300700301 de Jonge, H., & Jak, S. (2018, June). A Meta-Meta-Analysis: Identifying Typical Conditions of Meta-Analyses in Educational Research. Paper presented at the conference Research Synthesis 2018 of Leibniz Institute for Psychology Information, Trier, Germany. http://dx.doi.org/10.23668/psycharchives.853 Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. The Educational Researcher, 10, 3–8. doi:10.3102/0013189X005010003 Hafdahl, A. R. (2007). Combining correlation matrices: Simulation analysis of improved fixed-effects methods. Journal of Educational and Behavioral Statistics, 32, 180–205. doi:10.3102/1076998606298041 Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in covariance structure modeling: An overview and a meta-analysis. Sociological Methods & Research, 26, 329–367. doi:10.1177/0049124198026003003 Jacobs, P., & Viechtbauer, W. (2017). Estimation of the biserial correlation and its sampling variance for use in meta‐analysis. Research synthesis methods, 8, 161-180. doi:10.1002/jrsm.1218 Jak, S. (2015). Meta-analytic structural equation modelling. Springer International Publishing. Jansen, D., Elffers, L., & Jak, S. (2019). The functions of shadow education in school careers: a systematic review. Manuscript submitted for publication. MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the practice of dichotomization of quantitative variables. Psychological methods, 7, 19–40. doi:10.1037//1082-989X.7.1.19 Tate, R. F. (1955). The theory of correlation between two continuous variables when one is dichotomized. Biometrika, 42, 205–216. doi:10.2307/2333437 Sheng, Z., Kong, W., Cortina, J. M., & Hou, S. (2016). Analyzing matrices of meta‐analytic correlations: current practices and recommendations. Research synthesis methods, 7, 187-208. doi:10.1002/jrsm.1206 Soper, H. E. (1914). On the probable error of the bi-serial expression for the correlation coefficient. Biometrika, 10, 384–390. doi:10.2307/2331789 Viswesvaran, C., & Ones, D. (1995). Theory testing: Combining psychometric meta-analysis and structural equations modeling. Personnel Psychology, 48, 865–885. doi:10.1111/j.1744-6570.1995.tb01784.x
    en_US
  • Citation
    De Jonge, H., Jak, S., & Kan, K.-J. (2019). Dealing with Artificially Dichotomized Variables in Meta-Analytic Structural Equation Modeling. ZPID (Leibniz Institute for Psychology Information). https://doi.org/10.23668/psycharchives.2490
    en
  • Persistent Identifier
    https://hdl.handle.net/20.500.12034/2114
  • Persistent Identifier
    https://doi.org/10.23668/psycharchives.2490
  • Language of content
    eng
    en_US
  • Publisher
    ZPID (Leibniz Institute for Psychology Information)
    en_US
  • Is part of
    Research Synthesis 2019 incl. Pre-Conference Symposium Big Data in Psychology, Dubrovnik, Croatia
    en_US
  • Dewey Decimal Classification number(s)
    150
  • Title
    Dealing with Artificially Dichotomized Variables in Meta-Analytic Structural Equation Modeling
    en_US
  • DRO type
    conferenceObject
    en_US
  • Visible tag(s)
    ZPID Conferences and Workshops