Conference Object

Distance correlation: Discovering meta-analytic relationships between variables when other correlation coefficients fail

Author(s) / Creator(s)

Stasielowicz, Lukasz
Suck, Reinhard

Abstract / Description

Background: Many meta-analysts use correlation coefficients in order to assess the strength of the relationship between selected variables across the studies. Usually the Pearson product-moment correlation is chosen. After all, it is implemented in most of the meta-analytic packages (Polanin, Hennessy, & Tanner-Smith, 2017). Furthermore, it is relatively easy to interpret as it ranges from -1 to 1 and researchers have proposed benchmarks to facilitate fast assessment of the practical relevance of the findings based on Pearson correlations (Bosco, Aguinis, Singh, Field, & Pierce, 2015; Gignac & Szodorai, 2016). Notwithstanding the advantages it has to be noted that the Pearson correlation has several limitations, which need to be considered by people conducting meta-analyses, i.e. outliers can lead to biased estimates of the correlations. Furthermore, not every type of bivariate relationship can be discovered when utilizing Pearson correlations. Specifically, only linear relationships can be detected. This can be problematic, because it can lead to false conclusions when non-linear rather than linear relationships are present. To illustrate, it is well known that certain types of cognitive abilities – i.e. processing speed, memory (Li et al., 2004) – improve during the childhood and decline during the (late) adulthood. Due to the inverted-U relationship between age and cognitive abilities the value of the Pearson correlation will be close to zero, implying that there is no linear relationship. Unfortunately, people may be inclined to think that lack of linear relationship means that there is no relationship whatsoever, which in turn may lead to abandoning fruitful research questions. Although alternative well-established correlation coefficients are available (e.g. Kendall’s tau, Spearman’s rho) they are not adequate when assessing non-monotonic relationships. However, recently other measures of dependence emerged - i.e. distance correlation (Rizzo & Székely, 2016; Székely, Rizzo, & Bakirov, 2007) – which are not restricted to monotonic relationships. In contrast to the previously mentioned correlation coefficients the distance correlation ranges from 0 to 1. A value of zero implies lack of dependence. Objectives: Although it has been suggested that distance correlations could be used in the meta-analytic context (Székely et al., 2007) to gauge the strength of the relationship between variables such attempts were not undertaken. Thus, the main objective of the present study was to compare distance correlation to other correlation coefficients (Pearson correlation, Kendall’s tau, Spearman’s rho) by conducting separate meta-analyses for each effect size. Research questions: We hypothesized that only by using the distance correlation one will be able to consistently detect meta-analytic dependence between the variables across several scenarios (e.g. linear relationship, non-linear monotonic relationship, non-linear non-monotonic relationship). In contrast, Kendall’s tau and Spearman’s rho will fail in the non-monotonic scenario and the Pearson correlation will fail even in the non-linear scenario. Method: For each scenario (i.e. non-linear monotonic relationship) many samples of participants were simulated in order to mimic the meta-analytic procedure of reviewing different studies. Distance correlation, Pearson correlation, Kendall’s tau and Spearman’s rho were computed for each sample. Subsequently the mean effect size across the samples was calculated separately for each type of correlation coefficient. Finally, the respective mean effect sizes were compared. The analyses were conducted using several R packages. The distance correlation was computed using the energy package. In order to compute the meta-analytic weights of each sample the variance of the distance correlation estimate was calculated by applying the jackknife technique within each sample (bootstrappackage). The respective random-effect meta-analyses (REML estimator) were carried out using the metafor package. Results: In general, the expected pattern of results could be confirmed. To illustrate, an inverted-U relationship y = -x*x, which could reflect the relationship between age and cognitive abilities, led to the following meta-analytic correlation estimates (k = 40, N = 2000): .01 (Pearson correlation), .03 (Kendall’s tau), .02 (Spearman’s rho), .33 (distance correlation). The reproducible R code will be made available upon publication. Conclusions: Among the considered correlation coefficients only distance correlation could consistently yield evidence for the existing relationship between two variables (i.e. age and cognitive abilities). Thus, it could be fruitful to utilize distance correlations as the effect size in future meta-analyses. It would reduce the risk of wrongly concluding that there is no relationship when a non-linear non-monotonic relationship is present. Providing the evidence for usefulness of distance correlations in the meta-analytic context is the main contribution of the current study. One important drawback that could stymie meta-analytic research based on distance correlations pertains to the fact that distance correlations cannot be derived from other correlation coefficients. Thus, meta-analysts cannot compute it by utilizing summary statistics reported in relevant studies. Instead they need the access to raw data. However, considering the advances made by the open science movement (e.g. data repositories) it seems plausible to assume that in future meta-analyses the access to raw data stemming from new studies will be granted. Even nowadays small meta-analyses based on distance correlations could be feasible thanks to replication initiatives or multi-lab studies where several laboratories examine the same research question, conduct a mini meta-analysis and make their raw data available. Nevertheless, further work on the use of dependence measures in meta-analyses is needed. In future studies one could try to examine the meta-analytic performance of distance correlations within the Bayesian framework (Bhattacharjee, 2014). Furthermore, one could simulate meta-analyses based on alternative measures of dependence within both the frequentist and Bayesian framework, e.g. Maximum Information Coefficient or Heller Heller Gorfine measure (de Siqueira Santos, Takahashi, Nakata, & Fujita, 2014). References: Bhattacharjee, A. (2014). Distance correlation coefficient: An application with bayesian approach in clinical data analysis. Journal of Modern Applied Statistical Methods, 13(1), 354–366. http://doi.org/10.22237/jmasm/1398918120 Bosco, F. A., Aguinis, H., Singh, K., Field, J. G., & Pierce, C. A. (2015). Correlational effect size benchmarks. Journal of Applied Psychology, 100(2), 431–449. http://doi.org/10.1037/a0038047 de Siqueira Santos, S., Takahashi, D. Y., Nakata, A., & Fujita, A. (2014). A comparative study of statistical methods used to identify dependencies between gene expression signals. Briefings in Bioinformatics, 15(6), 906–918. http://doi.org/10.1093/bib/bbt051 Gignac, G. E., & Szodorai, E. T. (2016). Effect size guidelines for individual differences researchers. Personality and Individual Differences, 102, 74–78. http://doi.org/10.1016/j.paid.2016.06.069 Li, S.-C., Lindenberger, U., Hommel, B., Aschersleben, G., Prinz, W., & Baltes, P. B. (2004). Transformations in the couplings among intellectual abilities and constituent cognitive processes across the life span. Psychological Science, 15(3), 155–163. http://doi.org/10.1111/j.0956-7976.2004.01503003.x Polanin, J. R., Hennessy, E. A., & Tanner-Smith, E. E. (2017). A review of meta-analysis packages in R. Journal of Educational and Behavioral Statistics, 42(2), 206–242. http://doi.org/10.3102/1076998616674315 Rizzo, M. L., & Székely, G. J. (2016). Energy distance. Wiley Interdisciplinary Reviews: Computational Statistics, 8(1), 27–38. http://doi.org/10.1002/wics.1375 Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35(6), 2769–2794. http://doi.org/10.1214/009053607000000505

Persistent Identifier

Date of first publication

2019-05-29

Is part of

Research Synthesis 2019 incl. Pre-Conference Symposium Big Data in Psychology, Dubrovnik, Croatia

Publisher

ZPID (Leibniz Institute for Psychology Information)

Citation

Stasielowicz, L., & Suck, R. (2019). Distance correlation: Discovering meta-analytic relationships between variables when other correlation coefficients fail. ZPID (Leibniz Institute for Psychology Information). https://doi.org/10.23668/psycharchives.2471
  • Author(s) / Creator(s)
    Stasielowicz, Lukasz
  • Author(s) / Creator(s)
    Suck, Reinhard
  • PsychArchives acquisition timestamp
    2019-06-11T13:34:17Z
  • Made available on
    2019-06-11T13:34:17Z
  • Date of first publication
    2019-05-29
  • Abstract / Description
    Background: Many meta-analysts use correlation coefficients in order to assess the strength of the relationship between selected variables across the studies. Usually the Pearson product-moment correlation is chosen. After all, it is implemented in most of the meta-analytic packages (Polanin, Hennessy, & Tanner-Smith, 2017). Furthermore, it is relatively easy to interpret as it ranges from -1 to 1 and researchers have proposed benchmarks to facilitate fast assessment of the practical relevance of the findings based on Pearson correlations (Bosco, Aguinis, Singh, Field, & Pierce, 2015; Gignac & Szodorai, 2016). Notwithstanding the advantages it has to be noted that the Pearson correlation has several limitations, which need to be considered by people conducting meta-analyses, i.e. outliers can lead to biased estimates of the correlations. Furthermore, not every type of bivariate relationship can be discovered when utilizing Pearson correlations. Specifically, only linear relationships can be detected. This can be problematic, because it can lead to false conclusions when non-linear rather than linear relationships are present. To illustrate, it is well known that certain types of cognitive abilities – i.e. processing speed, memory (Li et al., 2004) – improve during the childhood and decline during the (late) adulthood. Due to the inverted-U relationship between age and cognitive abilities the value of the Pearson correlation will be close to zero, implying that there is no linear relationship. Unfortunately, people may be inclined to think that lack of linear relationship means that there is no relationship whatsoever, which in turn may lead to abandoning fruitful research questions. Although alternative well-established correlation coefficients are available (e.g. Kendall’s tau, Spearman’s rho) they are not adequate when assessing non-monotonic relationships. However, recently other measures of dependence emerged - i.e. distance correlation (Rizzo & Székely, 2016; Székely, Rizzo, & Bakirov, 2007) – which are not restricted to monotonic relationships. In contrast to the previously mentioned correlation coefficients the distance correlation ranges from 0 to 1. A value of zero implies lack of dependence. Objectives: Although it has been suggested that distance correlations could be used in the meta-analytic context (Székely et al., 2007) to gauge the strength of the relationship between variables such attempts were not undertaken. Thus, the main objective of the present study was to compare distance correlation to other correlation coefficients (Pearson correlation, Kendall’s tau, Spearman’s rho) by conducting separate meta-analyses for each effect size. Research questions: We hypothesized that only by using the distance correlation one will be able to consistently detect meta-analytic dependence between the variables across several scenarios (e.g. linear relationship, non-linear monotonic relationship, non-linear non-monotonic relationship). In contrast, Kendall’s tau and Spearman’s rho will fail in the non-monotonic scenario and the Pearson correlation will fail even in the non-linear scenario. Method: For each scenario (i.e. non-linear monotonic relationship) many samples of participants were simulated in order to mimic the meta-analytic procedure of reviewing different studies. Distance correlation, Pearson correlation, Kendall’s tau and Spearman’s rho were computed for each sample. Subsequently the mean effect size across the samples was calculated separately for each type of correlation coefficient. Finally, the respective mean effect sizes were compared. The analyses were conducted using several R packages. The distance correlation was computed using the energy package. In order to compute the meta-analytic weights of each sample the variance of the distance correlation estimate was calculated by applying the jackknife technique within each sample (bootstrappackage). The respective random-effect meta-analyses (REML estimator) were carried out using the metafor package. Results: In general, the expected pattern of results could be confirmed. To illustrate, an inverted-U relationship y = -x*x, which could reflect the relationship between age and cognitive abilities, led to the following meta-analytic correlation estimates (k = 40, N = 2000): .01 (Pearson correlation), .03 (Kendall’s tau), .02 (Spearman’s rho), .33 (distance correlation). The reproducible R code will be made available upon publication. Conclusions: Among the considered correlation coefficients only distance correlation could consistently yield evidence for the existing relationship between two variables (i.e. age and cognitive abilities). Thus, it could be fruitful to utilize distance correlations as the effect size in future meta-analyses. It would reduce the risk of wrongly concluding that there is no relationship when a non-linear non-monotonic relationship is present. Providing the evidence for usefulness of distance correlations in the meta-analytic context is the main contribution of the current study. One important drawback that could stymie meta-analytic research based on distance correlations pertains to the fact that distance correlations cannot be derived from other correlation coefficients. Thus, meta-analysts cannot compute it by utilizing summary statistics reported in relevant studies. Instead they need the access to raw data. However, considering the advances made by the open science movement (e.g. data repositories) it seems plausible to assume that in future meta-analyses the access to raw data stemming from new studies will be granted. Even nowadays small meta-analyses based on distance correlations could be feasible thanks to replication initiatives or multi-lab studies where several laboratories examine the same research question, conduct a mini meta-analysis and make their raw data available. Nevertheless, further work on the use of dependence measures in meta-analyses is needed. In future studies one could try to examine the meta-analytic performance of distance correlations within the Bayesian framework (Bhattacharjee, 2014). Furthermore, one could simulate meta-analyses based on alternative measures of dependence within both the frequentist and Bayesian framework, e.g. Maximum Information Coefficient or Heller Heller Gorfine measure (de Siqueira Santos, Takahashi, Nakata, & Fujita, 2014). References: Bhattacharjee, A. (2014). Distance correlation coefficient: An application with bayesian approach in clinical data analysis. Journal of Modern Applied Statistical Methods, 13(1), 354–366. http://doi.org/10.22237/jmasm/1398918120 Bosco, F. A., Aguinis, H., Singh, K., Field, J. G., & Pierce, C. A. (2015). Correlational effect size benchmarks. Journal of Applied Psychology, 100(2), 431–449. http://doi.org/10.1037/a0038047 de Siqueira Santos, S., Takahashi, D. Y., Nakata, A., & Fujita, A. (2014). A comparative study of statistical methods used to identify dependencies between gene expression signals. Briefings in Bioinformatics, 15(6), 906–918. http://doi.org/10.1093/bib/bbt051 Gignac, G. E., & Szodorai, E. T. (2016). Effect size guidelines for individual differences researchers. Personality and Individual Differences, 102, 74–78. http://doi.org/10.1016/j.paid.2016.06.069 Li, S.-C., Lindenberger, U., Hommel, B., Aschersleben, G., Prinz, W., & Baltes, P. B. (2004). Transformations in the couplings among intellectual abilities and constituent cognitive processes across the life span. Psychological Science, 15(3), 155–163. http://doi.org/10.1111/j.0956-7976.2004.01503003.x Polanin, J. R., Hennessy, E. A., & Tanner-Smith, E. E. (2017). A review of meta-analysis packages in R. Journal of Educational and Behavioral Statistics, 42(2), 206–242. http://doi.org/10.3102/1076998616674315 Rizzo, M. L., & Székely, G. J. (2016). Energy distance. Wiley Interdisciplinary Reviews: Computational Statistics, 8(1), 27–38. http://doi.org/10.1002/wics.1375 Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35(6), 2769–2794. http://doi.org/10.1214/009053607000000505
    en_US
  • Citation
    Stasielowicz, L., & Suck, R. (2019). Distance correlation: Discovering meta-analytic relationships between variables when other correlation coefficients fail. ZPID (Leibniz Institute for Psychology Information). https://doi.org/10.23668/psycharchives.2471
    en
  • Persistent Identifier
    https://hdl.handle.net/20.500.12034/2097
  • Persistent Identifier
    https://doi.org/10.23668/psycharchives.2471
  • Language of content
    eng
    en_US
  • Publisher
    ZPID (Leibniz Institute for Psychology Information)
    en_US
  • Is part of
    Research Synthesis 2019 incl. Pre-Conference Symposium Big Data in Psychology, Dubrovnik, Croatia
    en_US
  • Dewey Decimal Classification number(s)
    150
  • Title
    Distance correlation: Discovering meta-analytic relationships between variables when other correlation coefficients fail
    en_US
  • DRO type
    conferenceObject
    en_US
  • Visible tag(s)
    ZPID Conferences and Workshops