Preprint

Leveraging machine learning for bibliometric analysis of emerging fields

This article is a preprint and has not been certified by peer review [What does this mean?].

Author(s) / Creator(s)

Petrule, Claudiu
Bittermann, André
Ritter, Viktoria
Haberkamp, Anke
Rief, Winfried

Abstract / Description

Bibliometric analyses of emerging fields with inconsistent terminology and porous boundaries are challenging: When precise terms for search queries are not available, compiling a comprehensive dataset requires screening a large number of database records to prevent false positives. In this study, we leverage Machine Learning (ML) to identify and include publications that are relevant to the field but differ in their terminology. ML is employed to semi-automate the necessary screening process of the emerging research landscape of translational psychotherapy as a use case. Compared to a typical database search with terms of known terminology only, the dataset generated by the ML-augmented approach differs clearly in various bibliometrically relevant aspects, such as top authors, journals, countries and impact. Our study emphasizes the importance of consistent terminology of research fields and, in its absence, the merits and benefits of ML.

Keyword(s)

bibliometrics machine learning screening automation translational psychotherapy publications

Persistent Identifier

Date of first publication

2023-09-21

Publisher

PsychArchives

Citation

  • Author(s) / Creator(s)
    Petrule, Claudiu
  • Author(s) / Creator(s)
    Bittermann, André
  • Author(s) / Creator(s)
    Ritter, Viktoria
  • Author(s) / Creator(s)
    Haberkamp, Anke
  • Author(s) / Creator(s)
    Rief, Winfried
  • PsychArchives acquisition timestamp
    2023-09-21T13:58:21Z
  • Made available on
    2023-09-21T13:58:21Z
  • Date of first publication
    2023-09-21
  • Submission date
    2023-01-16
  • Abstract / Description
    Bibliometric analyses of emerging fields with inconsistent terminology and porous boundaries are challenging: When precise terms for search queries are not available, compiling a comprehensive dataset requires screening a large number of database records to prevent false positives. In this study, we leverage Machine Learning (ML) to identify and include publications that are relevant to the field but differ in their terminology. ML is employed to semi-automate the necessary screening process of the emerging research landscape of translational psychotherapy as a use case. Compared to a typical database search with terms of known terminology only, the dataset generated by the ML-augmented approach differs clearly in various bibliometrically relevant aspects, such as top authors, journals, countries and impact. Our study emphasizes the importance of consistent terminology of research fields and, in its absence, the merits and benefits of ML.
    en
  • Publication status
    other
    en
  • Review status
    notReviewed
    en
  • Persistent Identifier
    https://hdl.handle.net/20.500.12034/8752
  • Persistent Identifier
    https://doi.org/10.23668/psycharchives.13262
  • Language of content
    eng
    en
  • Publisher
    PsychArchives
    en
  • Is referenced by
    http://dx.doi.org/10.23668/psycharchives.13261
  • Is related to
    https://doi.org/10.1027/2151-2604/a000509
  • Is related to
    https://doi.org/10.23668/psycharchives.15204
  • Is related to
    https://www.psycharchives.org/handle/20.500.12034/8764
  • Keyword(s)
    bibliometrics
    en
  • Keyword(s)
    machine learning
    en
  • Keyword(s)
    screening automation
    en
  • Keyword(s)
    translational psychotherapy
    en
  • Keyword(s)
    publications
    en
  • Dewey Decimal Classification number(s)
    150
  • Title
    Leveraging machine learning for bibliometric analysis of emerging fields
    en
  • DRO type
    preprint
    en