Preprint

Multi-source item content, face validity and ecological-digital validity: Pretesting with experts, novices, and ChatGPT-4o in the development of the preliminary Observing Parent-Child Interaction Inventory (OPCII)

Measuring content validity , face validity and ecological-digital validity

This article is a preprint and has not been certified by peer review [What does this mean?].

Author(s) / Creator(s)

Müller, Jörg Michael

Abstract / Description

Background: Many observational instruments have been developed to measure aspects of the parent-child relationship or interaction patterns. However, given the flood of terms and scale labels, these instruments often suffer from a lack of factorial, content, convergent or discriminant validity. Research Question: We examine how one can assess content validity during item pretesting and test development using multi-source information from experts, novices and ChatGPT-4o. We ask: Are ratings on the suitability of items consistent within and between these groups? Does content validity depend on scale characteristics such as construct popularity or category breadth? Method: The newly developed parent-child interaction inventory, namely the Observing Parent-Child Interaction Inventory (OPCII-1.0), comprises 19 scales with a total of 460 items. Each item was rated independently by six experts, six novices and six ChatGPT-4o prompts. A linear mixed model was applied to analyze the influence of group membership and item pool characteristics, with repeated measures nested within raters. Results: Mean score differences emerged across groups, with experts rating items most conservatively, ChatGPT-4o most liberally and novices falling in between. Additionally, item pools differed significantly in terms of their average suitability scores. An exploratory factor analysis of rater agreement revealed that ChatGPT-4o ratings showed the highest and most consistent loading on a common factor of item suitability. Discussion: Our multi-source evaluation provides evidence for content, face and ecological-digital validity. By implementing a transparent methodology—including detailed item generation instructions—we enhance the replicability of content validity assessments. This approach aims to initiate a convergent development process following decades of divergent construction of parent-child interaction instruments.

Keyword(s)

Content validity Face validity Ecological-digital validity Pretesting Observing Parent-Child Interaction Inventory novices experts ChatGPT-4o OPCII LLM Large Language Models

Persistent Identifier

Date of first publication

2025-10-01

Publisher

PsychArchives

Citation

Müller, J. M. (2025). Multi-source item content, face validity and ecological-digital validity: Pretesting with experts, novices, and ChatGPT-4o in the development of the preliminary Observing Parent-Child Interaction Inventory (OPCII). Psycharchives. https://doi.org/10.23668/psycharchives.21272
  • Müller_2025_OPCII_Multi_source_pretesting_content_validity_V2.pdf
    Adobe PDF  - 550.6KB
    MD5 : ac89e6217ca7b4ff38a06eb0084a11b0
     Download
    Rationale for choice of sharing level: The understanding and measurement of content validity has been expanded, and comments are welcome.
  • 2
    2025-10-01
    The original submission was sent to the PCI Psychology Portal. They requested the TOP Checklist and Disclosures Form, which must be included as an appendix at the end of the preprint. This has now been added to the appendix of the document.
  • 1
    2025-09-16
  • Author(s) / Creator(s)
    Müller, Jörg Michael
  • PsychArchives acquisition timestamp
    2025-10-01T11:34:51Z
  • Made available on
    2025-09-16T07:14:23Z
  • Made available on
    2025-10-01T11:34:51Z
  • Date of first publication
    2025-10-01
  • Abstract / Description
    Background: Many observational instruments have been developed to measure aspects of the parent-child relationship or interaction patterns. However, given the flood of terms and scale labels, these instruments often suffer from a lack of factorial, content, convergent or discriminant validity. Research Question: We examine how one can assess content validity during item pretesting and test development using multi-source information from experts, novices and ChatGPT-4o. We ask: Are ratings on the suitability of items consistent within and between these groups? Does content validity depend on scale characteristics such as construct popularity or category breadth? Method: The newly developed parent-child interaction inventory, namely the Observing Parent-Child Interaction Inventory (OPCII-1.0), comprises 19 scales with a total of 460 items. Each item was rated independently by six experts, six novices and six ChatGPT-4o prompts. A linear mixed model was applied to analyze the influence of group membership and item pool characteristics, with repeated measures nested within raters. Results: Mean score differences emerged across groups, with experts rating items most conservatively, ChatGPT-4o most liberally and novices falling in between. Additionally, item pools differed significantly in terms of their average suitability scores. An exploratory factor analysis of rater agreement revealed that ChatGPT-4o ratings showed the highest and most consistent loading on a common factor of item suitability. Discussion: Our multi-source evaluation provides evidence for content, face and ecological-digital validity. By implementing a transparent methodology—including detailed item generation instructions—we enhance the replicability of content validity assessments. This approach aims to initiate a convergent development process following decades of divergent construction of parent-child interaction instruments.
    en
  • Publication status
    other
  • Review status
    notReviewed
  • Citation
    Müller, J. M. (2025). Multi-source item content, face validity and ecological-digital validity: Pretesting with experts, novices, and ChatGPT-4o in the development of the preliminary Observing Parent-Child Interaction Inventory (OPCII). Psycharchives. https://doi.org/10.23668/psycharchives.21272
    en
  • Persistent Identifier
    https://hdl.handle.net/20.500.12034/16625.2
  • Persistent Identifier
    https://doi.org/10.23668/psycharchives.21272
  • Language of content
    eng
  • Publisher
    PsychArchives
  • Is related to
    https://www.psycharchives.org/handle/20.500.12034/16627
  • Is related to
    https://www.psycharchives.org/handle/20.500.12034/16624
  • Is related to
    https://www.psycharchives.org/handle/20.500.12034/16626
  • Keyword(s)
    Content validity
  • Keyword(s)
    Face validity
  • Keyword(s)
    Ecological-digital validity
  • Keyword(s)
    Pretesting
  • Keyword(s)
    Observing Parent-Child Interaction Inventory
  • Keyword(s)
    novices
  • Keyword(s)
    experts
  • Keyword(s)
    ChatGPT-4o
  • Keyword(s)
    OPCII
  • Keyword(s)
    LLM
  • Keyword(s)
    Large Language Models
  • Dewey Decimal Classification number(s)
    150
  • Title
    Multi-source item content, face validity and ecological-digital validity: Pretesting with experts, novices, and ChatGPT-4o in the development of the preliminary Observing Parent-Child Interaction Inventory (OPCII)
    en
  • Alternative title
    Measuring content validity , face validity and ecological-digital validity
    en
  • DRO type
    preprint
  • Visible tag(s)
    Content validity
  • Visible tag(s)
    Face validity
  • Visible tag(s)
    Ecological-digital validity
  • Visible tag(s)
    Pretesting
  • Visible tag(s)
    ChatGPT-4o