Multi-source item content, face validity and ecological-digital validity: Pretesting with experts, novices, and ChatGPT-4o in the development of the preliminary Observing Parent-Child Interaction Inventory (OPCII)

Müller, Jörg Michael

Preprint

Multi-source item content, face validity and ecological-digital validity: Pretesting with experts, novices, and ChatGPT-4o in the development of the preliminary Observing Parent-Child Interaction Inventory (OPCII)

Measuring content validity , face validity and ecological-digital validity

This article is a preprint and has not been certified by peer review [What does this mean?].

Author(s) / Creator(s)

Müller, Jörg Michael

Abstract / Description

Background: Many observational instruments have been developed to measure aspects of the parent-child relationship or interaction patterns. However, given the flood of terms and scale labels, these instruments often suffer from a lack of factorial, content, convergent or discriminant validity. Research Question: We examine how one can assess content validity during item pretesting and test development using multi-source information from experts, novices and ChatGPT-4o. We ask: Are ratings on the suitability of items consistent within and between these groups? Does content validity depend on scale characteristics such as construct popularity or category breadth? Method: The newly developed parent-child interaction inventory, namely the Observing Parent-Child Interaction Inventory (OPCII-1.0), comprises 19 scales with a total of 460 items. Each item was rated independently by six experts, six novices and six ChatGPT-4o prompts. A linear mixed model was applied to analyze the influence of group membership and item pool characteristics, with repeated measures nested within raters. Results: Mean score differences emerged across groups, with experts rating items most conservatively, ChatGPT-4o most liberally and novices falling in between. Additionally, item pools differed significantly in terms of their average suitability scores. An exploratory factor analysis of rater agreement revealed that ChatGPT-4o ratings showed the highest and most consistent loading on a common factor of item suitability. Discussion: Our multi-source evaluation provides evidence for content, face and ecological-digital validity. By implementing a transparent methodology—including detailed item generation instructions—we enhance the replicability of content validity assessments. This approach aims to initiate a convergent development process following decades of divergent construction of parent-child interaction instruments.

Keyword(s)

Content validity Face validity Ecological-digital validity Pretesting Observing Parent-Child Interaction Inventory novices experts ChatGPT-4o OPCII LLM Large Language Models

Persistent Identifier

https://doi.org/10.23668/psycharchives.21272

Date of first publication

2025-10-01

Publisher

PsychArchives

Citation

Müller, J. M. (2025). Multi-source item content, face validity and ecological-digital validity: Pretesting with experts, novices, and ChatGPT-4o in the development of the preliminary Observing Parent-Child Interaction Inventory (OPCII). Psycharchives. https://doi.org/10.23668/psycharchives.21272

Müller_2025_OPCII_Multi_source_pretesting_content_validity_V2.pdf

Adobe PDF - 550.6KB

MD5 : ac89e6217ca7b4ff38a06eb0084a11b0

Sharing Level 0 (Public Use) CC-BY-SA 4.0

Download

Rationale for choice of sharing level: The understanding and measurement of content validity has been expanded, and comments are welcome.

Is related to

Code
Code for: Multi-source item content, face validity and ecological-digital validity: Pretesting with experts, novices, and ChatGPT-4o in the development of the preliminary Observing Parent-Child Interaction Inventory (OPCII)

Müller, Jörg Michael, 2025-09-16, PsychArchives

SAS syntax to replicate the results.
Research Data
Data for: Multi-source item content, face validity and ecological-digital validity: Pretesting with experts, novices, and ChatGPT-4o in the development of the preliminary Observing Parent-Child Interaction Inventory (OPCII)

Müller, Jörg Michael, 2025-09-16, PsychArchives

The file contains the data related to Müller, J.M. (2025). Multi-source item content, face validity and ecological-digital validity: Pretesting with experts, novices, and ChatGPT-4o in the development of the preliminary Observing Parent-Child Interaction Inventory (OPCII). Psycharchives.
Preprint
Test Conception and Item Generation for the Observing Parent-Child Interaction Inventory OPCII-1.0

Müller, Jörg Michael, 2025-09-16, PsychArchives

Current observational measures of parent-child interaction often lack factorial validity and theoretical connectivity between suggested scales. We present the Observing Parent-Child Interaction Inventory (OPCII-1.0), which is designed to capture interaction patterns by integrating parenting behaviors with child temperament. The OPCII-1.0 assesses two neurobiological temperament dimensions, namely behavioral activation (BAS) and behavioral inhibition (BIS), which influence attention, emotional reactivity, needs, and behavior. These first-order dimensions shape six second-order traits: need for activity, autonomy, security, (quiet/loud × positive/negative) emotional expression, and effortful control. Parenting is conceptualized within two higher-order domains. Positive parenting includes sensitivity/responsiveness, social behavior regulation, cognitive support/stimulation, and regulating one’s own positive and negative emotions. Negative parenting encompasses denial of autonomy, passive and active rejection, intrusiveness, and rejection sensitivity. There are also indices for over- and underinvolvement as well as impaired predictability. Crossing four parenting constellations with four BIS/BAS-based child profiles yields 16 theoretically defined interaction patterns, which define a parenting-temperament misfit and enable risk prediction. We advance transparent and replicable test development through explicit item generation instructions (IGIs) and multi-source item content, as well as face and ecological validity pretesting with experts, novices and ChatGPT-4o. The planned analyses and validation criteria for the next steps in test development are presented.

2

2025-10-01

The original submission was sent to the PCI Psychology Portal. They requested the TOP Checklist and Disclosures Form, which must be included as an appendix at the end of the preprint. This has now been added to the appendix of the document.
1

2025-09-16

View object

Author(s) / Creator(s)

Müller, Jörg Michael
PsychArchives acquisition timestamp

2025-10-01T11:34:51Z
Made available on

2025-09-16T07:14:23Z
Made available on

2025-10-01T11:34:51Z
Date of first publication

2025-10-01
Abstract / Description

Background: Many observational instruments have been developed to measure aspects of the parent-child relationship or interaction patterns. However, given the flood of terms and scale labels, these instruments often suffer from a lack of factorial, content, convergent or discriminant validity. Research Question: We examine how one can assess content validity during item pretesting and test development using multi-source information from experts, novices and ChatGPT-4o. We ask: Are ratings on the suitability of items consistent within and between these groups? Does content validity depend on scale characteristics such as construct popularity or category breadth? Method: The newly developed parent-child interaction inventory, namely the Observing Parent-Child Interaction Inventory (OPCII-1.0), comprises 19 scales with a total of 460 items. Each item was rated independently by six experts, six novices and six ChatGPT-4o prompts. A linear mixed model was applied to analyze the influence of group membership and item pool characteristics, with repeated measures nested within raters. Results: Mean score differences emerged across groups, with experts rating items most conservatively, ChatGPT-4o most liberally and novices falling in between. Additionally, item pools differed significantly in terms of their average suitability scores. An exploratory factor analysis of rater agreement revealed that ChatGPT-4o ratings showed the highest and most consistent loading on a common factor of item suitability. Discussion: Our multi-source evaluation provides evidence for content, face and ecological-digital validity. By implementing a transparent methodology—including detailed item generation instructions—we enhance the replicability of content validity assessments. This approach aims to initiate a convergent development process following decades of divergent construction of parent-child interaction instruments.

en
Publication status

other
Review status

notReviewed
Citation

Müller, J. M. (2025). Multi-source item content, face validity and ecological-digital validity: Pretesting with experts, novices, and ChatGPT-4o in the development of the preliminary Observing Parent-Child Interaction Inventory (OPCII). Psycharchives. https://doi.org/10.23668/psycharchives.21272

en
Persistent Identifier

https://hdl.handle.net/20.500.12034/16625.2
Persistent Identifier

https://doi.org/10.23668/psycharchives.21272
Language of content

eng
Publisher

PsychArchives
Is related to

https://www.psycharchives.org/handle/20.500.12034/16627
Is related to

https://www.psycharchives.org/handle/20.500.12034/16624
Is related to

https://www.psycharchives.org/handle/20.500.12034/16626
Keyword(s)

Content validity
Keyword(s)

Face validity
Keyword(s)

Ecological-digital validity
Keyword(s)

Pretesting
Keyword(s)

Observing Parent-Child Interaction Inventory
Keyword(s)

novices
Keyword(s)

experts
Keyword(s)

ChatGPT-4o
Keyword(s)

OPCII
Keyword(s)

LLM
Keyword(s)

Large Language Models
Dewey Decimal Classification number(s)

150
Title

Multi-source item content, face validity and ecological-digital validity: Pretesting with experts, novices, and ChatGPT-4o in the development of the preliminary Observing Parent-Child Interaction Inventory (OPCII)

en
Alternative title

Measuring content validity , face validity and ecological-digital validity

en
DRO type

preprint
Visible tag(s)

Content validity
Visible tag(s)

Face validity
Visible tag(s)

Ecological-digital validity
Visible tag(s)

Pretesting
Visible tag(s)

ChatGPT-4o