Multi-source item content, face validity and ecological-digital validity: Pretesting with experts, novices, and ChatGPT-4o in the development of the preliminary Observing Parent-Child Interaction Inventory (OPCII)
Measuring content validity , face validity and ecological-digital validity
This article is a preprint and has not been certified by peer review [What does this mean?].
Author(s) / Creator(s)
Müller, Jörg Michael
Abstract / Description
Background: Many observational instruments have been developed to measure aspects of the parent-child relationship or interaction patterns. However, given the flood of terms and scale labels, these instruments often suffer from a lack of factorial, content, convergent or discriminant validity. Research Question: We examine how one can assess content validity during item pretesting and test development using multi-source information from experts, novices and ChatGPT-4o. We ask: Are ratings on the suitability of items consistent within and between these groups? Does content validity depend on scale characteristics such as construct popularity or category breadth? Method: The newly developed parent-child interaction inventory, namely the Observing Parent-Child Interaction Inventory (OPCII-1.0), comprises 19 scales with a total of 460 items. Each item was rated independently by six experts, six novices and six ChatGPT-4o prompts. A linear mixed model was applied to analyze the influence of group membership and item pool characteristics, with repeated measures nested within raters. Results: Mean score differences emerged across groups, with experts rating items most conservatively, ChatGPT-4o most liberally and novices falling in between. Additionally, item pools differed significantly in terms of their average suitability scores. An exploratory factor analysis of rater agreement revealed that ChatGPT-4o ratings showed the highest and most consistent loading on a common factor of item suitability. Discussion: Our multi-source evaluation provides evidence for content, face and ecological-digital validity. By implementing a transparent methodology—including detailed item generation instructions—we enhance the replicability of content validity assessments. This approach aims to initiate a convergent development process following decades of divergent construction of parent-child interaction instruments.
Keyword(s)
Content validity Face validity Ecological-digital validity Pretesting Observing Parent-Child Interaction Inventory novices experts ChatGPT-4o OPCII LLM Large Language ModelsPersistent Identifier
Date of first publication
2025-10-01
Publisher
PsychArchives
Citation
Müller, J. M. (2025). Multi-source item content, face validity and ecological-digital validity: Pretesting with experts, novices, and ChatGPT-4o in the development of the preliminary Observing Parent-Child Interaction Inventory (OPCII). Psycharchives.
https://doi.org/10.23668/psycharchives.21272
-
Müller_2025_OPCII_Multi_source_pretesting_content_validity_V2.pdfAdobe PDF - 550.6KBMD5 : ac89e6217ca7b4ff38a06eb0084a11b0Rationale for choice of sharing level: The understanding and measurement of content validity has been expanded, and comments are welcome.
-
22025-10-01The original submission was sent to the PCI Psychology Portal. They requested the TOP Checklist and Disclosures Form, which must be included as an appendix at the end of the preprint. This has now been added to the appendix of the document.
-
Author(s) / Creator(s)Müller, Jörg Michael
-
PsychArchives acquisition timestamp2025-10-01T11:34:51Z
-
Made available on2025-09-16T07:14:23Z
-
Made available on2025-10-01T11:34:51Z
-
Date of first publication2025-10-01
-
Abstract / DescriptionBackground: Many observational instruments have been developed to measure aspects of the parent-child relationship or interaction patterns. However, given the flood of terms and scale labels, these instruments often suffer from a lack of factorial, content, convergent or discriminant validity. Research Question: We examine how one can assess content validity during item pretesting and test development using multi-source information from experts, novices and ChatGPT-4o. We ask: Are ratings on the suitability of items consistent within and between these groups? Does content validity depend on scale characteristics such as construct popularity or category breadth? Method: The newly developed parent-child interaction inventory, namely the Observing Parent-Child Interaction Inventory (OPCII-1.0), comprises 19 scales with a total of 460 items. Each item was rated independently by six experts, six novices and six ChatGPT-4o prompts. A linear mixed model was applied to analyze the influence of group membership and item pool characteristics, with repeated measures nested within raters. Results: Mean score differences emerged across groups, with experts rating items most conservatively, ChatGPT-4o most liberally and novices falling in between. Additionally, item pools differed significantly in terms of their average suitability scores. An exploratory factor analysis of rater agreement revealed that ChatGPT-4o ratings showed the highest and most consistent loading on a common factor of item suitability. Discussion: Our multi-source evaluation provides evidence for content, face and ecological-digital validity. By implementing a transparent methodology—including detailed item generation instructions—we enhance the replicability of content validity assessments. This approach aims to initiate a convergent development process following decades of divergent construction of parent-child interaction instruments.en
-
Publication statusother
-
Review statusnotReviewed
-
CitationMüller, J. M. (2025). Multi-source item content, face validity and ecological-digital validity: Pretesting with experts, novices, and ChatGPT-4o in the development of the preliminary Observing Parent-Child Interaction Inventory (OPCII). Psycharchives. https://doi.org/10.23668/psycharchives.21272en
-
Persistent Identifierhttps://hdl.handle.net/20.500.12034/16625.2
-
Persistent Identifierhttps://doi.org/10.23668/psycharchives.21272
-
Language of contenteng
-
PublisherPsychArchives
-
Is related tohttps://www.psycharchives.org/handle/20.500.12034/16627
-
Is related tohttps://www.psycharchives.org/handle/20.500.12034/16624
-
Is related tohttps://www.psycharchives.org/handle/20.500.12034/16626
-
Keyword(s)Content validity
-
Keyword(s)Face validity
-
Keyword(s)Ecological-digital validity
-
Keyword(s)Pretesting
-
Keyword(s)Observing Parent-Child Interaction Inventory
-
Keyword(s)novices
-
Keyword(s)experts
-
Keyword(s)ChatGPT-4o
-
Keyword(s)OPCII
-
Keyword(s)LLM
-
Keyword(s)Large Language Models
-
Dewey Decimal Classification number(s)150
-
TitleMulti-source item content, face validity and ecological-digital validity: Pretesting with experts, novices, and ChatGPT-4o in the development of the preliminary Observing Parent-Child Interaction Inventory (OPCII)en
-
Alternative titleMeasuring content validity , face validity and ecological-digital validityen
-
DRO typepreprint
-
Visible tag(s)Content validity
-
Visible tag(s)Face validity
-
Visible tag(s)Ecological-digital validity
-
Visible tag(s)Pretesting
-
Visible tag(s)ChatGPT-4o