Conference Object

BREAKING NEWS: Psycholinguistic and Behavioural Differences in (Un)‐Trustworthy Online News Source Interaction (Poster)

Author(s) / Creator(s)

Aluffi, Pietro Alessandro
Meynent, Léo
Stachl, Clemens

Abstract / Description

The psychological need to share information and impact thereof has been exacerbated with the advent of online activities and online social media. The pervasiveness of unfiltered and non-curated information in online social media induces exposure to untrustworthy information (i.e., low-factual information sources). This exposure makes it increasingly challenging for users to discern trustworthy from non-trustworthy online news sources. Moreover, the social character of these platforms raises questions around the diffusion of untrustworthy news sources. Understanding characteristics of users sharing untrustworthy online news sources could help identify and prevent the diffusion of untrustworthy content. Past work has mainly focused on the experimental investigation of specific individual differences in the interaction with untrustworthy online news sources. To comprehensively analyse the human factors related to the online dissemination of non-trustworthy news, a better understanding of user’s characteristics in the complexity of a real-world setting is called for. Here, we aim to answer the following research question: What are systematic differences in users’ demographic, psycholinguistic, and online posting behaviour regarding interaction with (un)trustworthy news sources? To address this question, we use open-access social media data from the Reddit platform (Pushshift) and independent data for political bias and trustworthiness of online media (e.g., media bias fact check) to highlight groups of users who tend to share less factual content to those platforms. We assign users into two groups: trustworthy news sharers and untrustworthy news sharers. For group assignment, we adopt communities’ embeddings to ensures that users in the two groups belong to similar communities. For predictive modelling, we extract three types of features: psycholinguistic, online posting behaviour and demographic. First, we quantify psycholinguistic characteristics by calculating the Linguistic Inquiry and Word Count scores (LIWC) using the text from the comments by users commenting a post containing a news article. Furthermore, we complement LIWC scores with general text-based characteristics such as lexical diversity and readability scores. Secondly, we extract online posting behaviour features based on the popularity of the user, posting frequency, and network-size. Lastly, we use the Bidirectional Encoder Representations from Transformers (BERT) model trained on the RedDust dataset to infer users’ demographic characteristics such as age and gender. We use the combined set of features in a machine learning approach (e.g., random forest) to predict group membership according to Redditors characteristics. Furthermore, to gain insights on which of these factors has a predominant impact on group membership we perform Shapely Values analysis to extract and interpret feature importance. Insights from this project will contribute to a more nuanced understanding of how users' characteristics can be associated to different consumption patterns of (un)trustworthy news sources, and potentially provide insights for predicting the process of misinformation spread. We anticipate that our findings can be used to identify groups who are more susceptible to consuming and spreading disinformation. This in turn could help individuals make informed decisions and avoid exposure to false information. Finally, we will discuss the implications of our findings for the development of interventions and policies to debunk disinformation and to increase media literacy.

Persistent Identifier

Date of first publication

2023-07-24

Is part of

Big Data & Research Syntheses 2023, Frankfurt, Germany

Publisher

ZPID (Leibniz Institute for Psychology)

Citation

  • Author(s) / Creator(s)
    Aluffi, Pietro Alessandro
  • Author(s) / Creator(s)
    Meynent, Léo
  • Author(s) / Creator(s)
    Stachl, Clemens
  • PsychArchives acquisition timestamp
    2023-07-24T10:51:52Z
  • Made available on
    2023-07-24T10:51:52Z
  • Date of first publication
    2023-07-24
  • Abstract / Description
    The psychological need to share information and impact thereof has been exacerbated with the advent of online activities and online social media. The pervasiveness of unfiltered and non-curated information in online social media induces exposure to untrustworthy information (i.e., low-factual information sources). This exposure makes it increasingly challenging for users to discern trustworthy from non-trustworthy online news sources. Moreover, the social character of these platforms raises questions around the diffusion of untrustworthy news sources. Understanding characteristics of users sharing untrustworthy online news sources could help identify and prevent the diffusion of untrustworthy content. Past work has mainly focused on the experimental investigation of specific individual differences in the interaction with untrustworthy online news sources. To comprehensively analyse the human factors related to the online dissemination of non-trustworthy news, a better understanding of user’s characteristics in the complexity of a real-world setting is called for. Here, we aim to answer the following research question: What are systematic differences in users’ demographic, psycholinguistic, and online posting behaviour regarding interaction with (un)trustworthy news sources? To address this question, we use open-access social media data from the Reddit platform (Pushshift) and independent data for political bias and trustworthiness of online media (e.g., media bias fact check) to highlight groups of users who tend to share less factual content to those platforms. We assign users into two groups: trustworthy news sharers and untrustworthy news sharers. For group assignment, we adopt communities’ embeddings to ensures that users in the two groups belong to similar communities. For predictive modelling, we extract three types of features: psycholinguistic, online posting behaviour and demographic. First, we quantify psycholinguistic characteristics by calculating the Linguistic Inquiry and Word Count scores (LIWC) using the text from the comments by users commenting a post containing a news article. Furthermore, we complement LIWC scores with general text-based characteristics such as lexical diversity and readability scores. Secondly, we extract online posting behaviour features based on the popularity of the user, posting frequency, and network-size. Lastly, we use the Bidirectional Encoder Representations from Transformers (BERT) model trained on the RedDust dataset to infer users’ demographic characteristics such as age and gender. We use the combined set of features in a machine learning approach (e.g., random forest) to predict group membership according to Redditors characteristics. Furthermore, to gain insights on which of these factors has a predominant impact on group membership we perform Shapely Values analysis to extract and interpret feature importance. Insights from this project will contribute to a more nuanced understanding of how users' characteristics can be associated to different consumption patterns of (un)trustworthy news sources, and potentially provide insights for predicting the process of misinformation spread. We anticipate that our findings can be used to identify groups who are more susceptible to consuming and spreading disinformation. This in turn could help individuals make informed decisions and avoid exposure to false information. Finally, we will discuss the implications of our findings for the development of interventions and policies to debunk disinformation and to increase media literacy.
    en
  • Publication status
    unknown
  • Review status
    unknown
  • External description on another website
    http://www.ressyn-bigdata.org
  • Persistent Identifier
    https://hdl.handle.net/20.500.12034/8523
  • Persistent Identifier
    https://doi.org/10.23668/psycharchives.13024
  • Language of content
    eng
  • Publisher
    ZPID (Leibniz Institute for Psychology)
  • Is part of
    Big Data & Research Syntheses 2023, Frankfurt, Germany
  • Is related to
    https://hdl.handle.net/20.500.12034/8510
  • Dewey Decimal Classification number(s)
    150
  • Title
    BREAKING NEWS: Psycholinguistic and Behavioural Differences in (Un)‐Trustworthy Online News Source Interaction (Poster)
    en
  • DRO type
    conferenceObject
  • Visible tag(s)
    ZPID Conferences and Workshops