BREAKING NEWS: Psycholinguistic and Behavioural Differences in (Un)‐Trustworthy Online News Source Interaction (Poster)
Author(s) / Creator(s)
Aluffi, Pietro Alessandro
Meynent, Léo
Stachl, Clemens
Abstract / Description
The psychological need to share information and impact thereof has been exacerbated with the advent of online activities and online social media. The pervasiveness of unfiltered and non-curated information in online social media induces exposure to untrustworthy information (i.e., low-factual information sources). This exposure makes it increasingly challenging for users to discern trustworthy from non-trustworthy online news sources. Moreover, the social character of these platforms raises questions around the diffusion of untrustworthy news sources. Understanding characteristics of users sharing untrustworthy online news sources could help identify and prevent the diffusion of untrustworthy content. Past work has mainly focused on the experimental investigation of specific individual differences in the interaction with untrustworthy online news sources. To comprehensively analyse the human factors related to the online dissemination of non-trustworthy news, a better understanding of user’s characteristics in the complexity of a real-world setting is called for. Here, we aim to answer the following research question: What are systematic differences in users’ demographic, psycholinguistic, and online posting behaviour regarding interaction with (un)trustworthy news sources? To address this question, we use open-access social media data from the Reddit platform (Pushshift) and independent data for political bias and trustworthiness of online media (e.g., media bias fact check) to highlight groups of users who tend to share less factual content to those platforms. We assign users into two groups: trustworthy news sharers and untrustworthy news sharers. For group assignment, we adopt communities’ embeddings to ensures that users in the two groups belong to similar communities. For predictive modelling, we extract three types of features: psycholinguistic, online posting behaviour and demographic. First, we quantify psycholinguistic characteristics by calculating the Linguistic Inquiry and Word Count scores (LIWC) using the text from the comments by users commenting a post containing a news article. Furthermore, we complement LIWC scores with general text-based characteristics such as lexical diversity and readability scores. Secondly, we extract online posting behaviour features based on the popularity of the user, posting frequency, and network-size. Lastly, we use the Bidirectional Encoder Representations from Transformers (BERT) model trained on the RedDust dataset to infer users’ demographic characteristics such as age and gender. We use the combined set of features in a machine learning approach (e.g., random forest) to predict group membership according to Redditors characteristics. Furthermore, to gain insights on which of these factors has a predominant impact on group membership we perform Shapely Values analysis to extract and interpret feature importance. Insights from this project will contribute to a more nuanced understanding of how users' characteristics can be associated to different consumption patterns of (un)trustworthy news sources, and potentially provide insights for predicting the process of misinformation spread. We anticipate that our findings can be used to identify groups who are more susceptible to consuming and spreading disinformation. This in turn could help individuals make informed decisions and avoid exposure to false information. Finally, we will discuss the implications of our findings for the development of interventions and policies to debunk disinformation and to increase media literacy.
Persistent Identifier
Date of first publication
2023-07-24
Is part of
Big Data & Research Syntheses 2023, Frankfurt, Germany
Publisher
ZPID (Leibniz Institute for Psychology)
Citation
-
Aluffi_Poster.pdfAdobe PDF - 419.71KBMD5: e109489b737a332bd9f2a0b10bef4ea7
-
There are no other versions of this object.
-
Author(s) / Creator(s)Aluffi, Pietro Alessandro
-
Author(s) / Creator(s)Meynent, Léo
-
Author(s) / Creator(s)Stachl, Clemens
-
PsychArchives acquisition timestamp2023-07-24T10:51:52Z
-
Made available on2023-07-24T10:51:52Z
-
Date of first publication2023-07-24
-
Abstract / DescriptionThe psychological need to share information and impact thereof has been exacerbated with the advent of online activities and online social media. The pervasiveness of unfiltered and non-curated information in online social media induces exposure to untrustworthy information (i.e., low-factual information sources). This exposure makes it increasingly challenging for users to discern trustworthy from non-trustworthy online news sources. Moreover, the social character of these platforms raises questions around the diffusion of untrustworthy news sources. Understanding characteristics of users sharing untrustworthy online news sources could help identify and prevent the diffusion of untrustworthy content. Past work has mainly focused on the experimental investigation of specific individual differences in the interaction with untrustworthy online news sources. To comprehensively analyse the human factors related to the online dissemination of non-trustworthy news, a better understanding of user’s characteristics in the complexity of a real-world setting is called for. Here, we aim to answer the following research question: What are systematic differences in users’ demographic, psycholinguistic, and online posting behaviour regarding interaction with (un)trustworthy news sources? To address this question, we use open-access social media data from the Reddit platform (Pushshift) and independent data for political bias and trustworthiness of online media (e.g., media bias fact check) to highlight groups of users who tend to share less factual content to those platforms. We assign users into two groups: trustworthy news sharers and untrustworthy news sharers. For group assignment, we adopt communities’ embeddings to ensures that users in the two groups belong to similar communities. For predictive modelling, we extract three types of features: psycholinguistic, online posting behaviour and demographic. First, we quantify psycholinguistic characteristics by calculating the Linguistic Inquiry and Word Count scores (LIWC) using the text from the comments by users commenting a post containing a news article. Furthermore, we complement LIWC scores with general text-based characteristics such as lexical diversity and readability scores. Secondly, we extract online posting behaviour features based on the popularity of the user, posting frequency, and network-size. Lastly, we use the Bidirectional Encoder Representations from Transformers (BERT) model trained on the RedDust dataset to infer users’ demographic characteristics such as age and gender. We use the combined set of features in a machine learning approach (e.g., random forest) to predict group membership according to Redditors characteristics. Furthermore, to gain insights on which of these factors has a predominant impact on group membership we perform Shapely Values analysis to extract and interpret feature importance. Insights from this project will contribute to a more nuanced understanding of how users' characteristics can be associated to different consumption patterns of (un)trustworthy news sources, and potentially provide insights for predicting the process of misinformation spread. We anticipate that our findings can be used to identify groups who are more susceptible to consuming and spreading disinformation. This in turn could help individuals make informed decisions and avoid exposure to false information. Finally, we will discuss the implications of our findings for the development of interventions and policies to debunk disinformation and to increase media literacy.en
-
Publication statusunknown
-
Review statusunknown
-
External description on another websitehttp://www.ressyn-bigdata.org
-
Persistent Identifierhttps://hdl.handle.net/20.500.12034/8523
-
Persistent Identifierhttps://doi.org/10.23668/psycharchives.13024
-
Language of contenteng
-
PublisherZPID (Leibniz Institute for Psychology)
-
Is part ofBig Data & Research Syntheses 2023, Frankfurt, Germany
-
Is related tohttps://hdl.handle.net/20.500.12034/8510
-
Dewey Decimal Classification number(s)150
-
TitleBREAKING NEWS: Psycholinguistic and Behavioural Differences in (Un)‐Trustworthy Online News Source Interaction (Poster)en
-
DRO typeconferenceObject
-
Visible tag(s)ZPID Conferences and Workshops