Conference Object

Usability of web scraping of open-source discussions for identifying key beliefs

Author(s) / Creator(s)

Gordoni, Galit
Steinmetz, Holger
Schmidt, Peter

Abstract / Description

Background: The recent years has brought tremendous interest in the collection and use of Big Data. While in the first phase of interest, the discussion largely focused on practical and societal issues, researchers have begun to consider the use of Big Data for scientific uses. In Psychology, there is an increasing interest in the usability of user-generated data for addressing psychological research questions (Adjerid & Kelley, 2018; Harlow & Oswald, 2016). As a prominent data collection method, web scraping (i.e., an automated tool for finding and extracting data from online sources) has been used for research on eating disorders (Moessner, et al., 2018), mental toughness (Gucciardi, 2017) and personality (Farnadi et al., 2016). One frequent characteristic of common Big Data analytics is its exploratory nature. In contrast, researchers increasingly demand to use it for theory-relevant research (e.g., Shmueli, 2010). Although web scraping is increasingly applied it is still not clear whether posts, can serve as a valuable data source in theory-driven empirical studies. In this study we address the lack of knowledge on usability of user-generated data for assessing research questions concerning beliefs of people (Eagly & Chaiken, 1993). As a relevant, theoretical framework that focuses on the fundamental role of beliefs in interventions, we draw on the well replicated social psychological theory—the Theory of Planned Behavior (TPB; Ajzen, 1991). The theory integrates the cognitive foundation of motivational and decision processes (i.e., the beliefs) with attitudes, perceptions of social legitimization, efficacy, and feasibility of the behavior in question (Fishbein & Ajzen, 2010). Briefly, the theory claims that deliberate behavior is mainly determined by the intention to perform the behavior. The intention, in turn, is a function of the attitude towards the behavior (i.e., the perceived attractiveness of the behavior), the subjective norm (i.e., the perceived expectations of important others towards conducting the behavior), and the perceived behavioral control (i.e., the perceived feasibility and control with regard to the behavior). Furthermore, the theory claims that these motivationally relevant factors are based on beliefs about positive and negative consequences of the behavior, the opinions of specific others and barriers and facilitators. The TPB serves as a central theoretical framework for understanding and changing behaviors. Since changing beliefs is the essence of intervention approaches, knowledge about potent beliefs of potential benefits, costs, social expectations, barriers, and facilitators of the behavior, is not only of theoretical value but provides the basis for practical endeavours to change behaviors (Steinmetz et al. 2016). The initial stage in a TPB driven study includes identifying motivationally relevant key beliefs via a qualitative pilot study. While this procedure (Ajzen & Fishbein, 1980; Fishbein & Ajzen, 2010) has been fruitful for identifying relevant beliefs for decades of TPB research, it has the limitation that the number of respondents is very small and that the approach runs the danger of reactive responses. Especially in cases with a non-familiar behavior, the comments may lack validity and will not concern those beliefs which occur in a natural decision process. In this study we focus on the potential of open-source discussions to serve as an additional data source that resembles the pitfalls of self-reported answers. Users comments are produced by individuals concerned with consequences of the behavior in question or expected difficulties of conducting the behavior, formulated in a natural setting, with no potential response bias due to factors, such as, interviewer effect, topic complexity and topic sensitivity. Objectives: We aim to advance the knowledge on the usability of integrating web scraping of web discussions in the initial stage of theory-driven belief study, for identifying key beliefs underlying behaviors under interest. Research questions: We use the behavior of Big Data adoption in organizations as an illustrative case for testing the following questions: 1. What are the key beliefs concerning Big Data adoption (behavioral beliefs, normative beliefs and control beliefs)? 2. Do key behavioral, normative and control beliefs concerning Big Data adoption identified in user-generated posts differ from those identified in self-report surveys? Method: We conducted web scraping study of discussion boards on Big Data usage in Israel, generated between June and August 2018. Discussions appeared mainly after online articles (41%), in social networks (25%) and forums (19%). Unit of analysis was the complete discussion beginning with the opening post up to the closing one. 353 authentic discussions (i.e., containing at least 2 comments) were scraped. Content analysis was conducted, manually for a sample of 148 authentic discussions. We applied the methodology used for identifying key beliefs in TPB driven studies (de Leeuw et al., 2015) for counting the number of times a given category of comment content appeared across discussions. Second, following Landers et al. (2016), we compared the beliefs found via web scraping with representative surveys in French companies (Raguseo, 2018) and in German companies (Commerzbank AG, 2018). These external data sources serve as a base rate for testing the replicability of key beliefs found in the web scraping data. For comparison we used for example the response distribution of the following multiple response question “What are the benefits to companies from the systematic use of digital data?” asked in the German companies survey (n=2004) conducted in 2017. Results: Initial and descriptive results will be presented. Content analysis resulted in classification of the 148 discussions into semantic units representing the advantages and disadvantages of big data adoption, list of potential stakeholders, and factors that could impede or facilitate it. Initial results show similarity in the content of beliefs and frequency rank across the independent data sources. For example, the most frequently cited advantage, in both data sources, German survey and web scraping, was better decision making (cited by 58% of survey participants and in 41% of scraped discussions that cited advantages). Conclusions and expected implications: Drawing upon web scraping of open-source discussions, we demonstrated initial results supporting the usefulness of using web scraping as an observational data collection method in first stages of identifying key beliefs underlying specific behaviors for a theory-driven belief-scale development. References: Adjerid, I., & Kelley, K. (2018). Big data in psychology: A framework for research advancement. American Psychologist, 73(7), 899-917. http://dx.doi.org/10.1037/amp0000190 Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50(2), 179-211. ‏ Ajzen, I., & Fishbein, M. (1980). Understanding attitudes and predicting social behavior.Englewood Cliffs, NJ: Prentice-Hall. Commerzbank Initiative Unternehmerperspektiven (2017). The Raw Material of the 21st century: Big Data, Smart Data – Lost Data? Retrieved from https://www.unternehmerperspektiven.de/portal/media/unternehmerperspektiven/up-startseite/2018_04_18_FL_UP_Studie_online_2018_EN.pdf. De Leeuw, A., Valois, P., Ajzen, I., & Schmidt, P. (2015). Using the theory of planned behavior to identify key beliefs underlying pro-environmental behavior in high-school students: Implications for educational interventions. Journal of Environmental Psychology, 42, 128-138. ‏ Eagly, A. H., & Chaiken, S. (1993). The psychology of attitudes. Harcourt Brace Jovanovich College Publishers. ‏ Farnadi, G., Sitaraman, G., Sushmita, S., Celli, F., Kosinski, M., Stillwell, D., ... & De Cock, M. (2016). Computational personality recognition in social media. User Modeling and User-Adapted Interaction, 26(2-3), 109-142. ‏ Fishbein, M., & Ajzen, I. (2010). Predicting and changing behavior: The reasoned action approach. Psychology Press. ‏Gucciardi, D. F. (2017). Mental toughness: progress and prospects. Current Opinion in Psychology, 16, 17-23. ‏ Harlow, L. L., & Oswald, F. L. (2016). Big data in psychology: Introduction to special issue. Psychological Methods, 21(4), 447–457. http://doi.org/10.1037/met0000120. Landers, R. N., Brusso, R. C., Cavanaugh, K. J., & Collmus, A. B. (2016). A primer on theory-driven web scraping: Automatic extraction of big data from the Internet for use in psychological research. Psychological Methods, 21(4), 475-492. ‏Moessner, M., Feldhege, J., Wolf, M., & Bauer, S. (2018). Analyzing big data in social media: Text and network analyses of an eating disorder forum. International Journal of Eating Disorders, 51(7), 656-667. Raguseo, E. (2018). Big data technologies: An empirical investigation on their adoption, benefits and risks for companies. International Journal of Information Management, 38(1), 187-195. ‏ ‏ ‏ Shmueli, G. (2010). To explain or to predict?. Statistical Science, 25(3), 289-310. Steinmetz, H., Knappstein, M., Ajzen, I., Schmidt, P., & Kabst, R. (2016). How effective are behavior change interventions based on the theory of planned behavior?. Zeitschrift für Psychologie, 224(3), 216–233. ‏‏

Persistent Identifier

Date of first publication

2019-05-28

Is part of

Research Synthesis 2019 incl. Pre-Conference Symposium Big Data in Psychology, Dubrovnik, Croatia

Publisher

ZPID (Leibniz Institute for Psychology Information)

Citation

Gordoni, G., Steinmetz, H., & Schmidt, P. (2019). Usability of web scraping of open-source discussions for identifying key beliefs. ZPID (Leibniz Institute for Psychology Information). https://doi.org/10.23668/psycharchives.2469
  • Author(s) / Creator(s)
    Gordoni, Galit
  • Author(s) / Creator(s)
    Steinmetz, Holger
  • Author(s) / Creator(s)
    Schmidt, Peter
  • PsychArchives acquisition timestamp
    2019-06-11T13:06:07Z
  • Made available on
    2019-06-11T13:06:07Z
  • Date of first publication
    2019-05-28
  • Abstract / Description
    Background: The recent years has brought tremendous interest in the collection and use of Big Data. While in the first phase of interest, the discussion largely focused on practical and societal issues, researchers have begun to consider the use of Big Data for scientific uses. In Psychology, there is an increasing interest in the usability of user-generated data for addressing psychological research questions (Adjerid & Kelley, 2018; Harlow & Oswald, 2016). As a prominent data collection method, web scraping (i.e., an automated tool for finding and extracting data from online sources) has been used for research on eating disorders (Moessner, et al., 2018), mental toughness (Gucciardi, 2017) and personality (Farnadi et al., 2016). One frequent characteristic of common Big Data analytics is its exploratory nature. In contrast, researchers increasingly demand to use it for theory-relevant research (e.g., Shmueli, 2010). Although web scraping is increasingly applied it is still not clear whether posts, can serve as a valuable data source in theory-driven empirical studies. In this study we address the lack of knowledge on usability of user-generated data for assessing research questions concerning beliefs of people (Eagly & Chaiken, 1993). As a relevant, theoretical framework that focuses on the fundamental role of beliefs in interventions, we draw on the well replicated social psychological theory—the Theory of Planned Behavior (TPB; Ajzen, 1991). The theory integrates the cognitive foundation of motivational and decision processes (i.e., the beliefs) with attitudes, perceptions of social legitimization, efficacy, and feasibility of the behavior in question (Fishbein & Ajzen, 2010). Briefly, the theory claims that deliberate behavior is mainly determined by the intention to perform the behavior. The intention, in turn, is a function of the attitude towards the behavior (i.e., the perceived attractiveness of the behavior), the subjective norm (i.e., the perceived expectations of important others towards conducting the behavior), and the perceived behavioral control (i.e., the perceived feasibility and control with regard to the behavior). Furthermore, the theory claims that these motivationally relevant factors are based on beliefs about positive and negative consequences of the behavior, the opinions of specific others and barriers and facilitators. The TPB serves as a central theoretical framework for understanding and changing behaviors. Since changing beliefs is the essence of intervention approaches, knowledge about potent beliefs of potential benefits, costs, social expectations, barriers, and facilitators of the behavior, is not only of theoretical value but provides the basis for practical endeavours to change behaviors (Steinmetz et al. 2016). The initial stage in a TPB driven study includes identifying motivationally relevant key beliefs via a qualitative pilot study. While this procedure (Ajzen & Fishbein, 1980; Fishbein & Ajzen, 2010) has been fruitful for identifying relevant beliefs for decades of TPB research, it has the limitation that the number of respondents is very small and that the approach runs the danger of reactive responses. Especially in cases with a non-familiar behavior, the comments may lack validity and will not concern those beliefs which occur in a natural decision process. In this study we focus on the potential of open-source discussions to serve as an additional data source that resembles the pitfalls of self-reported answers. Users comments are produced by individuals concerned with consequences of the behavior in question or expected difficulties of conducting the behavior, formulated in a natural setting, with no potential response bias due to factors, such as, interviewer effect, topic complexity and topic sensitivity. Objectives: We aim to advance the knowledge on the usability of integrating web scraping of web discussions in the initial stage of theory-driven belief study, for identifying key beliefs underlying behaviors under interest. Research questions: We use the behavior of Big Data adoption in organizations as an illustrative case for testing the following questions: 1. What are the key beliefs concerning Big Data adoption (behavioral beliefs, normative beliefs and control beliefs)? 2. Do key behavioral, normative and control beliefs concerning Big Data adoption identified in user-generated posts differ from those identified in self-report surveys? Method: We conducted web scraping study of discussion boards on Big Data usage in Israel, generated between June and August 2018. Discussions appeared mainly after online articles (41%), in social networks (25%) and forums (19%). Unit of analysis was the complete discussion beginning with the opening post up to the closing one. 353 authentic discussions (i.e., containing at least 2 comments) were scraped. Content analysis was conducted, manually for a sample of 148 authentic discussions. We applied the methodology used for identifying key beliefs in TPB driven studies (de Leeuw et al., 2015) for counting the number of times a given category of comment content appeared across discussions. Second, following Landers et al. (2016), we compared the beliefs found via web scraping with representative surveys in French companies (Raguseo, 2018) and in German companies (Commerzbank AG, 2018). These external data sources serve as a base rate for testing the replicability of key beliefs found in the web scraping data. For comparison we used for example the response distribution of the following multiple response question “What are the benefits to companies from the systematic use of digital data?” asked in the German companies survey (n=2004) conducted in 2017. Results: Initial and descriptive results will be presented. Content analysis resulted in classification of the 148 discussions into semantic units representing the advantages and disadvantages of big data adoption, list of potential stakeholders, and factors that could impede or facilitate it. Initial results show similarity in the content of beliefs and frequency rank across the independent data sources. For example, the most frequently cited advantage, in both data sources, German survey and web scraping, was better decision making (cited by 58% of survey participants and in 41% of scraped discussions that cited advantages). Conclusions and expected implications: Drawing upon web scraping of open-source discussions, we demonstrated initial results supporting the usefulness of using web scraping as an observational data collection method in first stages of identifying key beliefs underlying specific behaviors for a theory-driven belief-scale development. References: Adjerid, I., & Kelley, K. (2018). Big data in psychology: A framework for research advancement. American Psychologist, 73(7), 899-917. http://dx.doi.org/10.1037/amp0000190 Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50(2), 179-211. ‏ Ajzen, I., & Fishbein, M. (1980). Understanding attitudes and predicting social behavior.Englewood Cliffs, NJ: Prentice-Hall. Commerzbank Initiative Unternehmerperspektiven (2017). The Raw Material of the 21st century: Big Data, Smart Data – Lost Data? Retrieved from https://www.unternehmerperspektiven.de/portal/media/unternehmerperspektiven/up-startseite/2018_04_18_FL_UP_Studie_online_2018_EN.pdf. De Leeuw, A., Valois, P., Ajzen, I., & Schmidt, P. (2015). Using the theory of planned behavior to identify key beliefs underlying pro-environmental behavior in high-school students: Implications for educational interventions. Journal of Environmental Psychology, 42, 128-138. ‏ Eagly, A. H., & Chaiken, S. (1993). The psychology of attitudes. Harcourt Brace Jovanovich College Publishers. ‏ Farnadi, G., Sitaraman, G., Sushmita, S., Celli, F., Kosinski, M., Stillwell, D., ... & De Cock, M. (2016). Computational personality recognition in social media. User Modeling and User-Adapted Interaction, 26(2-3), 109-142. ‏ Fishbein, M., & Ajzen, I. (2010). Predicting and changing behavior: The reasoned action approach. Psychology Press. ‏Gucciardi, D. F. (2017). Mental toughness: progress and prospects. Current Opinion in Psychology, 16, 17-23. ‏ Harlow, L. L., & Oswald, F. L. (2016). Big data in psychology: Introduction to special issue. Psychological Methods, 21(4), 447–457. http://doi.org/10.1037/met0000120. Landers, R. N., Brusso, R. C., Cavanaugh, K. J., & Collmus, A. B. (2016). A primer on theory-driven web scraping: Automatic extraction of big data from the Internet for use in psychological research. Psychological Methods, 21(4), 475-492. ‏Moessner, M., Feldhege, J., Wolf, M., & Bauer, S. (2018). Analyzing big data in social media: Text and network analyses of an eating disorder forum. International Journal of Eating Disorders, 51(7), 656-667. Raguseo, E. (2018). Big data technologies: An empirical investigation on their adoption, benefits and risks for companies. International Journal of Information Management, 38(1), 187-195. ‏ ‏ ‏ Shmueli, G. (2010). To explain or to predict?. Statistical Science, 25(3), 289-310. Steinmetz, H., Knappstein, M., Ajzen, I., Schmidt, P., & Kabst, R. (2016). How effective are behavior change interventions based on the theory of planned behavior?. Zeitschrift für Psychologie, 224(3), 216–233. ‏‏
    en_US
  • Citation
    Gordoni, G., Steinmetz, H., & Schmidt, P. (2019). Usability of web scraping of open-source discussions for identifying key beliefs. ZPID (Leibniz Institute for Psychology Information). https://doi.org/10.23668/psycharchives.2469
    en
  • Persistent Identifier
    https://hdl.handle.net/20.500.12034/2095
  • Persistent Identifier
    https://doi.org/10.23668/psycharchives.2469
  • Language of content
    eng
    en_US
  • Publisher
    ZPID (Leibniz Institute for Psychology Information)
    en_US
  • Is part of
    Research Synthesis 2019 incl. Pre-Conference Symposium Big Data in Psychology, Dubrovnik, Croatia
    en_US
  • Dewey Decimal Classification number(s)
    150
  • Title
    Usability of web scraping of open-source discussions for identifying key beliefs
    en_US
  • DRO type
    conferenceObject
    en_US
  • Leibniz institute name(s) / abbreviation(s)
    ZPID
  • Visible tag(s)
    ZPID Conferences and Workshops