Launched in 2005, Reddit is a popular social news website widely known as ‘the front face of the internet.’ Researchers have studied Reddit for more than a decade, making a considerable impact in fields such as public health (Park and Conway, 2018b), politics (Soliman et al., 2019), religion (Lundmark, 2013), cybersecurity (Jasser et al., 2021), mental health (Choudhury and De, 2014), humanities (Willaert et al., 2021), among others (Amaya et al., 2019; Proferes et al., 2021). However, to date, there is no systematic work detailing how investigators engage, from a methodological point of view, in scientific practice when it comes to using data from Reddit. This chapter makes three main contributions. First, it endorses Reddit’s suitability as a solid corpus for research, while raising awareness about the ethical responsibilities that come with handling such rich, diverse, and unrestricted data. Second, this chapter provides a detailed manual of the data science process to guide prospective researchers and practitioners when working with Reddit data. Third, it compiles state-of-the-art analytical methods for which the platform shows substantial potential.
The Reddit Data Analysis Pipeline for Researchers
<< Go back to publications