Authors:

(1) Yi-Ling Chung, The Alan Turing Institute ([email protected]);

(2) Gavin Abercrombie, The Interaction Lab, Heriot-Watt University ([email protected]);

(3) Florence Enock, The Alan Turing Institute ([email protected]);

(4) Jonathan Bright, The Alan Turing Institute ([email protected]);

(5) Verena Rieser, The Interaction Lab, Heriot-Watt University and now at Google DeepMind ([email protected]).

Abstract and 1 Introduction

2 Background

3 Review Methodology

4 Defining counterspeech

4.1 Classifying counterspeech

5 The Impact of Counterspeech

6 Computational Approaches to Counterspeech and 6.1 Counterspeech Datasets

6.2 Approaches to Counterspeech Detection and 6.3 Approaches to Counterspeech Generation

7 Future Perspectives

8 Conclusion, Acknowledgements, and References

6 Computational Approaches to Counterspeech

In this section, we switch the focus to look at literature on counterspeech emerging from the field of computer science. We tackle three subjects in particular: the datasets being used in these studies, approaches to counterspeech detection, and approaches to counterspeech generation.

6.1 Counterspeech Datasets

Approaches for counterspeech collection focus on gathering two different kinds of datasets: spontaneously produced comments crawled from social media platforms, and deliberately created responses aiming to contrast hate speech. In the first case, content is retrieved based on keywords/hashtags related to targets of interest (Mathew et al., 2018; Vidgen et al., 2020; He et al., 2022; Vidgen et al., 2021) or from pre-defined counterspeech accounts (Garland et al., 2020). In principle, due to the easily accessible API required for data retrieval, the majority of datasets are collected from social media platforms including Twitter (Mathew et al., 2018; Procter et al., 2019; Garland et al., 2020; Kennedy et al., 2020; Vidgen et al., 2020; He et al., 2022; Goffredo et al., 2022; Toliyat et al., 2022; Lin et al., 2022), and only a few are retrieved from Youtube (Mathew et al., 2019; Kennedy et al., 2020; Priyadharshini et al., 2022) and Reddit (Kennedy et al., 2020; Vidgen et al., 2021; Lee et al.,2022; Yu et al., 2022), respectively (though again it is worth noting that at the time of writing the Twitter API was starting to become a lot less accessible).

In the second category, counterspeech is written by crowd workers (Qian et al., 2019) or operators expert in counterspeech writing (Chung et al., 2019, 2021b). While such an approach is expected to offer relatively controlled and tailored responses, writing counterspeech from scratch is timeconsuming and requires human effort. To address this issue, advanced generative language models are adopted to automatically produce counterspeech (Tekiroglu et al., 2020; Fanton et al., 2021; ˘ Bonaldi et al., 2022), as we will discuss further below.

Regarding granularity of taxonomies, most existing datasets provide binary annotation (counterspeech/non-counterspeech) (Garland et al., 2020; Vidgen et al., 2020; He et al., 2022; Vidgen et al., 2021), while three datasets feature annotations of the types of counterspeech (Mathew et al., 2018, 2019; Chung et al., 2019). In terms of hate incidents, datasets are available for several hate phenomena such as islamophobia (Chung et al., 2019) and East Asian prejudice during COVID-19 pandemic (Vidgen et al., 2020; He et al., 2022). The aforementioned datasets are mostly collected and analyzed at the level of individual text, not at discourse or conversations (e.g., multi-turn dialogues (Bonaldi et al., 2022)). Most of the datasets are in English, while only a few target multilinguality, including Italian (Chung et al., 2019; Goffredo et al., 2022), French (Chung et al., 2019), German (Garland et al., 2020), and Tamil (Priyadharshini et al., 2022).

This paper is available on arxiv under CC BY-SA 4.0 DEED license.