HaSpeeDe3@Evalita 2023
The HaSpeeDe 3 (Hate Speech Detection) shared task will be organized within Evalita 2023, the 8th evaluation campaign of Natural Language Processing and Speech tools for Italian, which will be held in Parma on the 7th-8th September 2023.
Introduction and Motivation
Online hateful content, or Hate Speech (HS), is characterised by some key aspects (such as virality, or presumed anonymity) which distinguish it from offline communication and make it potentially more dangerous and hurtful. Therefore, its identification has become a crucial mission in many fields.
From an NLP perspective, much attention has been paid to the topic of HS – together with all its possible facets and related phenomena, such as offensive/abusive language, and so on – and its identification. his is shown by the proliferation, especially in the last few years, of contributions on this matter ([9], [3], [11], [12], [16] to name a few), corpora and lexica (e.g. [13], [15], [2]), dedicated workshops, and shared tasks within national (GermEval,HASOC, IberLEF) and international (SemEval) evaluation campaigns (see in particular [1]).
Indeed, this shared task has reached the third edition, with a share, always increasing, of a highly diverse audience that, in many cases, participated in both the two previous editions organised within Evalita 2018 [6] and Evalita 2020 [17].
The new edition differs from the previous one while maintaining the continuity in analysing and contrasting HS on social media. HaSpeeDe [4] and HaSpeeDe 2 [17] focused on HS against immigrants, Muslims and Roms; whereas HaSpeeDe 3 explores hate speech in strong polarised debates, in particular concerning political and religious topics. Attention on strong polarised debates leads the task to explore contextual information about the authors of the tweets by means of their social media network, since political and religious self-identification may lead to hard conflict with the members of other political affiliations or worships.
Two datasets called PolicyCorpusXL and ReligiousHate are available:
- PolicyCorpusXL [18], [19] contains 7,000 tweets about political debates.
- ReligiousHate [20] is composed by 3,000 tweets about the three main monotheistic religions, namely Christianity, Islam and Judaism.
Task description
This Third Edition focuses on Hate Speech in Twitter proposing 2 tasks:
- Task A: Political Hate Speech Detection: a binary classification task aimed at determining whether the message contains Hate Speech or not
- Textual: participants can only use the provided textual content of the tweets from PolicyCorpusXL for development.
- Contextual: participants can employ for development the textual content of the tweets plus contextual information that will be given to them (i.e., metadata of the tweet and author, friends, retweets, and reply relations).
- Task B: Cross-domain Hate Speech Detection: a binary classification task with test data from different domains. The main objective is to explore cross-domain hate speech detection in two evaluation settings
- XPoliticalHate: the test set will consist of tweets from PolicyCorpusXL (the same as in Task A), but participants can use any kind of external data from other hate domains
- XReligiousHate: the task here will be recognizing religious hate, therefore the test set will consist of tweets from the ReligiousHate corpus
Important dates
- 7th February 2023: development data available to participants
- 30th April 2023: registration closes
- 7th May 2023: test data available to participants
- 2nd-19th May 2023: evaluation windows
- 30th May 2023: assessment returned to participants
- 14th June 2023: final reports due to task organizers
- 10th July 2023: review deadline
- 25th July 2023: camera ready version deadline
- 7th-8th September 2023: final workshop in Parma
References
[1] Valerio Basile, Cristina Bosco, Elisabetta Fersini, Debora Nozza, Viviana Patti, Francisco Rangel, Paolo Rosso, and Manuela Sanguinetti. SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter. In Proceedings of SemEval 2019, pages 54–63. Association for Computational Linguistics, 2019.
[2] Elisa Bassignana, Valerio Basile, and Viviana Patti. Hurtlex: A Multilingual Lexicon of Words to Hurt. In Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it2018), pages 1–6. CEUR.org, 2018.
[3] Aditya Bohra, Deepanshu Vijay, Vinay Singh, Syed S Akhtar, and Manish Shrivastava. A Dataset of Hindi-English Code-Mixed Social Media Text for Hate Speech Detection. In Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media, pages 36–41. Association for Computational Linguistics(ACL), 2018.
[4] Cristina Bosco, Felice Dell’Orletta, Fabio Poletto, Manuela Sanguinetti, and Maurizio Tesconi. Overview of the EVALITA 2018 Hate Speech Detection Task. In Proceedings of the 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA’18), 2018.
[5] Arthur TE Capozzi, Mirko Lai, Valerio Basile, Fabio Poletto, Manuela Sanguinetti, Cristina Bosco, Viviana Patti, Giancarlo Ruffo, Cataldo Musto, Marco Polignano, et al. Computational linguistics against hate: Hate speech detection and visualization on social media in the "Contro L’Odio" project. In6th Italian Conference on Computational Linguistics, CLiC-it 2019, volume 2481, pages 1–6. CEUR-WS, 2019.
[6] Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso. EVALITA 2018: Overview of the 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Toolsfor Italian. Final Workshop (EVALITA 2018). CEUR.org, 2018.
[7] Gloria Comandini and Viviana Patti. An Impossible Dialogue! Nominal Utterances and Populist Rhetoric in an Italian Twitter Corpus of Hate Speech against Immigrants. In Proceedings of the Third Workshop on Abusive Language Online, pages 163–171. Association for Computational Linguistics, 2019.
[8] Gloria Comandini, Manuela Speranza, and Bernardo Magnini. Effective Communication with-out Verbs? Sure! Identification of Nominal Utterances in Italian Social Media Texts. In Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Torino, Italy, December 10-12, 2018, volume 2253 of CEUR Workshop Proceedings. CEUR-WS.org, 2018.
[9] Paula Fortuna, João Rocha da Silva, Juan Soler-Company, Leo Wanner, and Sérgio Nunes. A Hierarchically-Labeled Portuguese Hate Speech Dataset. In Proceedings of the Third Workshop on Abusive Language Online, pages 94–104, Florence, Italy, August 2019. Association for Computational Linguistics.
[10] Chiara Francesconi, Cristina Bosco, Fabio Poletto, and Manuela Sanguinetti. Error Analysis in a Hate Speech Detection Task: The case of HaSpeeDe-TW at EVALITA 2018. In CLiC-it, volume 2481 of CEUR Workshop Proceedings. CEUR-WS.org, 2019.
[11] Lei Gao, Alexis Kuppersmith, and Ruihong Huang. Recognizing Explicit and Implicit Hate Speech Using a Weakly Supervised Two-path Bootstrapping Approach. CoRR, abs/1710.07394:774–782, 2017.
[12] David Jurgens, Eshwar Chandrasekharan, and Libby Hemphill. A Just and Comprehensive Strategy for Using NLP to Address Online Abuse. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3658–3666. Association for Computational Linguistics (ACL), 2019.
[13] Rogers De Pelle and Viviane P Moreira. Offensive Comments in the Brazilian Web: a Dataset and Baseline Results. In Proceedings of the Fifth Brazilian Workshop on Social Network Analysis and Mining (BraSNAM 2016), pages 510–519, 2016.
[14] Fabio Poletto, Marco Stranisci, Manuela Sanguinetti, Viviana Patti, and Cristina Bosco. Hate Speech Annotation: Analysis of an Italian Twitter Corpus. In Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017). CEUR, December 2017.
[15] Manuela Sanguinetti, Fabio Poletto, Cristina Bosco, Viviana Patti, and Marco Stranisci. An Italian Twitter Corpus of Hate Speech against Immigrants. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18), pages 2798–2895. European Language Resources Association (ELRA), 2018.
[16] Tommaso Caselli, Valerio Basile, Jelena Mitrović, Inga Kartoziya, Michael Granitzer. I Feel Offended, Don’t Be Abusive! Implicit/Explicit Messages in Offensive and Abusive Language. In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020).
[17] Manuela Sanguinetti, Comandini Gloria, Elisa Di Nuovo, Simona Frenda, Marco Antonio Stranisci, Cristina Bosco, Caselli Tommaso, Viviana Patti, and Russo Irene. HaSpeeDe 2@ EVALITA2020: Overview of the EVALITA 2020 Hate Speech Detection Task. In Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA 2020). CEUR.org, 2020.
[18] Fabio Celli, Mirko Lai, Armend Duzha, Cristina Bosco, and Viviana Patti. Policycorpus XL: An Italian Corpus for the Detection of Hate Speech Against Politics. In Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021), volume 3033 of CEUR Workshop Proceedings, Aachen, Germany. CEUR-WS.org, 2022.
[19] Armend Duzha, Cristiano Casadei, Michael Tosi, and Fabio Celli. Hate versus Politics: Detection of Hate Against Policy Makers in Italian Tweets. SN Social Sciences, 1(9):1–15, 2021.
[20] Alan Ramponi, Benedetta Testa, Sara Tonelli, and Elisabetta Jezek. Addressing religious hate online: From taxonomy creation to automated detection. PeerJ Computer Science (in press).