Mirko Lai

Assistant Professor (RTD-A) in the Department of Computer Science at the University of Turin (Italy).

I achieved the Ph.D degree in Computer Science in 2019 and I was a Postdoctoral researcher in the Department of Computer Science at the University of Turin until 2022. I currently hold an Assistant Professor position (RTD-A) in the same department and I work on the Quantifying access to services and facilities in inclusive urban spaces project.
My skills and interests include network and social media analysis, natural language processing, and data visualisations. This interests led me to investigate group formation and segregation in communications among users about political debates. I then found myself dealing with stance and hate speech detection with the aim of exploring communications where people discuss their different points of view.


ORCID iD iconorcid.org/0000-0003-1042-0861


Education

Università degli studi di Torino
Universitat Politècnica de València

PhD. in Computer Science
Stance detection consists in automatically determine whether the author of a post is in favour or against a target of interest, or whether the opinion toward the given target can not be inferred. We address the problem of stance detection in social media focusing on polarised political debates in Twitter. We then explore the communications which take place in these polarised debates shedding some light on dynamics of communications among people having concordant or contrasting opinions, particularly focusing on observing opinions' shifting. We deal stance detection in a multilingual perspective exploring tweets in English, Spanish, Catalan, French and Italian. We finally propose machine learning models for addressing stance detection as a classification problem. We also explore features based on the textual content of the tweet, but also features based on contextual information that do no emerge directly from the text.
Grade: Cum Laude
This thesis, developed in co-tutelle between Università degli Studi Di Torino and Universitat Politècnica de València, has obtained International Doctorate mention.
November 2014 - February 2019

Università degli studi di Torino

Master degree in Computer Science
The thesis describes my research activities performed during the internship supervised by Prof. Giancarlo Ruffo at the Telecom Italia Lab of Turin. The purpose of this research is the development of a Web application for supporting Data Journalism. The main idea is collecting heterogeneous data from different sources (such as Social Media, Open Data and newspaper articles) for showing advanced infovis that might help to support journalists in their work. The infovis could suggest the emergence of a news from the social media; analyze patterns and occurrences; highlight correlations and co-occurrence between entities present in articles and entities present in tweets; help in formulate consideration regarding public reaction when important events with strong geographical connotation happen.
October 2011 - July 2014

Università degli studi di Cagliari

Bachelor degree in Computer Science
The thesis shows the fields of application of Wearable Sensor Networks focusing on athletic performance evaluation. Advanced detection techniques such as photofinish, Scan'O'Vision and FinishLynx are currently used, but the need for high accuracy requires increasingly precise instruments. Miniaturization of sensors leads us to believe that Wearable Sensor Networks could be used for evaluating athletic performance in the future. After a theoretical background, we give you some suggestions for the creation of a system for evaluating athletic gesture. Although we use dummy data, we positively judge the feasibility of the idea proposing different algorithms.
September 2006 - April 2010

Postdoctoral Fellowship

Università degli studi di Torino

Postdoctoral Fellowship: Approcci computazionali basati su NLP per la comprensione e il contrasto dei discorsi di incitamento all’odio nei social media
Research fellow in the project Be Positive!Google.org Impact Challenge on Safety
The project is aimed at automatically collecting and identifying online Hate Speech in order to increase positive contents addressed to groups vulnerable to discriminations and promote their active presence on social media. It involves the improvement of Hate maps developed within the project Contro l'odio the creation of an Automatic Writing Assistant, a tool that automatically suggests positive contents against Hate Speech the organisation of training courses addressed to schools, journalists, communication experts, health workers, minorities, and activists. I manly deal with the develop of automatic models for detecting hate speech detection and for creating the automatic writing assistant (2021).
August 2020 - February 2022

Università degli studi di Torino

Postdoctoral Fellowship: HOME - Hierarchical Open Manufacturing Europe: data analysis in the energy consumption and production field
Research fellow in the project HOME: Hierarchical Open Manufacturing Europe founded by Regione Piemonte (POR-FESR 2014/2020)
The project is aimed at creating an open system for innovating the enterprises using the resources already present in the company: data generated by the company's activity. Data-driven innovation will take advantage of using modern technologies without being bound to particular proprietary protocols. I manly deal with data analysis in the energy consumption and production field (2018).
January 2019 - May 2020

Teaching

Academic Year 2021/2022
Academic Year 2020/2021
Academic Year 2019/2020
Academic Year 2018/2019

Publications

  1. Cignarella Alessandra Teresa , Lai Mirko, Marra Andrea, Sanguinetti Manuela (2021)
    "La ministro è incinta": A Twitter Account of Women’s Job Titles in Italian
    Italian Conference on Computational Linguistics 2021. CLiC-it 2021.
  2. Frenda Simona, Cignarella Alessandra Teresa, Stranisci Marco Antonio, Lai Mirko, Bosco Cristina, Patti Viviana
    Recognizing Hate with NLP: The Teaching Experience of the #DeactivHate Lab in Italian High Schools
    Italian Conference on Computational Linguistics 2021. CLiC-it 2021.
  3. Celli Fabio, Lai Mirko, Duzha Armend, Bosco Cristina, Patti Viviana (2021)
    Policycorpus XL: An Italian Corpus for the Detection of Hate Speech Against Politics
    Italian Conference on Computational Linguistics 2021. CLiC-it 2021.
  4. Lai Mirko, Cignarella Alessandra Teresa, Finos Livio, Sciandra Andrea (2021)
    WordUp! at VaxxStance 2021: Combining Contextual Information with Textual and Dependency-Based Syntactic Features for Stance Detection.
    XXXVII International Conference of the Spanish Society for Natural Language Processing. IBERLEF 2021.
  5. Best of PAN 2021 Lab.
    Lai Mirko, Stranisci Marco Antonio,a Bosco Cristin, Damiano Rossana, Patti Viviana (2021)
    HaMor at the Profiling Hate Speech Spreaders on Twitter
    Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum. CLEF 2021.
  6. Vilella Salvatore, Lai Mirko, Paolotti Daniela, Ruffo Giancarlo (2020)
    Immigration as a Divisive Topic: Clusters and Content Diffusion in the Italian Twitter Debate.
    Future Internet
  7. Lai Mirko, Cignarella Alessandra Teresa, Hernández Farías Delia Irazú, Bosco Cristina, Patti Viviana, Rosso Paolo (2020)
    Multilingual stance detection in social media political debates.
    Computer Speech & Language
  8. Lai Mirko, Patti Viviana, Giancarlo Ruffo, Rosso Paolo (2020)
    #Brexit: Leave or Remain? the Role of User’s Community and Diachronic Evolution on Stance Detection.
    Journal of Intelligent & Fuzzy Systems,
  9. Capozzi Lupi Arthur Thomas Edward, Lai Mirko, Basile Valerio, PolettoFabio, Sanguinetti Manuela, Bosco Cristina, Patti Viviana, Ruffo Giancarlo, Musto Cataldo, Polignano Marco, Semeraro Giovanni, Stranisci Marco Antonio (2020)
    “Contro L’Odio”: A Platform for Detecting, Monitoring and Visualizing Hate Speech against Immigrants in Italian Social Media
    IJCoL. Italian Journal of Computational Linguistics
  10. Benedetto Federico, Brunetti Davide, Gena Cristina, Lai Mirko, Meo Rosa, Vernero Fabiana (2020)
    Intelligent monitoring applications for Industry 4.0.
    IUI 2020: Proceedings of the 25th International Conference on Intelligent User Interfaces Companion
  11. Alessandra Teresa Cignarella, Mirko Lai, Cristina Bosco, Viviana Patti and Paolo Rosso (2020)
    SardiStance@EVALITA2020: Overview of the Task on Stance Detection in Italian Tweets.
    EVALITA 2020: Proceedings of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian
  12. Lai Mirko, Tambuscio Marcella, Pattia Viviana, Ruffo Giancarlo, Rosso Paolo (2019)
    Stance polarity in political debates: A diachronic perspective of network homophily and conversations on Twitter.
    Data & Knowledge Engineering
  13. Florio Komal, Basile Valerio, Lai Mirko, Patti Viviana (2019)
    Leveraging Hate Speech Detection to Investigate Immigration-related Phenomena in Italy.
    ACIIW 2019: Proceedings of the 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos
  14. Lai Mirko (2019)
    On Language and Structure in Polarized Communities
    Ph.D. Thesis
  15. Capozzi Lupi Arthur Thomas Edward, Lai Mirko, Basile Valerio , Musto Cataldo, Polignano Marco, Poletto Fabio, Sanguinetti Manuela, Bosco Cristina, Patti Viviana, Ruffo Giancarlo, Semeraro Giovanni, Stranisci Marco Antonio(2019)
    Computational Linguistics Against Hate: Hate Speech Detection and Visualization on Social Media in the "Contro L’Odio" Project
    CLiC-it 2019: Proceedings of the 6th Italian Conference on Computational Linguistics
  16. Mencarini Letizia, Hernández Farías Delia Irazú, Lai Mirko , Patti Viviana, Sulis Emilio, Vignoli Daniele (2019)
    Happy parents’ tweets: An exploration of Italian Twitter data using sentiment analysis.
    Demographic Research
  17. Lai Mirko, Meo Rosa, Schifanella Rossano, Sulis Emilio (2018)
    The role of the network of matches on predicting success: An example applied to table tennis.
    Journal of Sports Sciences
  18. Cignarella Alessandra Teresa, Bosco Cristina, Patti Viviana, Lai Mirko (2018)
    TWITTIRO’ – an Italian Twitter Corpus with a Multi-layered Annotation for Irony
    Italian Journal of Computational Linguistics
  19. Lai Mirko, Patti Viviana, Ruffo Giancarlo, Rosso Paolo (2018)
    Stance Evolution and Twitter Interactions in an Italian Political Debate
    NLDB 2018: Proceedings of the 23rd International Conference on Applications of Natural Language to Information Systems
  20. Alessandra Teresa Cignarella, Cristina Bosco, Patti Viviana, Lai Mirko (2018)
    Application and Analysis of a Multi-layered Scheme for Irony on the Italian Twitter Corpus TWITTIRÒ
    LREC 2018: In Proceedings of the 11th edition of the Language Resources and Evaluation Conference
  21. Rapp Amon, Marcengo Alessandro, Buriano Luca, Ruffo Giancarlo, Lai Mirko, Cena Federica (2018)
    Designing a personal informatics system for users without experience in self-tracking: a case study
    Behaviour & Information Technology
  22. Basile Valerio, Lai Mirko, Sanguinetti Manuela (2018)
    Long-term Social Media Data Collection at the University of Turin.
    CLiC-it 2018: Proceedings of the 5th Italian Conference on Computational Linguistics
  23. Lai Mirko, Tambuscio Marcella, Patti Viviana, Ruffo Giancarlo, Rosso Paolo (2017)
    Extracting Graph Topological Information and Users’ Opinion.
    CLEF 2017: Proceedings of the 8th of the Conference and Lab of the Evaluation Forum
  24. Hernandez-Farias Delia Irazu, Lai Mirko, Mencarini Letizia, Mozzachiodi Michele, Patti Viviana, Sulis Emilio, Vignoli Daniele (2017)
    Happy Parents’ Tweet? An exploration of 3 Milion Italian Twitter Data.
    The 2017 International Population Conference: IPC 2017.
  25. Lai Mirko, Cignarella Alessandra Teresa, Hernandez-Farias Delia Irazu (2017)
    iTACOS at IberEval2017: Detecting stance in catalan and spanish tweets
    IberEval 2017: Proceedings of the 2nd Workshop on Evaluation of Human Language Technologies for Iberian Languages
  26. Lai Mirko, Farías Hernández Delia Irazu, Patti Viviana, Rosso Paolo (2016)
    Friends and Enemies of Clinton and Trump: Using Context for Detecting Stance in Political Tweets
    MICAI 2016: Proceedings of the 15th Mexican International Conference on Artificial Intelligence
  27. Cristina Bosco, Mirko Lai, Viviana Patti, Francisco M. Rangel Pardo, Paolo Rosso (2016)
    Tweeting in the Debate about Catalan Elections.
    LREC 2016: Proceedings of the LREC 2016 Workshop “Emotion and Sentiment Analysis” at the 10th International Conference on Language Resources and Evaluation
  28. Sulis Emilio, Bosco Cristina, Patti Viviana, Lai Mirko, Farías Hernández Delia Irazu, Letizia Mencarini, Mozzachiodi Michele, Vignoli Daniele (2016)
    Subjective Well-Being and Social Media. A Semantically Annotated Twitter Corpus on Fertility and Parenthood
    CLiC-it 2016: Proceedings of the 3dr Italian Conference on Computational Linguistics
  29. Bosco Cristina, Lai Mirko, Patti Viviana, Virone Daniela (2016)
    Tweeting and Being Ironic in the Debate about a Political Reform: the French Annotated Corpus TWitter-MariagePourTous
    LREC 2016: Proceedings of the 10th International Conference on Language Resources and Evaluation
  30. Mirko Lai, Cristina Bosco, Viviana Patti and Daniela Virone (2015)
    Debate on political reforms in Twitter: A hashtag-driven analysis of political polarization
    DSAA 2015: IEEE International Conference on Data Science and Advanced Analytics
  31. Daniela Virone, Mirko Lai (2015)
    Dans un corpus hybride : les messages twittés, l’intertextualité et la formule
    ICODOC 2015: Colloque Jeunes Chercheurs du Laboratoire ICAR
  32. Lai Mirko, Virone Daniela, Bosco Cristina, Patti Viviana (2015)
    Building a Corpus on a Debate on Political Reform in Twitter.
    CLiC-it 2015: Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015
  33. Emilio Sulis, Mirko Lai, Manuela Vinai, Manuela Sanguinetti (2015)
    Exploring sentiment in social media and official statistics: a general framework
    ESSEM 2015: Proceedings of the 2nd International Conference on Emotion and Sentiment in Social and Expressive Media: Opportunities and Challenges for Emotion-aware Multiagent Systems

Recognitions

  • VaxxStance: Going Beyond Text in Crosslingual Stance Detection at IberLEF 2021 : The goal of the competition is to determine the author’s stance from tweets written both in Spanish and Basque on the topic of the Antivaxxers movement. We ranked as the 1st position in stance detection in all proposed sub-tasks for both languages among 3 participating teams.
    Lai Mirko, Cignarella Alessandra Teresa, Finos Livio, Sciandra Andrea (2021)
    WordUp! at VaxxStance 2021: Combining Contextual Information with Textual and Dependency-Based Syntactic Features for Stance Detection.
    XXXVII International Conference of the Spanish Society for Natural Language Processing. IBERLEF 2021.
  • Profiling Hate Speech Spreaders on Twitter at CLEF 2021: Teams are invited to develop a model that, given a Twitter feed of 200 messages (written in English and Spanish), determines whether its author spreads hatred contents. We ranked as the 19th position - over 66 participating teams - according to the averaged accuracy over the two languages. We obtained the 43th higher accuracy for English and the 2nd higher accuracy for Spanish.
    Best of PAN 2021 Lab.
    Lai Mirko, Stranisci Marco Antonio,a Bosco Cristin, Damiano Rossana, Patti Viviana (2021)
    HaMor at the Profiling Hate Speech Spreaders on Twitter
    Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum. CLEF 2021.
  • Stance and Gender Detection in Tweets on Catalan Independence shared task at IberEval 2017: The task is articulated into two subtasks about information contained in Twitter messages written both in Catalan and Spanish: the first subtask is related to detecting author’s stance towards the independence of Catalonia, while the second one aims at identifying their gender. We ranked as the 1st position in stance detection task for both languages among 10 participating teams; while in gender detection we ranked as fourth and third respectively for Catalan and Spanish.
    Lai Mirko, Cignarella Alessandra Teresa, Hernandez-Farias Delia Irazu (2017)
    iTACOS at IberEval2017: Detecting stance in catalan and spanish tweets
    IberEval 2017: Proceedings of the 2nd Workshop on Evaluation of Human Language Technologies for Iberian Languages

Other

  • Organizer of the first shared task regarding Stance Detection in Italian Tweets: SardiStance 2020.
    The SardiStance shared task has been organised within EVALITA 2020 , the 7th evaluation campaign of Natural Language Processing and Speech tools for Italian, which will be held online (due to the state of emergency regarding the covid-19 pandemic), on December 16th and 17th, 2020.
  • In the development team of Il Barometro Dell'Odio Nello Sport :
    the Barometro Dell’Odio Nello Sport (within the project Odiare non è uno sport. Percorsi educativi per prevenire e contrastare l’hate speech razziale nello sport - AID 011797) is the first Italian research that monitors the social pages of the five main Italian sports magazines with the aim of monitoring the hate speech in sport. I contributed to develop a unsupervised model for detection hate speech in Italian tweets about sports. I also developed a domain-specific retrieval-based chatter bot. The chatter bot returns an automatic response formulated with the main purpose of softening the tones emphasising the offensiveness of a message within a social media thread (2019-2021).
  • In the development team of Covid-19 Semantic Browser :
    COVID-19 Semantic Browser is an innovative search engine based on advanced models of artificial intelligence and which allows you to perform "semantic searches" (fast and precise) in the myriad of existing publications on the subject. I have been mainly working to develop the front-end employing Angular.js (2019).
  • Researcher in the project Contro L’Odio : Ministero del Lavoro e delle Politiche Sociali. (Contro l’odio: tecnologie informatiche, percorsi formativi e storytelling partecipativo per combattere l’intolleranza, avviso n.1/2017 per il finanziamento di iniziative e progetti di rilevanza nazionale ai sensi dell’art. 72 del decretolegislativo 3 luglio 2017, n. 117 - anno 2017 )
    The project aims to respond to the presence of hatred on the web problem. I’m mainly involved in automatically finding Hate Speeches by creating natural languages models. I’m also involved in developing and maintaining of the web application called Mappa dell’Odio (Hate Speech Map) (2018-Today).
  • In the development team of Mappa Contro L’Odio :
    the Mappa dell’Odio (Hate Speech Map) is choropletic map that shows the quantity of hate speech present in Italy using a colour scale that goes from a shade of white (absence of hatred) to one of red (strong presence of hatred). The data are collected and automatically analyses by a supervised classifier that recognises hate speech against vulnerable groups. I dealt with data collection, data analysing, and data transfer to the front end (2018-Today).
  • Researcher in the project SWELL-FER: Subjective Well-being and Fertility – ERC Grant Agreement
    The project is about subjective well-being (SWB) and demographic behaviour, with a particular focus on fertility in advanced societies, across time and space exploring social media posts. I participated in the data collection, data analysis, and data visualisation. I also helped to create an automatic tool for sentiment-related classification tasks in this domain (2013-2018).
  • Researcher in the project Visualization tools for complex information and heterogeneous data analysis founded by Telecom Italia S.p.A:
    The project was articulated in two parts. The first part consisted in developing a toolkit for supporting Data Journalism making available, through usable info visualisations, aggregated data collected from social media, news feeds, and linked open data sources. I participated in the data collection, data storage, data analysis, and data visualisation. The second part aimed to designing a personal informative self-tracking system that allows users’ to quantify self. I was involved in the collection of data from heterogeneous wearable sensors, in data storage, data homologation, and data analysis (2013-2015).