sentipolc@Evalita 2014

SENTIPOLC (SENTIment POLarity Classification) will be organised within Evalita 2014, the fourth evaluation campaign of Natural Language Processing and Speech tools for Italian, which will be held in Pisa, Italy, on December 11, 2014.

Introduction and Motivation

The huge amount of information streaming from online social networking and micro-blogging platforms such as Twitter, is increasingly attracting the attention of many kinds of researchers and practitioners. In particular, the linguistic analysis of social media has become a relevant topic of research, in different languages. Several frameworks for detecting sentiments and opinions in social media have been developed for different application purposes, and Sentiment Analysis (SA) is recognized as a crucial tool in social media monitoring platforms providing business services. Indeed, social media user-generated contents constitute a valuable asset for firms to directly tap into the customer’s needs and preferences. For instance, tweets are a precious mine for grasping opinions of groups of people, possibly about a specific topic or product. Overall, extracting sentiments expressed in tweets has been used for several purposes: to monitor political sentiment (Tumasjan et al., 2010), to extract critical information during times of mass emergency (Verma et al., 2011), to detect moods and happiness in a given geographical area from geotagged tweets (Mitchell et al., 2013), and in several social media monitoring services.

In this shared task, we will focus on Italian texts from Twitter by launching a battery of related tasks with an increasing level of complexity. The main task concerns sentiment polarity classification at the message-level. Sentiments expressed in tweets are typically categorized as positive, negative or neutral, but a message can contain parts expressing both positive and negative sentiment (mixed sentiment), a feature that should be tackled.

The fact that Twitter communications include a high percentage of ironic messages cannot be neglected (González-Ibáñez et al, 2011; Reyes at al., 2013; Reyes at al., 2012; Davidov et al., 2011; Hao and Veale, 2013), and platforms monitoring the sentiment in Twitter messages experienced the phenomenon of wrong polarity classification in ironic messages (Bosco et. al, 2013). Indeed, the presence of ironic devices in a text can work as an unexpected ``polarity reverser” (one says something ``good” to mean something ``bad”), thus undermining systems’ accuracy. In order to investigate this issue, our dataset will include ironic messages, and we will define a pilot ancillary subtask concerning irony detection.

At SemEval 2013, Task 2 focussed on Sentiment Analysis in English tweets (Nakov et. al 2013), and attracted a great number of participants. We believe that a similar task for Italian and the development of a standard sentiment corpus to promote research will lead to a better understanding of how sentiment is conveyed in tweets. Training and testing automatic systems obviously requires the availability of several resources that may consist in large datasets of annotated posts or even in lexical databases where affective words are associated with polarity values (e.g. Baccianella et al; Strapparava and Valitutti, 2004). Considering that the availability of such resources for Italian is currently very limited (Basile and Nissim 2013; Bosco et. al, 2013), the organisation of this shared task aims at providing a contribution also on this respect.

Target Audience

The task is open to everyone from industry and academia.

Task description

The main goal of SENTIPOLC is sentiment classification at message level on Italian tweets. The task is divided into three sub-tasks with an increasing level of complexity. Participants may choose to participate in one or more sub-tasks. The first two are standard SA tasks, whereas the third one is a pilot task aimed at studying the presence of irony in tweets.

A) Subjectivity Classification

Given a message, decide whether the message is subjective or objective.

B) Polarity Classification

Given a message, decide whether the message is of positive, negative, neutral or mixed sentiment (i.e. conveying both a positive and negative sentiment).

C) Pilot Task: Irony Detection

Given a message, decide whether the message is ironic or not.


The dataset will include short documents taken from Twitter.

A detailed description of data (topics, annotation scheme applied, data format, etc.) can be found in the task guidelines.

The development and test dataset will be released in compliance with Twitter's terms. A web interface for realising the development data based on RESTful Web APIs is available here. A readme file briefly explains the steps to follow.


Each participating team will initially have access to the training data only. Later, the unlabelled test data will be released (see the timeframe below). After the assessment, the labels for the test data will be released as well.

The evaluation will be performed according to the standard metrics known in literature (precision, recall and F-measure). Details on evaluation metrics applied for the evaluation of the participant results in the Task guidelines.

How to participate

Register your team by using the registration web form at (available soon, see timeframe below).

Information about the submission of results and their format is available in the Task guidelines.

We invite the potential participants to subscribe to our mailing list in order to be kept up to date with the latest news related to the task. Please share comments and questions with the mailing list. The organizers will assist you for any potential issues that could be raised.

Participants will be required to provide an abstract and a technical report including a brief description of their approach, an illustration of their experiments, in particular techniques and resources used, and an analysis of their results for the publication in the Proceedings of contest; guidelines are outlined on the Evalita's 2014 web site.

Papers must be submitted in PDF format, following the CLIC-it conference style mentioned here and not exceeding 6 pages in length.
Submission of abstracts and technical reports is to be done electronically through the Easychair system:


  • 15th March 2014: on-line registration opens at Evalita registration 2014 web page
  • (moved to) 6th June 2014: development data available to participants
  • 8th September 2014: test data available, registration closes
  • 15th September 2014, 6pm CEST: system results due to organizers
  • 22th September 2014: assessment returned to participants
  • 23rd October 2014: technical reports due to organizers
  • 11 December 2014: final workshop


Baccianella, S., Esuli, A. and Sebastiani, F. (2010). Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Nicoletta Calzolari et al., editor, Proceedings of LREC.

Basile V. and Nissim M. (2013). Sentiment Analysis on Italian Tweets. Proc. of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 100–107. Association for Computational Linguistic.

Bosco, C. Patti, V. and Bolioli A. (2013). Developing Corpora for Sentiment Analysis: The Case of Irony and Senti-TUT. IEEE Intelligent Systems 28(2): 55-63.

Davidov, D., Tsur, O. and Rappoport, A.. (2010). Semi-supervised recognition of sarcastic sentences in Twitter and Amazon. In Proceedings of the 14th Conference on Computational Natural Language Learning (CoNLL '10). Association for Computational Linguistics, 107-116.

González-Ibáñez, R., Muresan, S., and Wacholder, N. (2011). Identifying sarcasm in Twitter: a closer look. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2 (HLT '11), Vol. 2., 581-586. ACL.

Hao, Y., Veale, T. (2010). An Ironic Fist in a Velvet Glove: Creative Mis-Representation in the Construction of Ironic Similes. Minds and Machines 20(4):635–650.

Mitchell, L., Frank, M. R., Harris, K. D. , Dodds, P. S. and Danforth C. M. (2013). The Geography of Happiness:
Connecting Twitter Sentiment and Expression, Demographics, and Objective Characteristics of Place. PLoS ONE, 8(5), 05.

Nakov, P. Kozareva, Z., Ritter, A., Rosenthal, S., Stoyanov, A. and Wilson, T. (2013). In Semeval-2013 Task 2: Sentiment Analysis in Twitter. Proceedings of the 7th International Workshop on Semantic Evaluation. Association for Computational Linguistics.

Reyes A., Rosso P., and Veale T. (2013). A Multidimensional Approach for Detecting Irony in Twitter. In: Language Resources and Evaluation, vol. 47, issue 1, pp. 239-268.

Reyes A., Rosso P., and Buscaldi D. (2012). From Humor Recognition to Irony Detection: The Figurative Language of Social Media. In: Data & Knowledge Engineering, vol. 74, pp.1-12.

Reyes A., Rosso P. (2014). On the Difficulty of Automatically Detecting Irony: Beyond a Simple Case of Negation. In: Knowledge and Information Systems, 40(3), pp. 595-614 .

Strapparava C. and Valitutti, A. (2004). Wordnet-affect: an affective extension of wordnet. In Proceedings of LREC.

Tumasjan, A., Sprenger, T.O., Sandner, P., and Welpe, I. (2010). Predicting elections with twitter: What 140 characters reveal about political sentiment. Proceedings of ICWSM.

Verma, S., Vieweg, S., Corvey, W.J., Palen, L., Martin, J.H., Palmer, M., Schram, A. and Anderson, K. M. (2011). Natural language processing to the rescue? extracting situational awareness tweets during mass emergency. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, pages 385–392. AAAI.



Valerio Basile, University of Groningen, The Netherlands

Andrea Bolioli, CELI, Torino, Italy

Malvina Nissim, FICLIT, University of Bologna Italy

Viviana Patti, Dipartimento di Informatica, University of Torino, Italy

Paolo Rosso, NLE Lab - PRHLT, Universitat Politècnica de València, Spain


Sergio Rabellino, ICT Staff, Dipartimento di Informatica, University of Torino, Italy