SENTIPOLC: Sentiment Polarity Classification

sentipolc@Evalita 2016

Task description paper and official results available here (4/12/2016):
Francesco Barbieri, Valerio Basile, Danilo Croce, Malvina Nissim, Nicole Novielli, and Viviana Patti (2016) Overview of the Evalita 2016 SENTIment POLarity Classification Task. In Proceedings of CLiC-it 2016 and EVALITA 2016, Napoli, Italy, December 5-7, 2016. Volume 1749 of CEUR Workshop Proceedings. CEUR-WS.org.
All participants' reports are included in the CLiC-it & EVALITA 2016 Proceedings (4/12/2016)
Presentation of the task at Evalita 2016: slides (Napoli, 7/12/2016)
Test data with gold labels available through the 'Data' page
Task guidelines UPDATED! (13/9/2016)
Test data have been released! (12/9/2016: credentials to access the data sent to registered participants)
Please register: http://www.evalita.it/2016/registration (12/9/2016: registration closed)
Development data: web interface based on the use of RESTful Web API technology available here!
Development data have been released! (3/6/2016)
Task guidelines available (3/6/2016)
Join the Google group: sentipolc-evalita2016[at]googlegroups.com

SENTIPOLC (SENTIment POLarity Classification) will be organised within Evalita 2016, the fourth evaluation campaign of Natural Language Processing and Speech tools for Italian, which will be held in Naples, Italy, on December 7, 2016.

Summary

This is a rerun of Sentipolc@Evalita2014 (Basile et al, 2014) with new training and test data from Twitter.

Introduction and Motivation

The huge amount of information streaming from online social networking and micro-blogging platforms such as Twitter, is increasingly attracting the attention of many kinds of researchers and practitioners. In particular, the linguistic analysis of social media has become a relevant topic of research, in different languages. Several frameworks for detecting sentiments and opinions in social media have been developed for different application purposes, and Sentiment Analysis (SA) is recognized as a crucial tool in social media monitoring platforms providing business services. Indeed, social media user-generated contents constitute a valuable asset for firms to directly tap into the customer’s needs and preferences. For instance, tweets are a precious mine for grasping opinions of groups of people, possibly about a specific topic or product. Overall, extracting sentiments expressed in tweets has been used for several purposes: to monitor political sentiment (Tumasjan et al., 2010), to extract critical information during times of mass emergency (Verma et al., 2011), to detect moods and happiness in a given geographical area from geotagged tweets (Mitchell et al., 2013), and in several social media monitoring services.

In this shared task, we will focus on Italian texts from Twitter by launching a battery of related tasks with an increasing level of complexity. The main task concerns sentiment polarity classification at the message-level. Sentiments expressed in tweets are typically categorized as positive, negative or neutral, but a message can contain parts expressing both positive and negative sentiment (mixed sentiment), a feature that should be tackled.

The fact that Twitter communications include a high percentage of ironic messages cannot be neglected (González-Ibáñez et al, 2011; Reyes at al., 2013; Reyes at al., 2012; Davidov et al., 2011; Hao and Veale, 2013). The issue has been addressed recently in the Semeval 2015-Task11 shared task on Sentiment Analysis of Figurative Language in Twitter (Ghosh et al., 2015). Platforms monitoring the sentiment in Twitter messages experienced the phenomenon of wrong polarity classification in ironic messages (Bosco et. al, 2013). Indeed, the presence of ironic devices in a text can work as an unexpected ``polarity reverser” (one says something ``good” to mean something ``bad”), thus undermining systems’ accuracy. In order to investigate this issue, our dataset will include ironic messages, and we will define an ancillary subtask concerning irony detection.

Since its first edition at SemEval 2013 (Task 2, Nakov et. al 2013) the SemEval task on Sentiment Analysis in English tweets attracted a great number of participants. Similarly, the first edition of Sentipolc@Evalita2014 was the most participated Evalita task with a total of 35 submitted runs from 11 different teams, reflecting the great interest of the NLP community in sentiment analysis in social media, also in Italy. We believe that the re-run of a Sentiment Analysis task for Italian and the development of a standard sentiment corpus to promote research will lead to a better understanding of how sentiment is conveyed in tweets. Training and testing automatic systems obviously requires the availability of several resources that may consist in large datasets of annotated posts or even in lexical databases where affective words are associated with polarity values (e.g. Baccianella et al; Strapparava and Valitutti, 2004). Considering that the availability of such resources for Italian is currently very limited (Basile and Nissim 2013; Bosco et. al, 2013; Stranisci et al. 2015), the organisation of this shared task aims at providing a contribution also on this respect.

Target Audience

The task is open to everyone from industry and academia.

Task description

The main goal of SENTIPOLC is sentiment classification at message level on Italian tweets. The task is divided into three sub-tasks with an increasing level of complexity. Participants may choose to participate in one or more sub-tasks. The first two are standard SA tasks, whereas the third one is a task aimed at studying the presence of irony in tweets.

A) Subjectivity Classification

Given a message, decide whether the message is subjective or objective.

B) Polarity Classification

Given a message, decide whether the message is of positive, negative, neutral or mixed sentiment (i.e. conveying both a positive and negative sentiment).

C) Irony Detection

Given a message, decide whether the message is ironic or not.

Relation with NEEL-IT@Evalita2016

It is the intention of the organizers to promote the construction of a shared dataset for both Sentipolc and the Named Entity rEcognition and Linking in Italian Tweets (NEEL-IT) Evalita 2016 task. Indeed, interest in entity-liking in Twitter is gaining increasing attention, as well as aspect-based sentiment analysis. In a world where e-commerce is part of our everyday life and social media platforms are regarded as new channels for marketing and for fostering trust of potential customers, such great interest in opinion mining from Twitter isn’t surprising. In this scenario, it is crucial to be able to mine opinions about specific aspects of objects and named entities. Therefore, we believe that besides the traditional task on message-level polarity classification, in the future editions of Evalita special focus should be given to entity-based sentiment analysis.
The use of common data for the Sentipolc and NEEL-IT is a first step towards the long-term goal of enabling participants to develop end-to-end system from entity linking to entity-based sentiment analysis.

Data

The dataset will include short documents taken from Twitter.
In collaboration with the organizers of the Entity Linking task at Evalita 2016, we will put together the resources that are already annotated for the two tasks and fill the gaps in order to create a common dataset for evaluation. However, the two tasks are organized independently this year.

A detailed description of data (topics, annotation scheme applied, data format, etc.) can be found in the task guidelines.

The development and test dataset will be released in compliance with Twitter's terms. A web interface for realising the development data based on RESTful Web APIs will be available

Evaluation

Each participating team will initially have access to the training data only. Later, the unlabelled test data will be released (see the timeframe below). After the assessment, the labels for the test data will be released as well.

The evaluation will be performed according to the standard metrics known in literature (precision, recall and F-measure). Details on evaluation metrics to be applied for the evaluation of the participant results will be published in the Task guidelines.

How to participate

Register your team by using the registration web form at http://www.evalita.it/2016 (available soon, see timeframe below).

Information about the submission of results and their format will be available in the Task guidelines.

We invite the potential participants to subscribe to our mailing list in order to be kept up to date with the latest news related to the task. Please share comments and questions with the mailing list. The organizers will assist you for any potential issues that could be raised.

Participants will be required to provide an abstract and a technical report including a brief description of their approach, an illustration of their experiments, in particular techniques and resources used, and an analysis of their results for the publication in the Proceedings of contest.

Papers must be submitted in PDF format, following the CLIC-it conference style mentioned (details will be published soon). Submission of abstracts and technical reports is to be done electronically through the Easychair system.

June 6th 2016: on-line registration opens!
MOVED TO 3rd June 2016: development data available to participants
12th September 2016: test data available, registration closes
19th September 2016, 6pm CEST: system results due to organizers
26th September 2016: assessment returned to participants
19th October 2016: technical reports due to organizers
7th December 2016: final workshop

references

Baccianella, S., Esuli, A. and Sebastiani, F. (2010). Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Nicoletta Calzolari et al., editor, Proceedings of LREC.

Basile, V., Bolioli, A., Nissim, M., Patti, V., and Rosso, P. (2014). Overview of the Evalita 2014 SENTIment POLarity Classification Task. In Proceedings of the 4th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA’14), pages 50–57, Pisa, Italy. Pisa University Press.

Basile V. and Nissim M. (2013). Sentiment Analysis on Italian Tweets. Proc. of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 100–107. Association for Computational Linguistic.

Bosco, C. Patti, V. and Bolioli A. (2013). Developing Corpora for Sentiment Analysis: The Case of Irony and Senti-TUT. IEEE Intelligent Systems 28(2): 55-63.

Davidov, D., Tsur, O. and Rappoport, A.. (2010). Semi-supervised recognition of sarcastic sentences in Twitter and Amazon. In Proceedings of the 14th Conference on Computational Natural Language Learning (CoNLL '10). Association for Computational Linguistics, 107-116.

Ghosh, A., Li, G., Veale, T., Rosso, P., Shutova, E., Barnden, J., and Reyes, A. (2015). Semeval-2015 task 11: Sentiment analysis of figurative language in twit- ter. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 470–478, Denver, Colorado, June. Association for Computational Linguistics.

González-Ibáñez, R., Muresan, S., and Wacholder, N. (2011). Identifying sarcasm in Twitter: a closer look. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2 (HLT '11), Vol. 2., 581-586. ACL.

Hao, Y., Veale, T. (2010). An Ironic Fist in a Velvet Glove: Creative Mis-Representation in the Construction of Ironic Similes. Minds and Machines 20(4):635–650.

Mitchell, L., Frank, M. R., Harris, K. D. , Dodds, P. S. and Danforth C. M. (2013). The Geography of Happiness:
Connecting Twitter Sentiment and Expression, Demographics, and Objective Characteristics of Place. PLoS ONE, 8(5), 05.

Nakov, P. Kozareva, Z., Ritter, A., Rosenthal, S., Stoyanov, A. and Wilson, T. (2013). In Semeval-2013 Task 2: Sentiment Analysis in Twitter. Proceedings of the 7th International Workshop on Semantic Evaluation. Association for Computational Linguistics.

Reyes A., Rosso P., and Veale T. (2013). A Multidimensional Approach for Detecting Irony in Twitter. In: Language Resources and Evaluation, vol. 47, issue 1, pp. 239-268.

Reyes A., Rosso P., and Buscaldi D. (2012). From Humor Recognition to Irony Detection: The Figurative Language of Social Media. In: Data & Knowledge Engineering, vol. 74, pp.1-12.

Reyes A., Rosso P. (2014). On the Difficulty of Automatically Detecting Irony: Beyond a Simple Case of Negation. In: Knowledge and Information Systems, 40(3), pp. 595-614 .

Stranisci, M., Bosco, C., Hernàndez Farias, D.I., and Patti, V. (2016). Annotating sentiment and irony in the online Italian political debate on #labuonascuola. In Proceedings of LREC 2016.

Strapparava C. and Valitutti, A. (2004). Wordnet-affect: an affective extension of wordnet. In Proceedings of LREC.

Tumasjan, A., Sprenger, T.O., Sandner, P., and Welpe, I. (2010). Predicting elections with twitter: What 140 characters reveal about political sentiment. Proceedings of ICWSM.

Verma, S., Vieweg, S., Corvey, W.J., Palen, L., Martin, J.H., Palmer, M., Schram, A. and Anderson, K. M. (2011). Natural language processing to the rescue? extracting situational awareness tweets during mass emergency. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, pages 385–392. AAAI.

sentipolc / Sentiment Polarity Classification