SentiTUT: the sentiment Turin University Treebank (TUT)

top

SentiTUT is a project for the development of a novel Italian corpus for sentiment analysis, which includes sentiment annotations concerning irony and consists in a collection of texts from Twitter. This resource includes annotations concerning both sentiment and morpho-syntax, in order to make available several possibilities of further exploitation related to sentiment analysis. For what concerns the annotation at sentiment level, we focus on irony and we selected therefore texts on politics from a social media, namely Twitter, where irony is usually applied by humans. Our aim is to add a new sentiment dimension, which explicitly accounts for irony, to a sentiment analysis classification framework based on polarity annotation.

top

With respect to the composition and size of the data set, it is organized in two subcorpora, namely TWNEWS and TWSPINO. The former is currently composed of around three thousands of tweets, published in the weeks after the new Italian prime minister Mario Monti announced his Cabinet (from October 2011 the 16th to February 2012 the third). The latter is instead composed of more than one thousand tweets extracted from the Twitter section of Spinoza, published from July 2009 to February 2012.

Spinoza, is a very popular collective Italian blog which includes a high percentage of posts with sharp satire on politics, which is published on Twitter since 2009. This subcorpus has been therefore added in order to enlarge our data set with texts where various forms of irony are involved. The collection of all the data has been done by exploiting a collaborative annotation tool, which is part of the Blogmeter social media monitoring platform.

Download the corpus

The TWNEWS corpus will be available soon for research purpose.

Tweets from Spinoza can be accessed here.

top

The project for the development of the Senti–TUT involves the annotation of the linguistic data with respect to two distinguished levels. While the first one includes morphological and syntactic tags as usual e.g. in treebanks,the second refers instead to concepts typical of sentiment analysis.

Morphological and syntactic annotation

The annotation guidelines are the same as those used for TUT for what concerns syntax and morphology:

Syntactic categories
the Part of Speech tagset of the TUT corpus
Labels of the edges
the list of the grammatical relations labelling the dependency edges of the second release of TUT corpus

Manual sentiment annotation

Sentiment Labels

Annotation for sentiment analysis

The data are currently annotated at tweet level, since one sentiment tag is applied to each tweet (considering that a tweet can be composed by more than one sentence).In the table below the sentiment tags used for the annotation of Senti-TUT are described.

Sentiment tag	Meaning
POS	positive
NEG	negative
HUM	ironic
NONE	objective
MIXED	POS and NEG both

Even if, for the present time, the focus of the Senti-TUT is mainly the annotation at tweet level, the resource we are currently developing has to be seen in the wider framework of a project for sentiment analysis and opinion mining. And within this context it should be considered also the availability of the morpho-syntactic annotation on the same data, which allows in the future for the application of other more fine-grained annotations and analysis related to sentiment analysis.

The annotation of the sentiment tags at the tweet level was manually performed by exploiting a collaborative annotation tool,
which is part of the Blogmeter social media monitoring platform. Among the utilities made available by Blogmeter we applied, in particular, those related to filtering out the non relevant data.

top

Andrea Gianti, Cristina Bosco, Viviana Patti, Andrea Bolioli and Luigi Di Caro. Annotating Irony in a Novel Italian Corpus for Sentiment Analysis. In Proceedings of the 4th International Workshop on Corpora for Research on Emotion Sentiment & Social Signals (ES3) at LREC'12 (2012)
Cristina Bosco, Viviana Patti and Andrea Bolioli. Developing Corpora for Sentiment Analysis: The Case of Irony and Senti-TUT. IEEE Intelligent Systems, special issue on Knowledge-based approaches to content-level sentiment analysis. Vol 28 num 2 (2013)

top

Workshops and Conferences:

Emotion and Sentiment in Social and Expressive Media: approaches and perspectives from AI, ESSEM 2013, Satellite of AI*IA 2013, Full Day Workshop on 3 December, Turin, Italy.
ES³ 2012 4th International Workshop on Corpora for Research on EMOTION SENTIMENT & SOCIAL SIGNALS, Satellite of LREC 2012, ELRA, Full Day Workshop on 26 May, Istanbul, Turkey.
International Conference on Language Resources and Evaluation (LREC)

[project] [corpus] [documents] [publications] [links] [TUT homepage] [Interaction Models Group homepage]

Last updated: July, 8th 2013 by bosco[at]di.unito.it

top

top

Download the corpus

top

Morphological and syntactic annotation

Manual sentiment annotation

Annotation for sentiment analysis

top

top

Last updated:
July, 8th 2013
by bosco[at]di.unito.it