Corpora

  • SentiTUT Developed for Sentiment Analysis, this Italian corpus includes 4.447 posts from Twitter and from the Twitter section of the Spinoza blog. It is included in the data set exploited within the Sentiment Polarity Classification task held in Evalita 2014 (see at the SentiPolC website for download).

  • Felicittà Developed for Sentiment Analysis, this Italian corpus includes 1,500 posts from Twitter.


  • TwitterMariagePourTous Developed for Sentiment Analysis, this French corpus includes 2,872 posts from Twitter extracted during the time lapse 16th December 2010 - 20th July 2013.
    Soon available for download.

  • LaBuonaScuola Developed for Sentiment Analysis, this Italian corpus includes WEB-BS corpus (4.129 posts from the online discussion platform made available by Italian government, from September 15th, 2014, to November 15th, 2014) and TW-BS corpus (8.594 posts published on Twitter, from February 22th, 2014 to December 31st, 2014).
    Soon available for download.

See publications for more information about each corpus, and annotations for a description of the annotations format applied in each corpus