******************************************* * Sentipolc 2016 @ EVALITA Development Data * * http://di.unito.it/sentipolc16 * * Task co-organizers * Valerio Basile (valerio.basile[at]inria.fr), Inria Sophia Antipolis Mediterranée, France * Francesco Barbieri (francesco.barbieri[at]upf.edu), Pompeu Fabra University, Barcelona, Spain * Danilo Croce (croce[at]info.uniroma2.it), University of Rome "Tor Vergata", Rome, Italy * Malvina Nissim (m.nissim[at]rug.nl), University of Groningen, Groningen, The Netherlands * Nicole Novielli (nicole.novielli[at]uniba.it),University of Bari "A. Moro", Bari, Italy * Viviana Patti (patti[at]di.unito.it), Dipartimento di Informatica, University of Torino, Italy * * * Task Guidelines: http://www.di.unito.it/~tutreeb/sentipolc-evalita16/sentipolc-guidelines2016.pdf * ******************************************* A single development set will be provided. The distribution consists of a set of 7,410 tweets, with annotations concerning all three Sentipolc's subtasks: subjectivity classification, polarity classification and irony detection. The data format is as follows: "idtwitter","subj","opos","oneg","iro","lpos","lneg", "top", "text" Notice that, with respect to the annotation adopted in Sentipolc 2014, two additional fields (namely lpos and lneg) are reported to capture the literal polarity exhibited by a tweet. While Sentipolc does not include any task which takes the classification of literal polarity into account, lpos and lneg encode respectively the literal positive and negative polarity of tweets. This information is provided to enable participant to reason about the possible polarity inversion due to the use of figurative language in ironic tweets, thus the existing lpos and lneg fields refer to literal polarity of a tweet, which might differ from the intended overall polarity of the text expressed by opos and oneg. It will be possible to to download the data at the following address: http://www.di.unito.it/~tutreeb/sentipolc-evalita16/data.html. We are also providing a web interface based on the use of RESTful Web API technology to download the dataset: http://www.di.unito.it/~tutreeb/sentipolc-evalita16/tweet.html The interface works as follows: 1. Click on the button with label: "Step 1. Get Corpus Items". The 7,410 items of the Sentipolc development dataset will be loaded. You'll be notified by a pop-up window when the loading is completed. 2. Click on the button with label: "Step 3. Export the corpus". After all tweet are downloaded the dataset can be stored in different formats: e.g. comma separated values, Excel, PDF. You can select your preferred format by clicking on one of the buttons available on the right.