An Image is Worth More than a Thousand Favorites: Surfacing the Hidden Beauty of Flickr Pictures

The dynamics of attention in social media tend to obey power laws. Attention concentrates on a relatively small number of popular items and neglecting the vast majority of content produced by the crowd. Although popularity can be an indication of the perceived value of an item within its community, previous research has hinted to the fact that popularity is distinct from intrinsic quality. As a result, content with low visibility but high quality lurks in the tail of the popularity distribution. This phenomenon can be particularly evident in the case of photo-sharing communities, where valuable photographers who are not highly engaged in online social interactions contribute with high-quality pictures that remain unseen. We propose to use a computer vision method to surface beautiful pictures from the immense pool of near-zero-popularity items, and we test it on a large dataset of creative-commons photos on Flickr. By gathering a large crowdsourced ground truth of aesthetics scores for Flickr images, we show that our method retrieves photos whose median perceived beauty score is equal to the most popular ones, and whose average is lower by only 1.5%.

We decided to make available to the scientific community the dataset we collected for the study in order to guarantee the reproducibility of the results and support future works in the field of computational aesthetics.

The dataset doesn't contain any personal information about annotators or creators of the pictures, but it exposes only the public urls of the pictures in Flickr and the corresponding annotations. All the photos used in this study are part of the public One Hundred Million Creative Commons Flickr Images dataset released to the research community in 2014.

If you are going to use the dataset for your reasearch, it would be great if you cite this paper as:

@inproceedings{schifanella:beauty:icwsm15, author = {Schifanella, Rossano and Redi, Miriam and Aiello, Luca Maria}, title = {An Image is Worth More than a Thousand Favorites: Surfacing the Hidden Beauty of Flickr Pictures}, booktitle = {ICWSM'15: Proceedings of the 9th AAAI International Conference on Weblogs and Social Media}, year = {2015}, location = {Oxford, UK}, publisher = {AAAI} }

Dataset

The dataset is composed by ~15K Flickr pictures with the set of aesthetics scores assigned by the annotators.

Each line in the dataset represents a picture and it is formatted as follow:

flickr_photo_id category beauty_scores

where flickr_photo_id is the id of the picture in Flickr, category can assume the values people, animals, urban, or nature according to the subject of the picture (refer to the paper for more details), and beauty_scores is a comma separated list of the beauty scores (range 1-5) associated to the picture by the annotators. Fields are tab separated.

Example:

5866730468 urban 3,3,3,3,4

Refer to the Flickr API to obtain more information about a picture given its flickr_photo_id (e.g., method flickr.photos.getInfo)

Note that the validity of the flickr_photo_ids has been checked at the time of the crowdsourcing experiment (August 2014), that means it may happen that some pictures are currently no longer available in the platform.

Download

The dataset files organized in a zip archive can be downloaded here.

Contact

For any information about the dataset, the paper or the research we are carrying on, please, contact the authors: