6 Seconds of Sound and Vision: Creativity in Micro-Videos

The notion of creativity, as opposed to related concepts such as beauty or interestingness, has not been studied from the perspective of automatic analysis of multimedia content. Meanwhile, short online videos shared on social media platforms, or micro-videos, have arisen as a new medium for creative expression. In this paper we study creative micro-videos in an effort to understand the features that make a video creative, and to address the problem of automatic detection of creative content. Defining creative videos as those that are novel and have aesthetic value, we conduct a crowdsourcing experiment to create a dataset of 4,000 micro-videos labelled as creative and non-creative. We propose a set of computational features that we map to the components of our definition of creativity, and conduct an analysis to determine which of these features correlate most with creative video. Finally, we evaluate a supervised approach to automatically detect creative video, with promising results, showing that it is necessary to model both aesthetic value and novelty to achieve optimal classification accuracy.

We decided to make available to the scientific community the dataset we collected for the study in order to guarantee the reproducibility of the results and support future works in the field of computational creavity.

The dataset doesn't contain any personal information about annotators or creators of the videos, but it exposes only the public identifiers of the videos in the Vine system and the corresponding annotations in an aggregated form.

If you are going to use the dataset for your reasearch, it would be great if you will cite this paper as:

   author = {Redi, Miriam and O'Hare, Neil and Schifanella, Rossano and Trevisiol, Michele and Jaimes, Alejandro},
   title = {6 Seconds of Sound and Vision: Creativity in Micro-Videos},
   booktitle = {Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on},
   year = {2014},
   location = {Columbus, Ohio, USA},
   isbn = {978-1-4799-5118-5},
   pages = {4272--4279},
   numpages = {8},
   url = {http://dx.doi.org/10.1109/CVPR.2014.544},
   doi = {10.1109/CVPR.2014.544},
   publisher = {IEEE Computer Society}


The dataset is composed by the set of vine videos with the corresponding creative/non-creative annotations. The list of videos is partitioned according to the agreement score between the annotators. The name of the files is formatted as follows:


where <datasetid> can be:

  • D_60: videos with at least 60% agreement
  • D_80: videos with at least 80% agreement
  • D_100: videos with full agreement

For example the file annotated_videos_D_80.txt contains the set of the annotated videos that show at least a 80% agreement between annotators.

In those files each line represents an annotation with this format (values are tab separated):

<annotation> <videoid>

where <annotation> assumes the value 0 for non-creative and 1 for creative videos and <videoid> is the unique Vine identifier. To compose the URL of the video available in the Vine web portal follow this scheme:


We provide also the composition of the training and test sets for the three dataset D_60, D_80, and D_100. The files are named as follows:


where <partitionid> can be train or test if it refers to, respectively, the training or test set for a given dataset. The format of each line is the same of the annotated_videos_<datasetid>.txt files.

Refer to the paper for additional details on the annotation experiment.


The dataset files organized in a zip archive can be downloaded here.


For any information about the dataset, the paper or the research we are carrying on, please, contact the authors: