I am particulary attracted by scientific challenges that span from theoretical modeling to applicative exploitations of the findings in real world scenarios. I have partecipated to large and complex projects, and co-authored peer-review papers with scientists active in many different disciplines, such as physics, sociology, philosophy, communication science. I believe that multidisciplinarity and interdisciplinary collaborations are fundamental assets for ground breaking societal innovations.
One of the drawbacks of multidiciplinarity is that it is difficult to find the perfect fit between what you do and how others define what you do. However, I am comfortable to be classified as someone operating in the following areas: computational social science, complex networks, network science, network analsysis, data science, social media analysis.
Currently, I am investigating the following problems: diffusion of misinformation, opinion polarization, hate speech in social media, dynamics of intellectual migration, and visualization and application of signed networks.
During the earliest years of my scientifc career, I have been involved in research projects focused to other "purer" computer science problems, such as: computer and network security, distributed applications, peer-to-peer systems, micro-payment systems, proactive password checking, anomaly detection, machine learning, inductive logic programming, web and data mining, recommendation systems.
ISI Fellow program selects top scholars to be external members of the ISI Foundation research community, in order to collaborate on scientific problems that span across disciplines (math, physics, biology, social sciences, humanities) in the pursuit of breaking new grounds at the forefront of complex systems and network science, data and computational science, information technologies, computational epidemiology & public health, and statistical physics. Within the celebrations of the Premio Lagrange – Fondazione CRT 2015, ISI Foundation announced the scientists who joined the third round of ISI Fellow program.
Coordinator of the Master's Degree Program in "Networks and Computational Systems" (Reti e Sistemi Informatici) within the Computer Science Master's Degree Programs at the University of Turin.
SAA - Scuola di Amministrazione Aziendale is the MBA School at the University of Turin, and one of the most renowned MBA Schools in Italy.
NetAtlas is a tech company specialized in development of business intelligence, machine learning, big data, data analysis and information visualization solutions.
The dataset was publicly distributed with our paper presented at ICWSM'12: LM Aiello, M Deplano, R Schifanella, and G Ruffo. People are Strange when you're a Stranger: Impact and Influence of Bots on Social Networks. 2012, AAAI
My profile at IU's Web Site.
For a complete list of ARC2S publications, please check here
Contro l'odio (Hate Speech Map)
We are working on a data visualization platform designed to support the Natural Language Processing (NLP) scholar to study and analyze di erent corpora collected with the purpose to understand the hate speech phenomenon in social media. The project started with the creation of a corpus which collects tweets addressed to specific groups of ethnic minorities considered very controversial in the Italian public debate. Each tweet has been manually tagged with a series of attributes in order to capture the di erent features used to characterize the hate speech phenomenon. This corpus is mainly built to be used for training an automatic classi er and helping us in its testing and validation, before being it adopted to detect tweets targeted as hate speech on larger scale datasets. As opposed as many other traditional machine learning tasks, to build a good classi er achieving high scores in terms of accuracy is very challenging in such scenario, because of the intrin- sic ambiguity of the language, the lack of a proper and explicable context in social media, and the attitude of on line users of being sarcastic and ironical. Therefore, in order to properly validate an e ective feature selection process, correlations between selected attributes must be studied and analyzed. This motivated us to build an interactive platform to explore data in our corpora across the dimensions that have been used to characterize collected tweets.
Stance Detection and Social Polarization
In the last decade, social media gained a very significant role in public debates, and despite the many intrinsic difficulties of analyzing data streaming from on-line platforms that are poisoned by bots, trolls, and low-quality information, it is undeniable that such data can still be used to test the public opinion and overall mood and to investigate how individuals communicate with each other.
With the aim of analyzing the debate in Twitter on the 2016 referendum on the reform of the Italian Constitution, we created an Italian annotated corpus for stance detection for automatically estimating the stance of a relevant number of users. We take into account a diachronic perspective to shed lights on users' opinion dynamics.
Furthermore, different types of social network communities, based on friendships, retweets, quotes, and replies were investigated, in order to analyze the communication among users with similar and divergent viewpoints.
We observe particular aspects of users' behaviour. First, our analysis suggests that users tend to be less explicit in expressing their stances after the outcome of the vote; simultaneously, users who exhibit a high number of cross-stance relations tend to become less polarized or to adopt a more neutral style in the following phase of the debate.
Second, despite social media networks are generally aggregated in homogeneous communities, we highlight that the structure of the network can strongly change when different types of social relations are considered.
In particular, networks defined by means of reply-to messages exhibit inverse homophily by stance, and users use more often replies for expressing diverging opinions, instead of other forms of communication. Interestingly, we also observe that the political polarization increases forthcoming the election and decreases after the election day.
Signed networks represent networked data where edge annotations express whether each edge interaction is friendly (positive) or antagonistic (negative). The model is simple yet powerful and, while it enriches the standard network representation with a single bit of information per edge, it can lead to capturing novel and interesting structural properties of real-world networks. Analysis of signed graphs has many applications from modeling discussions in social media, to mining user reviews, and to recommending products in e-commerce sites, as well as assessing structural balance of a physical complex system. We consider the problem of discovering polarized communities in signed networks. In particular, we are searching for two communities (subsets of the network vertices) where within communities there are mostly positive edges while across communities there are mostly negative edges. While similar problem settings have been studied before, our formulation offers important novelties: firstly, it seeks for only two opposing communities, secondly these communities can be hidden within the whole network, meaning that our formulation allows for vertices not belonging to any of the two communities. We formulate this novel problem as a discrete eigenvector problem, which we show to be NP-hard. We then develop two intuitive spectral algorithms: one deterministic with quality guarantee n, and one randomized with quality guarantee √n, where n is the number of vertices in the graph. The analysis of the algorithms is based on majorization theory. We finally validate our algorithms against natural baselines on a large collection of real-world signed networks.
Learning Real Estate Automated Valuation Models from Heterogeneous Data
Real estate appraisal is a complex and important task, that can be made more precise and faster with the help of automated valuation tools. Usually the value of some property is determined by tak- ing into account both structural and geographical characteristics. However, while geographical information is easily found, obtaining signi cant structural information requires the intervention of a real estate expert, a professional appraiser. In this paper we propose a Web data acquisition methodology, and a Machine Learning model, that can be used to automatically evaluate real estate properties. This method uses data from previous appraisal documents, from the advertised prices of similar properties found via Web crawling, and from open data describing the characteristics of a correspond- ing geographical area. We describe a case study, applicable to the whole Italian territory, and initially trained on a data set of individ- ual homes located in the city of Turin, and analyze prediction and practical applicability.
We propose a framework to study the spreading of urban legends, false stories that become persistent in a local popular culture. Following the traditional approach in the study of information diffusion, we consider an epidemic SIS-like (Susceptible-Infected-Susceptible) model where the agents can be infected by the legend or its debunking depending on the belief of their neighborhood. Simulating the spreading process on several segregation levels of the underlying network, we look for the best strategies to locate persistent fact-checkers.
This course introduces the fundamental concepts, principles and methods in the interdisciplinary field of network science, with a particular focus on analysis techniques, modeling, and applications for the World Wide Web and online social media. Topics covered include graphic structures of networks, mathematical models of networks, common networks topologies, structure of large scale graphs, community structures, epidemic spreading, centrality measures, dynamic processes in networks, graphs visualization.
Course's language: English
This course is an extension of the the 'Complex Network' course described above. In addition to that, another learning objective of this course falls in the field of scientific data visualization. Students will learn basic visualization design and evaluation principles, and learn how to acquire, parse, and analyze large datasets. Students will also learn techniques for visualizing multivariate, temporal, text-based, geospatial, hierarchical, and (above all) network/graph-based data. Additionally, students will utilize GePhi, D3, Python, networkx and plot.ly, and many other tools to prototype many of these techniques on existing datasets.
Course's language: English (Complex Network), Italian (Data Viz)
The objectives of this course are the following:
Course's language: Italian