The The Lifecycle of Geotagged Social Media Data tutorial covers the four stages that are part of the lifecycle of geotagged social media data in research, namely representing, processing, analyzing, and visualizing. The tutorial aims to arm participants with both theoretical and practical knowledge about how to make sense of geospatial data for use in applications that range from computational social science and social media analysis to behavioral studies on digital platforms. We provide the basics on how to obtain, represent and combine different spatial data sources, with an accent on how to efficiently store, index and query a location-based dataset. We further discuss the main techniques on how to derive insights from spatial data, how to avoid common pitfalls and how to exploit social media (e.g. user interests, user movements) for the purpose of gaining a deeper understanding of the phenomenon under study. The tutorial will end with an overview of the main libraries and paradigms to build interactive and dynamic visualizations of geographical data on a map.
We present common ways of representing geographic data, from simple coordinates to polygons with holes, and using different data formats. We discuss several platforms that offer geotagged social media and show how their APIs can be used to obtain this data. We further illustrate how geotagged data can be used to acquire auxiliary information, such as historic weather conditions, that complements the original data.
We cover a variety of techniques you can use to derive actionable insights from geotagged social media data. We present methods like clustering, predicting and recommending. We show you how geotagged data differs from traditional data and often requires special considerations in order to obtain reliable output.
We cover techniques for storing spatial data and use spatial primitives, such as the distance between geographic coordinates, computing areas of and overlaps between polygons, and how the data representation influences which techniques should (not) be used. We discuss different strategies for efficiently storing, reading and querying geotagged data, such as indexes, as well as for efficiently processing the data.
The world is not flat, so visualizing geographic data is not straightforward. We present you several tools that can assist you to better understand your data. A hands-on session will let you understand how to effectively use the right tool at the right time to maximize the knowledge you can extract from the data.
Spherical law of cosines
GPS errors and accuracy
Introduction to analyzing geotagged social media data
What is so special about geographic data?
Kernel density estimation
Points and Polygons
Region connection calculus
Space and Time
Incompatibility of dimensions
Introduction to spatial databases
Creating a spatial database
Loading external spatial data
Batch insert of textual data (WKT)
Load ESRI shapefiles
Load OpenStreetMap data
Querying spatial databases
Intersections and differences
Intersect relationship types
Python for Spatial Computation
Connect a PostGIS database to a Python script
Create and manipulate geometries with Shapely
Load external data
Load ESRI shapefiles with Fiona
Load textual data in WKT
Indexing and prepared geometries
Using PostGIS in a desktop environment (introduction)
Spatial data on large-scale computational framework (introduction)
The tutorial material will be provided as a virtual machine where we set up the environment and exercises we will present during the practical sessions.
The virtual machine has been created with the general-purpose full virtualizer VirtualBox.
To be ready for the tutorial, follow these steps:
1. Download the Virtual Box installation package for your platform at this link.
2. Install Virtual Box (follow these instructions for more details)
3. Download the GeoCycle virtual machine at this link.
4. Import the virtual machine launching VirtualBox and running the command "File -> Import Appliance..." from the main menu.
5. Start the virtual machine called geocycle.
Assistant Professor in Computer Science at the University of Turin, Italy. His research embraces the creative energy of a range of disciplines across technology, social media, data visualization, and urban informatics.
Senior Research Scientist at Yahoo Labs/Flickr in San Francisco, CA, USA. His research focuses on the visual and spatiotemporal dimensions of media, in order to better understand how people experience the world and to better assist them with exploring the planet.