GeoCycle Tutorial

@ICWSM, Cologne, 17 May 2016

Abstract

The The Lifecycle of Geotagged Social Media Data tutorial covers the four stages that are part of the lifecycle of geotagged social media data in research, namely representing, processing, analyzing, and visualizing. The tutorial aims to arm participants with both theoretical and practical knowledge about how to make sense of geospatial data for use in applications that range from computational social science and social media analysis to behavioral studies on digital platforms. We provide the basics on how to obtain, represent and combine different spatial data sources, with an accent on how to efficiently store, index and query a location-based dataset. We further discuss the main techniques on how to derive insights from spatial data, how to avoid common pitfalls and how to exploit social media (e.g. user interests, user movements) for the purpose of gaining a deeper understanding of the phenomenon under study. The tutorial will end with an overview of the main libraries and paradigms to build interactive and dynamic visualizations of geographical data on a map.

Modules

Represent

We present common ways of representing geographic data, from simple coordinates to polygons with holes, and using different data formats. We discuss several platforms that offer geotagged social media and show how their APIs can be used to obtain this data. We further illustrate how geotagged data can be used to acquire auxiliary information, such as historic weather conditions, that complements the original data.

Analyze

We cover a variety of techniques you can use to derive actionable insights from geotagged social media data. We present methods like clustering, predicting and recommending. We show you how geotagged data differs from traditional data and often requires special considerations in order to obtain reliable output.

Process

We cover techniques for storing spatial data and use spatial primitives, such as the distance between geographic coordinates, computing areas of and overlaps between polygons, and how the data representation influences which techniques should (not) be used. We discuss different strategies for efficiently storing, reading and querying geotagged data, such as indexes, as well as for efficiently processing the data.

Visualize

The world is not flat, so visualizing geographic data is not straightforward. We present you several tools that can assist you to better understand your data. A hands-on session will let you understand how to effectively use the right tool at the right time to maximize the knowledge you can extract from the data.

Syllabus

Represent

  • Geometries
    • Coordinates
    • Boundaries
    • Single polygons
    • Multiple polygons
    • Holes
  • Data formats
    • Shapefiles
    • GeoJSON
    • OpenStreetMap
    • WKT
  • Applications
  • Projections
    • Examples
    • Distorsion
  • World
    • Plane
      • Equirectangular projection
      • Shortest distance
        • Euclidean
    • Sphere
      • Geodesics
      • Shortest distance
        • Great circle
        • Spherical law of cosines
        • Haversine
    • Ellipsoid
      • Geoid
      • Reference datum
      • Geodetic datums
        • Global datums
        • Local Datum
        • Conversions
      • Shortest distance
        • Vincenty
        • Karney
  • Pitfalls
    • Dateline crossing
    • Timezones
    • GPS errors and accuracy
  • Obtaining data
    • Twitter
    • Flickr
    • Instagram
    • Weather
  • Exercises

Analyze

  • Introduction to analyzing geotagged social media data
    • What is so special about geographic data?
  • Clustering
    • Hard clustering
    • Soft clustering
  • Density modeling
    • Mixture Models
    • Kernel density estimation
  • Points and Polygons
    • Point-in-polygon
    • Region boundaries
      • Thresholding
      • Voronoi
    • Region connection calculus
      • Equivalence
      • Adjacency
      • Proximity
      • Overlap
      • Containment
  • Space and Time
    • Incompatibility of dimensions
  • Spatial Modeling
    • Attributes
    • Features
    • Topics
    • Languages
  • Temporal modeling
    • Trajectories
    • Evolution
  • Applications
    • Similarity
    • Search
    • Ranking
    • Recommendation
    • Prediction
  • Exercises

Process

  • Introduction to spatial databases
  • PostGIS
    • Creating a spatial database
    • Loading external spatial data
      • Batch insert of textual data (WKT)
      • Load ESRI shapefiles
      • Load OpenStreetMap data
    • Querying spatial databases
    • Geometries
    • Geography
    • Spatial functions
      • Constructors
      • Outputs
      • Accessors
      • Measurement
      • Decomposition
      • Composition
      • Simplification
      • Spatial Relationships
        • Intersections and differences
        • Intersect relationship types
        • Equality
    • Nearest-Neighbour Searching
    • Spatial Indexing
    • Spatial Joins
    • Projecting Data
  • Python for Spatial Computation
    • Connect a PostGIS database to a Python script
    • Create and manipulate geometries with Shapely
    • Load external data
      • Load ESRI shapefiles with Fiona
      • Load textual data in WKT
    • Spatial functions
    • Indexing and prepared geometries
  • Using PostGIS in a desktop environment (introduction)
  • Spatial data on large-scale computational framework (introduction)
  • Exercises

Visualize

  • Build interactive web maps
    • Tile based vs vectorial maps
    • Client-side Javascript libraries: Leaflet, OpenLayers 3
    • Create a simple map
    • Load GeoJSON data in Leaflet
      • Show Points
        • Customize markers
        • Clustering
        • Heatmaps
      • Show Linestrings
      • Show Polygons
        • Choropleth Map
        • Deal with a large amount of polygons
    • Compose different geometries in layers
    • Create data-driven custom style
    • Add interactivity
    • Link d3.js to spatial visualizations
  • Desktop tools to visualize spatial data
    • Introduction to QGIS
    • The little Java brother: OpenJUMP
  • Build static spatial visualizations programmatically
    • Visualize spatial data in Python
  • Exercises

Material

The tutorial material will be provided as a virtual machine where we set up the environment and exercises we will present during the practical sessions.

The virtual machine has been created with the general-purpose full virtualizer VirtualBox.

To be ready for the tutorial, follow these steps:

1. Download the Virtual Box installation package for your platform at this link.
2. Install Virtual Box (follow these instructions for more details)
3. Download the GeoCycle virtual machine at this link.
4. Import the virtual machine launching VirtualBox and running the command "File -> Import Appliance..." from the main menu.
5. Start the virtual machine called geocycle.

Slides:
1. Represent
2. Process
3. Analyze
4. Visualize

Additional material:
Exercises

Speakers

Rossano Schifanella

Rossano Schifanella

University of Turin

Assistant Professor in Computer Science at the University of Turin, Italy. His research embraces the creative energy of a range of disciplines across technology, social media, data visualization, and urban informatics.

Nix Maxwell

Bart Thomee

Yahoo Labs

Senior Research Scientist at Yahoo Labs/Flickr in San Francisco, CA, USA. His research focuses on the visual and spatiotemporal dimensions of media, in order to better understand how people experience the world and to better assist them with exploring the planet.