Exploring Massive Volunteered Geographic Information for Geographic Knowledge Discovery

University dissertation from Stockholm : KTH

Abstract: Conventionally geographic data produced and disseminated by the national mapping agencies are used for studying various urban issues. These data are not commonly available or accessible, but also are criticized for being expensive. However, this trend is changing along with the rise of Volunteered Geographic Information (VGI). VGI, known as user generated content, is the geographic data collected and disseminated by individuals at a voluntary basis. So far, a huge amount of geographic data has been collected due to the increasing number of contributors and volunteers. More importantly, they are free and accessible to anyone. There are many formats of VGI such as Wikimapia, Flickr, GeoNames and OpenStreetMap (OSM). OSM is a new mapping project contributed by volunteers via a wiki-like collaboration, which is aimed to create free, editable map of the entire world. This thesis adopts OSM as the main data source to uncover the hidden patterns around the urban systems. We investigated some fundamental issues such as city rank size law and the measurement of urban sprawl. These issues were conventionally studied using Census or satellite imagery data. We define the concept of natural cities in order to assess city size distribution. Natural cities are generated in a bottom up manner via the agglomeration of individual street nodes. This clustering process is dependent on one parameter called clustering resolution. Different clustering resolutions could derive different levels of natural cities. In this respect, they show little bias compared to city boundaries imposed by Census bureau or extracted from satellite imagery. Based on the investigation, we made two findings about rank size distributions. The first one is that all the natural cities in US follow strictly Zipf’s law regardless of the clustering resolutions, which is different from other studies only investigating a few largest cities. The second one is that Zipf’s law is not universal at the state level, e.g., Zipf’s law for natural cities within individual states does not hold valid. This thesis continues to detect the sprawling based on natural cities. Urban sprawl devours large amount of open space each year and subsequently leads to many environmental problems. To curb urban sprawl with proper policies, a major problem is how to objectively measure it. In this thesis, a new approach is proposed to measure urban sprawl based on street nodes. This approach is based on the fact that street nodes are significantly correlated with population in cities. Specifically, it is reported that street nodes have a linear relationship with city sizes with correlation coefficient up to 0.97. This linear regression line, known as sprawl ruler, can partition all cities into the sprawling, compact and normal cities. This study verifies this approach with some US census data and US natural cities. Based on the verification, this thesis further applies it to three European countries: France, Germany and UK, and consequently categorizes all natural cities into three classes: sprawling, compact and normal. This categorization provides a new insight into the sprawling detection and sets a uniform standard for cross comparing sprawling level across an entire country.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.