Understanding Mobility and Transport Modal Disparities Using Emerging Data Sources: Modelling Potentials and Limitations

Abstract: Transportation presents a major challenge to curb climate change due in part to its ever-increasing travel demand. Better informed policy-making requires up-to-date empirical mobility data to model viable mitigation options for reducing emissions from the transport sector. On the one hand, the prevalence of digital technologies enables a large-scale collection of human mobility traces, providing big potentials for improving the understanding of mobility patterns and transport modal disparities. On the other hand, the advancement in data science has allowed us to continue pushing the boundary of the potentials and limitations, for new uses of big data in transport. This thesis uses emerging data sources, including Twitter data, traffic data, OpenStreetMap (OSM), and trip data from new transport modes, to enhance the understanding of mobility and transport modal disparities, e.g., how car and public transit support mobility differently. Specifically, this thesis aims to answer two research questions: (1) What are the potentials and limitations of using these emerging data sources for modelling mobility? (2) How can these new data sources be properly modelled for characterising transport modal disparities? Papers I-III model mobility mainly using geotagged social media data, and reveal the potentials and limitations of this data source by validating against established sources (Q1). Papers IV-V combine multiple data sources to characterise transport modal disparities (Q2) which further demonstrate the modelling potentials of the emerging data sources (Q1). Despite a biased population representation and low and irregular sampling of the actual mobility, the geolocations of Twitter data can be used in models to produce good agreements with the other data sources on the fundamental characteristics of individual and population mobility. However, its feasibility for estimating travel demand depends on spatial scale, sparsity, sampling method, and sample size. To extend the use of social media data, this thesis develops two novel approaches to address the sparsity issue: (1) An individual-based mobility model that fills the gaps in the sparse mobility traces for synthetic travel demand; (2) A population-based model that uses Twitter geolocations as attractions instead of trips for estimating the flows of people between regions. This thesis also presents two reproducible data fusion frameworks for characterising transport modal disparities. They demonstrate the power of combining different data sources to gain new insights into the spatiotemporal patterns of travel time disparities between car and public transit, and the competition between ride-sourcing and public transport.