Data-driven GPS Travel Survey Data Quality Control

Smartphones are increasingly being used to collect detailed data on people’s daily activities and trips. However, these smartphone-based travel surveys introduce new data quality issues compared to traditional paper or phone surveys. In this project, I developed a framework to systematically detect and address various quality issues in smartphone survey data.

The key data quality issues I focused on were attribute completeness (whether all required information is present for each trip/activity) and logical consistency (whether consecutive trips and activities align properly in time and space).

My framework takes a data-driven approach to classify invalid records into groups using statistical methods, such as mixture models, clustering, and thematic transition matrix. Then tailored solutions are proposed for each group - like splitting short temporal gaps or extending spatial trajectories. I demonstrated this framework on a smartphone survey dataset from the Twin Cities and improved the data quality significantly.

For example, over 5,000 invalid records were systematically handled, reducing maximum speeds for car trips from nearly 25,000 m/s to reasonable values. Overall, this research enables more accurate analysis on daily mobility behaviors from emerging smartphone-based surveys.

The codes and detailed documents can be retrieved via my GitHub Repo.

Click to access the full-text article

Yaxuan (Sean) Zhang
Yaxuan (Sean) Zhang
PhD Candidate at UMN | MGIS Student | Computer Science Minor

My research interests include geospatial data science, transportation planning, and GeoAI.