2/01/2020

Travel Time Estimation in the Age of Big Data

Dimitris Bertsimas, Arthur Delarue, Patrick Jaillet, and Sébastien Martin, Travel Time Estimation in the Age of Big Data, Operations Research, Vol. 67, No. 2, 2019.
Twenty-first century urban planners have identified the understanding of complex city traffic patterns as a major priority, leading to a sharp increase in the amount and the diversity of traffic data being collected. For instance, taxi companies in an increasing number of major cities have started recording metadata for every individual car ride, such as its origin, destination, and travel time. In this paper, we show that we can leverage network optimization insights to extract accurate travel time estimations from such origin–destination data, using information from a large number of taxi trips to reconstruct the traffic patterns in an entire city. We develop a method that tractably exploits origin–destination data, which, because of its optimization framework, could also take advantage of other sources of traffic information. Using synthetic data, we establish the robustness of our algorithm to high variance data, and the interpretability of its results. We then use hundreds of thousands of taxi travel time observations in Manhattan to show that our algorithm can provide insights about urban traffic patterns on different scales and accurate travel time estimations throughout the network.
Data.
We consider a road network, represented as a directed graph. On this graph we are given a data set of origin–destination travel time values in the network. 
Probabilistic Setting.
We would like to estimate the times of trips that are “similar” to the trips that are given in the data set but may not have been observed in the data.... Each observation of the data set is assumed to be independently sampled from the same probability distribution.
Model.
In practice, we do not observe all the possible (o, d) pairs, which makes it hard to estimate the geometric mean of Dod using only the data that have origin o and destination d.... Because our data set provides no information as to which path was followed to realize a given travel time, we assume that drivers use the fastest paths available.
Parameter Estimation:
Minimize the mean squared log error (MSLE) of the estimated travel times with a regularization term.
Optimization problem:
Mixed-integer optimization formulation with linear constraints and a nonlinear objective 
Solving Large-Scale Problems with a tractable algorithm
Adapting the shortest-path constraint to discard some binary variables 
Find a surrogate convex objective and reformulate the problem as a solvable second-order cone programming 
A Large-Scale Data Framework
For the island of Manhattan, to which we restrict our problem, we obtain a strongly connected graph with 4,324 nodes and 9,518 arcs. 
 Evaluating Results at the Scale of the City Accuracy
It turns out that between 9 and 11 a.m., the out-of-sample RMSLE of our estimations is just over 0.36.  This result means that our travel time estimation error is barely worse than the inherent noise in the data, and our estimated travel times must therefore be very close to the geometric expectation of the travel times throughout the network.
Nearest-Neighbor Travel Time Estimation.
Of course, we do not expect to produce more accurate estimates than a k-nearest neighbors scheme when a wealth of taxi trips is available. With a good method, however, we should be able to obtain more accurate travel times than k-nearest neighbors in zones without much data and only slightly less accurate in zones where data are plentiful. 

沒有留言:

張貼留言