Skip to main content
Uber logo

Schedule rides in advance

Reserve a rideReserve a ride

Schedule rides in advance

Reserve a rideReserve a ride
AI, Data / ML

Building a Backtesting Service to Measure Model Performance at Uber-scale

February 13, 2020 / Global
Featured image for Building a Backtesting Service to Measure Model Performance at Uber-scale
Figure 1. Multiple dimensions we need to consider, such as the number of cities and machine learning models, make backtesting compute-intensive. Our backtesting framework solved the issue by running multiple jobs on a large scale distribution system.
Figure 2. The service supports two major backtesting methods: expanding window (left) and sliding window (right).
Figure 3. Our backtesting system consists of both a Python library and a Go service.
Figure 4. In a typical data science query workflow, data scientists use Uber’s Data Science Workbench), a platform for data science with a managed RStudio Server and Jupyter Notebooks. Then, data scientists leverage an ML Platform, such as Uber’s Michelangelo, for model training and prediction.
Figure 5. Our online backtesting workflow is composed of four stages. In Stage 1, the model is either written locally in Data Science Workbench (DSW) or uploaded to an ML platform, which returns a unique model ID. DSW triggers a backtest through our Go service, which then returns a UUID to DSW. In Stage 2, the Go service fetches training and testing data, stores it in a datastore, and returns a data set. In Stage 3, the backtesting data set is trained on the ML platform and prediction results are generated and returned to the Go service. In Stage 4, the backtesting results are stored in a datastore, to be fetched by users using DSW.
Figure 6. Results from our Uber trips prediction model for three cities and four backtesting windows suggests that model prediction at April will have highest error for the Uber ridesharing apps during a given year for City 1, while prediction error will be highest in March for City 2 and February for City 3 (error numbers are made for demo purpose, they are not real error number). The y-axis depicts the MAPE error, while the x-axis highlights the start date of the prediction window.
Figure 7. Our backtest service has the potential to provide actionable recommendations for improvement based on historical data.
Sam Xiao

Sam Xiao

Sam Xiao is a senior software engineer on Uber’s Finance Intelligence team who leads the development and strategy behind backtesting services and other engineering initiatives to bring intelligent insights to financial forecasting and other areas.

Haoyu He

Haoyu He

Haoyu He is a software engineer on Uber’s Finance Intelligence team who builds and operates our backtesting platform to run financial models and optimize our ML-based financial planning solutions.

Pooja Kodavanti

Pooja Kodavanti

Pooja Kodavanti is a software engineer on Uber's Finance Intelligence team who works on the solutions that help us integrate forecast models into Uber’s financial planning tools.

Ethan Meng

Ethan Meng

Ethan Meng is a software engineer on Uber’s Finance Intelligence team who works on the backtesting service architecture and engineering excellence to improve the system performance.

Posted by Sam Xiao, Haoyu He, Pooja Kodavanti, Ethan Meng

Category: