Across the globe, nearly 1,250,000 people die in road crashes each year¹. At Uber, we’re determined to decrease this number by raising awareness of driving patterns to our partners.

In fact, an entire team at Uber focuses on building technology to encourage safer driving. On Uber Engineering’s Driving Safety team, we write code to measure indicators of unsafe driving and help driver partners stay safe on the road. We measure our success by how much we can decrease car crashes, driving-related complaints, and trips during which we detect unsafe driving.

Today we use harsh braking and acceleration as indicators of unsafe driving behavior. Harsh braking is highly correlated to unsafe behaviors like tailgating, aggressive driving, and losing focus on the road. For example, research from Progressive, a car insurance provider, has shown that harsh braking is a leading indicator for predicting future crashes. To detect these indicators of unsafe driving in the first place, we start with a few simple engineering problems.

### How Do We Measure Speed?

Before we can measure abrupt vehicle movement, we need to measure speed. And to understand speed, we have to understand how GPS works. Put simply, GPS is a system of 24 active satellites that orbit the Earth. The GPS receiver derives its position by determining its distance from at least four satellites.

The simplest way of deriving speed from position is by measuring the difference between two consecutive positions. If you know a location is $\fn_phv&space;\small&space;x_1$ at time $\fn_phv&space;\small&space;t_1$, and $\fn_phv&space;\small&space;x_2$ and time $\fn_phv&space;\small&space;t_2$, the average speed between those locations is $\fn_phv&space;\small&space;\frac{x_2&space;-&space;x_1}{t_2&space;-&space;t_1}$This value will approach the true speed as the frequency of measurements increases. While simple to implement, this method depends on GPS positional accuracy, which can be unreliable in urban environments, particularly around tall buildings.

We get a more accurate measurement of speed by using the Doppler shift, which occurs when a signal’s transmitter moves relative to its receiver. Fire truck sirens often illustrate the Doppler shift; the transmitter is the siren, and the receiver is your eardrum. The perceived pitch of the siren increases as the truck moves toward you, and decreases as the vehicle moves away.

GPS receivers on driver partner phones work in a similar way. The receiver (that is, the phone) is either moving toward or away from a satellite. The receiver’s velocity can be accurately derived from the difference between the expected signal’s frequency and its actual one. GPS can also take a measure of speed by looking at the rate at which the waves that carry the GPS signal change (this is called time-difference carrier positioning).

### Measuring Brakes and Accelerations

The next step is to derive brakes and accelerations from vehicle speed. Acceleration is defined as the rate of change of velocity. Therefore, once we measure the vehicle’s speed, we can determine the magnitude of the acceleration by calculating the derivative.

To start, the Uber Data team extracted a large set of GPS data into a hosted Jupyter notebook with the goal of transforming any arbitrary window of GPS data into a feature vector comprised of summary statistics. We did this by first extracting acceleration and braking events from the time series GPS data, and then by computing summary statistics on the observed events.

The process of extracting braking events from time series speed data is easiest visualized:

Once we had a series of braking and acceleration events, we computed a variety of descriptive statistics such as:

• The fraction of braking events exceeding 2 m/s² .
• The fraction of braking events exceeding just over 3 m/s² (7 mph), the threshold set by Progressive for a “hard brake” event.
• The maximum, 90th percentile, and median magnitude of all accelerations.

### Feature Engineering

So, how do we make sense of these features? In particular, we wanted to know which ones reliably indicated unsafe driver behavior. Fortunately, we had a large corpus of training data: rider feedback. We obtained a set of positive labels from rider feedback—trips with low passenger ratings indicating dangerous driving behavior—and with this set trained a basic machine learning model to validate that our feature set had predictive power on bad rider experiences. (In this case, we used scikit-learn’s logistic regression.) Specifically, we used an L2 regularized regression and evaluated the model’s ROC-AUC, iterating on the feature set until we had a satisfactory model.

### Processing Data at Scale

Finally, we had to put these learnings in production and efficiently process GPS data at Uber’s scale:

The above diagram shows how partner phone data flows through our architecture. We process and store GPS data from trips in our Trips Service. Trip data is then published to a Kafka topic and consumed by many other internal services, one of which is our Vehicle Movement Processor. This service produces a feature vector of driving behavior (num_hard_brakes, peak_accel_magnitude, etc) to yet another Kafka topic to be consumed by more services. All data from Kafka eventually lands into HDFS for long-term storage. We can run batch analysis from our HDFS cluster using tools like Hive and Spark. For example, we can compute daily city-level averages for hard brakes. Then, we index this data with our Elasticsearch cluster for low-latency reads and expose a simple API through the Vehicle Movement Gateway.

This architecture has a number of advantages:

• The architecture is fault tolerant. Each service is deployed across multiple hosts in multiple datacenters, and each data store is distributed in nature.
• The architecture scales horizontally. We monitor the performance of each component in the architecture and can easily add more nodes if we experience high load.
• The architecture is flexible. Any service can start consuming from one of our Kafka topics without compromising the health of the system.

This article shows one example of the work we are doing to improve safety through engineering. Curious to know more about what we are up to? If you’re interested in machine learning and signal processing, distributed systems, or building delightful products, look at Uber’s engineering openings in safety and telematics.