By Ting Chen

Last week, we highlighted some presentations delivered during our second annual Uber Technology Day. In this article, engineering data science manager and Uber Tech Day presenter Ting Chen discusses how we leverage cutting edge systems to tackle fraud on our platform. 

 

Fraud has a direct impact on Uber as well as user experiences on the platform. To combat bad actors,  Uber has a dedicated team of anti-fraud analysts, data scientists, and UX experts who work collaboratively on this issue. As part of this effort, we build internal services that help us continually monitor and respond to changes in the ever-evolving fraud landscape.

These services look for errant behaviors, actions that would not have been taken by legitimate users. Using our fraud fighting technologies, we can, for instance, differentiate between actual trips and those created by GPS spoofing, or analyze how our apps are being used to reveal fraudsters.  

In this article, we detail some types of fraud and the technologies we use to counter them.

 

Types of fraud

At Uber, we deal with multiple types of fraud, such as payment fraud, incentive abuse, and compromised accounts. We outline these categories, below:

WeChat window
Figure 1: A fraudster offers discounted trips via WeChat.

Payment fraud

Payment fraud happens when bad actors use stolen credit cards to pay for Uber trips. Typically, when credit card owners discover unauthorized transactions on their accounts, they call the bank or credit card company to dispute it, and Uber refunds the charge. In order to maximize the profit from stolen credit cards, fraudsters don’t take these trips themselves. Instead, working as an agent service, they advertise discounted trip services on websites and chat forums to other people.

Incentive abuse

Uber frequently offers new users a credit for signing up or referring friends, as well as bonuses for drivers who complete a certain amount of trips within a given time period. Fraudsters try to take advantage of these incentives by creating fake accounts to earn new user and referral credits or simulating fake trips to earn a driver bonus.

Driver incentives
Figure 2: Uber’s driver-partners earn cash incentives by completing set numbers of trips.

Compromised accounts

Fraudsters also use phishing techniques to access rider and driver accounts. With a rider account, a fraudster can offer agent services, selling rides to other people. Access to a driver account might let a fraudster withdraw money. Phishing techniques usually include emails, text messages, or phone calls to trick users into giving up their passwords and two-factor authentication codes.

Fraudulent text message
Figure 3: A scam message with a link to a fake payment site asks for a rider’s banking details.

 

Detection systems

Fraud fighting at scale is a challenging task. We are not fighting against a small number of individuals, but large, well-organized criminal communities equipped with advanced technologies and excellent customer service. However, we have developed even more advanced technologies to help combat this problem.

GPS spoofing detection

We have seen bad actors use GPS spoofing apps to create fake locations on a phone in order to simulate a real trip in the hopes of getting paid through a driver account. The standard technique involves a fraudster creating a new rider account, adding a stolen credit card, and using that account to pay for a spoofed/fake trip from their driver account. The credit card is cashed out and payment goes to the fraudulent driver account. Sometimes fraudsters create multiple spoofed/fake trips to boost the total number of trips, so they can earn an incentive bonus from Uber. What they don’t know is that we can detect GPS spoofing and block these fraudulent payments.

Mock Locations app
Figure 4: This spoofing app sets default parameters, such as altitude and speed.

For example, take a look at this popular GPS spoofing app, shown in Figure 4, called Mock Location. In this app, the default configuration for altitude is 120 ± 10 meters. However, if we plot the altitude distribution of all the trips from a certain city with average altitude around 800 meters, we find there are a number of trips showing an altitude of 120 meters. These are fake trips created with the Mock Location app. We do not use this ad hoc rule for spoofing detection, since the configuration can be changed easily and there are other spoofing apps that are more sophisticated.

Altitude distribution graph
Figure 5: Altitude distribution of trips in a city with a high degree of fraudulent behavior.

 

Instead, we developed an altitude profile for all the geographic locations around the world by aggregating historical trip data. We then compare each trip’s altitude with the profile altitude. In this example, show below in Figure 6, you can easily see that a real trip’s altitude aligns closely with the earth’s surface, while the fake trips are flying in the sky or traveling underground.

Altitude distribution visualization

Altitude Distribution visualization labels
Figure 6: Real trips on Uber follow the geographical altitude of a city or region, while spoofed trips look like they are flying through the air or tunneling underground.

 

We used a similar computation for speed matching, shown below in Figure 7, with hourly speed profiles for global road segments on each day of the week. By comparing a trip’s speed to the speed profile, we can see what percentage of a trip travels at an abnormal speed, indicating the likelihood of it being fake.

Comparison of trips versus speed profiles
Figure 7: By developing a speed profile for roads in an area, we can see which trips deviate substantially from that profile, likely indicating GPS spoofing.

 

Location integrity as a defense strategy is a complex task and suffers from limitations in regions with few Uber trips. To bolster our fraud prevention, we identify suspicious regions by corresponding low trip probability with high fraudulent account sign-up rates. Combining these parameters with other signals, such as financial loss, device information, and trip-level or user-level features, we sample trips for manual review. We have a dedicated manual review team responsible for labeling trips as legitimate or fraudulent with a high degree of confidence, as well as discovering new fraud patterns. Finally, we built a high-precision machine learning model to detect trips created by GPS spoofing. Meanwhile, we can use deep learning models for anomaly detection and reduce the effort of engineering new features.

Diagram of fraud prevention workflow
Figure 8: Our fraud prevention workflow combines automated processes, deep learning, and manual review.

 

Sequence modeling to classify user behavior

Interaction patterns with Uber apps differ between normal users and fraudsters. When requesting a ride, most users follow a sequence of editing the drop-off location, moving the pin around on the map, viewing the prices of different product types, and clicking the trip request button. Fraudsters follow a different pattern optimized for them to make the most money as quickly as possible. These distinct usage patterns let us use Long Short Term Memory (LSTM) deep learning models to differentiate between the two.

Animated gif of app usage
Figure 9: Users interact with Uber’s app through different tap sequences. This GIF shows a legitimate user editing the drop off address, clicking different product types, and requesting a trip.

 

For example, a good user who is new to Uber typically spends time reviewing the product types, comparing the differences between uberPOOL, uberX, and UberBLACK. However, a fraudster who is offering agent services to other people will spend more time editing addresses, moving pins, and changing payment methods.  

Tap-stream models
Figure 10: On the left we model a tap-stream from a legitimate user, while the right side shows a fraudulent user’s tap-stream on the Uber app.

 

We view the tap-stream data as a time series and use one-hot encoding to represent each tap. We also append the timestamp to the vector to keep the time duration information. These vectors are the input to our LSTM model. The final activation layer is a probability score that predicts if a sequence of taps is from a bad user or not. The  second last layer is a 64-dimensional dense vector that can also be viewed as an encoded feature for the tap stream.

Tap-stream representation
Figure 11: One-hot encoding feature representation is achieved in the tap-stream.

     

Diagram of LSTM model
Figure 12: The LSTM model ingests the one-hot encoding representation as input.

 

We added both the probability score and the encoded feature to our baseline business model and demonstrated the effectiveness of using LSTM for tap-stream analysis. Our experimental results indicated that the scores and encoded features that were learned from LSTM show up as the most important features in the traditional models, and recalls of the models were dramatically lifted by up to 67 percent compared to the models without LSTM features, as depicted in Figure 13, below:

Recall graphs
Figure 13: We applied the scores and encoded features in our baseline model, and lifted the recall of the model by up to 67 percent for new users.

 

Moving forward

Fraud fighting is a long-term, continuous effort, since the fraud black market itself is very sophisticated and adapts to new products and new services over time. We need to build systems that maintain a good balance between providing a quick turnaround for fraud detection and offering a robust, stable, and scalable infrastructure.

 

Acknowledgments

Ting Chen, Xiao Cai, Marjan Baghaie, and Chenliang Yang were the key contributors from Uber Engineering who built the two fraud prevention technologies outlined in this article.

If solving complex problems through data science and machine learning interests you, consider joining our team.

Subscribe to our newsletter to keep up with the latest innovations from Uber Engineering.