Skip to main content
Uber logo

Schedule rides in advance

Reserve a rideReserve a ride

Schedule rides in advance

Reserve a rideReserve a ride
Engineering

Detecting Abuse at Scale: Locality Sensitive Hashing at Uber Engineering

May 9, 2017 / Global
Featured image for Detecting Abuse at Scale: Locality Sensitive Hashing at Uber Engineering
Image
Figure 1: Wikipedia articles are represented as titles and content in Spark.
Image
Figure 2: After feature engineering our code, the contents of Wikipedia articles are converted to binary sparse vectors.
Figure 3: MinHashLSH adds a new column to store hashes, with each hash represented as an array of vectors.
Image
Figure 4: An approximate nearest neighbor search finds Wikipedia articles related to the “united states.”
Image
Figure 5: An approximate similarity join lists similar Wikipedia articles, setting the number of hash tables.
Figure 6: With numHashTables=5, approximate nearest neighbor ran 2x faster than full scan (as shown on right). With numHashTables=3, approximate similarity join ran 3x-5x faster than full join and filter (as shown on left).

Posted by Yun Ni, Kelvin Chu, ryanreynolds@uber.com

Category: