In this article, we discuss how Uber Engineering uses Locality Sensitive Hashing on Apache Spark to reliably detect fraudulent trips at scale.
Uber Engineering's data processing platform team recently built and open sourced Hoodie, an incremental processing framework that supports our business critical data pipelines. In this article, we see how Hoodie powers a rich data ecosystem where external sources can be ingested into Hadoop in near real-time.
Imagine you have to store data whose massive influx increases by the hour. Your first priority, after making sure you can easily add storage capacity, is to try and reduce the data’s footprint to save space. But how? This is the story of Uber Engineering’s comprehensive encoding protocol and compression algorithm test and how this discipline saved space in our Schemaless datastores.
The details and examples of Schemaless triggers, a key feature of the datastore that’s kept Uber Engineering scaling since October 2014. This is the third installment of a three-part series on Schemaless; the first part is a design overview and the second part is a discussion of architecture.