Interested in accelerating your career by tackling some of Uber’s most challenging AI problems? Apply for the Uber AI Residency, a research fellowship dedicated to fostering the next generation of AI talent.
Uber Engineering's data processing platform team recently built and open sourced Hoodie, an incremental processing framework that supports our business critical data pipelines. In this article, we see how Hoodie powers a rich data ecosystem where external sources can be ingested into Hadoop in near real-time.
Imagine you have to store data whose massive influx increases by the hour. Your first priority, after making sure you can easily add storage capacity, is to try and reduce the data’s footprint to save space. But how? This is the story of Uber Engineering’s comprehensive encoding protocol and compression algorithm test and how this discipline saved space in our Schemaless datastores.
The details and examples of Schemaless triggers, a key feature of the datastore that’s kept Uber Engineering scaling since October 2014. This is the third installment of a three-part series on Schemaless; the first part is a design overview and the second part is a discussion of architecture.