Today we introduce Marmaray, an open source framework allowing data ingestion and dispersal for Apache Hadoop, realizing our vision of any-sync-to-any-source functionality, including data format validation.
Databook, Uber's in-house platform for surfacing and exploring contextual metadata, makes dataset discovery and exploration easier for teams across the company.
Uber's Data Infrastructure team overhauled our approach to scaling our storage infrastructure by incorporating several new features and functionalities, including ViewFs, NameNode garbage collection tuning, and an HDFS load management service.
Written in Haskell, Queryparser is Uber Engineering's open source tool for parsing and analyzing SQL queries that makes it easy to identify foreign-key relationships in large data warehouses.
Uber Engineering's data science workbench (DSW) is an all-in-one toolbox that leverages aggregate data for interactive analytics and machine learning.
The UberEATS Restaurant Manager gives restaurant partners insight into their business by measuring customer satisfaction, sales, and service quality.
Uber Engineering architected a real-time trip features prediction system using an open source RESTful search engine built with Elasticsearch, Logstash, and Kibana (ELK).
Snap your fingers and presto! How Uber Engineering built a fast, efficient data analytics system with Presto and Parquet.
Composed of a staged rollout and intelligent analytics tool, Uber Engineering's experimentation platform is capable of stably deploying new features at scale across our apps. In this article, we discuss the challenges and opportunities we faced when building this product.
How Uber Engineering re-architected the content delivery feed and backend ecosystem of our new driver app to deliver an enhanced user experience.
Uber Engineering's data processing platform team recently built and open sourced Hudi, an incremental processing framework that supports our business critical data pipelines. In this article, we see how Hudi powers a rich data ecosystem where external sources can be ingested into Hadoop in near real-time.
In this article, we take a look at Euclid, Uber Engineering's Hadoop and Spark-based in-house marketing platform.
The end of a two-part series on the tech stack that Uber Engineering uses to make transportation as reliable as running water, everywhere, for everyone, as of spring 2016.
Here we look at Hadoop data ingestion, and how Uber Engineering streams diverse data into a cohesive layer for querying in near real-time using our in-house developed Streamific.