Tag: HIVE
Marmaray: An Open Source Generic Data Ingestion and Dispersal Framework and Library for Apache...
Today we introduce Marmaray, an open source framework allowing data ingestion and dispersal for Apache Hadoop, realizing our vision of any-sync-to-any-source functionality, including data format validation.
Databook: Turning Big Data into Knowledge with Metadata at Uber
Databook, Uber's in-house platform for surfacing and exploring contextual metadata, makes dataset discovery and exploration easier for teams across the company.
Scaling Uber’s Apache Hadoop Distributed File System for Growth
Uber's Data Infrastructure team overhauled our approach to scaling our storage infrastructure by incorporating several new features and functionalities, including ViewFs, NameNode garbage collection tuning, and an HDFS load management service.
Queryparser, an Open Source Tool for Parsing and Analyzing SQL
Written in Haskell, Queryparser is Uber Engineering's open source tool for parsing and analyzing SQL queries that makes it easy to identify foreign-key relationships in large data warehouses.
Turbocharging Analytics at Uber with our Data Science Workbench
Uber Engineering's data science workbench (DSW) is an all-in-one toolbox that leverages aggregate data for interactive analytics and machine learning.
Engineering Restaurant Manager, our UberEATS Analytics Dashboard
The UberEATS Restaurant Manager gives restaurant partners insight into their business by measuring customer satisfaction, sales, and service quality.
Engineering Uber Predictions in Real Time with ELK
Uber Engineering architected a real-time trip features prediction system using an open source RESTful search engine built with Elasticsearch, Logstash, and Kibana (ELK).
Engineering Data Analytics with Presto and Apache Parquet at Uber
Snap your fingers and presto! How Uber Engineering built a fast, efficient data analytics system with Presto and Parquet.
Building an Intelligent Experimentation Platform with Uber Engineering
Composed of a staged rollout and intelligent analytics tool, Uber Engineering's experimentation platform is capable of stably deploying new features at scale across our apps. In this article, we discuss the challenges and opportunities we faced when building this product.
Redesigning Uber Engineering’s Mobile Content Delivery Ecosystem
How Uber Engineering re-architected the content delivery feed and backend ecosystem of our new driver app to deliver an enhanced user experience.
Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop
Uber Engineering's data processing platform team recently built and open sourced Hudi, an incremental processing framework that supports our business critical data pipelines. In this article, we see how Hudi powers a rich data ecosystem where external sources can be ingested into Hadoop in near real-time.
Designing Euclid to Make Uber Engineering Marketing Savvy
In this article, we take a look at Euclid, Uber Engineering's Hadoop and Spark-based in-house marketing platform.
The Uber Engineering Tech Stack, Part II: The Edge and Beyond
The end of a two-part series on the tech stack that Uber Engineering uses to make transportation as reliable as running water, everywhere, for everyone, as of spring 2016.
Streamific, the Ingestion Service for Hadoop Big Data at Uber Engineering
Here we look at Hadoop data ingestion, and how Uber Engineering streams diverse data into a cohesive layer for querying in near real-time using our in-house developed Streamific.