Skip to footer

Data Analytics - search results

Engineering Data Analytics with Presto and Apache Parquet at Uber


From determining the most convenient rider pickup points to predicting the fastest routes, Uber uses data-driven analytics to create seamless trip experiences. Within engineering, analytics inform decision-making processes across the board. As we expand to new markets, the ability to

Turbocharging Analytics at Uber with our Data Science Workbench


Millions of Uber trips take place each day across nearly 80 countries, generating information on traffic, preferred routes, estimated times of arrival/delivery, drop-off locations, and more that enables us to facilitate better experiences for users.

To make our data exploration

Automating Merchant Live Monitoring with Real-Time Analytics: Charon


At Uber, live monitoring and automation of Ops is critical to preserve marketplace health, maintain reliability, and gain efficiency in markets. By the virtue of the word “live”, this monitoring needs to show what is happening now, with prompt access

Uber’s Journey Toward Better Data Culture From First Principles

Data powers Uber

Uber has revolutionized how the world moves by powering billions of rides and deliveries connecting millions of riders, businesses, restaurants, drivers, and couriers. At the heart of this massive transportation platform is Big Data and Data Science

Fast and Reliable Schema-Agnostic Log Analytics Platform


At Uber, we provide a centralized, reliable, and interactive logging platform that empowers engineers to work quickly and confidently at scale. The logs are tagged with a rich set of contextual key value pairs, with which engineers can slice and

Turning Metadata Into Insights with Databook

Every day in over 10,000 cities around the world, millions of people rely on Uber to travel, order food, and ship cargo. Our apps and services are available in over 69 countries and run 24 hours a day. At our

Building a Large-scale Transactional Data Lake at Uber Using Apache Hudi

From ensuring accurate ETAs to predicting optimal traffic routes, providing safe, seamless transportation and delivery experiences on the Uber platform requires reliable, performant large-scale data storage and analysis. In 2016, Uber developed Apache Hudi, an incremental processing framework, to power

Inside Uber ATG’s Data Mining Operation: Identifying Real Road Scenarios at Scale for Machine Learning


How did the pedestrian cross the road?

Contrary to popular belief, sometimes the answer isn’t as simple as “to get to the other side.” To bring safe, reliable self-driving vehicles (SDVs) to the streets at Uber Advanced Technologies Group (ATG)

Introducing Athenadriver: An Open Source Amazon Athena Database Driver for Go

Data analytics play a critical part in Uber’s decision making, driving and shaping all aspects of the company, from improving our products to generating insights that inform our business. To ensure timely and accurate analytics, the aggregated, anonymous data that

Designing a Production-Ready Kappa Architecture for Timely Data Stream Processing


At Uber, we use robust data processing systems such as Apache Flink and Apache Spark to power the streaming applications that helps us calculate up-to-date pricing, enhance driver dispatching, and fight fraud on our platform. Such solutions can process

Uber’s Data Platform in 2019: Transforming Information to Intelligence

Uber’s busy 2019 included our billionth delivery of an Uber Eats order, 24 million miles covered by bike and scooter riders on our platform, and trips to top destinations such as the Empire State Building, the Eiffel Tower, and the

Building a Better Big Data Architecture: Meet Uber’s Presto Team

Uber’s daily operations generate data, such as the number of trip requests or food orders at any given time, that can show us how and where to improve our services. However, this information is only truly useful if we can

Uber Joins LF Presto Foundation to Advance Open Source Analytics

From determining the most efficient routes to predicting rider demand, aggregated data-driven analytics are foundational to Uber’s ability to deliver seamless mobility experiences on our platform. 

Powering many of our analytics services, Presto, an open source, distributed SQL query

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Maintaining Uber’s large-scale data warehouse comes with an operational cost in terms of ETL functions and storage. In our experience, optimizing for operational efficiency requires answering one key question: for which tables does the maintenance cost supersede utility? Once identified,

Science at Uber: Building a Data Science Platform at Uber

At Uber, we take advanced research work and use it to solve real world problems. In our Science at Uber video series, Uber employees talk about how we apply data science, artificial intelligence, machine learning, and other innovative technologies in

Consistent Data Partitioning through Global Indexing for Large Apache Hadoop Tables at Uber


Data serves little purpose if we cannot find it. Looking up individual records in the 100-plus petabytes of data accumulated at Uber lets us perform updates and gather useful insights to help improve our services, such as delivering more accurate

Solving Big Data Challenges with Data Science at Uber


The data involved in serving millions of rides and food deliveries on Uber’s platform doesn’t just facilitate transactions, it also helps teams at Uber continually analyze and improve our services. When we launch new services, we can quickly measure success,

Accessible Machine Learning through Data Workflow Management


Machine learning (ML) pervades many aspect of Uber’s business. From responding to customer support tickets, optimizing queries, and forecasting demand, ML provides critical insights for many of our teams.

Our teams encountered many different challenges while incorporating

Introducing AresDB: Uber’s GPU-Powered Open Source, Real-time Analytics Engine


At Uber, real-time analytics allow us to attain business insights and operational efficiency, enabling us to make data-driven decisions to improve experiences on the Uber platform. For example, our operations team relies on data to monitor the market health and

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber is committed to delivering safer and more reliable transportation across our global markets. To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks