Skip to footer

Applying Machine Learning in Internal Audit with Sparsely Labeled Data

As machine learning continues to evolve, transforming the various industries it touches, it has only begun to inform the world of audit. As a data scientist and former CPA Auditor, I can understand why this is the case. By nature, auditing is a field that focuses on the fine details and investigates any exceptions, while machine learning typically seeks...

How Uber Deals with Large iOS App Size

The App Size Problem Uber’s iOS mobile Apps for Rider, Driver, and Eats are large in size. The choice of Swift as our primary programming language, our fast-paced development environment and feature additions, layered software and its dependencies, and statically linked platform libraries result in large app binaries. Reducing application size is critical to our customer experience. Moreover, Apple’s app-download-size...

Evolving Schemaless into a Distributed SQL Database

Introduction In 2016 we published blog posts (I, II) about Schemaless - Uber Engineering’s Scalable Datastore. We went over the design of Schemaless as well as explained the reasoning behind developing it. In this post today we are going to talk about the evolution of Schemaless into a general-purpose transactional database called Docstore.  Docstore is a general-purpose multi-model database that provides...

Fast and Reliable Schema-Agnostic Log Analytics Platform

At Uber, we provide a centralized, reliable, and interactive logging platform that empowers engineers to work quickly and confidently at scale. The logs are tagged with a rich set of contextual key value pairs, with which engineers can slice and dice their data to surface abnormal or interesting patterns that can guide product improvement. Right now, the platform is...

Uber’s Real-time Data Intelligence Platform At Scale: Improving Gairos Scalability/Reliability

Background Real-time data (# of ride requests, # of drivers available, weather, game) enables operations teams to make informed decisions like surge pricing, maximum dispatch ETA calculating, and demand/supply forecasting about our services that improve user experiences on the Uber platform. While batched data can provide powerful insights by identifying medium-term and long-term trends, Uber services can combine streaming data...

The Journey Towards Metric Standardization

At Uber, business metrics are vital for discovering insights about how we perform, gauging the impact of new products, and optimizing the decision making process. The use cases for metrics can range from an operations member diagnosing a fares issue at the trip level to a machine learning model for dynamic pricing that shapes a balanced and robust marketplace...

Disaster Recovery for Multi-Region Kafka at Uber

Apache Kafka at Uber Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day. As Figure 1 shows, today we position Apache Kafka as a cornerstone to Uber’s technology stack and build a complex ecosystem on top of it to empower a large number of different...

Uber’s Real-Time Push Platform

Uber builds multi-sided marketplaces handling millions of trips every day across the globe. We strive to build real-time experiences for all our users. The nature of real time marketplaces make them very lively. Over the course of a trip, there are multiple participants that can modify and view the state of an ongoing trip and need real-time updates. This creates...

No Code Workflow Orchestrator for Building Batch & Streaming Pipelines at Scale

Data-In-Motion @ Uber At Uber, several petabytes of data move across and within various platforms every day. We power this data movement by a strong backbone of data pipelines. Whether it’s ingesting the data from millions of Uber trips or transforming the ingested data for analytical and machine learning models, it all runs through these pipelines. To put it in...

Horovod v0.21: Optimizing Network Utilization with Local Gradient Aggregation and Grouped Allreduce

We originally open-sourced Horovod in 2017, and since then it has grown to become the standard solution in industry for scaling deep learning training to hundreds of GPUs.  With Horovod, you can reduce training times from days or weeks to hours or minutes by adding just a few lines of Python code to an existing TensorFlow, PyTorch, or Apache...

Turning Metadata Into Insights with Databook

Every day in over 10,000 cities around the world, millions of people rely on Uber to travel, order food, and ship cargo. Our apps and services are available in over 69 countries and run 24 hours a day. At our global scale, these activities generate large amounts of logging & operational data that runs through our systems in real-time....

Meet the 2020 Safety Engineering Interns: COVID Edition

About the Safety team & What we do Uber is dedicated to keeping people safe on the road. The Safety and Insurance Engineering team is at the core of Uber’s business. We work to redefine what it takes to be safe on the roads at a global scale. Our technology enables us to focus on rider safety before, during, and...

Operating Apache Pinot @ Uber Scale

Introduction Uber has a complex marketplace consisting of riders, drivers, eaters, restaurants and so on. Operating that marketplace at a global scale requires real-time intelligence and decision making. For instance, identifying delayed Uber Eats orders or abandoned carts helps to enable our community operations team to take corrective action. Having a real-time dashboard of different events such as consumer demand,...

Building from the Baltics: Meet the Uber Engineering Team in Vilnius, Lithuania

The Uber Vilnius office is home to members of our Production Engineering, Infrastructure, Storage Platform, and Developer Tools team.

Ludwig v0.3 Introduces Hyperparameter Optimization, Transformers and TensorFlow 2 support

In February 2019, Uber released Ludwig, an open source, code-free deep learning (DL) toolbox that gives non-programmers and advanced machine learning (ML) practitioners alike the power to develop models for a variety of DL tasks. With use cases spanning text classification, natural language understanding, image classification, and time series forecasting, among many others, Ludwig gives users the ability to...

Revolutionizing Money Movements at Scale with Strong Data Consistency

Uber as a platform invites its users to leverage it, earn from it, and be delighted by it. Serving more than 18 million requests per day, in 10,000+ cities, has enabled people to move freely and to think broadly while earning a livelihood on it. As one of the underlying engines, Uber Money fulfills some of the most important...

Spearheading Open Source: A Conversation with Jim Jagielski, Staff Technical Program Manager with the Uber Open Source Program Office

Jim Jagielski's fascination with open source software began out of necessity. He was working at NASA Goddard in the 1980s, and the agency had just received fancy new Macintosh computers loaded with Apple's new A/UX operating system. There was only one problem: None of the tools Jagielski needed ran on A/UX. It fell to Jagielski to port everything himself. "That’s...

Designing Edge Gateway, Uber’s API Lifecycle Management Platform

The making of Edge Gateway, the highly-available and scalable self-serve gateway to configure, manage, and monitor APIs of every business domain at Uber. Evolution of Uber's API gateway In October 2014, Uber had started its journey of scale in what would eventually turn out to be one of the most impressive growth phases in the company. Over time we were scaling...

Standing for Safety: Meet the Uber Sao Paulo Tech Team

Located in the heart of Latin America’s largest city, the Uber Sao Paulo Tech Center was founded in late 2018 as a company-wide hub for Safety Tech. The team is composed of product managers, UX designers, engineers and data scientists. As part of Uber’s mission to put the safety of our users first, our Sao Paulo-based Tech team is...

Introducing Domain-Oriented Microservice Architecture

Introduction Recently there has been substantial discussion around the downsides of service oriented architectures and microservice architectures in particular. While only a few years ago, many people readily adopted microservice architectures due to the numerous benefits they provide such as flexibility in the form of independent deployments, clear ownership, improvements in system stability, and better separation of concerns, in recent...

Popular Articles