The Uber Engineering Tech Stack, Part II: The Edge and Beyond
Uber’s mission is transportation as reliable as running water, everywhere, for everyone. Last time, we talked about the foundation that powers Uber Engineering. Now, we’ll explore the parts of the stack that face riders and drivers, starting with the world of Marketplace and moving up the stack through web and mobile.
Marketplace, the frontmost end of the Uber engine, funnels the real-world, real-time requests and locations into the engineering chutes and ladders of Uber. The persistence layer, matching system, and real-time transaction pieces live here. It also houses much of the logic for products like UberRUSH and UberEATS. Marketplace has the highest availability requirements at Uber.
To understand Marketplace, it’s important to remember the flexible influence all parts of Uber Engineering have on each other. The infrastructure at the bottom supports everything above it, but direction and features from the very top trickle down into the base. Marketplace builds for itself, but its technologies get picked up by layers above and below.
Marketplace has a mini Uber stack, as do many other teams at Uber. Within Marketplace itself, engineers build infrastructure and data solutions just for Marketplace. There’s a data team, an integrations team, front-end engineers, backend engineers, and services written in all four of our programming languages (Python, Node, Go, Java). This tiered set of systems ensures that Uber is highly available and largely immune to failure.
Uber’s core trip execution engine was originally written in Node.js because of its asynchronous primitives and simple, single-threaded processing. (In fact, we were one of the first two companies to deploy Node.js in production.) Node.js gives us the ability to manage large quantities of concurrent connections. We’ve now written many services in Go, and this number continues to increase. We like Go for its concurrency, efficiency, and type-safe operations.
The frontline API for our mobile apps consists of over 600 stateless endpoints that join together multiple services. It routes incoming requests from our mobile clients to other APIs or services. It’s all written in Node.js, except at the edge, where our NGINX front end does SSL termination and some authentication. The NGINX front end also proxies to our frontline API through an HAProxy load balancer.
This part of Marketplace integrates with a number of internal infrastructure initiatives. Engineers on this team use the open source module logtron to log to disk and Kafka. We generate stats using the uber-statsd-client module (the Node.js client for statsd), which talks to our in-house M3 (previously described).
Highly Available, Self-Healing, Persistent
Having to support the highest availability demands, the Marketplace stack must receive and execute in real time. Even brief interruptions in this area have major consequences for our users and our business. Much of Marketplace’s stack was built by and for Marketplace engineers first.
Ringpop, a library for building cooperative distributed systems, solved some of Marketplace’s problems before its adoption in other teams at Uber and beyond. It gives the high-availability, partition-tolerant properties of distributed databases like DynamoDB or Riak to developers at the application level.
The systems that handle pings from riders and drivers in real-time and then match them are written in Node.js and Go. These teams use Ringpop and Sevnup for cooperation and shifting of object ownership when a node in a hashring goes down, or when another node takes ownership of the keyspace. Riak is their distributed database. Redis provides caching.
Speed and Throughput
The engineers who build cross-functional tools for adoption across the organization use Cassandra and Go more heavily than other teams at Uber, the main reason being speed. Cassandra scales well out of the box, and Go compiles extremely fast.
Throughput is also crucial for Marketplace teams. They must be able to handle the largest quantities of traffic, since all requests go through Marketplace. Inundated with queries on our busiest nights of the year, Marketplace systems must take the hit; otherwise, requests don’t even get the chance to hit other parts of Uber.
Optimizing and Balancing
Marketplace teams control optimization and balance through dynamic pricing, supply positioning, intelligent matching, and health. Much of this stack was built in Python with Flask and uWSGI, but we’re rewriting most Python with Go for higher performance. Blocks on network calls and I/O slowed our services in weird ways, requiring more capacity and services provisioned to get the same request throughput. Python was useful for talking to a MySQL back end, but we move away from the MySQL primary-secondary setup with every Riak and Cassandra cluster.
Seeing and Using Data
Data engineers within Marketplace use a flow of databases, homegrown solutions, and open external technologies for data processing, streaming, querying, machine learning, and graph processing.
For data streaming, we use Kafka and Uber’s production databases. Hive, MapReduce, HDFS, Elasticsearch, and file storage web services all contribute to the purposeful data storage and operations we perform. We developed a different kind of LIDAR from what you’re used to. The Ledger of Interactive Data Analysis Records, only shared internally for now, runs JupyterHub for multiuser Jupyter (IPython) Notebooks, integrated with Apache Spark and our data platform. Marketplace Data works independently from the Data Engineering team, but much of their stacks overlap.
Above Marketplace, the web and mobile sides are a different world.
Top: Web and Mobile
Our web and mobile engineers share many elements with the lower parts of the stack, but many technologies are unique to the top. Engineers along these branches build the app you know, along with libraries and frameworks for all web and mobile engineers to use. Teams in this part of the org prioritize user experience and accessibility.
Web and product teams collaborate to create and promote modular, separately-deployed web apps with shared user interfaces and a unified user experience.
Our base web server, Bedrock, is built on top of the widely popular web framework Express.js, which has a set of default middleware to provide security, internationalization, and other Uber-specific pieces that handle infrastructure integration.
Our in-house service communication layer, Atreyu, handles communication with backend services and integrates with Bedrock. Atreyu lets us make requests to our SOA service APIs easily, similar to Falcor or Relay.
Rendering, State Handling, and Building
We use React.js and standard Flux for our application rendering and state handling, though a few teams have been experimenting with Redux as our future state container. We are also migrating our existing OOCSS/BEM-style CSS toolkit, called Superfine, into a set of CSS-encapsulated React.js UI components built using style objects—think Radium.
Our build system, Core Tasks, is a standard set of scripts to compile and version front-end assets, created on top of Gulp.js, that publishes to the file storage web service, allowing us to take advantage of a CDN service.
Finally, we use an internal NPM registry to access the huge collection of public-registry packages as well as publish internal-only packages. Any engineer can publish to it, which lets us share modules and React.js components between teams with ease.
Uber once had a strictly mobile org. Now, we have a cross-functional organization for what we call program teams. Each interdisciplinary program team has members from back end to design and data science.
Uber’s iOS engineers, as expected, write in Objective C and Swift, while Android engineers write in Java. Some work with React components as well. Swift enables more static analysis and compile time safety; it makes it more difficult to write incorrect code. We’re excited about protocol oriented programming. We’re moving toward a modular library based system in mobile.
At these top branches, we use third-party libraries or build our own to fit specific needs. Many open source libraries available are general-purpose, which can create binary bloat. For mobile engineering, every kilobyte matters.
The Espresso extension enables us to write a lot of native automation code using familiar Android SDKs within the IDE (we use Android Studio). Regarding architecture, we use RxJava to simplify how we do asynchronous and event-based programming. For logging, we use Timber.
All of our iOS code lives in a monorepo that we build with Buck. Masonry and SnapKit with Auto Layout help with component placement and sizing. For crash detection, we use KSCrash and report the crashes using our internal reporting framework. For testing in Objective-C, we use OCMock to mock and stub classes. For testing in Swift, we generate mocks via protocols.
We have four primary apps: Android rider, Android driver, iOS rider, and iOS driver. That means that hundreds of engineers on each platform land code into a monolithic code base that ships once a week, and we have no ability to roll out quickly if anything goes wrong. Instead, we have to build systems to make this type of development reliable.
Mobile development is one hundred percent trunk development and train releases. We still use Git to store our software versions over time, but all engineers working on the apps commit directly to master. So many people branching and landing creates too much risk. Instead, we use an in-house service and application configuration platform that’s easy to use and build on top of, enabling stakeholders to effect change in Uber’s services and businesses. The platform uses feature flags to enable and disable code pass from the server side. We launch dark, turn it on, and monitor the rollout closely.
Engineers don’t need to think about the build train; they just land incrementally behind the feature flag. Rather than a manual QA process, we invest heavily in automation and monitoring. Our continuous integration and enforcement keeps us scaling fast, and monitoring lets auto-responsive systems catch and correct any flawed commits.
Stacks on Stacks on Stacks
Part of what makes Uber’s tech stack difficult to cover is that there’s no definitive set of rules. When many people think of a tech stack, they picture one totem pole, with the infrastructure at the ground and the user-facing feature tools at the top. There are clear layers and boundaries.
Uber, however, has microcosms of a full stack at almost every level—mobile feature teams have front-end and backend engineers working together, and teams choose whichever data storage solutions best meet a project’s unique needs. Some teams, like Money in finance engineering (which we’ll cover in its own article), have stacks that warrant standalone explanations.
Overall, what makes our work interesting is not the stack we use; it’s operating with great speed at scale as we handle real-life transactions in real time. The technologies that make Uber happen are subject to change. Our speed, flexibility, sense of urgency, and drive to overcome challenges will persist. Sound fun? Join us.
Photo Credit: “Chapman’s Baobab” by Conor Myhrvold, Botswana.
Header Explanation: Baobab trees are renowned for their resilience, longevity, and thick trunk and branches. Chapman’s Baobab in the Kalahari desert is one of Africa’s oldest trees.
Correction, Tuesday, August 2, 2016: An earlier version of this article referred incorrectly to mobile tools and libraries. We use Masonry, not Mantle, for layout. We also updated the section with a more accurate overview of our mobile stack.