How Uber Engineering Massively Scaled Global Driver Onboarding
Here’s the behind-the-scenes story about how Uber Engineering’s Driver Team continues to develop our virtual onboarding funnel to get hundreds of thousands of driver-partners on the road earning money with Uber.
The Consequences of Scale for Driver-Partners
Our team cares about growth. We built the first partner web onboarding process to solve our scaling issues, and we had to completely revise that initial version for the same reason. Uber grows fast.
Signing up as a rider with Uber might be straightforward, but partnering with Uber is inherently a more complex process. As late as 2013, onboarding was purely manual. Potential driver-partners had to go to a local Uber office—before we created regional Greenlight locations—and go through a series of paperwork with an operations manager to create an account.
After six years of operation, we recently reached 2 billion trips on our platform. Uber is now in 400+ cities and 70+ countries. Delivering rides in all these places means we need to attract and approve driver-partners all over the world. Each week, hundreds of thousands of new drivers take their first trip and start earning money on our platform. At this scale, doing everything in person is not a solution. We had to make a better experience for our driver-partners, so we took this on as an engineering challenge.
The Web Onboarding MVP
We went into our driver-partner onboarding centers (what are now known as our local Greenlight hubs) and analyzed in detail how the process worked. In most cases, when someone wanted to become a driver-partner, the routine was short:
- What is your vehicle?
- Can you go through a screening process?
- Provide your driver documents (driver’s license, vehicle registration, and vehicle insurance).
- Watch a video on how the Uber app works.
- Wait a few days for the screening process. If OK, you are finally activated.
To allow driver-partners to go through these steps online, we built an MVP of what we called Web Onboarding with several components:
- Monolithic app
- Flask, a Python microframework
- Server-side template rendering with Jinja2
- Simple HTTP calls (no AJAX) to submit forms and transitions between steps
- jQuery when additional design and DOM manipulation was needed
- onboarding_steps represented a step (e.g., vehicle or screening_process) for a pair of city and product (e.g., uberX in San Francisco)
- onboarding_step_instances represented a user’s status in a step (i.e., incomplete or complete).
When a user successfully submitted a step, we marked it as complete and found the next incomplete step. The steps varied slightly by city, but overall users had the same experience. Very simply, Web Onboarding allowed people to become driver-partners from any location.
Scalability Challenges for Our Service
Soon enough, Uber outgrew its Web Onboarding.
Web Onboarding worked and served us well, but at the rate at which we grew, more cities and countries brought additional challenges:
- Screening processes and regulations vary across different states and countries.
- Some countries have a very long list of required documents for driver-partners , some taking multiple days or weeks to obtain.
- Onboarding flows vary. Adding new products (UberEATS, uberCOMMUTE, Xchange) challenged our architecture, since we originally designed our system to solve one specific problem: onboarding ridesharing driver-partners.
- Mobile app support wasn’t scaling. Because we did server-side rendering coupled with business and display logic, we couldn’t reuse the same back end. This meant we built every feature twice, creating a huge amount of code duplication.
As an example, consider the Paris onboarding flow chart. Each teal diamond shows a possible alternative to the regular onboarding process:
Keeping high engineering productivity and freedom to build our product the way we needed became very hard. Rather than continuing to hack new features onto the service, we paused to think of a longer-term solution.
Rethinking for an Optimized, Scalable Solution
When we built our original iOS and Android onboarding apps simultaneously, we tried to avoid as much client work duplication as possible.
Our initial approach to reduce client-side work duplication was a flexible framework to render a form-based UI on both iOS and Android. We built a full protocol that allowed us to translate a set of components defined in a JSON schema into native widgets that were then rendered by mobile clients. The goal was to get the best of both web and native worlds: a UI that feels fast and integrated into the system while being completely, remotely, and instantly modifiable. It let us update or add steps on a weekly basis to keep up with changing requirements without slowing users down.
This also allowed very fast iterations and supported as many clients as we wanted, but as we started facing the challenges brought by our growth, the form schema quickly became a constraint from a lack of flexibility in UI declarations.
Almost two years after our original work, it was time to get the team in a room and completely rethink our onboarding systems to attain new features:
- A robust, flexible, and scalable back end to support every client (web, iOS, Android)
- A state machine to easily branch out users from one onboarding engine to another
- Stateless clients that are responsible for their UI
We built our back end in two parts:
- An onboarding state machine (OSM) that allows us to store the requirements for each city and product and derive what a user needs to do during that unique onboarding process
- An improved onboarding API that maps a client to a step
The OSM allows us to easily configure a set of steps for each onboarding process in each country, state, city, or any level of granularity we need, coupled with an event system that allows us to easily switch users from one step to another depending on their actions or input. The onboarding API can then easily query the OSM to know at which step in the process a user is.
For example, here’s an additional state machine for the vehicle solutions step, which enables people without vehicles to apply for new vehicles instead of getting stuck at the vehicle step.
We ended up with a more flexible architecture, the ability to serve more needs, and a system that both scaled and allowed engineers to move faster.
The JSON for the vehicle step would look like this:
We went from Flask to Tornado, an async Python framework, allowing more scalability and speed. The onboarding API uses a lighter version of our initial JSON schema architecture, where only data is passed to the client (not UI definitions). That decision allows us to keep 100% of our business logic in the shared back end while benefiting from the flexibility of each client’s native UI frameworks.
When a potential driver loads a client (Android, iOS, or web), the client makes a request to the onboarding API, which pings the OSM to find the user’s status and consequently sends back all the data required to render the page. When the user submits information, the same thing happens. In the case of success, the API returns the data for the next step. In the case of failure, it returns an error.
This new partner onboarding system is a platform used across the whole company—for example, the China Growth team built a completely different experience on the web. With our onboarding API, teams can focus on what matters to them, the client experience.
Lessons in Hypergrowth
Here’s what we’ve learned through engineering the most robust, scalable driver-partner web onboarding process that Uber has had to date:
Ship early. Do not optimize too soon.
The way we built the initial web onboarding process allowed us to quickly prove our idea, solve the problem at hand, and start getting feedback on how the tool is used. Without doing this first, Uber wouldn’t have been able to grow as fast as it did.
There are no silver bullets.
When we were looking at frameworks to make sure we’d build something to scale as fast as our business, we were sold on Tornado. Python and Tornado support existed at Uber, and because it was asynchronous, it meant that we could reduce our latency for more speed.
That said, writing async applications took some adjustment. To take advantage of Tornado’s event loop, we had to invest more time into optimizing the way we wrote our code—speed didn’t come for free just because we were using Tornado. We also faced hard-to-debug memory leaks that we wouldn’t have faced with Flask.
We still face hard problems that we need to solve.
Despite having built a better solution to allow as many people as possible around the world to help provide safe, reliable transportation for everyone, everywhere, offering a virtual onboarding experience everywhere remains a challenge that is going to take us longer to fully solve.
We have some distance until we achieve our end goal, that driver-partners complete the signup process to the Uber platform in the morning, drive that same day, and then use the money earned the same night. In the meantime, we tweak our code to move toward perfection.