By Cory McDowell
Finishing with a bachelor’s in Computer Science provides college graduates with their first sustained opportunity to leave the ranks of academic learning and try their hand at applying their skills in the broader marketplace.
Before graduating from the University of California, Berkeley last May, I aimed to find a company with interesting problems that offered a new grad the opportunity to make an immediate impact toward solving them. After interning at a large company the previous summer, I saw important problems being solved, but felt my hunger for personal responsibility would not be satisfied at a mature company of that size. I longed for one closer towards its infancy, in need of engineers and rich with opportunity for learning and career development.
After interviewing for a software engineering position on Uber’s Money Team, I knew I’d met my match. Here was a disruptive company innovating in an age-old industry, offering a position on a team building and scaling a payments platform handling millions of transactions every day.
The First Task
Soon after my onboarding I was given an opportunity to work on Uber’s most important project at the time: transforming Uber’s largest PostgreSQL table into a service providing a sharded, schema-less datastore.
Why was this necessary? High trip volume caused Uber’s corresponding table to outgrow the limitations of a single PostgreSQL database. A new design was needed to accommodate more information. This complete redesign of Uber’s trip model (how we define what a trip is) meant refactoring the existing codebase. The effort lasted several weeks and included engineers across multiple teams, including involvement from the company’s hands-on CTO & Head of Engineering, Thuan Pham.
However, a huge undertaking like this didn’t come without challenges or risks. The more services that depend on each other, the greater the risk a change in one could adversely affect another. You can try to account for as many dependencies as possible, but in a sizable codebase there are hundreds (if not thousands) of them to keep track of, which can change on a weekly basis with each additional update.
Keeping that in mind, I made my biggest code change at the company yet. Our new datastore for trips is immutable, so we needed a new way to store changes to a trip’s fare. We decided to move fare adjustment information to a new Postgres table. This involved designing its schema in addition to refactoring the table to read from and write to the new datastore. After two weeks of work, I pushed my change to production on a Friday afternoon. Friday deploys at Uber should only contain critical changes, given that the high traffic sustained over a weekend magnifies any possible bugs.
About an hour after my change had gone live, an on-call engineer had determined that my change resulted in all trip fares displaying $0.00 in the driver app receipt. How did this occur? I correctly modified the behavior of our internal API service, but I didn’t account for the change affecting the behavior of downstream services. Thus, my code created a display bug that would make drivers think they weren’t making any money from the trips they took, right after they completed each trip.
With Uber’s global reach, change spreads quickly. I was mortified to have caused an outage reported by drivers from as far away as Sydney, Australia. Having only been at the company for a few weeks, I was inexperienced with handling problems of that size. How would the whole event play out?
Shortly after the problem was diagnosed, multiple engineers reached out to help. They walked me through coordinating and communicating an emergency deploy to release the fix. After identifying the error through debugging and adding more unit tests, I deployed the fix the next morning. With the receipts properly displaying the right amounts, business returned to a normal state.
Through this event I was able to learn about Uber as an engineering organization. As a recent college graduate starting a software engineering career, my biggest fear is causing a company-wide outage. I did this, but instead of experiencing negative repercussions from my fellow engineers, it was seen as a long term learning experience with a short term cost. After my coworkers helped me out, many of them shared similar experiences of company-wide outages they had caused on their own.
A mishap like this not only demonstrated the significance of the code I was working on, but also exposed me to the true character of our engineering organization. I was able to turn a mistake into a learning opportunity where I grew as an engineer.
I learned about the importance of communicating the effects of a code change at a rapidly changing company — and ultimately of the inevitability that working with complex systems will result in breakages. I’m lucky I had engineers to lend a helping hand, and I’m happy to pay it forward to the next nUber I find in need.