Lau Skorstengaard is a Ph.D. student at Aarhus University who pursued a 2018 internship with Uber Engineering’s Aarhus, Denmark office. In this article, Lau discusses his path to Uber and the technical challenges faced while building his internship project as a member of our Infrastructure team.
At some point, every computer science student stands at a crossroads, having to decide whether to pursue a career in industry or academia. At the end of my master’s program, I found myself at this crossroads, deciding to blaze through and pursue a Ph.D. at Aarhus University.
In early 2018, the end of my Ph.D. studies was on the horizon, and I once again faced the same question: industry or academia? This time I stopped to consider my options. I had spent seven and a half years studying at my university without setting foot outside academia, so all I really knew about working in industry was based on stories from my friends and colleagues. I decided that I needed to create my own experiences, so I chose to pursue a software engineering internship.
I applied at a number of places, but one of the possible internship projects at Uber Aarhus caught my interest: the infrastructure team needed help building additional features for YQL, the graph query language for Grail, Uber’s in-house infrastructure state aggregation platform. I had invested a substantial part of my studies in programming languages, and it seemed like the perfect opportunity to put my skills to work on a language being used in a production environment.
Internship project: aggregation support in YQL
YQL is the query language used to retrieve data from Uber’s infrastructure state aggregation platform, Grail. Basically, Grail is a huge cache that stores the latest information about Uber’s infrastructure, including host hardware information and disk usage for each instance of Uber’s storage solutions. Grail is leveraged by other Uber infrastructure services for, among other uses, issue identification and automatic remediation of them. Grail started as an internal component used for issue identification in an internal storage solution management system, but due to its general usefulness, it became its own service so it could benefit other systems.
The internal Grail user base grows quickly as it is frequently adopted for both new and existing services. However, the success of Grail hinges on how easy it is to use, in part meaning whether it is easy to fetch data from. This is where YQL comes into the picture.
YQL is Uber’s homegrown graph query language that acts as the user interface to Grail. The usefulness of Grail is limited by whether YQL can express queries that fetch the requested information. For instance, say you want to know the amount of disk space used across all of Uber’s storage solutions. Grail contains disk usage information for all disks on all hosts, so the necessary information is there, and YQL allows you to fetch the data for each of the disks.
However, suppose the information you wanted was not the disk usage for each of the disks but the total disk usage. In other words, you wanted an aggregation of the information. In this instance, the client could aggregate all of the relevant data, but with data from a host fleet the size of Uber’s, it takes a significant amount of time just to transfer the data from Grail to the client, an issue preventing other infrastructure components onboarding to Grail. This was the motivation for my internship project: building aggregation support in YQL.
Extending YQL with aggregations required three steps: design, implementation, and evaluation.
In the first step, the language extension design, I had to figure out what the syntax and semantics of the extension should be. Without a doubt, this was the most important and difficult step in the process because the design shapes the entire language feature. If you get the design wrong, implement it, and find out you want to change it, then you have to go through the tedious task of migrating users away from the feature before you can remove it. In other words, it is better to spend some time getting the design right in the first place.
The design process is difficult simply because it is easy to get the design wrong. In the case of YQL aggregations, I first needed to figure out what kinds of aggregations users needed. As it turned out, most of the Grail users I talked to hadn’t even considered whether they needed aggregation support. I did, however, find a few use cases. One of them was a UI that displays frequencies of storage solution issues. The frequencies were calculated client side, resulting in a long loading time (in the ballpark of 30 seconds). This is actually a good example of what happens when developers meet technical limitations. They simply work around them to accomplish what they need, which may result in technical debt.
Having identified a few use cases, I had something to base the design on. The next step was to design the syntax and semantics of the extension. This design had to meet three requirements: 1) it should be able to solve the identified use cases, 2) it should be consistent with existing YQL, and 3) it should be feasible to implement in the existing YQL query evaluation engine. I discussed possible syntax and semantics with the other Grail team members, and in the end, I had three design suggestions for my RFC. I shared the RFC with Grail users so they could provide their feedback before we settled on a design. In the end, we decided to build the aggregations implemented as functions, as the feature was purely an extension of existing YQL and it seemed to satisfy all three design requirements.
Implementation and evaluation
The next step was to actually implement the aggregations in YQL. This step went very smoothly. I had spent a fair amount of time on the design step, so I knew exactly what to implement. Further, when I started to second guess the design, the design document reminded me of the answer to my concerns.
After the YQL aggregation extension was implemented and put into production, the third and final step was to evaluate it against our use cases. It turned out that the aggregation extension did not quite solve the use cases. The aggregations themselves worked as they should, but we couldn’t specify them in the places we needed. Loosely speaking, the issue was that we couldn’t aggregate the final query result, only subparts of the query. The aggregation extension was a step on the way, but it was clear that it did not suffice on its own. Having identified a new limitation in YQL that we needed to remove, the entire process started over again with design, implementation, and evaluation.
In the end, YQL was also extended with transformations that not only allowed aggregations to be performed on the entire query result as needed but also more general transformations of the output result. With both transformations and aggregations implemented, it was finally possible to solve the use cases we had initially identified.
Intern experience takeaways
Looking back at my internship, I wish I had spent a summer or two interning at a company like Uber during my undergraduate studies. I had a lot of fun, and I got to learn what it is like to be an engineer, building new technologies and deploying systems into production outside the insularity of a university. Having only done this one internship, I have no other experience to compare with, but I want to highlight the following benefits of my experience about interning at Uber Engineering Aarhus:
Being part of a team
On the Grail team, I did not feel like the intern—I just felt like another member of the team. I wasn’t just placed in a corner with my internship project. I was assigned tasks in almost all parts of the system (I got to feel the rush of finding the cause of a huge memory leak that caused periodical out-of-memory kills), but my manager and tech lead also made sure that I had enough time to finish the internship project. I was invited to participate in all team design discussions and planning meetings, and my opinion was taken into consideration when I voiced it—even for things that I would not be around to see through.
Working on a real language
It was exciting to work on a query language with actual users. At times, it was also frustrating that I couldn’t just make any change I liked to the language, as it might break a service. This meant that I couldn’t straighten out some of the quirks of YQL, which just goes to show the importance of thinking through extensions to a language like YQL to minimize the number of quirks in the first place. To some degree, it is also different from academia, where there is more freedom to play around with things because there are no dependents. However, overall it was fun to get to discuss and implement the direction of a real language, and in the end, I felt that I left YQL in a better state than when I started working on it.
Having an immediate and lasting impact
The changes I made to YQL had an immediate impact on other engineers at Uber Engineering Aarhus, and they were not shy about showing their appreciation and suggestions for other possible YQL improvements. This positive feedback loop was very motivating and encouraged me to make YQL better, even though I had to return to university the following year.
At Uber Engineering Aarhus, I saw firsthand that industry has exciting and challenging work to offer, and I feel a lot better prepared to make a decision about my future career path.
If tackling engineering challenges for large-scale infrastructure systems like Grail interests you, consider applying for a role on our team or as an intern our Aarhus Engineering office!