By Ferras Hamad, Isaac Liu, Xian Xing Zhang
Choice is fundamental to the Uber Eats experience. At any given location, there could be thousands of restaurants and even more individual menu items for an eater to choose from. Many factors can influence their choice. For example, the time of day, their cuisine preference, and current mood can all play a role. At Uber Eats, we strive to help eaters find the exact food they want as effortlessly as possible.
We approach this task through search and recommendation technologies, and recent advances in machine learning. From the moment an eater enters a query, we try to understand their intent based on our knowledge of food organized as a graph, and then use a learned representation of eater intent to expand on this query, with the idea of surfacing the most relevant results. A greater part of the Uber Eats discovery process comes from the recommender system we built, designed to be both engaging to eaters and helpful to our restaurant partners. Through frameworks such as multi-objective optimization and multi-armed bandit, we balance the needs of both restaurants and eaters in the Uber Eats marketplace.
In this two-part article series, we will look under the hood of the Uber Eats app and walk through the efforts we take to aid eaters in their decision-making process. This first part of the series focuses on building a query understanding engine for Uber Eats through an in-house food knowledge graph, and the related work done to help understand explicit eater intent through representation learning-based query expansion. The second article in this series describes how we used multi-objective optimization to build a recommendation engine, showing eaters new restaurants based on their order history.
The first question we try to understand when helping our customers discover the perfect meal is: what is the eater looking for? For example, some engineers in our office order bubble tea for a midday pick-me-up. When they open the Uber Eats app, their intent is clear and they know they only want bubble tea. However, when eaters open the app to order lunch or dinner, their intentions may not be as clear. An eater might have a general cuisine type in mind, like Asian or American, but need help deciding whether to go with Chinese or Japanese, sandwiches or barbecue.
Alternatively, an eater might have a certain type of food in mind, but choose something else while browsing the app. For example, an eater might search for udon, but end up ordering soba. In this case, the eater may have been looking for something similar to udon, such as soba and ramen, instead of only being interested in udon. As humans, it might seem obvious; Udon and soba are somewhat similar, Chinese and Japanese are both Asian cuisines. However, machines have a more difficult time understanding these similarities only based on the textual information. In fact, a lot of work goes into training them to make these types of intelligent decisions on the semantic level.
Typically, an eater specifies their intent through text in the form of a search query in the Uber Eats app. Consequently, we use query understanding to figure out eater intent. Although query understanding is a common problem for different types of search engines, it poses unique challenges and additional opportunities when faced with food and restaurants. A restaurant may be categorized for a specific cuisine type, but may also include other types of cuisine on its menu. Individual food items can share enough similarities to make them relevant as results in a search query, although their names may be completely different. And the geographical bounding for the set of potential results creates a limitation, which can lead to no obvious responses to a query.
Building a food knowledge graph
The classic approach of query understanding through text matching with natural language processing (NLP) works if the eater intent is clear and specific. But where the intent is ambiguous, such as the scenarios outlined above, applying classic NLP approaches alone is not sufficient. Of the alternative approaches we can take, most require establishing an intelligent understanding of the entities within the food domain by building a knowledge base. Many companies spend considerable time building up knowledge bases across several domains, with Google’s Knowledge Vault being one of the most well-known.
At Uber, we are building a food-focused knowledge base to enable better understanding of food-related queries.
In the food domain, we deal with heterogeneous entities such as restaurants, cuisines, and menu items. Since these entities have natural relationships, we can model them as a graph. A graph is the most common form used to express complex and intricate relationships between entities in a knowledge base. This graph makes modeling and linking data much more intuitive.
Establishing a knowledge base can be a very challenging process. To effectively leverage data, a knowledge base needs to be in a semi-structured form: generic and flexible enough to easily add more facts, but specific enough to be more than just a blob of data. Achieving this balance requires building an ontology, or language, to describe the graph, including properties of different entities and the relationships between them.
With this ontology, our offline pipelines can transform data consumed from multiple sources to conform with its definitions. Once the ingest component of the offline pipeline transforms the data to fit our ontology, a set of classifiers are then run to de-duplicate the data and make cross-source connections in order to leverage the graph’s abstraction power. For example, this stage involves establishing that a given restaurant on Foursquare, one of our data sources, is the same as the restaurant in our internal account data. Finally, once all these stages are complete, the data is stored in such a way that makes it queryable in real time with low latency.
With an established graph, our next task involves leveraging it to optimize an eater’s search results. Offline, we can extensively annotate restaurants and menu items with very descriptive tags. Online, we can rewrite the eater’s query in real time to optimize the quality of the returned results. We combine both approaches to guarantee high accuracy and low latency.
Both offline tagging and online query rewriting require a semantic understanding of entities in our data and how they relate to one another. To build a rich set of tags and annotations at the restaurant and item level, we need to understand the difference between a cuisine and a dish type and how certain cuisines have associations with countries and sub-geographies. Our graph enables us to better understand how these different dishes and cuisines relate. For instance, if an eater queries for “Asian” cuisine, the graph can infer that “Chinese” and “Japanese” are a subset of “Asian,” and surface the restaurants appropriately.
If it were handling a query for “udon,” the graph would use online query rewriting to expand the search to also include related terms such as “ramen,” “soba,” and “Japanese”; however, if restaurants are not properly tagged in our underlying database, it will be very difficult to ensure total recall of all relevant restaurants. This is where the offline tagging comes into play and surfaces restaurants that sell udon or udon-related dishes to the eater as quickly as possible.
Additionally, we use the graph to solve another common problem of search: the zero result problem. A lack of results commonly occurs when an eater searches for a restaurant that is not on our platform, or any matching restaurants are out of the eater’s delivery radius. For example, “Shanghai Dumpling King” is available for eaters with a delivery address set to San Francisco, but this restaurant would become unavailable to an eater with a delivery address in San Jose.
Instead of simply returning no results, we leverage the cross-domain nature of the graph to surface similar restaurants in the area. Our graph not only maps nodes to cuisines, but also to restaurants in the area. These restaurant nodes are linked to their respective cuisines, which lets us surface other restaurants in the area with similar cuisines as suggested results.
Learning from data
Besides creating a graph to understand eater intent through search queries, as discussed above, an alternative is to leverage eater search behavior data directly. In the following section, we discuss a representation learning-based approach.
A representation learning algorithm learns the features of entities, usually through latent vector space embedding. One good example is the GloVe algorithm, which embeds words into a latent vector space by exploiting the word co-occurrence in a corpus. Importantly, distance between words in the learned latent vector space makes semantic sense. Inspired by GloVe and other related work, we designed the following query2vec model with our eater search behavior data.
To understand how our query2vec model works, we first explain two key concepts adopted from the GloVe paper: word and context. For word, instead of modeling each token of a query as a word, we model the whole query as one single word. And for context, we say two queries appeared in the same context if they lead to an order from the same restaurant. Or, think of each restaurant as the context and each query as a word, and two queries appeared in a restaurant if it leads to an order from that restaurant. Figure 4, below, illustrates this concept:
When leveraging query2vec, queries leading to orders from similar restaurants should be close in the embedded latent space. Note that in Figure 4, we only discussed word and context without specifying a concrete model. This is because, given differing product requirements, we can explore different types of advancements in representation learning with online A/B tests. In the query expansion use case, for example, we found a variant of GloVe to be effective.
More concretely, we first construct a pointwise mutual information (PMI) matrix with the search queries as follows:
Where a and b are two tuning parameters, p(q1,q2) is the joint distribution of query q1 and q2, and
is the marginal distribution of p1. Note that in our case p(q1,q2) can be approximated with the co-occurrence frequency of query q1 and q2 within the same context, as discussed above.
With the PMI matrix, we employ a GloVe-based factorization model to learn the vector representations of all queries. Such query vector representation can be used as features in downstream machine learning models.
One challenge we face at Uber Eats is that, not only do we want to give an eater the best possible results for their query, but we also must contend with the fact that the world of potential results is confined to the restaurants and menu items available in a specific area. Consequently, as a direct match to the query may not exist, we use query expansion to find related matches, which can help our customers find food options they had not considered, but may find satisfying. In the following we use query expansion as an example to illustrate how representation learning is applied to improve eater’s search experience.
Query expansion (QE) is a commonly used technique in search engines to improve recall coverage and search quality in general. QE is especially useful in the following two cases: 1) the eater’s search intent is ambiguous and can’t be fully captured by the query itself; and 2) the quality of results retrieved from the query is limited. For example, when an eater searches for “tan tan noodles,” it’s likely that the eater might also be interested in spicy Chinese food or other Szechuan restaurants; moreover, there might not be many restaurants that sell “tan tan noodles.”
A key component in query expansion involves finding similar queries based on the original query. With the vector representation of queries, we can measure the similarity between two queries in their vector space. However, one challenge still remains: how to efficiently find queries similar to the original query in real time. Nearest neighbor search is a default option, and a few fast/approximate variants are available. For example, clustering the query vectors to reduce the search space achieves the efficiency we want.
In Figure 5, below, we illustrate how representation learning-based query expansion works in practice. An eater searches for “tan tan noodles,” but there is no restaurant selling tan tan noodles near the eater’s delivery address. Instead of returning zero results, query expansion enables us to retrieve and rank results associated with expanded queries, such as “Szechuan,” “Chinese,” and “Spicy food.” On the backend, the original query is expanded to similar queries and relevant restaurants from the expanded queries are first retrieved, and then combined and ranked before being surfaced to the eater.
Achieving the best of both worlds
While representation learning can determine how closely related two terms are, it cannot deduce how they are related. This can be especially problematic when dealing with dietary restrictions, such as Halal. This term may end up expanding to “Middle Eastern,” “Mediterranean,” or certain South Asian cuisines, since many restaurants categorized as “Halal” also happen to fall under one of these other categories. However, the inverse is not necessarily true. Surfacing non-Halal restaurants and dishes for eaters specifically trying to filter for Halal can be very misleading and will result in a poor eater experience.
The above representation learning also requires the presence of eater behavioral data, which becomes an issue when we launch Uber Eats in a new city where we have little to no data. Given the rapid expansion of the platform, we must account for this scenario. Even in cities where we have been operating for over a year, we still see cases of missing data when eaters search for restaurants that are not on the platform. In such cases, these searches will often yield zero results, meaning that there is nothing to feed into our representation learning model.
Given this limitation, both the knowledge graph and representation learning approaches are complementary, and help us effectively rewrite the query. With the increased recall, we can then use learning-to-rank models to blend the expanded results with the original results, which helps us deliver the best from both worlds to Uber Eats customers.
The Uber Eats Discovery team is always striving to build the best app experience possible to help our customers discover delicious meals. In this article, we introduced how we use a knowledge graph and representation learning to understand explicit eater intent. Given the complexity of our platform, there are still many challenges that we are solving for in this space. For example, we are working on further growing our knowledge graph base to refine our semantic understanding of eater intent. In our next article, we will discuss our recommender system, and explain how we deliver more personalized Uber Eats experiences.
To further our vision, we are looking for talented data scientists and engineers to join our team. If you are interested in helping us solve these types of problems, consider applying for a role on the team!
We want to thank the whole Eater Discovery team (engineering, data science, design, product, and product ops) for the hard work that made this article possible.
Subscribe to our newsletter to keep up with the latest innovations from Uber Engineering.