Choice is fundamental to the Uber Eats experience. At any given location, there could be thousands of restaurants and even more individual menu items for an eater to choose from. Many factors can influence their choice. For example, the time of day, their cuisine preference, and current mood can all play a role. At Uber Eats, we strive to help eaters find the exact food they want as effortlessly as possible....
A representation learning algorithm learns the features of entities, usually through latent vector space embedding. One good example is the GloVe algorithm, which embeds words into a latent vector space by exploiting the word co-occurrence in a corpus. Importantly, distance between words in the learned latent vector space makes semantic sense. Inspired by GloVe and other related work, we designed the following query2vec model with our eater search behavior data.
To understand how our query2vec model works, we first explain two key concepts adopted from the GloVe paper: word and context. For word, instead of modeling each token of a query as a word, we model the whole query as one single word. And for context, we say two queries appeared in the same context if they lead to an order from the same restaurant. Or, think of each restaurant as the context and each query as a word, and two queries appeared in a restaurant if it leads to an order from that restaurant. Figure 4, below, illustrates this concept:
Yuyan Wang, Yuanchi Ning, Isaac Liu, and Xian Xing Zhang, Food Discovery with Uber Eats: Recommending for the Marketplace, Uber, September 10, 2018.
Maintaining the Uber Eats marketplace also means paying attention to the performance of our restaurant-partners. The popularity of restaurants is, of course, completely up to the eaters on our platform. However, we do want to give restaurants the opportunity to be seen on our platform.
The relevance ranking model for a plain list of restaurants outlined above relies heavily on historic data. Without special treatment, a new restaurant on our platform may have a low ranking since it has no impressions and no historic orders. To give new restaurants a fair opportunity to rank high and gather exposure, we used the multi-armed bandit (MAB) framework.
In this case, we applied the upper confidence bound (UCB, one of the methodologies of MAB) approach to facilitate the exploration of new restaurants or restaurants with low impressions. We calculate a UCB score for each restaurant based on metrics such as its historic impression number, total click number, and boosting factor. The UCB score, along with other objectives discussed above, decides the ranking order among restaurants. A new restaurant will have a relatively high UCB score initially and hence rank highly, increasing its exposure. As the new restaurant gathers more impressions, the UCB score will smoothly decrease and gradually transfer ranking weight back to other objective scores such as relevance....
If we only optimize for eater conversion, we would probably only surface a few very popular and well-established restaurants to almost all eaters. New or recently onboarded restaurants would not show up on the homescreen for eaters and would get fewer orders, hurting marketplace fairness. On the other hand, if we simply optimize for marketplace fairness, ensuring every restaurant gets equal exposure to every eater, we would probably be recommending some irrelevant restaurants, affecting eater conversation rate.
Multi-objective optimization gives us a mathematically principled solution for the trade-off among competing objectives. More specifically, we developed an optimization framework with quadratic programming. We set constraints on the sacrifice we are willing to make on each objective while optimizing others. Here, we illustrate the optimization methodology with two objectives: total number of orders and gross bookings....
Using Lagrangian duality and KKT conditions, this constrained optimization problem effectively yields the ultimate ranking function for every eater as a combination of different objectives with analytical formula. Moreover, each objective is associated with a coefficient that is a function of the constraint parameters (i.e. in the above example) and can be further tuned.
With tunable coefficients for each objective as free parameters in the multi-objective optimization framework, we then need to efficiently find the optimal weight combination. An intuitive way to accomplish this is through online A/B testing: generate different sets of coefficient values and observe their performance online. However, the exploration space for the coefficients grows exponentially with the number of objectives, and online A/B testing takes too long to iterate and is costly especially given the rapid growth of the business.
In this case, it’s a typical exploitation vs. exploration trade-off, which fits nicely in a MAB framework. We want to exploit the current best coefficients that have the highest payoff so far, while at the same time explore other coefficients that potentially have a higher return. Bayesian optimization with contextual multi-armed bandit can be applied here. Given the current best set of coefficient values, we also have a holdout of production sessions running with the epsilon-greedy method that generates new sets of coefficients for less biased training data, which is then used for subsequent model training and online optimization iterations.
沒有留言:
張貼留言