Before we begin to talk about Voice Technology, the UberEats platform, or the $40 Billion market, let’s first dive into Uber, itself. Here is a citation from Uber on how they got started and where they want to go.

It started as a simple idea: What if you could request a ride from your phone? More than 5 billion trips later, we’re working to make transportation safer and more accessible, helping people order food quickly and affordably, reducing congestion in cities by getting more people into fewer cars, and creating opportunities for people to work on their own terms.

These are the big problems that Uber wants to tackle and solve in the upcoming years. One of these problems is helping people order food quickly and affordably.

The product opportunity for Voice lies within helping people order food quickly.

Here is the current user flow for ordering food on UberEats.

The Two Major Pain Points

There are two main problems that users experience along the UberEats food ordering process.

Time being spent in indecisiveness and in inconvenience seem like small and trivial issues on the surface, but they’re really not. In fact, there is a significant amount of friction between what they want to do and doing what they want (relative to what could be accomplished via a voice interface).

Think About It: If it took 10 more seconds for every YouTube video to load, you would get very annoyed and irritated — objectively, its only 10 seconds, but once humans get speed and convenience, its very hard to go back to seemingly ancient options.

How Voice Will Enable People to Make Decisions Faster and Do Things Faster

Here is a series of example voice use cases on the UberEats mobile app.

Why Voice as an Interface is Inevitable

Humans have evolved to take the path of least resistance.

To illustrate how much humans love speed, lets take a look at how history has proven that consumer usage evolve towards the most reliable and frictionless interface

Failure to adjust to this reality could cause UberEats to lose their market share in the upcoming years, as competitors do begin to adjust

$40 Billion Market

Based on a study from OC&C Strategy Consultants, $40 Billion worth of sales in the US will be transacted through voice technology by 2022.

Link to Report

Given the vast platform, availability of resources, and product-market fit of UberEats, they are in an optimal position to invest in a long-term moonshot project that could lead to disproportionate business value — enough to significantly narrow the margin between themselves and GrubHub (leading in food delivery market share), if not overtake them.

August 2017 — March 2018. Click Here for Source

By being a first-mover, UberEats will gain significant market share within the voice food delivery market, that is currently “uncontested”. This will be made possible with the following:

Looking at the current market leaders, UberEats could easily take the #1 spot for food delivery Alexa Skill — and honestly that’s no offence to pizza source or Denny’s

Voice User Flow

The voice user flow for ordering food on UberEats will look like this.

The Nitty Gritty

Let’s dive into how to create this voice experience for the UberEats app.

Draw similar elements from Shazam when constructing the voice ordering screen

Interpretation Logic

A user will say their order, and the item(s) will automatically be added to the ordering cart.

Finally, allergy warnings, expected ordering time, and total price should still be prompted and displayed on the final checkout screen before confirmation of payment.

System Architecture

  1. Leverage iOS and Android Speech-to-Text libraries (iOS Speech and Android SpeechRecognizer) to transform user voice input to text output
  2. Create and design a linguistic interpreter that maps a pattern or sequence of words to the different forms of the ordering food intent, and create API endpoints for these. Contextual information (type of food, restaurant name, etc) should be stored as slots.Check out Amazon Intent Schemas for how they solve this problem.

  3. Pipe the text to the intent interpreter through the API endpoint. The recognized intent will call the existing UberEats API endpoints with the extracted slots as the API parameters (food to order, restaurant to order from) in order to add the item(s) to the order cart.

Defining Success for 6-Months Post Launch

Focus on task success and customer utility for the first 6 months.

Focus on growth (acquisition and engagement) after task success and utility metrics have been hit.

Key Objective

High-level Hypothesis: Users who utilize the voice feature should be able to complete their order faster than a user who does not utilize the feature.

Orders completed via voice should be put into cohort A

Orders completed normally should be put into cohort B

Session length at time of order from cohort A should be at least 10% less than cohort B to meaningfully say that the hypothesis has been proven correct.

We need to incorporating time from the beginning of the app session because UberEats users are opening the app to find a solution. Our hypothesis states that voice will convert users into ordering within a faster time frame.

Make sure orders of cohort B that have initiated a voice search is less than 10%.

Objective #2

High-level Hypothesis: Users who utilize the voice feature should be able to find what they are looking for

Primary focus is not to increase A for now, but need to make sure that this number over 30 days does not dip below 0.1% of unique active users over 30 days.

Objective #3

High-level Hypothesis: The user experience of the voice feature is seamless and leaves the user satisfied

Utilize a Net Promoter Score: After a voice-initiated order has been successfully completed, show a modal that asks the user if they would use voice to order again with a thumbs up and thumbs down option.

Aim to get at least 85% for thumbs up.

Launch Plan for the First 6 Months

Start roll out with a small segment of users in large, technologically-progressive cities.

Minimum 2 weeks spent on each stage

Progress to the next step, under the following conditions:

Conclusion

If you’ve got any questions, feel free to drop them down below!

Drop a 👏 if you’ve made it this far! More coming soon.

Are you a student? Working in tech in your 20s? I create videos over here.