☕The Next Gen Virtual Assistant (EP 1)

Ep1: VIRTUAL assistants are the future of computing. The next big thing since mobile. Still, trying to duplicate the App Store paradigm within VIRTUAL assistants is a mistake, here is why, and here is how to fix it.

Virtual assistant

Capitain Picard’s famously known for his “Tea, Earl Grey, Hot” voice command to the Star Trek Computer.

 

I love Virtual Assistants. They communicate like us, they are not bond to a single object, they can live everywhere, are always here to help. Virtual assistants are the most effective and the most natural way of interaction. If anything, Virtual Assistants will be a major component of our future.

Still in 2017, we are still in the Assistant’s infancy age. Their conversations are limited to deterministic dialog trees, and regarding the Turing test… well, we are not there yet. Assistants have great ears, and pretty good voices but really poor brains. A 3 year old’s brain at best.

Having said that, some of the best best (human) brains in the world are currently working on this issue and we should hit the Singularity before 2050 and have fun remembering how basic initial versions of Assistants were in 2017.

But as “the future has to be built to exist“, what should we forge next? What’s the next milestone for Assistants ? My opinion is should focus on a better way of providing voice services and that this will happen with the reinvention of the Skills paradigm.

Today: Atomic Skills 🤖

The agent is the core of the assistant.

An agent is composed of:

  • Personality: Assistant’s name, a voice, personality, and lots of predefined answers.
  • Core Skills: Music, Weather, Q&A, Calculation, Translation, and of course Jokes…
  • Third party Skills: Additional features created by brands and developers

With the core skills, the user experience is pretty straight-forward. Let’s look at some examples with an assistant that for the sake of neutrality, I will call Computer.

  • Computer, set the temperature to 23.
  • Computer, what is the weather?
  • Computer, play me some French Touch Electro.

Things become clunky when it comes to the third party skills which are supposed to represent the majority:

  • Computer, ask Capital One how much I spent on restaurants last month.
  • Computer, ask Uber where my driver is.
  • Computer, ask the Financial Times for the last stock quotes.
  • {Assitant’s Name}, ask {App Name} for {Whatever you are looking for}

So as a user, if I want to make use of third party skills, I have to:

  • Be aware of and remember the invocation name of the third party skills I may need
  • Choose the right one to use among the many options available
  • Formulate my request in a very un-natural way

Furthermore, all of this happens in closed conversation bubbles that are unable to communicate with each other. This completely breaks the “Speak Naturally” argument.

For example, here is how today you can plan a week end with a limited budget:

  • Computer, ask Acme Bank for my account balance.
  • You have 138 € left.
  • ….
  • Computer, how is the weather like in Marseille this week end?
  • The weather will be sunny with a maximum of ….
  • Computer, ask National-Rail for a Paris – Marseille ticket this friday
  • The next train to Marseille is 98 euros , and leaves at 8:00pm from Paris, would you like to book it ?
  • Ok book it
  • Ok booked!

Hum, 3 different skills used to book a train ticket. There is certainly room for improvement there!

The Future: Interconnected microskills

Wouldn’t it be better like this:

  • Computer, what is the weather like in Marseille this weekend?
  • The weather will be sunny with a maximum of ….
  • Ok, I am in! Find me a cheap train ticket for tomorrow morning
  • The next train to Marseille is 98 euros , and leaves at 8:00pm from Paris, would you like to book it?
  • Wait how much left in the bank?
  • You have 138 euros left.
  • Ok book it !
  • All done, and I just sent the ticket to your inbox.

Enabling such human like conversations will lead to more natural and appealing user experiences, which is the ultimate Graal every designer is looking for!

To make these natural conversations happen, we should rethink the way the Assistant and in particular the third party skills work. My suggestions to fix this is a shared context between skills and a deeper integration of skills so they can just disappear in the back end to make great conversations shine!

PS: See you soon for ep2 😉!