☕The Next Gen Virtual Assistant (EP 1)

Ep1: VIRTUAL assistants are the future of computing. The next big thing since mobile. Still, trying to duplicate the App Store paradigm within VIRTUAL assistants is a mistake, here is why, and here is how to fix it.

Virtual assistant

Capitain Picard’s famously known for his “Tea, Earl Grey, Hot” voice command to the Star Trek Computer.

 

I love Virtual Assistants. They communicate like us, they are not bond to a single object, they can live everywhere, are always here to help. Virtual assistants are the most effective and the most natural way of interaction. If anything, Virtual Assistants will be a major component of our future.

Still in 2017, we are still in the Assistant’s infancy age. Their conversations are limited to deterministic dialog trees, and regarding the Turing test… well, we are not there yet. Assistants have great ears, and pretty good voices but really poor brains. A 3 year old’s brain at best.

Having said that, some of the best best (human) brains in the world are currently working on this issue and we should hit the Singularity before 2050 and have fun remembering how basic initial versions of Assistants were in 2017.

But as “the future has to be built to exist“, what should we forge next? What’s the next milestone for Assistants ? My opinion is should focus on a better way of providing voice services and that this will happen with the reinvention of the Skills paradigm.

Today: Atomic Skills 🤖

The agent is the core of the assistant.

An agent is composed of:

  • Personality: Assistant’s name, a voice, personality, and lots of predefined answers.
  • Core Skills: Music, Weather, Q&A, Calculation, Translation, and of course Jokes…
  • Third party Skills: Additional features created by brands and developers

With the core skills, the user experience is pretty straight-forward. Let’s look at some examples with an assistant that for the sake of neutrality, I will call Computer.

  • Computer, set the temperature to 23.
  • Computer, what is the weather?
  • Computer, play me some French Touch Electro.

Things become clunky when it comes to the third party skills which are supposed to represent the majority:

  • Computer, ask Capital One how much I spent on restaurants last month.
  • Computer, ask Uber where my driver is.
  • Computer, ask the Financial Times for the last stock quotes.
  • {Assitant’s Name}, ask {App Name} for {Whatever you are looking for}

So as a user, if I want to make use of third party skills, I have to:

  • Be aware of and remember the invocation name of the third party skills I may need
  • Choose the right one to use among the many options available
  • Formulate my request in a very un-natural way

Furthermore, all of this happens in closed conversation bubbles that are unable to communicate with each other. This completely breaks the “Speak Naturally” argument.

For example, here is how today you can plan a week end with a limited budget:

  • Computer, ask Acme Bank for my account balance.
  • You have 138 € left.
  • ….
  • Computer, how is the weather like in Marseille this week end?
  • The weather will be sunny with a maximum of ….
  • Computer, ask National-Rail for a Paris – Marseille ticket this friday
  • The next train to Marseille is 98 euros , and leaves at 8:00pm from Paris, would you like to book it ?
  • Ok book it
  • Ok booked!

Hum, 3 different skills used to book a train ticket. There is certainly room for improvement there!

The Future: Interconnected microskills

Wouldn’t it be better like this:

  • Computer, what is the weather like in Marseille this weekend?
  • The weather will be sunny with a maximum of ….
  • Ok, I am in! Find me a cheap train ticket for tomorrow morning
  • The next train to Marseille is 98 euros , and leaves at 8:00pm from Paris, would you like to book it?
  • Wait how much left in the bank?
  • You have 138 euros left.
  • Ok book it !
  • All done, and I just sent the ticket to your inbox.

Enabling such human like conversations will lead to more natural and appealing user experiences, which is the ultimate Graal every designer is looking for!

To make these natural conversations happen, we should rethink the way the Assistant and in particular the third party skills work. My suggestions to fix this is a shared context between skills and a deeper integration of skills so they can just disappear in the back end to make great conversations shine!

PS: See you soon for ep2 😉!

 

🛫The Smart Speaker Market is About to get Noisy

China’s Alibaba, Samsung and Facebook are reportedly the latest tech giants to announce their intentions of joining the smart speaker market. The market, originally pioneered by Amazon, is expected to hit 13bn by 2024.

The smart speaker timeline

Let’s take a closer look at some of the major players in the smart speaker market.

 

Smart speaker timeline

Amazon was the first player on the market, introducing Alexa and the Amazon Echo back in November 2014. The company enjoyed a monopoly of the market for quite some time and unveiled the Echo Dot in March 2016. Then, in November 2016, Google unveiled Google Home, its smart speaker to rival Amazon’s. Chinese company LingLong also launched its smart speaker DingDong in November. 2017 saw Amazon reveal the Echo Look in April and more recently the Echo Show in June. Apple announced that its smart speaker, HomePod, would be available at the end of the year and Orange has teamed up with Deutsche Telecom to create Djingo, the first French smart speaker, due for release in 2018! Reports revealed that Samsung was working on a Bixby powered smart speaker, and Facebook and Alibaba are also reportedly planning to join the market.

Amazon vs Google

To seduce the developers, the giants of the Internet are waging a war without mercy. Despite the headstart, Alexa is being caught up by Assistant, which is pretty much iso functional today.

Comparing Google Assistant and Alexa

In terms of geographical coverage, we can see in the map below that Assistant is already in front.

Geographical coverage - google home and amazon

Geographical coverage –  Assistant vs Alexa

However, Amazon remains far ahead in terms of the number of voice applications available on the store: +15000 vs 378.

Voice application skills - google home amazon alexa

Finally in terms of product range, Amazon is in front with a solid base of innovative products like the Echo Show and Echo Look. Interstingly those last two devices are more Voice-First than Voice-Only devices opening new UX opportunities.

Accelerators and brakes

The rise in smart homes, in addition to companies wanting to increase consumer experience and convenience, are among the major factors driving the rise of smart speakers. Indeed, today smart speakers are a lot more that a gadget for amusement and they can do a lot more than order a hawaiian pizza! Amazon Alexa, for example, has official skills from the banking, tourism and connected home sector. Privacy concerns owing to the fact that the devices are connected to the internet and can store voice data, as well as connectivity range and compatibility are all potential brakes to this otherwise fast growing market.

This is just the beginning

Voice is one of our primary and most natural methods of communication. Now, thanks to technological advancements it has become a major interface, transforming how we interact with technology. Touchscreens represented the last major shift in the way humans interact with machines, however, the leap to vocal interactions with machines is far more significant, particularly thanks to all the possibilities with third party applications. With an increasing number of players announcing their intentions of joining the smart speaker race, it is clear that this market isn’t going to slow down anytime soon, and it will indeed be fascinating to watch everything unfold. Smartly AI has been an advocate of vocal technology since 2012, and we are delighted to see its mainstream adoption, which encourages us to work even harder to accompany companies in this voice first revolution.

👀 Are smart speakers putting your privacy at risk?

Voice is becoming a primary interface. In our home appliances, cars, mobile apps… voice is everywhere. We can turn off the lights, order takeout, buy our weekly groceries or listen to our favorite album, all by using one of the most natural interfaces of all: our voice! This is made possible thanks to smart speakers such as Amazon Echo and Google Home! The convenience and fun these devices can bring is boundless. However, smart speakers and privacy is a hot topic at the moment… just how safe is it to sit these unassuming devices on our bedside table or in our living room, listening to our every word?

What are smart speakers?

Voice recognition technology, like Apple’s Siri, has been around for a while. However, smart speakers such as Amazon’s Echo and Google’s Google Home are game changers. These speakers want to be your virtual assistant and transform the way you interact with your home, other devices, even your favorite brands. Based on voice activated artificial intelligence, smart speakers can be connected to third party Internet of things devices, such as your thermostat or car doors, enabling you to order and control things using your voice! Smart speakers are equipped with a web connected microphone, that is constantly listening for their trigger word. When a user activates a smart speaker to make a request, the device sends a record or stream audio clip of the command to a server where the request is processed a response is formulated. The audio clips are stored remotely and with both Amazon and Google’s devices you can review and delete them online. However, it is not clear whether the data stays on servers after being deleted from the account. Furthermore, at the moment devices only record requests, however, as they advance and are we are able to do more with them, such as dictate emails to be sent, where will this data be stored ?

Your voice is only cloud processed if you say a specific trigger word

Your privacy at risk?

So, can hackers exploit the backdoor coding of these devices and listen to what you’re saying? Well, nothing is impossible, but both Google and Amazon have taken the necessary precautions to stop wiretapping. Furthermore, the audio file that is sent to their data centers is encrypted, meaning that even your network was compromised, it is unlikely that smart speakers can be used as listening devices. Someone getting hold of you Amazon or Google password and seeing your interactions is the biggest risk, so make sure you use a strong password, you could even consider 2 factor security!

What can you do?

If the thought of the smart speaker being about to listen in at any moment makes you uneasy, you can put it on mute manually or change your account settings to make your device even more secure, such as password protecting purchase options available with the speaker or making the device play an audible tone when it is active and recording. You can also log onto your Amazon or Google account and delete your voice history (either individually or in bulk. To do this for your Google device, head over to myactivity.google.com, click the three vertical dots in the “My Activity” bar, and hit “Delete activity by” in the drop-down menu. Click the “All Products” drop-down menu, choose “Voice & Audio,” and click delete. For Amazon’s speaker, go to  amazon.com/myx, click the “Your Devices” tab, select your Alexa device, and click “Manage voice recordings.” A pop-up message will appear, and all you need to do it click “Delete”. However, please note that deleting your history on your smart speaker may affect the personalisation of your experience. Check out this handy screen cast for further instructions on deleting your Amazon Alexa account history.

Developers could also use privacy by design assistants, such as Snips. However, use may be limited due to these kinds of assistants having no internet connection.

The privacy / convenience tradeoff

At the rate the smart speaker and IoT industries are evolving it looks like they are going to become more and more present in our daily lives, therefore, it is essential to understand how they work and and what you can do to prevent them from breaching your privacy. In conclusion, yes, theoretically smart speakers could pose a threat to privacy. However, they are not terribly intrusive, as they are only recording when awoken by a trigger word, and the likelihood of them picking up on a conversation they aren’t supposed to, and then someone intercepting it is very slight. Google, Amazon and other sites have been logging our web activity for years, now it is starting to happen with voice snippets. In the pursuit of convenience privacy is sometimes sacrificed, and in this particular trade off, convenience comes out on top for us!

🔔Toward a fully Context-aware Conversational Agent

I was recently asked by my friend Bret Kinsella from voicebot.ai for my predictions on AI and Voice. You can find my 50 cents in the post 2017 Predictions From Voice-first Industry Leaders.

In this contribution, I mentioned the concept of speech metadata that I want to detail with you here.

As Voice App developper, when you have to deal with voice inputs coming from an Amazon Echo or a Google Home, the best you can get today is the transcription of the text pronounced by the user.

While It’s cool to finally have access to efficient speech to text engines, It’s a bit sad that in the process, so much valuable information is lost!

The reality of a conversational input is much more than just a sequence of words, It’s also about:

  • the people — is it John or Emma speaking?
  • the emotions — is Emma happy ? angry ? excited ? tired ? laughing ?
  • the environment — is she walking on a beach or stuck in a traffic jam?
  • local sounds — a door slam? a fire alarm? some birds tweeting ?.

Imagine now the possibilities, the intelligence of the conversations if we could have access to all this information: Huge!

But even we could go further.

It’s a known fact in communication that while interacting with someone, non-verbal communication is as important as verbal communication.

So why are we sticking to the verbal side of the conversation while interacting with Voice Apps ?

Speech metadata is all about the non verbal information, wich is in my opinion the immerged part of the iceberg and thus the more interesting to explore!

A good example of speech metadata is the combination of vision and voice processing in the movie Her.

With the addition of the camera, new conversations can happens, such as discussing the beauty of a sunset, the origin of an artwork or the composition of a chocolate bar!

Asteria is one of the many startups starting to offer this kind of rich interactions.

I think this is the way to go and that there would be a tremendous amount of innovative apps that will be unleashed by the availablily of the conversational metadata.

In particular, I hope from Amazon, Google & Microsoft to release some of this data in 2017 so we the developers can work on a fully context aware conversational agent.

🔊Introducing Audicons™

The way we are interacting with our digital world
will be completely changed with the rise of voice assistants such as Alexa or Assistant.

We created Smartly.AI to make this transition easier for developers while pushing the horizons of Conversational AI.

The Problem
Currently, if you want to build a rich message for your bot, you can use a language called SSML to mix Voice Synthesis and Audio Sounds.
With SSML you can do pretty amazing things ( change the pitch and tone of the voice, add silences, …). You can check a documentation on how Alexa’s SSML . But, the issue here is that SSML has also a tricky syntax that makes it quite hard to master for a new developer.
As an illustration, let’s see what I have to do to build an answer to this question with SSML:

“Alexa, ask PlaneWatcher: Where is the plane DC-132?”

<speak>
    <audio src="https://server.com/audio/plane.mp3"/> 
    <s>Welcome to Plane Watcher, 
    <audio src="https://server.com/audio/sad.mp3" /> 
    <s>The plane DC-132 is currently being delayed of 30 minutes!
</speak>

Wait another XML like grammar to deal with… 🤔
Come on, this has to be fixed!

Our solution
As we overuse emoticons in our Slack channel, we couldn’t resist to try to transpose this awesome language to the voice world!
After a few experiments, we are happy to present you our latest creation:
the Audicons !

✈ Welcome to Plane Watcher ☹ The place DC-132 is currently being delayed of 30 minutes!

Audicons  is a set of standardized audio-files that can be easily recognized and associated to specific meanings. Audicons will be soon open sourced so you can reuse them in your ownprojects, Stay tuned 😀
In most cases, we think Audicons can replace SSML.
Audicons have the potential to evolve to a standardized audioset used in ALL the voice interfaces.

A short examples you may want to create for weather forecasts

😃 Tomorrow is gonna be sunny ☀🕶!
😩Tomorrow is gonna be rainy ☔⛈!

Hear our first Audicons in the demo below.

Cool isn’it ? 😃
Wich ones do you prefer?

You can already use Audicons in your Alexa skill if you build it with Smartly.AI but we plan to open source them soon along with our SSML generator.

Now it’s up to you to make your beloved Alexa more expressive  !

💪🏻 Are you ready to hire an AI?

Let’s face it – the super-intelligent AI takeover that many are fearing is not for today.

We may all lose against Watson at Jeopardy, and AlphaGo is the champion when it comes to Go but… those cool marketing campaigns are far from the holy grail of the so-called General AI.

According to the most authoritative voices in the space,
the singularity may probably occur at some point in the 2040s.
Until then, I can’t imagine having a smart, meaningful and pleasant conversation with Siri, Alexa or Cortana for more than 5 minutes.
When it comes to open conversations, there is no match to humans.

 Human > general AI

Things get less contrasted when we narrow the conversation to a specific topic. A specialized AI can be much better at managing user requests because it has been designed for a unique purpose.
The Turing is easier to achieve for those AIs.
To illustrate this, you can try Amy, the virtual assistant created by x.ai to perform a single task: scheduling meetings for you. She does so by email: demo here.
Amy is so good at doing this that most people think she is a real assistant.
When it comes to narrowed conversations, AI has the advantage of dealing with big data volumes while humans are more accurate. 1-1 here.

Human ~  specialized AI

How is Amy doing such a great job?
Well, as explained here, a key part of the process relies on a Supervised Machine Learning. AI trainers teach Amy how humans express time, locations, contact names… Amy then uses this knowledge to work better.
It’s a virtuous circle. 🙂

Facebook M is relying even more on humans to teach the AI how to complete tasks. “M can purchase items, get gifts delivered to your loved ones, book restaurants, make travel arrangements, appointments and way more.”

In a recent project at Smartly.ai, we tried this “hybrid AI” approach.
The results were stunning – the AI was able to manage 80% of the requests!
While the AI was successfully dealing with the simple questions,
the operator was enjoying more time to engage in a qualitative way with the customers having complex requests.
Our AI excelled at narrowed and repetitive requests, humans excelled at complex and particular ones.

The magic of hybrid AI is that it just works and scale at the same time!
Peter Thiel’s Palantir is another powerful demonstration of the power of hybrid AI in solving big challenges of today’s world : fraud and terrorism.

So, basically:

Human > Human  + specialized AI

At Smartly.ai ,
We are committed to empowering humans with AI assistants.
We have got awesome demos,
book yours now by dropping us an email! 😉

 

 

 

👍 Congratulations for your chatbot Mr President !

Yesterday,
the White House released a brand new chatbot. 🙂

Why?

One remarkable habit of @Potus has been reading 10 letters a day since he was elected.
This allows him to get the pace of the nation from the inside.
I did my math and if that comes to be true it’s a quite impressive number of letters:

As of today, Obama has been President for 7 years and 204 days.
(7*365 + 204) * 10 = 27,590… That’s a lot of letters!

But wait, who is writing letters anymore?
Are those letters representative of generation X, Y, Z?
They are probably more used to emails, SMS or Facebook Messenger.

Capture
So, according to the White House, 2016 is Messenger year!
With 60B daily messages and 900M users worldwide, it’s probably a safe bet.

How?

Now, let’s see how the bot experience is delivered.

Capture

The experience is focused on getting your message to Obama, getting your name and email address… and that’s it until Mr. President decides to answer you. 🙂

The purpose is simple, and the edge cases are well managed.
At the end, you get an emoji and a cool video.

Still we may regret that the bot isn’t showing any kind of intelligence.
In fact, you may have sent your message to the President 10 times faster using the contact form…

It may have been funnier if the bot had been an automated version of Obama! Some gamification around his job, or even some interactive poll on his next actions, travels, outfits…

You can try this bot by yourself here.
The operation is also described on the White House website, here.

We hope to see more bots used by political figures but they should be aware that a poorly designed bot will inevitably flop.

And if President Hollande needs a bot,
we’ll be happy to build one for him. 😉

 

 

🍏 Testing out the Siri SDK

 

Finally, it is live!
Last week at the WWDC, Apple released the new Siri SDK.

At VocalApps, we were dying to try it out and find out the pros and cons of this new Apple feature.

The following video demonstrates how we successfully created an iOS app that can be launched directly within Siri.

[embedyt] http://www.youtube.com/watch?v=MJUoWJIfUAM[/embedyt]

Although this first version is quite limited to a predefined list of usages, it still allows developers to create some interesting use cases:

  • starting audio or video calls in your VoIP app
  • sending or searching text messages in your messaging app
  • searching pictures and displaying slideshows in your photo app
  • sending or requesting a payment in your payment app
  • starting, pausing, ending a workout in your sports app
  • listing and booking rides in your ride-sharing app
  • controlling your car music if your app is CarPlay-compatible

ios-10-siri-sirikit-third-party-apps-100666361-large[1]

The SDK was designed so that Siri will listen to the user, try to understand what he means, and if all goes well, transfer the user’s request to your app.
Then you can engage a conversation, display some custom data and process the request with your own web services.

This is really nice since it will support in an out-of-the-box way all Siri languages and all the stuff Siri knows about you (where you are, who your sister is, your name…).

For instance, if you want to send money to Sarah using your VocalApps app, you just have to tell Siri:

“Hey Siri, send $10 to Sarah using VocalApps.”

Siri understands you want to send money, that the amount is $10, that the recipient bears the name “Sarah” and that the app you want to use is “VocalApps.” So it calls a sendPayment method in your app with all these arguments.

Currently, Siri is included in iPhone 4S, iPhone 5, iPhone 5C, iPhone 5S, iPhone 6, iPhone 6 Plus, iPhone 6s, iPhone 6s Plus, iPhone SE, 5th generation iPod Touch, 6th generation iPod Touch, 3rd generation iPad, 4th generation iPad, iPad Air, iPad Air 2, all iPad Minis, iPad Pro, Apple Watch, and Apple TV.

It’s gigantic, it’s the future and it’s only the beginning.

Do you have a mobile app that you would like to connect to Siri?
If so, we can definitively help you, just start chatting with us 🙂 !

 

 

 

🎵 Alexa Skill update – Blind Test

Hey Alexa fans!

Introducing the Blind Test
Blind Test is a fun game that you can play to test your musical knowledge.
The concept is simple. Listen to the song extract and find the name of the artist!

fans-8x3[1]

So, what’s in this updated version? Well…

New songs 
Hundreds of new tracks to discover!
How many will you recognize?

New features 
Blind Test calculates your score so you can challenge your friends. 🙂

So if you don’t have it yet on your Alexa,
grab it now and enjoy the music!

📈 A new monitoring tool for Alexa Skills!

Hi there,

Once we published Music Quiz, our first Alexa skill, we quickly wanted to see how it was performing. We quickly discovered that we had to put in place a logging system and then that navigating through all the data generated by an Alexa skill was a nightmare.

To get more transparency and true actionable insights,
we decided to build a tool that would allow us to:

know exactly what’s going on between our skills and our users,
⇒ find and fix bugs and
⇒ enhance the user experience 

After weeks of work, here is the dashboard we have finally built:

 

The Logs section,  which allows you to search for specific sessions.

Capture
User logs

We are also bringing out specialized analytics for conversational apps.
You can see it as “Google Analytics” for Alexa.

Capture

Capture


Awesome, but… Can I use it for my skills?
Sure! All you have to do is log to Alexa Designer and install a small tracker code in your lambda function.

Cheers
The Vocal Apps Team 

PS: If you have privacy concerns, contact us so you can have everything installed in your server.