Back in February I was invited to participate in an pre-beta release of the Amazon Echo SDK. I was under NDA so I couldn’t share any of my finding here. But now that NDA has expired and I can share some of the integrations I did with this interesting device.
First of all I want to comment on the fact that not any of the OS level voice assistants in the market are quite getting it right when it comes to interacting with third party integrations. Let me explain, neither Google Now nor Siri or Amazon Echo will let you interact with a voice “app” unless you “open” or “start” that app first. For example to start an app in the any of the OSes mentioned above I have to do the following:
“[Ok Google], [Hey Siri], or [Alexa] open [name of application]”…”close” or “exit” [name of application]
Then I can start interacting with that application. This interaction paradigm belongs to a desktop model where you are used to open and close programs. And furthermore these actions are not even part of the mobile experience.
My proposal solution to fix this problem would be for the systems to create an “intent” model where a user could decide what to do with certain defined utterances. For example:
“[Ok Google], [Hey Siri], or [Alexa] do I have any new mail?”
In this case, the user should have the option to decide which will be the default application to handle “mail” through settings or through a first program run.
When you install app for the first time the system should ask:
“Would you like to use this app to handle your voice command for mail?”
Voice as the next user interface
Voice recognition and natural language processing (NLP) algorithms have advanced exponentially. These systems are getting truly ready for primetime. The use cases are only limited by our futuristic view of interacting with our systems with just our voice.
This is where the Amazon Echo shines. The idea of picking up my phone and commanding it with my voice, feels unnatural to me. The Amazon Echo just sits there on my desk and is always ready for my commands. One could argue that Google Now and Siri could do the same but the lack of the rich sound presence and visual cues (RGB ring around the top) of the Echo are enough to have a better experience.
Demos
Without further ado, here are two demos of service integration I did with the Echo. I used Temboo libraries for the Facebook, Twitter and Uber integrations. For IMAP mail, iCal, Philips Hue I created my own. All this of course was done in Java.
Office Automation
Internet of Things demo
So would you get an Amazon Echo?
This is awesome. I can’t wait until the API is public.
So, where will the Echo “apps” reside once they start being developed?
Right now I’ve got a server running with the various modules to get Echo to interact with the Hue Lights, Gcal, Evernote, Uber., etc (see Zach Feldman’s awesome repo at alexaho.me.) The result is very close to what you’ve developed (but without the benefit of Alexa being able to respond appropriately.)
Is this still going to be the basic setup – I noticed a call for alexa to open up “control center.”
Great work!
Hi Steve,
To answer your first question. The app is basically a REST service that the Amazon servers contact whenever your “open” your application. You have to host your own code 🙁 Also a bit of a pain since it needs to be SSL enable (for the beta you could submitt a self-signed cert to them).
For your second question. The “control center” was the application name. Once I was inside the application I was able to command my pre-defined utterances for that app. Then when I’m done I can quit or it will time out after a while.
Thanks for asking!
Thanks!
I suspected that’s how it was going to work. Any idea when the API will go public? (Are you allowed to say?)
Best,
Steve
Impressive work, Noel!
I’m still not entirely sold on voice interfaces, though. When I walk into a room I can flip on the lights faster with my hand than I can say “turn on the lights.” Using my hand to flip the switch keeps my mind free to think about other things and lets me do both at once without disturbing others. I also often find it faster and easier to scan information on a screen than to listen while a robot slowly reads each sentence aloud.
That said, I do appreciate hearing spoken information when I’m driving. And I am using voice dictation to write this on my iPad. (My wife and daughter are out shopping so I am only disturbing the cats.)
So for me the goal is multi modal interfaces that let me use my eyes and hands and voice and ears in whatever combination is most convenient at the moment. Something like Echo might be a part of that future as long as it is aware of and plays nicely with other forms of interaction.
I am most curious about what it was really like using Echo on a day-to-day basis.
Did your use of Echo change when other people were in the room with you? Did you refrain from using it when your wife or kids were asleep or trying to work in the next room? Did you still use physical light switches (and if so, when)? Did everyone in your family start using Echo? Did Echo ever annoy you?
Thanks again for a great post!
John
Nice demos Noel 🙂
I’m also experimenting with the Alexa SDK and from what little I’ve done so far, I agree – I think this is a very natural interface for voice; definitely more fluid than fiddling with your phone.
I’ve a couple demo videos of my work so far:
https://jchivers.wistia.com/medias/kxx5ygwhu1
https://jchivers.wistia.com/medias/47nbym1ni8
What are you working on next?!
Since your in the Echo SDK private beta, you should let them know about how you dislike that you have to open up an app. I agree with you, having to open up an app is not the way to go. But I can see why they did this, since it will be hard to tell what app will be opened based on the utterance. You might have apps that have the same or duplicate utterances. Although this could be fixed by the system checking for duplicate utterances and either the echo/amazon or the app developer can have other utterances to recommend when a duplicate is detected for the app to respond to.
@John: We can talk about experiences w the Echo. Noel and I each have had one for a few months. The jokes get a little old 🙂
To your point about light switches, I see that as a single solution, which on its own isn’t valuable. See WeMo or Hue bulbs.
However, when the house is automated and has a single hub for control, services like IFTTT can trigger actions based on conditions, which *should* leave your mind free to do even more things.
That’s the big picture anyway, YMWV.
Good post.
I’d agree with @John about multimodal solutions. Voice is great for fast entry of data into the Cloud, but we will always need other options for security and accessibility reasons as well as broader context of use. For example, a sales rep standing in line in Starbucks might be wiser to enter that killer sales lead into the cloud using a soft keyboard than speaking out loud.
Voice capability is getting a lot better. Google Voice for example can make sense of my Irish name and pronounce it correctly. Flattering and scary at the same time (most people even in Ireland cannot).
Re: Echo – whither the enterprise use cases though? Perhaps integration with supply chain, customer experience, a total immersive iOT Cloud opportunity?
do you see any places in the SDK for the Echo to wake up and act upon receiving a signal or message of any kind?
presently it does not function until the wake word is spoken, with the exception of an alarm/timer.
how about waking up to give me a reminder when my phone comes into a certain proximity? or what if i wanted to flag super important senders in my email and when mail arrives from them let me know, again if I’m nearby.
the proximity thing is something that needs to be solved, obviously. bluetooth, gps, integration with Nest stuff, etc are possibilities.
thanks for your posts, very interesting stuff!
@chad, I did suggest that to the Amazon Echo team during the private trial. I hope it becomes a feature in the future. I even suggested to have the ability to use the color ring as indicator (think answering machine new message light indicator).
It will be amazing to trigger interactions with bluetooth proximity as well.
They keep adding features so I don’t see why not in the future.
Thanks for the article. It saved me the cost of an echo. When I contemplate that much conversation to accomplish such simple tasks, I’m unimpressed. Think about it, I can flip my desk light switch while scanning my email. Further what good was the exchange RE Uber? Were you ready to depart? what if the delay had been beyond your check-in?
On the plus side. If you have a 5th grader, I think these gadgets might actually bring some conversation back to the dinner table. For instance. letting Alexa settle an argument about the capitol of Louisiana might engage a student who would otherwise be unavailable.