7.8 KiB
title | author | tags | |
---|---|---|---|
Nuance Brings Pebble The Freedom Of Speech | jonb |
|
In October, we gave developers access to the microphone in the Pebble Time via our new Dictation API, and almost instantly we began seeing awesome projects utilising speech input. Voice recognition is an exciting new method of interaction for Pebble and it has created an opportunity for developers to enhance their existing applications, or create highly engaging new applications designed around the spoken word.
Speech-to-text has been integrated with Pebble by using the Recognizer cloud service from Nuance, a leading provider of voice and language solutions.
The Pebble Dictation Process
The Dictation API has been made incredibly easy for developers to integrate into their watchapps. It’s also intuitive and simple for users to interact with. Here’s an overview of the the dictation process:
-
The user begins by pressing a designated button or by triggering an event within the watchapp to indicate they want to start dictating. The watchapp’s UI should make this obvious.
-
The watchapp initiates a dictation session, assigning a callback function to handle the response from the system. This response will be either successful and return the dictated string, or fail with an error code.
-
The system Dictation UI appears and guides the user through recording their voice. The stages in the image below illustrate:
a. The system prepares its buffers and checks connectivity with the cloud service.
b. The system begins listening for speech and automatically stops listening when the user finishes talking.
c. The audio is compressed and sent to the cloud service via the mobile application.
d. The audio is transcribed by the cloud service and the transcribed text is returned and displayed for the user to accept or reject (this behaviour can be programmatically overridden).
-
Once the process has completed, the registered callback method is fired and the watchapp can deal with the response.
But How Does It Actually Work?
Let’s take a closer look at what’s happening behind the scenes to see what’s really going on.
-
To capture audio, Pebble Time (including Time Steel and Time Round) has a single MEMS microphone. This device produces output at 1 MHz in a PDM format.
-
This 1 bit PDM signal needs to be converted into 16-bit PCM data at 16 kHz before it can be compressed.
-
Compression is performed using the Speex encoder, which was specifically designed for speech compression. Compression needs to occur in order to reduce the overall size of the data before it’s transferred via bluetooth to the mobile application. Speex also has some additional advantages like tuneable quality/compression and recovery from dropped frames.
-
The mobile application sends the compressed data to Nuance Recognizer, along with some additional information like the user’s selected language.
-
Nuance performs its magic and returns the textual representation of the spoken phrase to the mobile application, which is then automatically passed back to the watchapp.
-
The Dictation UI presents the transcribed text back to the user where they can choose to accept or reject it.
About the Dictation API
Behind the scenes there’s a lot going on, but let’s take a look at how minimal the code needs to be in order to use the API.
-
Create a static variable as a reference to the dictation session:
static DictationSession *s_dictation_session;
-
Create a callback function to receive the dictation response:
static void dictation_session_callback(DictationSession *session, DictationSessionStatus status, char *transcription, void *context) { if(status == DictationSessionStatusSuccess) { APP_LOG(APP_LOG_LEVEL_DEBUG, "Transcription:\n\n%s", transcription); } else { APP_LOG(APP_LOG_LEVEL_DEBUG, "Transcription failed.\n\nError ID:\n%d", (int)status); } }
-
Within a button click or other event, create a
DictationSession
to begin the process:s_dictation_session = dictation_session_create(512, dictation_session_callback, NULL);
-
Before your app exits, don’t forget to destroy the session:
dictation_session_destroy(s_dictation_session);
Voice-enabled Watchapps
Here’s a small sample of some of the watchapps which are already available in the Pebble appstore which utilise the Dictation API.
-
Voice2Timeline is a handy tool for quickly creating pins on your timeline by using your voice. It already works in 6 different languages. You can leave notes in the past, or even create reminders for the future (e.g. “Don’t forget the milk in 1 hour”).
-
Translate (Vox Populi) allows a user to translate short phrases and words into a different language. It uses the Yandex machine translator API which supports more than 60 different languages.
-
Checklist is a really simple tool which generates a list of items using your voice. It even allows you to enter multiple items at once, by specifying a comma or period. You can easily mark them as completed by pressing the ‘select’ button on each item.
-
Smartwatch Pro (iOS) has been updated to give users voice controlled music playback, create reminders and even create tweets by using their voice.
Final Thoughts
Why not also checkout this video of Andrew Stapleton (Embedded Developer) as he deep dives into the internals of the Pebble Dictation API during his presentation at the Pebble Developer Retreat 2015.
We hope you’ve seen how flexible and easy it is to use the new Dictation API, and perhaps it will inspire you to integrate voice into your own application or watchface - if you create a voice enabled watchapp, let us know by tweeting @pebbledev.
If you’re looking to find out more about voice integration, checkout our developer guide, API documentation and our simple example app. We also have a very friendly and helpful Pebble community on Discord; why not [join us]({{ site.links.discord_invite }})?