By Oleksii Rudenko December 7, 2014 5:00 PM
Ember.js and the Web Speech API: Example of a Speech Recognition Component

On Friday I’ve stumbled upon a good article about the Web Speech API and I decided that I should play with this API as I’ve never used it before.

The Web Speech API is in the Unofficial Draft status and it’s implemented in Chrome and Safari only. The API consists of two parts: the speech recognition and the speech synthesis.

Speech Recognition (speech-to-text)

Speech recognition service is provided by the SpeechRecognition interface. In Google Chrome, you’d need to use the prefixed version webkitSpeechRecognition. The basic usage of the API is quite trivial:


// creating an instance of SpeechRecognition interface
var recognition = new webkitSpeechRecognition();
recognition.onresult = function(event) {
  // event.results contain the results of recognition
  // each result has the transcript property
  // which is a textual representation of the speech
  if (event.results.length > 0) {
    alert(event.results[0][0].transcript);
  }
}

The methods of the webkitSpeechRecognition interface are:

  • start() - starts the recognition
  • stop() - stops the recognition
  • abort() - aborts the recognition immediately

Three important handlers:

  • onresult - the recognition API passes the results(both interim and final results) to this handler. The value of the isFinal attribute for final results equals true.
  • onerror - is called when an error happens.
  • onend - is called when the recognition ends.

The webkitSpeechRecognition interface has several interesting parameters:

  • lang - string, the language used for recognition. If not set, the language of the html document root will be used.
  • continuous - boolean, it defines how many results are provided. If continuous == false, only one result will be provided. To get more results, the recognition should be started again.
  • interimResults - boolean, it defines whether interim results are returned. Interim results are not final and may not be accurate.

More event handlers and methods are here.

Speech Synthesis (text-to-speech)

Another half of the API provides the text-to-speech feature. It works like this:


speechSynthesis.speak(new SpeechSynthesisUtterance('Javascript for Ninja'));

Ember.js Speech Recognition Component

I think that the speech recognition should be disabled by default because it requires an approval by the user and, thus, if it’s activated in the background it may look suspicious to the user.

But I would like to provide a possibility to use speech recognition, if the user wants to. I see this as a widget with a possibility to activate the speech recognition.

First, I will write a simple app:


App = Ember.Application.create();
App.Router.map(function() {
  // put your routes here
});
App.IndexController = Ember.ObjectController.extend({
  speakBack: '', // a property to hold the recognized speech
  actions: {
    // this will speak the recognized text
    // using another part of the Web Speech API
    // https://dvcs.w3.org/hg/speech-api/raw-file/tip/webspeechapi.
    speak: function() {
      var utterance = new SpeechSynthesisUtterance(this.get('speakBack'));
      utterance.onerror = function() {
        console.log(arguments);
      };
      speechSynthesis.speak(utterance);
    },
    onResult: function(result) {
      alert('Search this: ' + result);
      this.set('speakBack', result);
    }
  }
});

The HTML for the app:


<!DOCTYPE html>
<html>
    <body>
        <script type="text/x-handlebars">
            <h2>Voice Control Component Demo</h2>
            {{outlet}}
        </script>
        <script type="text/x-handlebars" data-template-name="index">
            <div class="container-fluid">
              <div class="row">
                <div class="col-xs-6 col-md-4">
                  {{voice-control onResult="onResult"}}
                </div>
              </div>
              <br>
              {{#if speakBack}}
                <div class="row">
                  <div class="col-xs-6 col-md-4">
                    <button class="btn btn-primary" {{action 'speak'}}>
                        Speak Back
                    </button>
                  </div>
                </div>
              {{/if}}
            </div>
        </script>
    </body>
</html>

Now the implementation of the component to control the Web Speech API:


/**
* VoiceControlComponent uses Web Speech API to recognize speech
* Usage:
*   {{voice-control onResult="onResult"}}
*/
App.VoiceControlComponent = Ember.Component.extend({
  enabled: false, // whether recognition is enabled
  speechRecognition: null, // the instance of webkitSpeechRecognition
  language: 'en', // language to recognise
  startRecognition: function() {
    // prefixed SpeechRecognition object because it only works in Chrome
    var speechRecognition = new webkitSpeechRecognition();
    // not continuous to avoid delays
    speechRecognition.continuous = false;
    // only the final result
    speechRecognition.interimResults = false;
    // the recognition language
    speechRecognition.lang = this.get('language');
    // binding various handlers
    speechRecognition.onresult = Ember.run.bind(this, this.onRecoginitionResult);
    speechRecognition.onerror = Ember.run.bind(this, this.onRecognitionError);
    speechRecognition.onend = Ember.run.bind(this, this.onRecognitionEnd);
    // starting the recognition
    speechRecognition.start();
  },
  onRecognitionEnd: function() {
    this.set('enabled', false);
  },
  onRecognitionError: function() {
    alert('Recognition error');
  },
  /**
  * e is a SpeechRecognitionEvent
  * https://dvcs.w3.org/hg/speech-api/raw-file/tip/webspeechapi.html#speechreco-event
  */
  onRecoginitionResult: function(e) {
    var result = '';
    var resultNo = 0;
    var alternativeNo = 0;
    // we get the first alternative of the first result
    result = e.results[resultNo][alternativeNo].transcript;
    // report the result to the outside
    this.sendAction('onResult', result);
  },
  onEnabledChange: function() {
    if (this.get('enabled')) {
      this.startRecognition();
    }
  }.observes('enabled'),
  actions: {
    toggle: function() {
      this.toggleProperty('enabled');
    }
  }
});

And a simple template for the component:


<!-- VoiceControlComponent's template -->
<script type="text/x-handlebars" data-template-name="components/voice-control">
    <button class="btn btn-primary" {{action 'toggle'}}>
            <i {{bind-attr class=":fa :fa-lg enabled:fa-microphone:fa-microphone-slash"}}></i>
            {{#if enabled}}
                Please speak!
            {{else}}
                Click to enable voice control!
            {{/if}}
    </button>
</script>

The result looks like this:

Voice Control Component Demo

Github

Additional remarks

  • since the web speech API is available in webkit only, it’s a good idea to check whether the user’s browser supports the API
  • when the website is accessed over http, the permission to use the microphone will not be remembered by the browser and the user will be asked each time to allow using the microphone.

Thanks for reading. Please comment and subscribe!