Saturday 20 May 2017

Google speech API bad voice recognition with Node js

I am trying to create very simple voice recognition soft with node js. I have hooked up Google search API and can send over correctly recorded .wav file and get back transcription and the recognition is very good (Recorded with Audacity)

BUT I have issues getting voice recognition "on the fly", like send the audio stream directly from mic to Gooegle speech API.

Here is my main method that records voice and sends over to google.

function recognize(encoding, sampleRateHertz, languageCode)
{

  const request = {
    config: {
      encoding: encoding,
      sampleRateHertz: sampleRateHertz,
      languageCode: languageCode
    },
    interimResults: true // If you want interim results, set this to true
  };


  // Imports the Google Cloud client library
  const Speech = require('@google-cloud/speech');

  // Instantiates a client
  const speech = Speech();

  // Create a recognize stream
  const recognizeStream = speech.createRecognizeStream(request)
    .on('error', console.error)
    .on('data', (data) => process.stdout.write(data.results + ', '))


let fs = require('fs');
let Mic = require('node-microphone');
let mic = new Mic({ 'rate': '16000', 'channels': '1', 'debug': true, 'exitOnSilence': 6, 'bitwidth' : '16' });
let micStream = mic.startRecording();

micStream.pipe(recognizeStream);
micStream.pipe(fs.createWriteStream('test.wav') )
setTimeout(() => {
    //logger.info('stopped recording');
    console.log('stopped writing')
    mic.stopRecording();
}, 10000);
mic.on('info', (info) => {
    console.log('INFO ' + info);
});
mic.on('error', (error) => {
    console.log(error);
});
}

And the config data I pass to method

options({
    encoding: {
      alias: 'e',
      default: 'LINEAR16',
      global: true,
      requiresArg: true,
      type: 'string'
    },
    sampleRateHertz: {
      alias: 'r',
      default: 16000,
      global: true,
      requiresArg: true,
      type: 'number'
    },
    languageCode: {
      alias: 'l',
      default: 'en-US',
      global: true,
      requiresArg: true,
      type: 'string'
    }
  })

So I use 'node-microphone' for recording, I have Windows and SOX is installed. Send it over google. I dont get errors, but recognition is VERY bad. I get transcription on very easy words or phrases like "who", "food", "call". Mostly, if I speak normally nothing is returned.

I have a feeling, that something with encodng is wrong, or recording rate(Like, the record is "too fast" and google does not understand), but I dont see my error.

I also added file saving. When I open the file and listen to it, it sounds normal. When I send THIS file for recongition I get almost nothing back. So, there is something wrong with the way audio stream is recorded



via Gerda

No comments:

Post a Comment