I am trying to create very simple voice recognition soft with node js. I have hooked up Google search API and can send over correctly recorded .wav file and get back transcription and the recognition is very good (Recorded with Audacity)
BUT I have issues getting voice recognition "on the fly", like send the audio stream directly from mic to Gooegle speech API.
Here is my main method that records voice and sends over to google.
function recognize(encoding, sampleRateHertz, languageCode)
{
const request = {
config: {
encoding: encoding,
sampleRateHertz: sampleRateHertz,
languageCode: languageCode
},
interimResults: true // If you want interim results, set this to true
};
// Imports the Google Cloud client library
const Speech = require('@google-cloud/speech');
// Instantiates a client
const speech = Speech();
// Create a recognize stream
const recognizeStream = speech.createRecognizeStream(request)
.on('error', console.error)
.on('data', (data) => process.stdout.write(data.results + ', '))
let fs = require('fs');
let Mic = require('node-microphone');
let mic = new Mic({ 'rate': '16000', 'channels': '1', 'debug': true, 'exitOnSilence': 6, 'bitwidth' : '16' });
let micStream = mic.startRecording();
micStream.pipe(recognizeStream);
micStream.pipe(fs.createWriteStream('test.wav') )
setTimeout(() => {
//logger.info('stopped recording');
console.log('stopped writing')
mic.stopRecording();
}, 10000);
mic.on('info', (info) => {
console.log('INFO ' + info);
});
mic.on('error', (error) => {
console.log(error);
});
}
And the config data I pass to method
options({
encoding: {
alias: 'e',
default: 'LINEAR16',
global: true,
requiresArg: true,
type: 'string'
},
sampleRateHertz: {
alias: 'r',
default: 16000,
global: true,
requiresArg: true,
type: 'number'
},
languageCode: {
alias: 'l',
default: 'en-US',
global: true,
requiresArg: true,
type: 'string'
}
})
So I use 'node-microphone' for recording, I have Windows and SOX is installed. Send it over google. I dont get errors, but recognition is VERY bad. I get transcription on very easy words or phrases like "who", "food", "call". Mostly, if I speak normally nothing is returned.
I have a feeling, that something with encodng is wrong, or recording rate(Like, the record is "too fast" and google does not understand), but I dont see my error.
I also added file saving. When I open the file and listen to it, it sounds normal. When I send THIS file for recongition I get almost nothing back. So, there is something wrong with the way audio stream is recorded
via Gerda
No comments:
Post a Comment