JavaScript语音识别

技术2022-07-11 130

Speech recognition software is becoming more and more important; it started (for me) with Siri on iOS, then Amazon's Echo, then my new Apple TV, and so on. Speech recognition is so useful for not just us tech superstars but for people who either want to work "hands free" or just want the convenience of shouting orders at a moment's notice. Browser tech sometimes lags behind native technology but not for speech recognition: the technology in the browser today and it's time to use it: the SpeechRecognition API.

语音识别软件变得越来越重要。它开始于iOS上的Siri，然后是Amazon的Echo，然后是我的新Apple TV，依此类推。语音识别不仅对我们的科技巨星非常有用，而且对于那些想要“免提”工作或只想在短时间内大喊大叫的人来说非常有用。浏览器技术有时落后于本机技术，但不落后于语音识别：当今浏览器中的技术，现在是时候使用它了： SpeechRecognition API。

Basic 基本 Video Demo 视频演示

SpeechRecognition (SpeechRecognition)

For as advanced of a concept speech recognition is, the API to use it is fairly simple:

对于高级概念语音识别来说，使用它的API非常简单：

var recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition || window.mozSpeechRecognition || window.msSpeechRecognition)(); recognition.lang = 'en-US'; recognition.interimResults = false; recognition.maxAlternatives = 5; recognition.start(); recognition.onresult = function(event) { console.log('You said: ', event.results[0][0].transcript); };

The first match is at the event.results[0][0].transcript path; you can also set the number of alternatives in the case that what you're listening for could be ambiguous.

第一个匹配项位于event.results[0][0].transcript路径；如果您正在听的内容含糊不清，您还可以设置替代项的数量。

You can even add your own terms using the SpeechGrammarList object:

您甚至可以使用SpeechGrammarList对象添加自己的术语：

var grammar = '#JSGF V1.0; grammar colors; public = aqua | azure | beige ... ;' var recognition = new SpeechRecognition(); var speechRecognitionList = new SpeechGrammarList(); speechRecognitionList.addFromString(grammar, 1); recognition.grammars = speechRecognitionList;

There are several events emitted during the speech recognition process, so you can use the following snippet to follow the event timeline:

在语音识别过程中会发出多个事件，因此您可以使用以下代码片段跟随事件时间轴：

[ 'onaudiostart', 'onaudioend', 'onend', 'onerror', 'onnomatch', 'onresult', 'onsoundstart', 'onsoundend', 'onspeechend', 'onstart' ].forEach(function(eventName) { recognition[eventName] = function(e) { console.log(eventName, e); }; });

A few caveats about the using speech recognition:

有关使用语音识别的一些注意事项：

Chrome ends the listener after a given amount of time, so you'll need to hook into the end event to restart the speech listener

Chrome会在给定的时间后结束监听器，因此您需要挂入end事件才能重新启动语音监听器

If you have multiple tabs using the speech listener API, you may experience the listener ending quickly

如果您有多个使用语音监听器API的标签，则可能会遇到监听器快速结束的情况

娘娘！ (annyang!)

The excellent annyang library provides a neat API for listening to for desired commands, all in an awesome 2KB package. The following is a sample usage of annyang:

出色的annyang库提供了一个精巧的API，用于侦听所需的命令，所有这些均以2KB的超赞包提供。以下是annyang的示例用法：

// Let's define our first command. First the text we expect, and then the function it should call var commands = { 'play video': function() { document.querySelector('video').play(); }, 'pause video': function() { document.querySelector('video').pause(); } '* video': function(word) { if(word === 'play') { document.querySelector('video').play(); } else if(word === 'pause' || word === 'stop') { document.querySelector('video').pause(); } } }; // Add our commands to annyang annyang.addCommands(commands); // Start listening. You can call this here, or attach this call to an event, button, etc. annyang.start();

Note that not only can you provide an exact phrase to listen for, but you can also provide a wildcard string; the wildcard string is useful in cases where you want to prefix your commands, much like saying "Siri: (instructions)" or "Echo: (instructions)".

请注意，您不仅可以提供要收听的确切短语，而且还可以提供通配符字符串。通配符字符串在要为命令添加前缀的情况下很有用，就像说“ Siri ：(指令)”或“ Echo ：(指令)”一样。

Basic 基本 Video Demo 视频演示

It's so cool that speech recognition is available within the browser today. If you want to see an awesome application of this feature, check out Mozilla VR's Kevin Ngo's amazing demo: Speech Recognition + A-Frame VR + Spotify. You could even use this API to listen for "wtf" when someone reviews your code! Take some time to play with this API and create something innovative!

太酷了，今天的浏览器中可以使用语音识别。如果您想看到此功能的出色应用，请查看Mozilla VR的Kevin Ngo的惊人演示：语音识别+ A-Frame VR + Spotify 。当有人查看您的代码时，您甚至可以使用此API来监听“ wtf” ！花一些时间来使用此API并创建一些创新的东西！

翻译自: https://davidwalsh.name/speech-recognition