如何从window.speechSynthesis.speak()调用捕获生成的音频? [英] How to capture generated audio from window.speechSynthesis.speak() call?

查看:1171
本文介绍了如何从window.speechSynthesis.speak()调用捕获生成的音频?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以前的问题已经提出了相同或类似的查询



尚未使用 window.speechSynthesis()。尽管使用 epeak meSpeak 如何在铬浏览器上创建或转换文本到音频?或向外部服务器发送请求。



如何捕获和记录的音频输出window.speechSynthesis.speak()调用并以<$ c $返回结果c> Blob ArrayBuffer AudioBuffer 或其他对象类型?

解决方案

Web Speech API Specification 目前没有提供关于如何实现返回或捕获和记录 window.speechSynthesis.speak()的音频输出的方法或提示 call。



另请参阅


  • MediaStream,ArrayBuffer,用于录音的speak()的Blob音频结果?

  • Re:MediaStream,ArrayBuffer,Blob音频结果来自speak()进行录制吗?


  • Re:MediaStream,ArrayBuffer,来自speak()的Blob音频结果用于录制?。在相关部分中,用例包括但不限于


    1. 说话有问题的人;例如,遭受
      中风或其他沟通抑制病痛的人。他们可以将
      文本转换为音频文件,并将文件发送给另一个人或组。
      此功能将用于帮助他们与其他人沟通,
      类似于协助Stephen Hawking沟通的技术;


    2. 目前,唯一可以听到音频输出的人是浏览器前面的人
      ;实质上,没有充分利用
      的文本到语音功能。音频结果可以用作电子邮件中的
      附件;媒体流;聊天系统;或其他
      通讯应用程序。也就是说,对生成的音频输出进行控制;另一个应用程序是提供免费的libre开源音频
      字典和翻译服务 - 客户端到客户端和客户端到服务器,
      服务器到客户端。




    3. 可以捕捉音频输出的输出 window.speechSynthesis.speak()调用利用 navigator.mediaDevices.getUserMedia() MediaRecorder ()。预期结果将在Chromium浏览器中返回。 Firefox上的实施有问题。在 navigator.mediaDevices.getUserMedia()提示下选择内置音频模拟立体声监听器



      解决方法非常麻烦。我们应该能够得到生成的音频,至少作为一个 Blob ,没有 navigator.mediaDevices.getUserMedia() MediaRecorder()



      浏览器,JavaScript和C ++开发人员,浏览器实施者和规范作者进一步输入;为功能创建适当的规范,并在浏览器的源代码上实现一致;请参阅如何实现选项从window.speechSynthesis.speak()调用返回Blob,ArrayBuffer或AudioBuffer。



      在Chromium上安装一个语音调度程序, - enable-speech-dispatcher 标志设置为 window.speechSynthesis.getVoices()返回一个空数组,参见如何在Chrome上使用Web Speech API?



      概念验证

        // SpeechSynthesisRecorder.js guest271314 6-17- 2017 
      //动机:从`window.speechSynthesis.speak()`调用
      作为`ArrayBuffer`,`AudioBuffer`,`Blob`,`MediaSource`,`MediaStream`, `ReadableStream`,或其他对象或数据类型
      //请参阅https://lists.w3.org/Ar chives / Public / public-speech-api / 2017Jun / 0000.html
      // https://github.com/guest271314/SpeechSynthesisRecorder

      //配置:Analog Stereo Duplex
      //输入设备:内置音频监听模拟立体声,内置音频模拟立体声

      class SpeechSynthesisRecorder {
      构造函数({text =,utteranceOptions = {},recorderOptions = {},dataType =}){
      if(text ===)throw new Error(no words to synthesize);
      this.dataType = dataType;
      this.text = text;
      this.mimeType = MediaRecorder.isTypeSupported(audio / webm; codecs = opus)
      ? audio / webm; codecs = opus:audio / ogg; codecs = opus;
      this.utterance = new SpeechSynthesisUtterance(this.text);
      this.speechSynthesis = window.speechSynthesis;
      this.mediaStream_ = new MediaStream();
      this.mediaSource_ = new MediaSource();
      this.mediaRecorder = new MediaRecorder(this.mediaStream_,{
      mimeType:this.mimeType,
      bitsPerSecond:256 * 8 * 1024
      });
      this.audioContext = new AudioContext();
      this.audioNode = new Audio();
      this.chunks = Array();
      if(utteranceOptions){
      if(utteranceOptions.voice){
      this.speechSynthesis.onvoiceschanged = e => {
      const voice = this.speechSynthesis.getVoices()。find(({
      name:_name
      })=> _name === utteranceOptions.voice);
      this.utterance.voice = voice;
      console.log(voice,this.utterance);
      }
      this.speechSynthesis.getVoices();
      }
      让{
      lang,rate,pitch
      } = utteranceOptions;
      Object.assign(this.utterance,{
      lang,rate,pitch
      });
      }
      this.audioNode.controls =controls;
      document.body.appendChild(this.audioNode);
      }
      start(text =){
      if(text)this.text = text;
      if(this.text ===)会抛出新的错误(无法合成词);
      return navigator.mediaDevices.getUserMedia({
      audio:true
      })
      .then(stream => new Promise(resolve => {
      const track = stream.getAudioTracks()[0];
      this.mediaStream_.addTrack(track);
      //返回当前的`MediaStream`
      if(this.dataType&& this。 dataType ===mediaStream){
      resolve({tts:this,data:this.mediaStream_});
      };
      this.mediaRecorder.ondataavailable = event => {
      if(event.data.size> 0){
      this.chunks.push(event.data);
      };
      };
      this.mediaRecorder。 onstop =()=> {
      track.stop();
      this.mediaStream_.getAudioTracks()[0] .stop();
      this.mediaStream_.removeTrack(track);
      console.log(`完成录制$ {this.utterance.text}`,this.chunks);
      resolve(this);
      }
      this.mediaReco rder.start();
      this.utterance.onstart =()=> {
      console.log(`开始录音SpeechSynthesisUtterance $ {this.utterance.text}`);
      }
      this.utterance.onend =()=> {
      this.mediaRecorder.stop();
      console.log(`结束录音SpeechSynthesisUtterance $ {this.utterance.text}`);
      }
      this.speechSynthesis.speak(this.utterance);
      }));

      blob(){
      if(!this.chunks.length)throw new Error(no data to return);
      return Promise.resolve({
      tts:this,
      data:this.chunks.length === 1?this.chunks [0]:new Blob(this.chunks,{
      类型:this.mimeType
      })
      });
      }
      arrayBuffer(blob){
      if(!this.chunks.length)抛出新错误(无数据返回);
      return new Promise(resolve => {
      const reader = new FileReader;
      reader.onload = e => resolve(({
      tts:this,
      data:reader.result
      )));
      reader.readAsArrayBuffer(blob?new Blob(blob,{
      type:blob.type
      }):this.chunks。 length === 1?this.chunks [0]:new Blob(this.chunks,{
      type:this.mimeType
      }));
      });
      }
      audioBuffer(){
      if(!this.chunks.length)throw new Error(no data to return); (b => this.audioContext.decodeAudioData(ab))
      .then(buffer =>({
      tts:this ,
      数据:缓冲区
      }))
      }
      mediaSource(){
      if(!this.chunks.length)抛出新错误(无数据返回); $(
      )返回this.arrayBuffer()
      .then(({
      data:ab
      })=> new Promise((resolve,reject)=> {
      this.mediaSource_.onsourceended =()=> resolve({
      tts:this,
      data:this.mediaSource_
      });
      this.mediaSource_.onsourceopen =( )=> {
      if(MediaSource.isTypeSupported(this.mimeType)){
      const sourceBuffer = this.mediaSource_.addSourceBuffer(this.mimeType);
      sourceBuffer.mode =sequence
      sourceBuffer.onupdateend =()=>
      this.mediaSource_.endOfStream();
      sourceBuffer.appendBuffer(ab);
      } else {
      reject(` $ {this.mimeType} is not supported`)
      }
      }
      this.audioNode.src = URL.createObjectURL(this.mediaSource_);
      }));
      }
      readableStream({size = 1024,controllerOptions = {},rsOptions = {}}){
      if(!this.chunks.length)throw new Error(no data to return );
      const src = this.chunks.slice(0);
      const chunk = size;
      return Promise.resolve({
      tts:this,
      data:new ReadableStream(controllerOptions || {
      start(controller){
      console.log(src。长度);
      controller.enqueue(src.splice(0,chunk))
      },
      pull(controller){
      if(src.length = 0)controller.close ();
      controller.enqueue(src.splice(0,chunk));
      }
      },rsOptions)
      });
      }
      }

      用法

        let ttsRecorder = new SpeechSynthesisRecorder({
      text:革命不会被电视转播,
      utternanceOptions:{
      voice:英式英语espeak,
      lang:en-US,
      pitch:.75,
      rate:1
      }
      });

      // ArrayBuffer
      ttsRecorder.start()
      //`tts`:`SpeechSynthesisRecorder`实例,`data`:音频为`dataType`或方法调用结果$ b ($ {$ b $ t})(< tts.arrayBuffer())
      .then(({tts,data})=> {
      //用`ArrayBuffer`,`AudioBuffer`, `Blob`,
      //`MediaSource`,`MediaStream`,`ReadableStream`
      //`data`:`ArrayBuffer`
      tts.audioNode.src = URL.createObjectURL(new Blob ([data],{type:tts.mimeType}));
      tts.audioNode.title = tts.utterance.text;
      tts.audioNode.onloadedmetadata =()=> {
      console.log(tts.audioNode.duration);
      tts.audioNode.play();
      }
      })
      // AudioBuffer
      ttsRecorder.start( )
      .then(tts => tts.audioBuffer())
      .then(({tts,data})=> {
      //`data':`AudioBuffer`
      let source = tts.audioContext.createBufferSource();
      source.buffer = data;
      source.connect(tts.audioContext.destination);
      source.start ()
      })
      // Blob
      ttsRecorder.start()
      .then(tts => (({tts,data})=> {
      //`data':`Blob`
      tts.audioNode.src = URL.createObjectURL (blob);
      tts.audioNode.title = tts.utterance.text;
      tts.audioNode.onloadedmetadata =()=> {
      console.log(tts.audioNode.duration) ;
      tts.audioNode.play();
      }
      })
      // ReadableStream
      ttsRecorder.start()
      .then(tts => gt (tts,data))=> {
      //`data`:`ReadableStream`
      console.log(tts,data); tts.readableStream())
      .then ;
      data.getReader()。read()。then(({value,done})=> {
      tts.audioNode.src = URL.createObjectURL(value [0]);
      tts.audioNode.title = tts.utterance.text;
      tts.audioNode.onloadedmetadata =()=> {
      console.log(tts.audioNode.duration);
      tts .audioNode.play();
      }
      })
      })
      // MediaSource
      ttsRecorder.start()
      .then(tts => gt ; tts.mediaSource())
      .then(({tts,data})=> {
      co nsole.log(tts,data);
      //`data`:`MediaSource`
      tts.audioNode.srcObj = data;
      tts.audioNode.title = tts.utterance.text;
      tts.audioNode.onloadedmetadata =()=> {
      console.log(tts.audioNode.duration);
      tts.audioNode.play();

      })
      // MediaStream
      let ttsRecorder = new SpeechSynthesisRecorder({
      text:革命不会被电视转播,
      utternanceOptions: {
      voice:english-us espeak,
      lang:en-US,
      pitch:.75,$ b $ rate:1
      },
      dataType:mediaStream
      });
      ttsRecorder.start()
      .then(({tts,data})=> {
      //`data`:`MediaStream`
      // do something with active `MediaStream`
      })
      .catch(err => console.log(err))

      plnkr


      Previous questions have presented this same or similar inquiry

      yet no workarounds appear to be have been created using window.speechSynthesis(). Though there are workarounds using epeak , meSpeak How to create or convert text to audio at chromium browser? or making requests to external servers.

      How to capture and record audio output of window.speechSynthesis.speak() call and return result as a Blob, ArrayBuffer, AudioBuffer or other object type?

      解决方案

      The Web Speech API Specification does not presently provide a means or hint on how to achieve returning or capturing and recording audio output of window.speechSynthesis.speak() call.

      See also

      • MediaStream, ArrayBuffer, Blob audio result from speak() for recording?

      • Re: MediaStream, ArrayBuffer, Blob audio result from speak() for recording?

      • Re: MediaStream, ArrayBuffer, Blob audio result from speak() for recording?. In pertinent part, use cases include, but are not limited to

        1. Persons who have issues speaking; i.e.g., persons whom have suffered a stroke or other communication inhibiting afflictions. They could convert text to an audio file and send the file to another individual or group. This feature would go towards helping them communicate with other persons, similar to the technologies which assist Stephen Hawking communicate;

        2. Presently, the only person who can hear the audio output is the person in front of the browser; in essence, not utilizing the full potential of the text to speech functionality. The audio result can be used as an attachment within an email; media stream; chat system; or other communication application. That is, control over the generated audio output;

        3. Another application would be to provide a free, libre, open source audio dictionary and translation service - client to client and client to server, server to client.

      It is possible to capture the output of audio output of window.speechSynthesis.speak() call utilizing navigator.mediaDevices.getUserMedia() and MediaRecorder(). The expected result is returned at Chromium browser. Implementation at Firefox has issues. Select Monitor of Built-in Audio Analog Stereo at navigator.mediaDevices.getUserMedia() prompt.

      The workaround is cumbersome. We should be able to get generated audio, at least as a Blob, without navigator.mediaDevices.getUserMedia() and MediaRecorder().

      More interest is evidently necessary by users of browsers, JavaScript and C++ developers, browser implementers and specification authors for further input; to create a proper specification for the feature, and consistent implementation at browsers' source code; see How to implement option to return Blob, ArrayBuffer, or AudioBuffer from window.speechSynthesis.speak() call.

      At Chromium a speech dispatcher program should be installed and the instance launched with --enable-speech-dispatcher flag set, as window.speechSynthesis.getVoices() returns an empty array, see How to use Web Speech API at chromium?.

      Proof of concept

      // SpeechSynthesisRecorder.js guest271314 6-17-2017
      // Motivation: Get audio output from `window.speechSynthesis.speak()` call
      // as `ArrayBuffer`, `AudioBuffer`, `Blob`, `MediaSource`, `MediaStream`, `ReadableStream`, or other object or data types
      // See https://lists.w3.org/Archives/Public/public-speech-api/2017Jun/0000.html
      // https://github.com/guest271314/SpeechSynthesisRecorder
      
      // Configuration: Analog Stereo Duplex
      // Input Devices: Monitor of Built-in Audio Analog Stereo, Built-in Audio Analog Stereo
      
      class SpeechSynthesisRecorder {
        constructor({text = "", utteranceOptions = {}, recorderOptions = {}, dataType = ""}) {
          if (text === "") throw new Error("no words to synthesize");
          this.dataType = dataType;
          this.text = text;
          this.mimeType = MediaRecorder.isTypeSupported("audio/webm; codecs=opus") 
                          ? "audio/webm; codecs=opus" : "audio/ogg; codecs=opus";
          this.utterance = new SpeechSynthesisUtterance(this.text);
          this.speechSynthesis = window.speechSynthesis;
          this.mediaStream_ = new MediaStream();
          this.mediaSource_ = new MediaSource();
          this.mediaRecorder = new MediaRecorder(this.mediaStream_, {
            mimeType: this.mimeType,
            bitsPerSecond: 256 * 8 * 1024
          });
          this.audioContext = new AudioContext();
          this.audioNode = new Audio();
          this.chunks = Array();
          if (utteranceOptions) {
            if (utteranceOptions.voice) {
              this.speechSynthesis.onvoiceschanged = e => {
                const voice = this.speechSynthesis.getVoices().find(({
                  name: _name
                }) => _name === utteranceOptions.voice);
                this.utterance.voice = voice;
                console.log(voice, this.utterance);
              }
              this.speechSynthesis.getVoices();
            }
            let {
              lang, rate, pitch
            } = utteranceOptions;
            Object.assign(this.utterance, {
              lang, rate, pitch
            });
          }
          this.audioNode.controls = "controls";
          document.body.appendChild(this.audioNode);
        }
        start(text = "") {
          if (text) this.text = text;
          if (this.text === "") throw new Error("no words to synthesize");
          return navigator.mediaDevices.getUserMedia({
              audio: true
            })
            .then(stream => new Promise(resolve => {
              const track = stream.getAudioTracks()[0];
              this.mediaStream_.addTrack(track);
              // return the current `MediaStream`
              if (this.dataType && this.dataType === "mediaStream") {
                resolve({tts:this, data:this.mediaStream_});
              };
              this.mediaRecorder.ondataavailable = event => {
                if (event.data.size > 0) {
                  this.chunks.push(event.data);
                };
              };
              this.mediaRecorder.onstop = () => {
                track.stop();
                this.mediaStream_.getAudioTracks()[0].stop();
                this.mediaStream_.removeTrack(track);
                console.log(`Completed recording ${this.utterance.text}`, this.chunks);
                resolve(this);
              }
              this.mediaRecorder.start();
              this.utterance.onstart = () => {
                console.log(`Starting recording SpeechSynthesisUtterance ${this.utterance.text}`);
              }
              this.utterance.onend = () => {
                this.mediaRecorder.stop();
                console.log(`Ending recording SpeechSynthesisUtterance ${this.utterance.text}`);
              }
              this.speechSynthesis.speak(this.utterance);
            }));
        }
        blob() {
          if (!this.chunks.length) throw new Error("no data to return");
          return Promise.resolve({
            tts: this,
            data: this.chunks.length === 1 ? this.chunks[0] : new Blob(this.chunks, {
              type: this.mimeType
            })
          });
        }
        arrayBuffer(blob) {
          if (!this.chunks.length) throw new Error("no data to return");
          return new Promise(resolve => {
            const reader = new FileReader;
            reader.onload = e => resolve(({
              tts: this,
              data: reader.result
            }));
            reader.readAsArrayBuffer(blob ? new Blob(blob, {
              type: blob.type
            }) : this.chunks.length === 1 ? this.chunks[0] : new Blob(this.chunks, {
              type: this.mimeType
            }));
          });
        }
        audioBuffer() {
          if (!this.chunks.length) throw new Error("no data to return");
          return this.arrayBuffer()
            .then(ab => this.audioContext.decodeAudioData(ab))
            .then(buffer => ({
              tts: this,
              data: buffer
            }))
        }
        mediaSource() {
          if (!this.chunks.length) throw new Error("no data to return");
          return this.arrayBuffer()
            .then(({
              data: ab
            }) => new Promise((resolve, reject) => {
              this.mediaSource_.onsourceended = () => resolve({
                tts: this,
                data: this.mediaSource_
              });
              this.mediaSource_.onsourceopen = () => {
                if (MediaSource.isTypeSupported(this.mimeType)) {
                  const sourceBuffer = this.mediaSource_.addSourceBuffer(this.mimeType);
                  sourceBuffer.mode = "sequence"
                  sourceBuffer.onupdateend = () =>
                    this.mediaSource_.endOfStream();
                  sourceBuffer.appendBuffer(ab);
                } else {
                  reject(`${this.mimeType} is not supported`)
                }
              }
              this.audioNode.src = URL.createObjectURL(this.mediaSource_);
            }));
        }
        readableStream({size = 1024, controllerOptions = {}, rsOptions = {}}) {
          if (!this.chunks.length) throw new Error("no data to return");
          const src = this.chunks.slice(0);
          const chunk = size;
          return Promise.resolve({
            tts: this,
            data: new ReadableStream(controllerOptions || {
              start(controller) {
                  console.log(src.length);
                  controller.enqueue(src.splice(0, chunk))
                },
                pull(controller) {
                  if (src.length = 0) controller.close();
                  controller.enqueue(src.splice(0, chunk));
                }
            }, rsOptions)
          });
        }
      }
      

      Usage

      let ttsRecorder = new SpeechSynthesisRecorder({
         text: "The revolution will not be televised", 
         utternanceOptions: {
           voice: "english-us espeak",
           lang: "en-US",
           pitch: .75,
           rate: 1
         }
       });
      
       // ArrayBuffer
       ttsRecorder.start()
       // `tts` : `SpeechSynthesisRecorder` instance, `data` : audio as `dataType` or method call result
       .then(tts => tts.arrayBuffer())
       .then(({tts, data}) => {
         // do stuff with `ArrayBuffer`, `AudioBuffer`, `Blob`,
         // `MediaSource`, `MediaStream`, `ReadableStream`
         // `data` : `ArrayBuffer`
         tts.audioNode.src = URL.createObjectURL(new Blob([data], {type:tts.mimeType}));
         tts.audioNode.title = tts.utterance.text;
         tts.audioNode.onloadedmetadata = () => {
           console.log(tts.audioNode.duration);
           tts.audioNode.play();
         }
       })
       // AudioBuffer     
       ttsRecorder.start()
       .then(tts => tts.audioBuffer())
       .then(({tts, data}) => {
         // `data` : `AudioBuffer`
         let source = tts.audioContext.createBufferSource();
         source.buffer = data;
         source.connect(tts.audioContext.destination);
         source.start()
       })
       // Blob
       ttsRecorder.start()
       .then(tts => tts.blob())
       .then(({tts, data}) => {
         // `data` : `Blob`
         tts.audioNode.src = URL.createObjectURL(blob);
         tts.audioNode.title = tts.utterance.text;
         tts.audioNode.onloadedmetadata = () => {
           console.log(tts.audioNode.duration);
           tts.audioNode.play();
         }
       })
       // ReadableStream
       ttsRecorder.start()
       .then(tts => tts.readableStream())
       .then(({tts, data}) => {
         // `data` : `ReadableStream`
         console.log(tts, data);
         data.getReader().read().then(({value, done}) => {
           tts.audioNode.src = URL.createObjectURL(value[0]);
           tts.audioNode.title = tts.utterance.text;
           tts.audioNode.onloadedmetadata = () => {
             console.log(tts.audioNode.duration);
             tts.audioNode.play();
           }
         })
       })
       // MediaSource
       ttsRecorder.start()
       .then(tts => tts.mediaSource())
       .then(({tts, data}) => {
         console.log(tts, data);
         // `data` : `MediaSource`
         tts.audioNode.srcObj = data;
         tts.audioNode.title = tts.utterance.text;
         tts.audioNode.onloadedmetadata = () => {
           console.log(tts.audioNode.duration);
           tts.audioNode.play();
         }
       })
       // MediaStream
       let ttsRecorder = new SpeechSynthesisRecorder({
         text: "The revolution will not be televised", 
         utternanceOptions: {
           voice: "english-us espeak",
           lang: "en-US",
           pitch: .75,
           rate: 1
         }, 
         dataType:"mediaStream"
       });
       ttsRecorder.start()
       .then(({tts, data}) => {
         // `data` : `MediaStream`
         // do stuff with active `MediaStream`
       })
       .catch(err => console.log(err))
      

      plnkr

      这篇关于如何从window.speechSynthesis.speak()调用捕获生成的音频?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆