如何从语音合成API访问音频结果? [英] How to access audio result from Speech Synthesis API?

查看:286
本文介绍了如何从语音合成API访问音频结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

语音合成API 允许在Chrome Beta中使用文字转语音功能。但是,TTS请求的结果由浏览器自动播放。如何访问音频结果以进行后期处理并禁用API的默认行为?

解决方案

TTS系统没有标准的音频输出,而且看起来很有意思,所以不可能很快改变。 / p>

为了理解原因,您可以看看这个界面的另一面,浏览器扩展可以充当TTS引擎并提供客户端可以使用的语音:



作为此API可访问的有效的TTS引擎在chrome中,支持TTS请求的开始/暂停/取消和恢复,并将进度更新作为以下类型的事件: //developer.chrome.com/extensions/tts#type-TtsEventrel =nofollow> https://developer.chrome.com/extensions/tts#type-TtsEvent



因此,TTS引擎没有标准的方式来指示除了实际播放音频外所产生的音频。根据具体的TTS引擎,它可能不使用标准音频格式,甚至不使用浏览器的普通音频设备访问。 (例如,它可能会将文本转发到平台的辅助功能系统。)



如果您对特定的TTS引擎有所了解(或创建您自己的),那么您可以构建自己的界面 1 以检索音频文件。但是那个TTS引擎必须安装在你想要使用它的每个客户端的浏览器上。这就是为什么任何解决方案都必须指向特定的TTS引擎或外部TTS解决方案,如果您想控制播放而不是将有效输入调整为TTS引擎请求(相对音高,相对音量,相对比率,性别)。 >

笔记 -



1 这样的接口,它不能平凡地扩展现有的TTS事件API,因为浏览器正在检查它们:

  //尝试添加属性引擎中的其他合法事件:
sendTTSev({'type':'end','charIndex':len,foo:'george'});
...
未捕获错误:参数2的值无效。属性'foo':意外的属性。
at validate(extensions :: schemaUtils:34:13)
at Object.normalizeArgumentsAndValidate(extensions :: schemaUtils:117:3)
at Object。< anonymous> (extensions :: binding:361:30)
在sendTtsEvent(extensions :: ttsEngine:17:22)


The Speech Synthesis API allows text-to-speech functionality in Chrome Beta. However, results from TTS requests are automatically played by the browser. How do I access the audio results for post-processing and disable the default behavior of the API?

解决方案

There is no standard audio output for the TTS system and that seems quite intentional so it is unlikely to change anytime soon.

To understand why, you can look at the other side of this interface where a browser extension can act as a TTS Engine and provide the voices the client can use:

Being a valid TTS Engine accessible by this API in chrome is about supporting starting/pausing/canceling and resuming of TTS requests and sending updates on the progress as events of the following types:

https://developer.chrome.com/extensions/tts#type-TtsEvent

As such, there is no standard way for a TTS engine to indicate the resulting audio aside from actually playing it. Depending on the specific TTS engine, it may not use a standard audio format or even the browser's normal audio devices access. (For example, it may be forwarding the text to the platform's accessibility system.)

If you know something about a specific TTS Engine (or create your own) then you can build your own interface1 to retrieve the audio file. But that TTS Engine must then be installed on every client's browser where you want to use it. This is why any solution must point you to a specific TTS Engine or an outside TTS solution if you want to control the playback beyond adjusting valid inputs to a TTS Engine request (relative pitch, relative volume, relative rate, sex.)

Notes-

1 If you give a TTS Engine such an interface, it can not trivially extend the existing TTS event API since the browser is checking them:

// attempt to add properties to an otherwise legal event in an Engine:
sendTTSev({'type': 'end', 'charIndex': len, foo:'george'});
...
Uncaught Error: Invalid value for argument 2. Property 'foo': Unexpected property.
    at validate (extensions::schemaUtils:34:13)
    at Object.normalizeArgumentsAndValidate  (extensions::schemaUtils:117:3)
    at Object.<anonymous> (extensions::binding:361:30)
    at sendTtsEvent (extensions::ttsEngine:17:22)

这篇关于如何从语音合成API访问音频结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆