Microsoft Speech产品/平台之间的差异 [英] Difference among Microsoft Speech products/platforms

本文介绍了Microsoft Speech产品/平台之间的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

看来Microsoft提供了许多语音识别产品,我想知道它们之间的区别.

It seems Microsoft offers quite a few speech recognition products, I'd like to know the differences among all of them pls.

  • There is Microsoft Speech API, or SAPI. But somehow Microsoft Cognitive Service Speech API has the same name.

好吧,现在,Azure上的Microsoft认知服务提供了语音服务API Bing Speech API .我假设对于语音转文本,两个API都是相同的.

Ok now, Microsoft Cognitive Service on Azure offers Speech service API and Bing Speech API. I assume for speech-to-text, both APIs are the same.

然后有此处

And then there is System.Speech.Recognition (or Desktop SAPI), Microsoft.Speech.Recognition (or Server SAPI) and Windows.Media.Speech.Recognition. Here and here have some explanations on the difference among the three. But my guesses are they are old speech recognition models based on HMM, aka are not neural network models, and all three can be used offline without internet connection, right?

对于Azure语音服务和bing语音API,它们是更高级的语音模型,对吗?但是我认为无法在本地计算机上离线使用它们,因为它们都需要订阅验证. (甚至似乎Bing API都有一个 C#桌面库 ..)

For the Azure speech service and bing speech APIs, they are more advanced speech models right? But I assume there is no way to use them offline on my local machine, as they all require subscription verification. (even tho it seems Bing API has a C# desktop library..)

基本上,我希望有一个离线模型,该模型可以对我的对话数据进行语音到文本的转录(每个音频记录需要5-10分钟),该模型可以识别多个扬声器并输出时间戳(或时间编码的输出).现在,我对所有选项感到困惑.如果有人可以向我解释,将不胜感激!

Essentially I want to have a offline model which does speech-to-text transcription, for my conversation data (5-10 mins for each audio recording), which recognises multi-speakers and outputs timestamps (or timecoded output). I am a bit confused now by all the options. I would be greatly appreciated if someone can explain to me, much thanks!

推荐答案

一个困难的问题-以及如此困难的部分原因:我们(Microsoft)似乎提出了一个关于语音"和语音api"的不连贯的故事'.尽管我在Microsoft工作,但以下是我对此的看法.我试图对我的团队正在计划的内容(认知服务演讲-客户端SDK)提供一些见识,但是我无法预测不久将来的所有方面.

A difficult question - and part of the reason why it is so difficult: We (Microsoft) seem to present an incoherent story about 'speech' and 'speech apis'. Although I work for Microsoft, the following is my view on this. I try to give some insight on what is being planned in my team (Cognitive Service Speech - Client SDK), but I can't predict all facets of the not-so-near-future.

Microsoft早就认识到语音是一种重要的媒介,因此Microsoft具有广泛而悠久的历史,可以在其产品中启用语音功能.确实有不错的语音解决方案(具有本地识别功能),您列出了其中的一些.

Early on Microsoft recognized that speech is an important medium, so Microsoft has an extensive and long running history enabling speech in its products. There are really good speech solutions (with local recognition) available, you listed some of those.

我们正在努力统一这一点,并为您提供一个在Microsoft找到最先进的语音解决方案的地方.这是"Microsoft语音服务"( https://docs. microsoft.com/de-de/azure/cognitive-services/speech-service/)-当前处于预览状态.

We are working on unifying this, and present one place for you to find the state-of-the-art speech solution at Microsoft. This is 'Microsoft Speech Service' (https://docs.microsoft.com/de-de/azure/cognitive-services/speech-service/) - currently in preview.

在服务方面,它将把我们的主要语音技术(如语音到文本,文本到语音,意图,翻译(以及将来的服务))整合到一个保护伞下.语音和语言模型不断改进和更新.我们正在为此服务开发客户端SDK.随着时间的推移(今年晚些时候),该SDK将在所有主要操作系统(Windows,Linux,Android,iOS)上可用,并支持主要编程语言.我们将继续增强/改进SDK的平台和语言支持.

On the service side it will combine our major speech technologies, like speech-to-text, text-to-speech, intent, translation (and future services) under one umbrella. Speech and languages models are constantly improved and updated. We are developing a client SDK for this service. Over time (later this year) this SDK will be available on all major operating systems (Windows, Linux, Android, iOS) and have support for major programming languages. We will continue to enhance/improve platform and language support for the SDK.

在线服务和客户端SDK的结合将在今年晚些时候退出预览状态.

This combination of online service and client SDK will leave the preview-state later this year.

我们了解拥有本地识别功能的愿望.它不会在我们的第一个SDK版本中开箱即用"(也不是当前预览的一部分). SDK的目标之一是平台和语言之间的奇偶校验(功能和API).这需要很多工作.离线目前还不属于这种情况,在功能和时间轴上我都无法做出任何预测...

We understand the desire to have local recognition capabilities. It will not be available 'out-of-the-box' in our first SDK release (it is also not part of the current preview). One goal for the SDK is parity (functionality and API) between platforms and languages. This needs a lot of work. Offline is not part of this right now, I can't make any prediction here, neither in features nor timeline ...

因此,从我的角度来看,新的语音服务和SDK是前进的方向.目标是在所有平台上提供统一的API,以便轻松访问所有Microsoft Speech Services.它需要订阅密钥,它需要您已连接".我们正努力在今年晚些时候使(服务器和客户端)退出预览状态.

So from my point of view - the new Speech Services and the SDK is the way forward. The goal is a unified API on all platforms, easy access to all Microsoft Speech Services. It requires the subscription key, it requires you are 'connected'. We are working hard to get both (server and client) out of preview status later this year.

希望这对您有帮助...

Hope this helps ...

狼帮

这篇关于Microsoft Speech产品/平台之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆