如何在语音识别中生成时间戳? [英] How to generate timestamps in speech recognition?

查看:90
本文介绍了如何在语音识别中生成时间戳?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从事语音识别系统项目.我已经使用深度神经网络来进行语音识别.但我还需要给定演讲中出现的单词的开始和结束时间.您能否建议我或指导我寻找解决语音识别中时间戳生成问题的资源?我知道 Amazon transcribe 服务也会生成时间戳,但我无法获得有关此的论文.

I am working on a speech recognition system project. I have used deep neural network to do the speech recognition. But I also need the starting and end timings of the words occuring in the given speech. Can you suggest me or direct me towards resources to solve the problem of timestamp generation in speech recognition ? I know the Amazon transcribe service does the timestamp generation too but I haven't been able to get the papers about this.

推荐答案

如果您有兴趣尝试 Microsoft 的语音服务 (https://aka.ms/speech/sdk) 我们也支持字级时间戳.您可以从我们的快速入门示例之一(可用于多种编程语言)开始,您还可以多写几行代码来获取字级计时信息.

If you're interested in trying Microsoft's speech service (https://aka.ms/speech/sdk) we do support word level timestamps as well. You can start with one of our quick start samples (available in many programming languages), and you can a couple more lines of code to get the word level timing information.

基本上,在尝试了默认的麦克风快速入门文件快速入门,您可以添加几行代码来请求单词级时间戳.并且您将添加另一行代码来检索服务提供的 json 响应(具有字级计时信息).

Basically, after trying out the default microphone quickstart or file quickstart, you can add a couple lines of code to request the word level timestamps. And you'll add another line of code to retrieve the service provided json response (which has the word level timing information).

例如,在 C# 中,您可以为 SpeechConfig 对象执行此操作:

For example, in C#, you'd do this for your SpeechConfig object:

config.OutputFormat = OutputFormat.Detailed;
config.RequestWordLevelTimestamps = true;

一旦你收到了你的 SpeechRecognitionResult 对象,你就会这样做:

And once you've received your SpeechRecognitionResult object, you'd do this:

var json = result.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult);
Console.WriteLine(json);

如果您使用其他受支持的编程语言(C++、Java、JavaScript、Objective-C、Swift、Python 等),则代码会略有不同.

If you're using another supported programming language (C++, Java, JavaScript, Objective-C, Swift, Python, etc.), the code would be slightly different.

祝你好运.

罗伯·钱伯斯,微软
建筑师和工程经理

Rob Chambers, Microsoft
Architect and Engineering Manager

这篇关于如何在语音识别中生成时间戳?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆