C#中的语音识别系统,从音频(扬声器发出的声音) [英] C# Speech Recognition from System Audio (Speaker Sound)

查看:808
本文介绍了C#中的语音识别系统,从音频(扬声器发出的声音)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经看到了从输入设备(显然)语音识别和我见过从文件中的语音识别(的 http://gotspeech.net/forums/thread/6835.aspx )。不过,我想知道是否有可能在实时系统运行的音频语音识别。通过系统的音频,散发出来的扬声器的声音。

I've seen speech recognition from input devices (obviously) and I've seen speech recognition from files (http://gotspeech.net/forums/thread/6835.aspx). However, I was wondering whether it would be possible to run speech recognition on system audio in real time. By system audio, the sound that comes out of your speakers.

这将是为那些谁是重听的一个很好的工具,因为它们是观看YouTube视频时, C#应用程序可以抄写什么东西被说。

It would be a great tool for those who are hard of hearing, as they are watching YouTube videos, the C# Application could transcribe what's being said.

我怎么能去这样做?

推荐答案

很容易 - 转到混音器,选择输入并启用/取消静音立体声混音。你当然应该,静音麦克风,如果你不想要录制这一点。然后,就开始记录你录制麦克风以同样的方式 - 现在你会得到相同的饲料作为数字品质扬声器

Very easily - Go to the sound mixer, choose input and enable/unmute "Stereo Mix". You should, of course, mute the mic if you don't want to record that too. Then, just start recording the same way you'd record the mic - now you'll get the same feed as the speakers at digital quality.

这可能是<一个。 HREF =htt​​p://www.codeproject.com/KB/cs/sync_volumecontrol.aspx相对=nofollow>程序来完成的虽然它可以是繁琐 - 特别是如果你想支持的WinXP以及Vista的/ Win7的(声音被翻修了在Vista中,我相信的API有显著不同虽然我还没有使用它们尚未)。

This can be done programatically although it can be fiddly - especially if you want to support WinXP as well as Vista/Win7 (Sound was overhauled in Vista and I believe the APIs are significantly different although I haven't had to use them yet).

你几乎肯定会需要尝试承认之前过滤声音。除非言语识别测试。库您所使用的设计在不利条件下,音乐和特效的工作将适当承认意志多人同时发言干涉。

You're almost certainly going to need to filter the sound before attempting recognition. Unless the speech recog. library you're using is designed to work in adverse conditions, music and special effects will interfere with proper recognition as will multiple people speaking at the same time.

如果你的避风港'T得到了一个超级强大的库,过滤器,以衰减非声音的频率将是一绝。您可能还需要申请音量正常化占响亮/安静的场景 - 有数百个可能改善匹配滤波器

If you haven't got a super-robust library, filters to attenuate non-vocal frequencies are going to be a must. You may also need to apply volume normalisation to account for loud/quiet scenes - There are hundreds of filters that could potentially improve matching.

您可能要访问的识别API在最低水平得到尽可能多的控制越好 - 你需要调整它应付人喊,气喘吁吁,哭了,等等......如果你开始设计灵活的低级别的访问权限,它可能会为你节省数周如果你发现你需要它以后,不得不重新架构

You may want to access the recognition API at the lowest level to get as much control as possible - You'll need to tweak it to cope with people shouting, breathless, crying, etc... If you start designing for flexible low-level access, it will probably save you weeks if you find you need it later on and have to re-architect.

我建议你考虑的 n音讯为出发点用于音频处理

I'd suggest you look into NAudio as a starting point for audio processing

我怀疑你就可以得到一些东西,在理想的工作没有太多精力的条件 - 但调整它在所有不测事件很好地工作可能是一个艰巨的任务。这就是说,它听起来像一个有趣的项目。

I suspect you'll be able to get something which works under ideal conditions without too much effort - but tweaking it to work well in all eventualities may be a mammoth task. That said, it sounds like a fun project.

您可以提高创建genre-,用户或特定的节目,字典相当认可的机会。这些既可以预先生成,或内置自动使用加权反馈回路 - 或许还允许用户纠正错误

You could improve recognition chance considerably by creating genre-, user- or show-specific dictionaries. These could either be pre-generated, or built automatically using a weighted feedback loop - perhaps also allowing the user to correct mistakes.

这篇关于C#中的语音识别系统,从音频(扬声器发出的声音)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆