Matlab:在音频数据帧中查找主频 [英] Matlab: Finding dominant frequencies in a frame of audio data

查看:278
本文介绍了Matlab:在音频数据帧中查找主频的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对Matlab还是很陌生,我正在尝试编写一种基于频率的简单语音检测算法.最终目标是在wav文件上运行脚本,并让其输出每个语音段的开始/结束时间.如果使用代码:

fr = 128;
[ audio, fs, nbits ] = wavread(audioPath);
spectrogram(audio,fr,120,fr,fs,'yaxis')

我会得到一个有用的频率强度与时间的关系图,如下所示:

通过查看,很容易看到何时发生语音.我可以编写一种算法,通过查看每个x轴框架来自动执行检测过程,找出哪些频率是主导频率(具有最高强度),测试主导频率以查看其中是否有足够的频率高于某个强度阈值(图上黄色和红色之间的差异),然后将该帧标记为语音或非语音.一旦帧被标记,就很容易获得每个语音片段的开始/结束时间.

我的问题是我不知道如何访问该数据.我可以使用代码:

[S,F,T,P] = spectrogram(audio,fr,120,fr,fs);

以获取频谱图的所有功能,但是该代码的结果对我而言没有任何意义. S,F,T,P数组和矩阵的边界与我在图中看到的任何东西都不相关.我已经浏览了帮助文件和API,但是当它们开始抛出算法名称和首字母缩略词时,我会感到困惑-我的DSP背景非常有限.

我如何获得此频谱图分析的每一帧的频率强度值数组?我可以从那里找出其余的信息,我只需要知道如何获取适当的数据即可.

解决方案

您尝试执行的操作称为语音活动检测.有很多方法,最简单的可能是简单的带通滤波器,它可以使语音最强的频率通过,该频率在1kHz至8kHz之间.然后,您可以将总信号能量与带通受限进行比较,如果大部分能量在语音频带内,则将帧分类为语音.这是一个选择,但还有其他选择.

要获取峰值频率,可以使用FFT获取频谱,然后使用 peakdetect .m .但这是一种非常幼稚的方法,因为您会得到很多峰,这些峰属于基本正弦波的谐波频率.

从理论上讲,您应该使用某种倒频谱(也称为频谱频谱),它可以将频谱中谐波的周期性降低到基本频率,然后将其与peakdetect一起使用.或者,您可以使用执行此操作的现有工具,例如 praat .

请注意,语音分析通常是在30毫秒左右的帧中进行的,步进为10毫秒.您可以通过确保在N个连续帧中检测到共振峰来进一步过滤掉错误检测.

I am pretty new to Matlab and I am trying to write a simple frequency based speech detection algorithm. The end goal is to run the script on a wav file, and have it output start/end times for each speech segment. If use the code:

fr = 128;
[ audio, fs, nbits ] = wavread(audioPath);
spectrogram(audio,fr,120,fr,fs,'yaxis')

I get a useful frequency intensity vs. time graph like this:

By looking at it, it is very easy to see when speech occurs. I could write an algorithm to automate the detection process by looking at each x-axis frame, figuring out which frequencies are dominant (have the highest intensity), testing the dominant frequencies to see if enough of them are above a certain intensity threshold (the difference between yellow and red on the graph), and then labeling that frame as either speech or non-speech. Once the frames are labeled, it would be simple to get start/end times for each speech segment.

My problem is that I don't know how to access that data. I can use the code:

[S,F,T,P] = spectrogram(audio,fr,120,fr,fs);

to get all the features of the spectrogram, but the results of that code don't make any sense to me. The bounds of the S,F,T,P arrays and matrices don't correlate to anything I see on the graph. I've looked through the help files and the API, but I get confused when they start throwing around algorithm names and acronyms - my DSP background is pretty limited.

How could I get an array of the frequency intensity values for each frame of this spectrogram analysis? I can figure the rest out from there, I just need to know how to get the appropriate data.

解决方案

What you are trying to do is called speech activity detection. There are many approaches to this, the simplest might be a simple band pass filter, that passes frequencies where speech is strongest, this is between 1kHz and 8kHz. You could then compare total signal energy with bandpass limited and if majority of energy is in the speech band, classify frame as speech. That's one option, but there are others too.

To get frequencies at peaks you could use FFT to get spectrum and then use peakdetect.m. But this is a very naïve approach, as you will get a lot of peaks, belonging to harmonic frequencies of a base sine.

Theoretically you should use some sort of cepstrum (also known as spectrum of spectrum), which reduces harmonics' periodicity in spectrum to base frequency and then use that with peakdetect. Or, you could use existing tools, that do that, such as praat.

Be aware, that speech analysis is usually done on a frames of around 30ms, stepping in 10ms. You could further filter out false detection by ensuring formant is detected in N sequential frames.

这篇关于Matlab:在音频数据帧中查找主频的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆