给定一个音频流,找出门何时砰地一声(声压级计算?) [英] Given an audio stream, find when a door slams (sound pressure level calculation?)

查看:9
本文介绍了给定一个音频流,找出门何时砰地一声(声压级计算?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

与鼓掌检测器不同(鼓掌!鼓掌鼓掌!鼓掌鼓掌,鼓掌,鼓掌!鼓掌em> ") 我需要检测门何时关闭.这是在车上,这比房间或家用门更容易:

听:

它以 16 位 4khz 采样,我想避免大量处理或存储样本.

当您使用 Audacity 或其他波形工具查看它时,它非常独特,并且几乎总是由于车辆中声压的增加而剪辑 - 即使在窗户和其他门打开的情况下:

听:

我希望有一种相对简单的算法,可以在 4kHz、8 位读取读数,并跟踪稳定状态".当算法检测到声级显着增加时,它会标记该点.

  • 你有什么想法?
  • 您将如何检测此事件?
  • 是否有可能有帮助的声压级计算代码示例?
  • 我是否可以减少采样频率(1kHz 甚至更慢?)

更新: 使用 Octave(开源数值分析 - 类似于 Matlab)并查看均方根是否会满足我的需求(这会产生与 SPL 非常相似的结果)

更新 2: 计算 RMS 发现在简单情况下门很容易关闭:

现在我只需要看看困难的情况(收音机打开,热/空气高等).CFAR 看起来非常有趣 - 我知道我将不得不使用自适应算法,而 CFAR 肯定符合要求.

-亚当

解决方案

查看源音频文件的屏幕截图,检测声级变化的一种简单方法是执行 数值积分 以找出特定时间波的能量".

粗略的算法是:

  1. 将样本分成几部分
  2. 计算每个部分的能量
  3. 取前一窗口与当前窗口的能量比
  4. 如果该比率超过某个阈值,则确定突然发出巨响.

伪代码

samples = load_audio_samples()//包含音频样本的数组WINDOW_SIZE = 1000//1000 个样本的样本窗口(示例)for (i = 0; i  阈值):突然声音检测()last_energy = 能量能量 = 0;

我应该添加一个我没有尝试过的免责声明.

这种方式应该可以在不首先记录所有样本的情况下执行.只要有一定长度的缓冲区(本例中为WINDOW_SIZE),就可以进行数值积分来计算这段声音的能量.然而,这确实意味着处理会有延迟,这取决于 WINDOW_SIZE 的长度.确定一段声音的合适长度是另一个问题.

如何分成多个部分

在第一个音频文件中,关门声音的持续时间似乎是0.25秒,所以用于数值积分的窗口最多应该是它的一半,甚至更像是一个十分之一,所以即使静音部分和噪音部分之间的窗口重叠,也可以注意到静音和突然声音之间的差异.

例如,如果积分窗口为 0.5 秒,第一个窗口覆盖 0.25 秒的静音和 0.25 秒的关门,第二个窗口覆盖 0.25 秒的关门和 0.25 秒的静音,则可能会出现两段声音的噪音水平相同,因此不会触发声音检测.我想有一个较短的窗口会在一定程度上缓解这个问题.

但是,窗口太短意味着声音的上升可能无法完全适应一个窗口,并且可能看起来相邻部分之间的能量差异很小,这会导致声音被错过.

我相信 WINDOW_SIZETHRESHOLD 都必须根据将要检测到的声音凭经验确定.

为了确定这个算法需要在内存中保留多少个样本,假设WINDOW_SIZE是关门声音的1/10,大约是0.025秒.采样率为 4 kHz,即 100 个样本.这似乎不是太多的内存要求.使用 200 字节的 16 位样本.

优点/缺点

这种方法的优点是,如果源音频以整数形式输入,则可以使用简单的整数算法进行处理.正如前面提到的,问题在于实时处理会有延迟,具体取决于集成部分的大小.

我可以想到这种方法有几个问题:

  1. 如果背景噪音太大,背景噪音和关门之间的能量差异将无法轻易区分,并且可能无法检测到关门.
  2. 任何突然的声音,例如拍手声,都可以视为门正在关闭.

也许,结合其他答案中的建议,例如尝试使用傅立叶分析来分析关门的频率特征,这将需要更多处理,但会使其不易出错.

在找到解决此问题的方法之前,可能需要进行一些实验.

Not unlike a clap detector ("Clap on! clap clap Clap off! clap clap Clap on, clap off, the Clapper! clap clap ") I need to detect when a door closes. This is in a vehicle, which is easier than a room or household door:

Listen: http://ubasics.com/so/van_driver_door_closing.wav

Look:

It's sampling at 16bits 4khz, and I'd like to avoid lots of processing or storage of samples.

When you look at it in audacity or another waveform tool it's quite distinctive, and almost always clips due to the increase in sound pressure in the vehicle - even when the windows and other doors are open:

Listen: http://ubasics.com/so/van_driverdoorclosing_slidingdoorsopen_windowsopen_engineon.wav

Look:

I expect there's a relatively simple algorithm that would take readings at 4kHz, 8 bits, and keep track of the 'steady state'. When the algorithm detects a significant increase in the sound level it would mark the spot.

  • What are your thoughts?
  • How would you detect this event?
  • Are there code examples of sound pressure level calculations that might help?
  • Can I get away with less frequent sampling (1kHz or even slower?)

Update: Playing with Octave (open source numerical analysis - similar to Matlab) and seeing if the root mean square will give me what I need (which results in something very similar to the SPL)

Update2: Computing the RMS finds the door close easily in the simple case:

Now I just need to look at the difficult cases (radio on, heat/air on high, etc). The CFAR looks really interesting - I know I'm going to have to use an adaptive algorithm, and CFAR certainly fits the bill.

-Adam

解决方案

Looking at the screenshots of the source audio files, one simple way to detect a change in sound level would be to do a numerical integration of the samples to find out the "energy" of the wave at a specific time.

A rough algorithm would be:

  1. Divide the samples up into sections
  2. Calculate the energy of each section
  3. Take the ratio of the energies between the previous window and the current window
  4. If the ratio exceeds some threshold, determine that there was a sudden loud noise.

Pseudocode

samples = load_audio_samples()     // Array containing audio samples
WINDOW_SIZE = 1000                 // Sample window of 1000 samples (example)

for (i = 0; i < samples.length; i += WINDOW_SIZE):
    // Perform a numerical integration of the current window using simple
    // addition of current sample to a sum.
    for (j = 0; j < WINDOW_SIZE; j++):
        energy += samples[i+j]

    // Take ratio of energies of last window and current window, and see
    // if there is a big difference in the energies. If so, there is a
    // sudden loud noise.
    if (energy / last_energy > THRESHOLD):
        sudden_sound_detected()

    last_energy = energy
    energy = 0;

I should add a disclaimer that I haven't tried this.

This way should be possible to be performed without having the samples all recorded first. As long as there is buffer of some length (WINDOW_SIZE in the example), a numerical integration can be performed to calculate the energy of the section of sound. This does mean however, that there will be a delay in the processing, dependent on the length of the WINDOW_SIZE. Determining a good length for a section of sound is another concern.

How to Split into Sections

In the first audio file, it appears that the duration of the sound of the door closing is 0.25 seconds, so the window used for numerical integration should probably be at most half of that, or even more like a tenth, so the difference between the silence and sudden sound can be noticed, even if the window is overlapping between the silent section and the noise section.

For example, if the integration window was 0.5 seconds, and the first window was covering the 0.25 seconds of silence and 0.25 seconds of door closing, and the second window was covering 0.25 seconds of door closing and 0.25 seconds of silence, it may appear that the two sections of sound has the same level of noise, therefore, not triggering the sound detection. I imagine having a short window would alleviate this problem somewhat.

However, having a window that is too short will mean that the rise in the sound may not fully fit into one window, and it may apppear that there is little difference in energy between the adjacent sections, which can cause the sound to be missed.

I believe the WINDOW_SIZE and THRESHOLD are both going to have to be determined empirically for the sound which is going to be detected.

For the sake of determining how many samples that this algorithm will need to keep in memory, let's say, the WINDOW_SIZE is 1/10 of the sound of the door closing, which is about 0.025 second. At a sampling rate of 4 kHz, that is 100 samples. That seems to be not too much of a memory requirement. Using 16-bit samples that's 200 bytes.

Advantages / Disadvantages

The advantage of this method is that processing can be performed with simple integer arithmetic if the source audio is fed in as integers. The catch is, as mentioned already, that real-time processing will have a delay, depending on the size of the section that is integrated.

There are a couple of problems that I can think of to this approach:

  1. If the background noise is too loud, the difference in energy between the background noise and the door closing will not be easily distinguished, and it may not be able to detect the door closing.
  2. Any abrupt noise, such as a clap, could be regarded as the door is closing.

Perhaps, combining the suggestions in the other answers, such as trying to analyze the frequency signature of the door closing using Fourier analysis, which would require more processing but would make it less prone to error.

It's probably going to take some experimentation before finding a way to solve this problem.

这篇关于给定一个音频流,找出门何时砰地一声(声压级计算?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆