在numpy数组中找到大量满足条件的连续值 [英] Find large number of consecutive values fulfilling condition in a numpy array

查看:51
本文介绍了在numpy数组中找到大量满足条件的连续值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一个 numpy 数组中加载了一些音频数据,我希望通过查找无声部分来分割数据,即一段时间内音频幅度低于某个阈值的部分.

I have some audio data loaded in a numpy array and I wish to segment the data by finding silent parts, i.e. parts where the audio amplitude is below a certain threshold over a a period in time.

一个非常简单的方法是这样的:

An extremely simple way to do this is something like this:

values = ''.join(("1" if (abs(x) < SILENCE_THRESHOLD) else "0" for x in samples))
pattern = re.compile('1{%d,}'%int(MIN_SILENCE))                                                                           
for match in pattern.finditer(values):
   # code goes here

上面的代码查找至少有 MIN_SILENCE 个连续元素小于 SILENCE_THRESHOLD 的部分.

The code above finds parts where there are at least MIN_SILENCE consecutive elements smaller than SILENCE_THRESHOLD.

现在,很明显,上面的代码效率极低,并且严重滥用了正则表达式.有没有其他更有效的方法,但仍然产生同样简单和简短的代码?

Now, obviously, the above code is horribly inefficient and a terrible abuse of regular expressions. Is there some other method that is more efficient, but still results in equally simple and short code?

推荐答案

这是一个基于 numpy 的解决方案.

Here's a numpy-based solution.

我认为 (?) 它应该比其他选项更快.希望它相当清楚.

I think (?) it should be faster than the other options. Hopefully it's fairly clear.

然而,它需要的内存是各种基于生成器的解决方案的两倍.只要您可以在内存中保存数据的单个临时副本(用于差异),以及与数据长度相同的布尔数组(每个元素 1 位),它应该非常有效......

However, it does require a twice as much memory as the various generator-based solutions. As long as you can hold a single temporary copy of your data in memory (for the diff), and a boolean array of the same length as your data (1-bit-per-element), it should be pretty efficient...

import numpy as np

def main():
    # Generate some random data
    x = np.cumsum(np.random.random(1000) - 0.5)
    condition = np.abs(x) < 1
    
    # Print the start and stop indices of each region where the absolute 
    # values of x are below 1, and the min and max of each of these regions
    for start, stop in contiguous_regions(condition):
        segment = x[start:stop]
        print start, stop
        print segment.min(), segment.max()

def contiguous_regions(condition):
    """Finds contiguous True regions of the boolean array "condition". Returns
    a 2D array where the first column is the start index of the region and the
    second column is the end index."""

    # Find the indicies of changes in "condition"
    d = np.diff(condition)
    idx, = d.nonzero() 

    # We need to start things after the change in "condition". Therefore, 
    # we'll shift the index by 1 to the right.
    idx += 1

    if condition[0]:
        # If the start of condition is True prepend a 0
        idx = np.r_[0, idx]

    if condition[-1]:
        # If the end of condition is True, append the length of the array
        idx = np.r_[idx, condition.size] # Edit

    # Reshape the result into two columns
    idx.shape = (-1,2)
    return idx

main()

这篇关于在numpy数组中找到大量满足条件的连续值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆