在numpy数组中查找满足条件的大量连续值 [英] Find large number of consecutive values fulfilling condition in a numpy array
问题描述
我将一些音频数据加载到numpy数组中,我希望通过查找无声部分(即在一段时间内音频幅度低于某个阈值的部分)来对数据进行分段.
I have some audio data loaded in a numpy array and I wish to segment the data by finding silent parts, i.e. parts where the audio amplitude is below a certain threshold over a a period in time.
一种非常简单的方法是这样的:
An extremely simple way to do this is something like this:
values = ''.join(("1" if (abs(x) < SILENCE_THRESHOLD) else "0" for x in samples))
pattern = re.compile('1{%d,}'%int(MIN_SILENCE))
for match in pattern.finditer(values):
# code goes here
上面的代码查找的零件中至少有MIN_SILENCE个连续元素小于SILENCE_THRESHOLD.
The code above finds parts where there are at least MIN_SILENCE consecutive elements smaller than SILENCE_THRESHOLD.
现在,显然,上面的代码效率低下,并且正则表达式的使用非常糟糕.还有其他更有效的方法,但仍然会产生同样简单且简短的代码吗?
Now, obviously, the above code is horribly inefficient and a terrible abuse of regular expressions. Is there some other method that is more efficient, but still results in equally simple and short code?
推荐答案
这是一个基于numpy的解决方案.
Here's a numpy-based solution.
我认为(?)应该比其他选项更快.希望它是很清楚的.
I think (?) it should be faster than the other options. Hopefully it's fairly clear.
但是,它确实需要的内存是各种基于生成器的解决方案的两倍.只要您可以将数据的单个临时副本保存在内存中(用于diff),并且具有与数据长度相同的布尔数组(每元素1位),它应该是非常有效的...
However, it does require a twice as much memory as the various generator-based solutions. As long as you can hold a single temporary copy of your data in memory (for the diff), and a boolean array of the same length as your data (1-bit-per-element), it should be pretty efficient...
import numpy as np
def main():
# Generate some random data
x = np.cumsum(np.random.random(1000) - 0.5)
condition = np.abs(x) < 1
# Print the start and stop indicies of each region where the absolute
# values of x are below 1, and the min and max of each of these regions
for start, stop in contiguous_regions(condition):
segment = x[start:stop]
print start, stop
print segment.min(), segment.max()
def contiguous_regions(condition):
"""Finds contiguous True regions of the boolean array "condition". Returns
a 2D array where the first column is the start index of the region and the
second column is the end index."""
# Find the indicies of changes in "condition"
d = np.diff(condition)
idx, = d.nonzero()
# We need to start things after the change in "condition". Therefore,
# we'll shift the index by 1 to the right.
idx += 1
if condition[0]:
# If the start of condition is True prepend a 0
idx = np.r_[0, idx]
if condition[-1]:
# If the end of condition is True, append the length of the array
idx = np.r_[idx, condition.size] # Edit
# Reshape the result into two columns
idx.shape = (-1,2)
return idx
main()
这篇关于在numpy数组中查找满足条件的大量连续值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!