numpy数组的高效阈值过滤器 [英] Efficient thresholding filter of an array with numpy

查看:328
本文介绍了numpy数组的高效阈值过滤器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要过滤一个数组来删除低于某个阈值的元素。我现在的代码如下所示:

$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ data
b = numpy.array(filter(lambda x:x> = threshold,a))

问题是这会创建一个临时列表,使用一个lambda函数的过滤器(slow)。

由于这是一个相当简单的操作,一个numpy函数,它以一种有效的方式,但我一直无法找到它。



我认为另一种方法来实现这可能是排序数组,找到阈值的索引并从该索引开始返回切片,但是即使这对于小输入来说会更快(并且它将不会显而易见),但是其随着输入大小的增长明确地渐近地变得不太有效。 / p>

有什么想法?感谢!


$ b

更新:我也进行了一些测量,排序+切片仍然比纯python过滤器快两倍是$ 100.000.000条目。

$ p $ 在[321]中:r = numpy.random.uniform(0,1,100000000)

在[322]中:%timeit test1(r)#filter
1个循环,最好是3:每个循环21.3 s

在[323]中:%timeit test2(r)#sort和slice
1个循环,每个循环最好3:11.1 s

在[324]中:%timeit test3(r)#布尔索引
1循环,最好的3:每循环1.26秒


解决方案

b = a [a> threshold] 应该这样做

我测试如下:

  import numpy as np,datetime 
#0和1交错的数组
lrg = np.arange(2).reshape(( 2,-1))。repeat(1000000,-1).flatten()

t0 = datetime.datetime.now()
flt = lrg [lrg == 0]
print datetime.datetime.now() - t0

t0 = datetime.datetime.now()
flt = np.array(filter(lambda x:x == 0,lrg))
print datetime.datetime.now() - t0

我得到了

  $ python test.py 
0:00:00.028000
0:00:02.461000

http://docs.scipy.org/ doc / numpy / user / basics.indexing.html#布尔或屏蔽索引数组

I need to filter an array to remove the elements that are lower than a certain threshold. My current code is like this:

threshold = 5
a = numpy.array(range(10)) # testing data
b = numpy.array(filter(lambda x: x >= threshold, a))

The problem is that this creates a temporary list, using a filter with a lambda function (slow).

As this is a quite simple operation, maybe there is a numpy function that does it in an efficient way, but I've been unable to find it.

I've thought that another way to achieve this could be sorting the array, finding the index of the threshold and returning a slice from that index onwards, but even if this would be faster for small inputs (and it won't be noticeable anyway), its definitively asymptotically less efficient as the input size grows.

Any ideas? Thanks!

Update: I took some measurements too, and the sorting+slicing was still twice as fast than the pure python filter when the input was 100.000.000 entries.

In [321]: r = numpy.random.uniform(0, 1, 100000000)

In [322]: %timeit test1(r) # filter
1 loops, best of 3: 21.3 s per loop

In [323]: %timeit test2(r) # sort and slice
1 loops, best of 3: 11.1 s per loop

In [324]: %timeit test3(r) # boolean indexing
1 loops, best of 3: 1.26 s per loop

解决方案

b = a[a>threshold] this should do

I tested as follows:

import numpy as np, datetime
# array of zeros and ones interleaved
lrg = np.arange(2).reshape((2,-1)).repeat(1000000,-1).flatten()

t0 = datetime.datetime.now()
flt = lrg[lrg==0]
print datetime.datetime.now() - t0

t0 = datetime.datetime.now()
flt = np.array(filter(lambda x:x==0, lrg))
print datetime.datetime.now() - t0

I got

$ python test.py
0:00:00.028000
0:00:02.461000

http://docs.scipy.org/doc/numpy/user/basics.indexing.html#boolean-or-mask-index-arrays

这篇关于numpy数组的高效阈值过滤器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆