在numpy 2d数组中计算大于的最佳方法 [英] Best way to count Greater Than in numpy 2d array

查看:107
本文介绍了在numpy 2d数组中计算大于的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

结果是大小为300000的2d numpy数组

results is 2d numpy array with size 300000

for i in range(np.size(results,0)):  
     if results[i][0]>=0.7:  
        count+=1

在这个python代码中花了我0.7秒,但是我在C ++代码中运行了它,花了不到0.07秒.
那么如何使这种python代码尽可能快呢?

it takes me 0.7 second in this python code,but I run this in C++ code,it takes less than 0.07 second.
So how to make this python code as fast as possible?

推荐答案

在进行数值计算以提高速度时,尤其是在Python中,如果可能的话,您永远不想使用for循环.Numpy针对矢量化"计算进行了优化,因此您希望将通常在for循环中所做的工作传递给特殊的numpy索引和诸如 where 之类的函数.

When doing numerical computation for speed, especially in Python, you never want to use for loops if possible. Numpy is optimized for "vectorized" computation, so you want to pass off the work you'd typically do in for loops to special numpy indexing and functions like where.

我对300,000 x 600的从0到1的随机值数组进行了快速测试,发现了以下内容.

I did a quick test on a 300,000 x 600 array of random values from 0 to 1 and found the following.

您的代码,没有一个for循环的向量:
每次运行226毫秒

Your code, non-vectorized with one for loop:
226 ms per run

%%timeit
count = 0
for i in range(np.size(n,0)):  
     if results[i][0]>=0.7:  
        count+=1

emilaz解决方案:
每次运行8.36毫秒

emilaz Solution:
8.36 ms per run

%%timeit
first_col = results[:,0]
x = len(first_col[first_col>.7])

Ethan的解决方案:
每次运行7.84毫秒

Ethan's Solution:
7.84 ms per run

%%timeit
np.bincount(results[:,0]>=.7)[1]

最好我想出了
每次运行6.92毫秒

%%timeit
len(np.where(results[:,0] > 0.7)[0])

所有4种方法均得出相同的答案,对我的数据为90,134.希望这会有所帮助!

All 4 methods yielded the same answer, which for my data was 90,134. Hope this helps!

这篇关于在numpy 2d数组中计算大于的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆