计算矩阵中一行出现的次数(numpy) [英] Counting how many times a row occurs in a matrix (numpy)

查看:232
本文介绍了计算矩阵中一行出现的次数(numpy)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

与numpy 2D数组相比,是否有更好的方法计算给定行出现的次数

Is there a better way to count how many times a given row appears in a numpy 2D array than

def get_count(array_2d, row):
    count = 0
    # iterate over rows, compare
    for r in array_2d[:,]:
        if np.equal(r, row).all():
            count += 1
    return count    

# let's make sure it works

array_2d = np.array([[1,2], [3,4]])
row = np.array([1,2])       

count = get_count(array_2d, row)
assert(count == 1)

推荐答案

一种简单的方法是使用 broadcasting -

One simple way would be with broadcasting -

(array_2d == row).all(-1).sum()


考虑到内存效率,这是一种将array_2d中的每一行视为n-dimensional网格上的索引元组并在输入中假设为正数的一种方法-


Considering memory efficiency, here's one approach considering each row from array_2d as an indexing tuple on an n-dimensional grid and assuming positive numbers in the inputs -

dims = np.maximum(array_2d.max(0),row) + 1
array_1d = np.ravel_multi_index(array_2d.T,dims)
row_scalar = np.ravel_multi_index(row,dims)
count = (array_1d==row_scalar).sum()

在这里 是一篇讨论与之相关的各个方面的文章.

Here's a post discussing the various aspects related to it.

注意:使用np.count_nonzero来计数布尔值要快得多,而不是用.sum()求和.因此,请考虑将其用于上述两个方法.

Note: Using np.count_nonzero could be much faster to count booleans instead of summation with .sum(). So, do consider using it for both the above mentioned aproaches.

这是一个快速的运行时测试-

Here's a quick runtime test -

In [74]: arr = np.random.rand(10000)>0.5

In [75]: %timeit arr.sum()
10000 loops, best of 3: 29.6 µs per loop

In [76]: %timeit np.count_nonzero(arr)
1000000 loops, best of 3: 1.21 µs per loop

这篇关于计算矩阵中一行出现的次数(numpy)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆