计算Python中多维数组中数组的出现 [英] count occurrences of arrays in multidimensional arrays in python
问题描述
我有以下类型的数组:
a = array([[1,1,1],
[1,1,1],
[1,1,1],
[2,2,2],
[2,2,2],
[2,2,2],
[3,3,0],
[3,3,0],
[3,3,0]])
我想计算每种类型的数组(例如
I would like to count the number of occurrences of each type of array such as
[1,1,1]:3, [2,2,2]:3, and [3,3,0]: 3
如何在python中实现呢?是否可以不使用for循环并计入字典?它必须快,并且应少于0.1秒左右.我查看了Counter,numpy bincount等.但是,这些是针对单个元素而不是数组.
How could I achieve this in python? Is it possible without using a for loop and counting into a dictionary? It has to be fast and should take less than 0.1 seconds or so. I looked into Counter, numpy bincount, etc. But, those are for individual element not for an array.
谢谢.
推荐答案
您可以使用 np.unique
给我们提供职位每个唯一行的开头,也有一个可选参数return_counts
来给我们计数.因此,实现看起来像这样-
You could convert those rows to a 1D array using the elements as two-dimensional indices with np.ravel_multi_index
. Then, use np.unique
to give us the positions of the start of each unique row and also has an optional argument return_counts
to give us the counts. Thus, the implementation would look something like this -
def unique_rows_counts(a):
# Calculate linear indices using rows from a
lidx = np.ravel_multi_index(a.T,a.max(0)+1 )
# Get the unique indices and their counts
_, unq_idx, counts = np.unique(lidx, return_index = True, return_counts=True)
# return the unique groups from a and their respective counts
return a[unq_idx], counts
样品运行-
In [64]: a
Out[64]:
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[2, 2, 2],
[2, 2, 2],
[2, 2, 2],
[3, 3, 0],
[3, 3, 0],
[3, 3, 0]])
In [65]: unqrows, counts = unique_rows_counts(a)
In [66]: unqrows
Out[66]:
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 0]])
In [67]: counts
Out[67]: array([3, 3, 3])
基准化
假设您可以使用numpy数组或集合作为输出,那么可以对到目前为止提供的解决方案进行基准测试,就像这样-
Benchmarking
Assuming you are okay with either numpy arrays or collections as outputs, one can benchmark the solutions provided thus far, like so -
函数定义:
import numpy as np
from collections import Counter
def unique_rows_counts(a):
lidx = np.ravel_multi_index(a.T,a.max(0)+1 )
_, unq_idx, counts = np.unique(lidx, return_index = True, return_counts=True)
return a[unq_idx], counts
def map_Counter(a):
return Counter(map(tuple, a))
def forloop_Counter(a):
c = Counter()
for x in a:
c[tuple(x)] += 1
return c
时间:
In [53]: a = np.random.randint(0,4,(10000,5))
In [54]: %timeit map_Counter(a)
10 loops, best of 3: 31.7 ms per loop
In [55]: %timeit forloop_Counter(a)
10 loops, best of 3: 45.4 ms per loop
In [56]: %timeit unique_rows_counts(a)
1000 loops, best of 3: 1.72 ms per loop
这篇关于计算Python中多维数组中数组的出现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!