有效计数唯一元素的数量-NumPy/Python [英] Efficiently counting number of unique elements - NumPy / Python

查看:353
本文介绍了有效计数唯一元素的数量-NumPy/Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

运行np.unique()时,它首先对数组进行展平,对数组进行排序,然后找到唯一值.当我有形状为(10,3000,3000)的数组时,大约需要一秒钟的时间来找到唯一性,但是由于我需要多次调用np.unique(),所以这很快就加起来了.由于我只关心数组中唯一数字的总数,因此排序似乎是在浪费时间.

When running np.unique(), it first flattens the array, sorts the array, then finds the unique values. When I have arrays with shape (10, 3000, 3000), it takes about a second to find the uniques, but this quickly adds up as I need to call np.unique() multiple times. Since I only care about the total number of unique numbers in an array, sorting seems like a waste of time.

除了np.unique()之外,是否有更快的方法来查找大型数组中唯一值的总数?

Is there a faster method of find the total number of unique values in a large array other than np.unique()?

推荐答案

这是一种适用于dtype np.uint8快于np.unique的数组的方法.

Here's a method that works for an array with dtype np.uint8 that is faster than np.unique.

首先,创建一个要使用的数组:

First, create an array to work with:

In [128]: a = np.random.randint(1, 128, size=(10, 3000, 3000)).astype(np.uint8)

为了以后进行比较,请使用np.unique查找唯一值:

For later comparison, find the unique values using np.unique:

In [129]: u = np.unique(a)

这是更快的方法; v将包含结果:

Here's the faster method; v will contain the result:

In [130]: q = np.zeros(256, dtype=int)

In [131]: q[a.ravel()] = 1

In [132]: v = np.nonzero(q)[0]

验证我们得到了相同的结果:

Verify that we got the same result:

In [133]: np.array_equal(u, v)
Out[133]: True

时间:

In [134]: %timeit u = np.unique(a)
2.86 s ± 9.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [135]: %timeit q = np.zeros(256, dtype=int); q[a.ravel()] = 1; v = np.nonzero(q)
300 ms ± 5.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

np.unique()因此为2.86秒,替代方法为0.3秒.

So 2.86 seconds for np.unique(), and 0.3 seconds for the alternative method.

这篇关于有效计数唯一元素的数量-NumPy/Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆