有效地确定大型排序的numpy数组是否只有唯一值 [英] Efficiently determining if large sorted numpy array has only unique values

查看:75
本文介绍了有效地确定大型排序的numpy数组是否只有唯一值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的numpy数组,我想对其进行排序并测试其是否唯一.

I have a very large numpy array and I want to sort it and test if it is unique.

我知道函数numpy.unique,但是它再次对数组进行排序以实现该功能.

I'm aware of the function numpy.unique but it sorts the array another time to achieve it.

我需要对数组进行先验排序的原因是因为argsort函数返回的键将用于对另一个数组进行重新排序.

The reason I need the array sorted a priori is because the returned keys from the argsort function will be used to reorder another array.

我正在寻找一种方法(argsort和唯一测试),而无需再次对数组进行排序.

I'm looking for a way to do both (argsort and unique test) without the need to sort the array again.

示例代码:

import numpy as np
import numpy.random

# generating random arrays with 2 ^ 27 columns (it can grow even bigger!)
slices = np.random.random_integers(2 ** 32, size = 2 ** 27)
values = np.random.random_integers(2 ** 32, size = 2 ** 27)

# get an array of keys to sort slices AND values
# this operation takes a long time
sorted_slices = slices.argsort()

# sort both arrays
# it would be nice to make this operation in place
slices = slices[sorted_slices]
values = values[sorted_slices]

# test 'uniqueness'
# here, the np.unique function sorts the array again
if slices.shape[0] == np.unique(slices).shape[0]:
    print('it is unique!')
else:
    print('not unique!')

数组slicesvalues都具有1行和相同(大)列数.

Both the arrays slices and values have 1 row and the same (huge) number of columns.

提前谢谢.

推荐答案

通过将它们的差异与0进行比较,可以检查是否存在两个或更多个彼此相等的值(排序数组中的非唯一值)

You can check whether there are two or more equal values next to each other (non-unique values in a sorted array) by comparing their difference to 0

numpy.any(numpy.diff(slices) == 0)

请注意,尽管numpy将创建两个中间数组:一个具有差值,一个具有布尔值.

Be aware though that numpy will create two intermediate arrays: one with the difference values, one with boolean values.

这篇关于有效地确定大型排序的numpy数组是否只有唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆