生成指数numpy的阵列重复数据删除组点 [英] Generating numpy array of indices for a deduplicated set of points
本文介绍了生成指数numpy的阵列重复数据删除组点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个最低点的10秒数千(最多3十亿)其中一些被复制的阵列。我想删除重复点,并生成一个索引数组,它保留了重复点的原始序列。
例如:
X = [(0,0),#(X1,Y1)
(1,0),#(X2,Y2)
(1,1),#(X3,Y3)
(0,0)]#(X4,Y4)
重复数据删除的x,我们有Y:
Y =名单(套(X))= [(1,0),#(x2,y2)
(0,0),#(X1,Y1)和(X4,Y4)
(1,1)]#(X3,Y3)
然后我们将有一个结果索引数组,Z:
Z = [1,#(X1,Y1)
0,#(X2,Y2)
2,#(X3,Y3)
1]#(X4,Y4)
有没有获得z的numpy的样的方式?这里有一个强力实施
Z = []
对于以x each_point:
指数= y.index(each_point)
z.append(指数)
解决方案
χ2= np.ascontiguousarray(X)。查看(np.dtype((np.void,x.dtype。 itemsize * x.shape [1])))
y_temp,Z = np.unique(X2,return_inverse = TRUE)
Y = y_temp.view(DTYPE ='的int64')。重塑(LEN(y_temp),2)
打印(Y)
打印(Z)
收益
[0]
[1 0]
[1 1]]
和
[0 1 2 0]
唯一行I have an array of a minimum of 10s of thousands of points (up to 3 billion) some of which are duplicated. I'd like to deduplicate the points and generate an index array which retains the original sequence of the duplicated points.
For example:
x = [(0, 0), # (x1, y1)
(1, 0), # (x2, y2)
(1, 1), # (x3, y3)
(0, 0)] # (x4, y4)
Deduplicating x, we have y:
y = list(set(x)) = [(1, 0), # (x2, y2)
(0, 0), # (x1, y1) and (x4, y4)
(1, 1)] # (x3, y3)
And then we would have a resulting index array, z:
z = [1, # (x1, y1)
0, # (x2, y2)
2, # (x3, y3)
1] # (x4, y4)
Is there a numpy-like way of obtaining z? Here's a brute-force implementation:
z = []
for each_point in x:
index = y.index(each_point)
z.append(index)
解决方案
x2 = np.ascontiguousarray(x).view(np.dtype((np.void, x.dtype.itemsize * x.shape[1])))
y_temp, z = np.unique(x2, return_inverse=True)
y = y_temp.view(dtype='int64').reshape(len(y_temp), 2)
print(y)
print(z)
yields
[[0 0]
[1 0]
[1 1]]
and
[0 1 2 0]
Credit: Find unique rows in numpy.array
这篇关于生成指数numpy的阵列重复数据删除组点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文