从NumPy 2D数组中删除重复的列和行 [英] Removing duplicate columns and rows from a NumPy 2D array
问题描述
的想法看起来非常未被优化。例如,我正在尝试将数组转换为元组列表,使用集合删除重复项,然后再次转换为数组:
coordskeys = np.array(list(set([tuple(x)for x in coordskeys])))
是否有任何现有的解决方案,所以我不会重新发明?
为了说清楚,我在寻找:
>>> a = np.array([[1,1],[2,3],[1,1],[5,4],[2,3]])
>>> unique_rows(a)
数组([[1,1],[2,3],[5,4]])
BTW,我只想使用一个元组列表,但列表太大了,它们消耗了我的4Gb RAM + 4Gb交换(numpy数组更有效率)。 >
这是一个想法,它需要一点点工作,但可能相当快。我会给你1d的情况,让你弄清楚如何将它扩展到2d。以下函数查找一个1d数组的唯一元素:
import numpy as np
def unique(a) :
a = np.sort(a)
b = np.diff(a)
b = np.r_ [1,b]
返回一个[b!= 0]
现在将其扩展到2d,您需要更改两件事。您将需要弄清楚自己如何进行排序,关于排序的重要事情将是两个完全相同的条目彼此相邻。第二,你需要像(b!= 0).all(axis)
这样做,因为你想比较整个行/列。让我知道这是否足以让你开始。
更新:对于doug有一些帮助,我认为这应该适用于2d。
import numpy as np
def unique(a):
order = np.lexsort(aT)
a = a [order]
diff = np.diff(a,axis = 0)
ui = np.ones(len(a),'bool')
ui [1:] = != 0).any(axis = 1)
return a [ui]
I'm using a 2D shape array to store pairs of longitudes+latitudes. At one point, I have to merge two of these 2D arrays, and then remove any duplicated entry. I've been searching for a function similar to numpy.unique, but I've had no luck. Any implementation I've been thinking on looks very "unoptimizied". For example, I'm trying with converting the array to a list of tuples, removing duplicates with set, and then converting to an array again:
coordskeys = np.array(list(set([tuple(x) for x in coordskeys])))
Are there any existing solutions, so I do not reinvent the wheel?
To make it clear, I'm looking for:
>>> a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
>>> unique_rows(a)
array([[1, 1], [2, 3],[5, 4]])
BTW, I wanted to use just a list of tuples for it, but the lists were so big that they consumed my 4Gb RAM + 4Gb swap (numpy arrays are more memory efficient).
Here's one idea, it'll take a little bit of work but could be quite fast. I'll give you the 1d case and let you figure out how to extend it to 2d. The following function finds the unique elements of of a 1d array:
import numpy as np
def unique(a):
a = np.sort(a)
b = np.diff(a)
b = np.r_[1, b]
return a[b != 0]
Now to extend it to 2d you need to change two things. You will need to figure out how to do the sort yourself, the important thing about the sort will be that two identical entries end up next to each other. Second, you'll need to do something like (b != 0).all(axis)
because you want to compare the whole row/column. Let me know if that's enough to get you started.
updated: With some help with doug, I think this should work for the 2d case.
import numpy as np
def unique(a):
order = np.lexsort(a.T)
a = a[order]
diff = np.diff(a, axis=0)
ui = np.ones(len(a), 'bool')
ui[1:] = (diff != 0).any(axis=1)
return a[ui]
这篇关于从NumPy 2D数组中删除重复的列和行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!