从NumPy 2D数组中删除重复的列和行 [英] Removing duplicate columns and rows from a NumPy 2D array

查看:172
本文介绍了从NumPy 2D数组中删除重复的列和行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一个2D形状的数组来存储经纬度+纬度的对。有一点,我必须合并这两个2D数组,然后删除任何重复的条目。我一直在寻找类似于numpy.unique的功能,但我没有运气。任何实现我已经
的想法看起来非常未被优化。例如,我正在尝试将数组转换为元组列表,使用集合删除重复项,然后再次转换为数组:

  coordskeys = np.array(list(set([tuple(x)for x in coordskeys])))

是否有任何现有的解决方案,所以我不会重新发明?



为了说清楚,我在寻找:

 >>> a = np.array([[1,1],[2,3],[1,1],[5,4],[2,3]])
>>> unique_rows(a)
数组([[1,1],[2,3],[5,4]])

BTW,我只想使用一个元组列表,但列表太大了,它们消耗了我的4Gb RAM + 4Gb交换(numpy数组更有效率)。 >

解决方案

这是一个想法,它需要一点点工作,但可能相当快。我会给你1d的情况,让你弄清楚如何将它扩展到2d。以下函数查找一个1d数组的唯一元素:

  import numpy as np 
def unique(a) :
a = np.sort(a)
b = np.diff(a)
b = np.r_ [1,b]
返回一个[b!= 0]

现在将其扩展到2d,您需要更改两件事。您将需要弄清楚自己如何进行排序,关于排序的重要事情将是两个完全相同的条目彼此相邻。第二,你需要像(b!= 0).all(axis)这样做,因为你想比较整个行/列。让我知道这是否足以让你开始。



更新:对于doug有一些帮助,我认为这应该适用于2d。

  import numpy as np 
def unique(a):
order = np.lexsort(aT)
a = a [order]
diff = np.diff(a,axis = 0)
ui = np.ones(len(a),'bool')
ui [1:] = != 0).any(axis = 1)
return a [ui]


I'm using a 2D shape array to store pairs of longitudes+latitudes. At one point, I have to merge two of these 2D arrays, and then remove any duplicated entry. I've been searching for a function similar to numpy.unique, but I've had no luck. Any implementation I've been thinking on looks very "unoptimizied". For example, I'm trying with converting the array to a list of tuples, removing duplicates with set, and then converting to an array again:

coordskeys = np.array(list(set([tuple(x) for x in coordskeys])))

Are there any existing solutions, so I do not reinvent the wheel?

To make it clear, I'm looking for:

>>> a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
>>> unique_rows(a)
array([[1, 1], [2, 3],[5, 4]])

BTW, I wanted to use just a list of tuples for it, but the lists were so big that they consumed my 4Gb RAM + 4Gb swap (numpy arrays are more memory efficient).

解决方案

Here's one idea, it'll take a little bit of work but could be quite fast. I'll give you the 1d case and let you figure out how to extend it to 2d. The following function finds the unique elements of of a 1d array:

import numpy as np
def unique(a):
    a = np.sort(a)
    b = np.diff(a)
    b = np.r_[1, b]
    return a[b != 0]

Now to extend it to 2d you need to change two things. You will need to figure out how to do the sort yourself, the important thing about the sort will be that two identical entries end up next to each other. Second, you'll need to do something like (b != 0).all(axis) because you want to compare the whole row/column. Let me know if that's enough to get you started.

updated: With some help with doug, I think this should work for the 2d case.

import numpy as np
def unique(a):
    order = np.lexsort(a.T)
    a = a[order]
    diff = np.diff(a, axis=0)
    ui = np.ones(len(a), 'bool')
    ui[1:] = (diff != 0).any(axis=1) 
    return a[ui]

这篇关于从NumPy 2D数组中删除重复的列和行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆