从 NumPy 二维数组中删除重复的列和行 [英] Removing duplicate columns and rows from a NumPy 2D array

查看:24
本文介绍了从 NumPy 二维数组中删除重复的列和行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用二维形状数组来存储经度+纬度对.有一次,我必须合并这些二维数组中的两个,然后删除任何重复的条目.我一直在寻找类似于 numpy.unique 的函数,但我没有运气.我做过的任何实现思考看起来很未优化".例如,我正在尝试将数组转换为元组列表,使用 set 删除重复项,然后再次转换为数组:

coordskeys = np.array(list(set([tuple(x) for x in coordskeys])))

是否有任何现有的解决方案,所以我不会重新发明轮子?

为了清楚起见,我正在寻找:

<预><代码>>>>a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])>>>unique_rows(a)数组([[1, 1], [2, 3],[5, 4]])

顺便说一句,我只想为它使用一个元组列表,但列表太大了,以至于它们消耗了我的 4Gb RAM + 4Gb 交换(numpy 数组的内存效率更高).

解决方案

这是一个想法,它需要一点工作,但可能会很快.我会给你 1d 的情况,让你弄清楚如何将它扩展到 2d.以下函数查找一维数组的唯一元素:

将 numpy 导入为 np定义唯一(a):a = np.sort(a)b = np.diff(a)b = np.r_[1, b]返回 a[b != 0]

现在要将其扩展到 2d,您需要更改两件事.您需要自己弄清楚如何进行排序,排序的重要之处在于两个相同的条目最终会彼此相邻.其次,您需要执行类似 (b != 0).all(axis) 的操作,因为您想比较整行/整列.如果这足以让您入门,请告诉我.

更新:在 doug 的帮助下,我认为这应该适用于 2d 情况.

将 numpy 导入为 np定义唯一(a):订单 = np.lexsort(a.T)a = a[顺序]diff = np.diff(a,axis=0)ui = np.ones(len(a), 'bool')ui[1:] = (diff != 0).any(axis=1)返回一个[ui]

I'm using a 2D shape array to store pairs of longitudes+latitudes. At one point, I have to merge two of these 2D arrays, and then remove any duplicated entry. I've been searching for a function similar to numpy.unique, but I've had no luck. Any implementation I've been thinking on looks very "unoptimizied". For example, I'm trying with converting the array to a list of tuples, removing duplicates with set, and then converting to an array again:

coordskeys = np.array(list(set([tuple(x) for x in coordskeys])))

Are there any existing solutions, so I do not reinvent the wheel?

To make it clear, I'm looking for:

>>> a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
>>> unique_rows(a)
array([[1, 1], [2, 3],[5, 4]])

BTW, I wanted to use just a list of tuples for it, but the lists were so big that they consumed my 4Gb RAM + 4Gb swap (numpy arrays are more memory efficient).

解决方案

Here's one idea, it'll take a little bit of work but could be quite fast. I'll give you the 1d case and let you figure out how to extend it to 2d. The following function finds the unique elements of of a 1d array:

import numpy as np
def unique(a):
    a = np.sort(a)
    b = np.diff(a)
    b = np.r_[1, b]
    return a[b != 0]

Now to extend it to 2d you need to change two things. You will need to figure out how to do the sort yourself, the important thing about the sort will be that two identical entries end up next to each other. Second, you'll need to do something like (b != 0).all(axis) because you want to compare the whole row/column. Let me know if that's enough to get you started.

updated: With some help with doug, I think this should work for the 2d case.

import numpy as np
def unique(a):
    order = np.lexsort(a.T)
    a = a[order]
    diff = np.diff(a, axis=0)
    ui = np.ones(len(a), 'bool')
    ui[1:] = (diff != 0).any(axis=1) 
    return a[ui]

这篇关于从 NumPy 二维数组中删除重复的列和行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆