在python中给定阈值的情况下,有效地删除彼此接近的数组 [英] Efficiently delete arrays that are close from each other given a threshold in python

查看:189
本文介绍了在python中给定阈值的情况下,有效地删除彼此接近的数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用python进行这项工作,在这里非常客观,我想找到一种"pythonic"方式从数组数组中删除彼此接近阈值的重复项".例如,给这个数组:

I am using python for this job and being very objective here, I want to find a 'pythonic' way to remove from an array of arrays the "duplicates" that are close each other from a threshold. For example, give this array:

[[ 5.024,  1.559,  0.281], [ 6.198,  4.827,  1.653], [ 6.199,  4.828,  1.653]]

观察到[ 6.198, 4.827, 1.653][ 6.199, 4.828, 1.653]确实彼此接近,它们的欧几里得距离为0.0014,所以它们几乎是重复项",我希望最终输出为:

observe that [ 6.198, 4.827, 1.653] and [ 6.199, 4.828, 1.653] are really close to each other, their Euclidian distance is 0.0014, so they are almost "duplicates", I want my final output to be just:

[[ 5.024,  1.559,  0.281], [ 6.198,  4.827,  1.653]]

我现在拥有的算法是:

to_delete = [];
for i in unique_cluster_centers:
    for ii in unique_cluster_centers:
        if i == ii:
            pass;
        elif np.linalg.norm(np.array(i) - np.array(ii)) <= self.tolerance:
            to_delete.append(ii);
            break;

for i in to_delete:
    try:
        uniques.remove(i);
    except:
        pass;

但是它真的很慢,我想知道一些更快和"pythonic"的方式来解决这个问题.我的容忍度是0.0001.

but its really slow, I would like to know some faster and 'pythonic' way to solve this. My tolerance is 0.0001.

推荐答案

一种通用方法可能是:

def filter_quadratic(data,condition):
    result = []
    for element in data:
        if all(condition(element,other) for other in result):
            result.append(element)
    return result

这是具有条件的通用高阶filter.仅当列表中已存在的所有元素的条件满足时,才会添加该元素.

This is a generic higher order filter that has a condition. Only if the condition is satisfied for all elements that are already in the list*, that element is added.

现在,我们仍然需要定义条件:

def the_condition(xs,ys):
    # working with squares, 2.5e-05 is 0.005*0.005 
    return sum((x-y)*(x-y) for x,y in zip(xs,ys)) > 2.5e-05

这给出了:

>>> filter_quadratic([[ 5.024,  1.559,  0.281], [ 6.198,  4.827,  1.653], [ 6.199,  4.828,  1.653]],the_condition)
[[5.024, 1.559, 0.281], [6.198, 4.827, 1.653]]

该算法在 O(n 2 )中运行,其中 n 是您赋予该函数的元素数.但是,您可以使用 k -d树使其效率更高,但这需要一些更高级的数据结构.

The algorithm runs in O(n2) where n is the number of elements you give to the function. You can however make it a bit more efficient with k-d trees, but this requires some more advanced data structures.

这篇关于在python中给定阈值的情况下,有效地删除彼此接近的数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆