删除重复项会删除太多点 [英] Removing repetitions removes too many points
问题描述
我正在尝试从一组键值对中删除重复数据.这些重复具有完全相同的键,或者键可以彼此非常接近.在那种情况下,我只想保留最大的键值对.
I am trying to remove repetitive data from an a set of key value pairs. Those repetitions have exactly the same key or the keys can be very close to each other. In those cases I only want to keep the key value pair with the largest value.
ind=-1;
while(~isempty(ind))
%find the non-max point
Max=([diff(vals) 0]<0 & [0 -diff(vals)]<0);
Nind=1:length(vals);
Nind(Max)=[];
%determine the range of points
Cind=[0 diff(keys)<0.5 & abs(diff(keys)>0.01)];
Cind(find(Cind)-1)=1;
vec=1:length(Cind);
Cind=Cind.*vec;
Cind(Cind == 0)=[];
%check through & back
ind=intersect(Cind,Nind);
keys(ind)=[];
vals(ind)=[];
end
适用于给定的一对配对
keys = [1 2 3 3.1 3.15 4 5];
vals = [0.8 1 1.1 1.3 1.2 1 1.1];
所以当输入看起来像
然后输出看起来像这样
删除3
和6
周围的重复.
但是,如果我对集合应用相同的解决方案
However if I apply the same solution to the set
keys = [414 414 999 1011 1070 1280 1280 1635 1641 1793 1799 1870 1872 1886 2213 2214 2225 2572 3778 3790 4970];
values = [1.100 1.100 0.316 0.198 0.224 0.555 0.555 0.443 0.374 0.387 0.510 0.446 0.456 0.347 0.224 0.229 0.171 0.175 0.202 0.183 0.147];
并相应地将阈值更改为
Cind=[0 diff(keys)<13 & abs(diff(keys)>0.01)];
然后输入看起来像
输出看起来像
在这种情况下,问题在于删除了太多的点.例如,在红色圆圈中,该组中的最大点被删除,并且该区域中的三个点中,尽管距离远高于设置的阈值13,但仅保留了一个点.尽管所有较大的值都被删除,但在1635处的点也被删除了.再走13点.
The problem in this case is that too many points are removed. For example in the red circle the largest point in the group is removed and of the three points in the region only one is kept although the distance is well above the set threshold of 13. Also the point at 1635 is removed although all larger values are more then 13 away.
我在这里误会什么?
所需的输出将是那些键值对非常接近的那些键值对的输出,只有其中一个值最大的键将被保留,而另一个键将被从这两个键值中移除数组.我指出了应该合并为该图中最大值的那些点:
The desired output would be that of those key value pairs where the keys are very close to each other only the one with the largest value would be kept and the other would be removed from both arrays. I indicated those points that should be merged to the largest value in this plot:
因此,所需的输出数组将是:
Edit 2: The desired output array would therefore be:
keys = [414 999 1070 1280 1635 1799 1872 1886 2213 2225 2572 3778 4970];
vals = [1.100 0.316 0.224 0.555 0.443 0.510 0.456 0.347 0.224 0.171 0.175 0.202 0.147];
推荐答案
这是一种直接,非常简单的策略,该策略仅包含一些if语句并一次删除一个点,但是仍然有效.
Here is a straightforward, pretty simple strategy, which only contains some if statements and delete one point at a time, but it works anyway.
但是,以下代码的复杂度为 O(N ^ 2),与向量化无关,当输入变得可观时,这将非常耗时.
However, the code following has the complexity of O(N^2) and has nothing to do with the vectorization, which will be very time consuming when the input became considerable.
%% Input
clc; clear;
keys = [414 414 999 1011 1070 1280 1280 1635 1641 1793 1799 1870 1872 1886 2213 2214 2225 2572 3778 3790 4970];
vals = [1.100 1.100 0.316 0.198 0.224 0.555 0.555 0.443 0.374 0.387 0.510 0.446 0.456 0.347 0.224 0.229 0.171 0.175 0.202 0.183 0.147];
%% Dealing
[len,flag]=deal(13,1);
while flag
flag=0;
for ii=2:length(keys)
if ((keys(ii)-keys(ii-1) > len))
continue;
else
if (vals(ii) > vals(ii-1))
keys(ii-1)=[];
vals(ii-1)=[];
else
keys(ii)=[];
vals(ii)=[];
end
flag=1;
break;
end
end
end
%% plot
figure(1)
plot(keys,vals)
hold on
plot(keys,vals,'ro')
for ii=1:length(vals)
text(keys(ii),vals(ii),num2str(ii))
end
代码将输出:
这篇关于删除重复项会删除太多点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!