通过保留最大的键值对来删除重复的键值对 [英] Remove duplicate key value pairs with tolerance by keeping the ones with largest value
问题描述
我正在尝试使用以下规则从一组键和值中删除容错的重复项:
假设以下设置:
keys = [1 2 3 3.1 3.15 4 5];
vals = [0.8 1 1.1 1.3 1.2 1 1.1];
绘制如下图所示:
现在,我想删除键对非常靠近的那些对,如图中红色圆圈所示.我要保留的键值对是具有最大值的键对(在示例中为中间的[3.1; 1.3]
),因此结果集将为:
keys = [1 2 3.1 4 5];
vals = [0.8 1 1.3 1 1.1];
我尝试使用Matlab的diff
函数通过执行此操作
vals_new = keys(~(diff(keys) < 0.5));
keys_new = vals(~(diff(keys) < 0.5));
[M,I] = max(vals(diff(keys) < 0.5));
这为vals_new和keys_new提供了一个新集合,该集合仅包括最后一个重复对,但也缺少最后一个值:
keys_new = [1 2 3.15 4]
vals_new = [0.8 1 1.2 1]
最后一行返回重复对I=2
的最大值的索引,但是不幸的是,它不包括三个重复对[3.15; 1.2]
中的最后一个对,因此在这里比较正确是一个巧合.
我觉得应该有一种更聪明的方法来做到这一点,但是我无法真正解决这个问题.
这是我的解决方案:
第1步..在当前键和值中找到所有非最大点,在其前面或后面有一个较大的邻居,并建立一个名为Nind
的Set. /p>
第2步.创建另一个名为Cind
的集合,该集合包含具有近邻且需要在当前键值中考虑的每个点.
第3步.与Nind
和Cind
相交,并删除Keys
和Vals
中的相同部分.
第4步..如果两组的相交为空,请转到第5步.在其他情况下,请转到 Step1 .
第5步.至此,〜
请注意,while循环正在处理具有多个最大点的丑陋输入,例如:
我的代码:
%% Input
clc; clear;
keys = [1 2 3 3.1 3.15 4 5];
vals = [0.8 1 1.1 1.3 1.2 1 1.1];
%% Dealing
ind=-1;
while(~isempty(ind))
%find the non-max point
Max=([diff(vals) 0]<0 & [0 -diff(vals)]<0);
Nind=1:length(vals);
Nind(Max)=[];
%determine the range of points
Cind=[0 diff(keys)<0.5];
Cind(find(Cind)-1)=1;
vec=1:length(Cind);
Cind=Cind.*vec;
Cind(Cind == 0)=[];
%check through & back
ind=intersect(Cind,Nind);
keys(ind)=[];
vals(ind)=[];
end
%% Output
[keys;vals]
代码的输出是:
ans =
1.0000 2.0000 3.1000 4.0000 5.0000
0.8000 1.0000 1.3000 1.0000 1.1000
I am trying to remove duplicates with tolerance from a set of keys and values using the following rule:
Assume the following set:
keys = [1 2 3 3.1 3.15 4 5];
vals = [0.8 1 1.1 1.3 1.2 1 1.1];
Plotted this would look like this:
Now I would like to remove those pairs where the keys are very close together as indicated in the plot by the red circle. The key value pair that I would like to keep is that one with the largest value (in the example the middle one [3.1; 1.3]
), so that the resulting set would be:
keys = [1 2 3.1 4 5];
vals = [0.8 1 1.3 1 1.1];
I tried to use Matlab's diff
function to get this behavior by doing
vals_new = keys(~(diff(keys) < 0.5));
keys_new = vals(~(diff(keys) < 0.5));
[M,I] = max(vals(diff(keys) < 0.5));
This gives vals_new and keys_new as a new set that only includes the last of the duplicate pairs, but is also lacking the very last value:
keys_new = [1 2 3.15 4]
vals_new = [0.8 1 1.2 1]
The last line returns the index of the maximum value of the duplicate pairs I=2
, however does unfortunately not include the last of the three duplicate pairs [3.15; 1.2]
so it's more a coincidence that it is correct here.
I feel like there should be a much smarter way to do this, but can't really get my head around it.
Here is my solution:
Step1. Find all the non-max point in the current keys&vals, which has a larger neighbor in front of it or just behind it, and build a Set called Nind
.
Step2. Create another Set called Cind
, which contains every point that has a close neighbor and needed to be considered in the current keys&vals.
Step3. Intersect Nind
and Cind
, and delete the same part in the Keys
and Vals
.
Step4. If the intersect of two set is empty, goto Step5. In the other cases, goto Step1.
Step5. This is the end~
Note that a while loop is dealing with some ugly input which has multiple max points, something like:
My code:
%% Input
clc; clear;
keys = [1 2 3 3.1 3.15 4 5];
vals = [0.8 1 1.1 1.3 1.2 1 1.1];
%% Dealing
ind=-1;
while(~isempty(ind))
%find the non-max point
Max=([diff(vals) 0]<0 & [0 -diff(vals)]<0);
Nind=1:length(vals);
Nind(Max)=[];
%determine the range of points
Cind=[0 diff(keys)<0.5];
Cind(find(Cind)-1)=1;
vec=1:length(Cind);
Cind=Cind.*vec;
Cind(Cind == 0)=[];
%check through & back
ind=intersect(Cind,Nind);
keys(ind)=[];
vals(ind)=[];
end
%% Output
[keys;vals]
the output of the code is:
ans =
1.0000 2.0000 3.1000 4.0000 5.0000
0.8000 1.0000 1.3000 1.0000 1.1000
这篇关于通过保留最大的键值对来删除重复的键值对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!