查找两个数组之间的(多集)差异 [英] Finding (multiset) difference between two arrays

查看:127
本文介绍了查找两个数组之间的(多集)差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定数组(例如行向量)A和B,我如何找到数组C,使B和C合并将得到A?

Given arrays (say row vectors) A and B, how do I find an array C such that merging B and C will give A?

例如,给定

A = [2, 4, 6, 4, 3, 3, 1, 5, 5, 5];
B = [2, 3, 5, 5];

然后

C = multiset_diff(A, B) % Should be [4, 6, 4, 3, 1, 5]

(结果的顺序在这里无关紧要).

(the order of the result does not matter here).

对于相同的A,如果为B = [2, 4, 5],则结果应为[6, 4, 3, 3, 1, 5, 5].

For the same A, if B = [2, 4, 5], then the result should be [6, 4, 3, 3, 1, 5, 5].

(由于A中有两个4,而B中有一个4,结果C中应该有2-1 = 1 4.对于其他值类似.)

(Since there were two 4s in A and one 4 in B, the result C should have 2 - 1 = 1 4 in it. Similarly for the other values.)

PS:请注意,setdiff将删除2、3和5的所有实例,而在这里,无论它们在B中出现多少次,都需要将其删除.

PS: Note that setdiff would remove all instances of 2, 3, and 5, whereas here they need to be removed just however many times they appear in B.

性能:我在本地运行了一些快速n基准测试,以下是供以后参考的结果:

Performance: I ran some quick-n-dirty benchmarks locally, here are the results for future reference:

  • @heigele的嵌套循环方法对于小长度的A(例如,最多N = 50个元素)表现最佳.与下一个最佳方法相比,对于较小的(N = 20)A s,其性能要好3倍,对于中等尺寸(N = 50)A s,其性能要好1.5倍,即:

  • @heigele's nested loop method performs best for small lengths of A (say upto N = 50 or so elements). It does 3x better for small (N=20) As, and 1.5x better for medium-sized (N=50) As, compared to the next best method - which is:

@obchardon的基于histc的方法.当A的大小N开始大于100时,这是表现最好的一种.例如,当N = 200时,这比上述嵌套循环方法好3倍.

@obchardon's histc-based method. This is the one performs the best when A's size N starts to be 100 and above. For eg., this does 3x better than the above nested loop method when N = 200.

@matt的for + find方法确实与小N的histc方法相当,但是对于大N的性能却迅速下降(这是有道理的,因为整个C == B(x)比较都是在每次迭代时运行).

@matt's for+find method does comparably to the histc method for small N, but quickly degrades in performance for larger N (which makes sense since the entire C == B(x) comparison is run every iteration).

(其他方法在编写时可能会慢几倍或无效.)

(The other methods are either several times slower or invalid at the time of writing.)

推荐答案

还是使用histc函数的另一种方法:

Still another approach using the histc function:

A = [2, 4, 6, 4, 3, 3, 1, 5, 5, 5];
B = [2, 3, 5, 5];

uA  = unique(A);
hca = histc(A,uA); 
hcb = histc(B,uA);
res = repelem(uA,hca-hcb)

我们只需根据向量A的唯一值计算每个向量的重复元素数,然后使用repelem来创建结果.

We simply calculate the number of repeated elements for each vectors according to the unique value of vector A, then we use repelem to create the result.

此解决方案不保留初始顺序,但对您来说似乎不是问题.

This solution do not preserve the initial order but it don't seems to be a problem for you.

我使用histc获得八度兼容性,但是不建议使用此功能,因此您也可以使用histcounts

I use histc for Octave compatibility, but this function is deprecated so you can also use histcounts

这篇关于查找两个数组之间的(多集)差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆