使用“行"更快地替代INTERSECT-MAT​​LAB [英] Faster alternative to INTERSECT with 'rows' - MATLAB

查看:114
本文介绍了使用“行"更快地替代INTERSECT-MAT​​LAB的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个用Matlab编写的代码,该代码使用相交"来查找在两个大矩阵中相交的向量(及其索引).我发现相交"是我的代码中最慢的行(相差很大).不幸的是,到目前为止我找不到更快的选择.

作为示例,在我的电脑上运行下面的代码大约需要5秒钟:

profile on
for i = 1 : 500
    a = rand(10000,5);
    b = rand(10000,5);
    [intersectVectors, ind_a, ind_b] = intersect(a,b,'rows');
end
profile viewer

我想知道是否有更快的方法.请注意,矩阵(a)和(b)有5列.这两个矩阵的行数不必相同.

任何帮助都会很棒. 谢谢

解决方案

讨论和解决方案代码

您可以使用利用 fast matrix multiplication in MATLAB 将输入数组的这些5列转换为一列,方法是将每一列视为单个数字的有效数字".因此,最终将得到一个只有列的数组,然后可以使用intersectismember而不使用'rows',这必须大大加快代码的速度!

以下是作为易于使用的功能代码实现的承诺-

intersectrows_fast_v1.m:

function [intersectVectors, ind_a, ind_b] = intersectrows_fast_v1(a,b)

%// Calculate equivalent one-column versions of input arrays
mult = [10^ceil(log10( 1+max( [a(:);b(:)] ))).^(size(a,2)-1:-1:0)]'; %//'
acol1 = a*mult;
bcol1 = b*mult;

%// Use intersect without 'rows' option for a good speedup
[~, ind_a, ind_b] = intersect(acol1,bcol1);
intersectVectors = a(ind_a,:);

return;

intersectrows_fast_v2.m:

function [intersectVectors, ind_a, ind_b] = intersectrows_fast_v2(a,b)

%// Calculate equivalent one-column versions of input arrays
mult = [10^ceil(log10( 1+max( [a(:);b(:)] ))).^(size(a,2)-1:-1:0)]'; %//'
acol1 = a*mult;
bcol1 = b*mult;

%// Use ismember to get indices of the common elements
[match_a,idx_b] = ismember(acol1,bcol1);

%// Now, with ismember, duplicate items are not taken care of automatically as
%// are done with intersect. So, we need to find the duplicate items and
%// remove those from the outputs of ismember
[~,a_sorted_ind] = sort(acol1);
a_rm_ind =a_sorted_ind([false;diff(sort(acol1))==0]); %//indices to be removed
match_a(a_rm_ind)=0;

intersectVectors = a(match_a,:);
ind_a = find(match_a);
ind_b = idx_b(match_a);

return;

快速测试和结论

使用问题中列出的数据大小,运行时为-

-------------------------- With original approach
Elapsed time is 3.885792 seconds.
-------------------------- With Proposed approach - Version - I
Elapsed time is 0.581123 seconds.
-------------------------- With Proposed approach - Version - II
Elapsed time is 0.963409 seconds.

结果似乎表明,这两种提议的方法都具有很大的优势,与原始方法相比,其提速高达 6.7x

另外,请注意,如果您不需要使用基于行"方法的原始intersect的三个输出中的任何一个或两个,则可以进一步缩短这两个提议的方法,以获得更好的运行时性能! /p>

I have a code written in Matlab that uses 'intersect' to find the vectors (and their indices) that intersect in two large matrices. I found that 'intersect' is the slowest line (by a large difference) in my code. Unfortunately I couldn't find a faster alternative so far.

As an example running the code below takes approx 5 seconds on my pc:

profile on
for i = 1 : 500
    a = rand(10000,5);
    b = rand(10000,5);
    [intersectVectors, ind_a, ind_b] = intersect(a,b,'rows');
end
profile viewer

I was wondering if there is a faster way. Note that the matrices (a) and (b) have 5 columns. The number of rows don't necessary have to be the same for the two matrices.

Any help would be great. Thanks

解决方案

Discussion and solution codes

You can use an approach that leverages fast matrix multiplication in MATLAB to convert those 5 columns of input arrays into one column by considering each column as a significant "digit" of a single number. Thus, you would end up with an array with only column and then, you can use intersect or ismember without 'rows' and that must speedup the codes in a big way!

Here are the promised implementations as function codes for easy usage -

intersectrows_fast_v1.m:

function [intersectVectors, ind_a, ind_b] = intersectrows_fast_v1(a,b)

%// Calculate equivalent one-column versions of input arrays
mult = [10^ceil(log10( 1+max( [a(:);b(:)] ))).^(size(a,2)-1:-1:0)]'; %//'
acol1 = a*mult;
bcol1 = b*mult;

%// Use intersect without 'rows' option for a good speedup
[~, ind_a, ind_b] = intersect(acol1,bcol1);
intersectVectors = a(ind_a,:);

return;

intersectrows_fast_v2.m:

function [intersectVectors, ind_a, ind_b] = intersectrows_fast_v2(a,b)

%// Calculate equivalent one-column versions of input arrays
mult = [10^ceil(log10( 1+max( [a(:);b(:)] ))).^(size(a,2)-1:-1:0)]'; %//'
acol1 = a*mult;
bcol1 = b*mult;

%// Use ismember to get indices of the common elements
[match_a,idx_b] = ismember(acol1,bcol1);

%// Now, with ismember, duplicate items are not taken care of automatically as
%// are done with intersect. So, we need to find the duplicate items and
%// remove those from the outputs of ismember
[~,a_sorted_ind] = sort(acol1);
a_rm_ind =a_sorted_ind([false;diff(sort(acol1))==0]); %//indices to be removed
match_a(a_rm_ind)=0;

intersectVectors = a(match_a,:);
ind_a = find(match_a);
ind_b = idx_b(match_a);

return;

Quick tests and conclusions

With the datasizes listed in the question, the runtimes were -

-------------------------- With original approach
Elapsed time is 3.885792 seconds.
-------------------------- With Proposed approach - Version - I
Elapsed time is 0.581123 seconds.
-------------------------- With Proposed approach - Version - II
Elapsed time is 0.963409 seconds.

The results seem to suggest a big advantage in favour of the version - I of the two proposed approaches with a whooping speedup of around 6.7x over the original approach!!

Also, please note that if you don't need any one or two of the three outputs from the original intersect with 'rows' based approach, then both the proposed approaches could be further shortened for better runtime performances!

这篇关于使用“行"更快地替代INTERSECT-MAT​​LAB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆