使用“行"更快地替代INTERSECT-MATLAB [英] Faster alternative to INTERSECT with 'rows' - MATLAB
问题描述
我有一个用Matlab编写的代码,该代码使用相交"来查找在两个大矩阵中相交的向量(及其索引).我发现相交"是我的代码中最慢的行(相差很大).不幸的是,到目前为止我找不到更快的选择.
作为示例,在我的电脑上运行下面的代码大约需要5秒钟:
profile on
for i = 1 : 500
a = rand(10000,5);
b = rand(10000,5);
[intersectVectors, ind_a, ind_b] = intersect(a,b,'rows');
end
profile viewer
我想知道是否有更快的方法.请注意,矩阵(a)和(b)有5列.这两个矩阵的行数不必相同.
任何帮助都会很棒. 谢谢
讨论和解决方案代码
您可以使用利用 fast matrix multiplication in MATLAB
将输入数组的这些5
列转换为一列,方法是将每一列视为单个数字的有效数字".因此,最终将得到一个只有列的数组,然后可以使用intersect
或ismember
而不使用'rows'
,这必须大大加快代码的速度!
以下是作为易于使用的功能代码实现的承诺-
intersectrows_fast_v1.m:
function [intersectVectors, ind_a, ind_b] = intersectrows_fast_v1(a,b)
%// Calculate equivalent one-column versions of input arrays
mult = [10^ceil(log10( 1+max( [a(:);b(:)] ))).^(size(a,2)-1:-1:0)]'; %//'
acol1 = a*mult;
bcol1 = b*mult;
%// Use intersect without 'rows' option for a good speedup
[~, ind_a, ind_b] = intersect(acol1,bcol1);
intersectVectors = a(ind_a,:);
return;
intersectrows_fast_v2.m:
function [intersectVectors, ind_a, ind_b] = intersectrows_fast_v2(a,b)
%// Calculate equivalent one-column versions of input arrays
mult = [10^ceil(log10( 1+max( [a(:);b(:)] ))).^(size(a,2)-1:-1:0)]'; %//'
acol1 = a*mult;
bcol1 = b*mult;
%// Use ismember to get indices of the common elements
[match_a,idx_b] = ismember(acol1,bcol1);
%// Now, with ismember, duplicate items are not taken care of automatically as
%// are done with intersect. So, we need to find the duplicate items and
%// remove those from the outputs of ismember
[~,a_sorted_ind] = sort(acol1);
a_rm_ind =a_sorted_ind([false;diff(sort(acol1))==0]); %//indices to be removed
match_a(a_rm_ind)=0;
intersectVectors = a(match_a,:);
ind_a = find(match_a);
ind_b = idx_b(match_a);
return;
快速测试和结论
使用问题中列出的数据大小,运行时为-
-------------------------- With original approach
Elapsed time is 3.885792 seconds.
-------------------------- With Proposed approach - Version - I
Elapsed time is 0.581123 seconds.
-------------------------- With Proposed approach - Version - II
Elapsed time is 0.963409 seconds.
结果似乎表明,这两种提议的方法都具有很大的优势,与原始方法相比,其提速高达 6.7x
!
另外,请注意,如果您不需要使用基于行"方法的原始intersect
的三个输出中的任何一个或两个,则可以进一步缩短这两个提议的方法,以获得更好的运行时性能! /p>
I have a code written in Matlab that uses 'intersect' to find the vectors (and their indices) that intersect in two large matrices. I found that 'intersect' is the slowest line (by a large difference) in my code. Unfortunately I couldn't find a faster alternative so far.
As an example running the code below takes approx 5 seconds on my pc:
profile on
for i = 1 : 500
a = rand(10000,5);
b = rand(10000,5);
[intersectVectors, ind_a, ind_b] = intersect(a,b,'rows');
end
profile viewer
I was wondering if there is a faster way. Note that the matrices (a) and (b) have 5 columns. The number of rows don't necessary have to be the same for the two matrices.
Any help would be great. Thanks
Discussion and solution codes
You can use an approach that leverages fast matrix multiplication in MATLAB
to convert those 5
columns of input arrays into one column by considering each column as a significant "digit" of a single number. Thus, you would end up with an array with only column and then, you can use intersect
or ismember
without 'rows'
and that must speedup the codes in a big way!
Here are the promised implementations as function codes for easy usage -
intersectrows_fast_v1.m:
function [intersectVectors, ind_a, ind_b] = intersectrows_fast_v1(a,b)
%// Calculate equivalent one-column versions of input arrays
mult = [10^ceil(log10( 1+max( [a(:);b(:)] ))).^(size(a,2)-1:-1:0)]'; %//'
acol1 = a*mult;
bcol1 = b*mult;
%// Use intersect without 'rows' option for a good speedup
[~, ind_a, ind_b] = intersect(acol1,bcol1);
intersectVectors = a(ind_a,:);
return;
intersectrows_fast_v2.m:
function [intersectVectors, ind_a, ind_b] = intersectrows_fast_v2(a,b)
%// Calculate equivalent one-column versions of input arrays
mult = [10^ceil(log10( 1+max( [a(:);b(:)] ))).^(size(a,2)-1:-1:0)]'; %//'
acol1 = a*mult;
bcol1 = b*mult;
%// Use ismember to get indices of the common elements
[match_a,idx_b] = ismember(acol1,bcol1);
%// Now, with ismember, duplicate items are not taken care of automatically as
%// are done with intersect. So, we need to find the duplicate items and
%// remove those from the outputs of ismember
[~,a_sorted_ind] = sort(acol1);
a_rm_ind =a_sorted_ind([false;diff(sort(acol1))==0]); %//indices to be removed
match_a(a_rm_ind)=0;
intersectVectors = a(match_a,:);
ind_a = find(match_a);
ind_b = idx_b(match_a);
return;
Quick tests and conclusions
With the datasizes listed in the question, the runtimes were -
-------------------------- With original approach
Elapsed time is 3.885792 seconds.
-------------------------- With Proposed approach - Version - I
Elapsed time is 0.581123 seconds.
-------------------------- With Proposed approach - Version - II
Elapsed time is 0.963409 seconds.
The results seem to suggest a big advantage in favour of the version - I
of the two proposed approaches with a whooping speedup of around 6.7x
over the original approach!!
Also, please note that if you don't need any one or two of the three outputs from the original intersect
with 'rows' based approach, then both the proposed approaches could be further shortened for better runtime performances!
这篇关于使用“行"更快地替代INTERSECT-MATLAB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!