MATLAB中每一行的出现索引 [英] indices of occurence of each row in MATLAB

查看:280
本文介绍了MATLAB中每一行的出现索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个矩阵,AB. (B1:n一样是连续的)

I have two matrices, A and B. (B is continuous like 1:n)

我需要找到A中每个B行的所有出现,并将这些行索引相应地存储在单元格数组C中.参见下面的示例.

I need to find all the occurrences of each individual row of B in A, and store those row indices accordingly in cell array C. See below for an example.

A = [3,4,5;1,3,5;1,4,3;4,2,1]
B = [1;2;3;4;5]

因此

C = {[2,3,4];[4];[1,2,3];[1,3,4];[1,2]}

请注意,对于我的应用程序,C不必位于单元格数组中.我之所以建议这样做,是因为C的行向量长度不相等.如果您可以提出解决方法,也可以.

Note C does not need to be in a cell array for my application. I only suggest it because the row vectors of C are of unequal length. If you can suggest a work-around, this is fine too.

我已经尝试对B的每一行使用运行ismember的循环,但是当矩阵AB很大且有大约一百万个条目时,这太慢了.赞赏矢量化代码.

I've tried using a loop running ismember for each row of B, but this is too slow when the matrices A and B are huge, with around a million entries. Vectorized code is appreciated.

(为了提供上下文,此操作的目的是在网格中标识那些附加到单个顶点的面.请注意,由于我的数据的格式不是"TR",因此无法使用edgeattachments函数.三角表示法.我所拥有的只是一个面列表和一个顶点列表.)

(To give you context, the purpose of this is to identify, in a mesh, those faces that are attached to a single vertex. Note I cannot use the function edgeattachments because my data are not of the form "TR" in triangulation representation. All I have is a list of faces and list of vertices.)

推荐答案

嗯,对此的最佳答案需要了解A的填充方式.如果A是稀疏的,也就是说,如果它的列值很少并且B很大,那么我认为节省内存的最佳方法可能是使用稀疏矩阵而不是单元格.

Well, the best answer for this would require knowledge of how A is filled. If A is sparse, that is, if it has few columns values and B is quite large, then I think the best way for memory saving may be using a sparse matrix instead of a cell.

% No fancy stuff, just fast and furious 
bMax = numel(B);
nRows = size(A,1);

cLogical = sparse(nRows,bMax);

for curRow = 1:nRows
  curIdx = A(curRow,:);
  cLogical(curRow,curIdx) = 1;
end

答案:

cLogical =

   (2,1)        1
   (3,1)        1
   (4,1)        1
   (4,2)        1
   (1,3)        1
   (2,3)        1
   (3,3)        1
   (1,4)        1
   (3,4)        1
   (4,4)        1
   (1,5)        1
   (2,5)        1

如何阅读答案.对于每一列,行均显示列索引显示在A中的索引.即1出现在[2 3 4]行中,2出现在[4]行中,3[1 2 3]4行中[1 3 4]5[1 2]行中.

How to read the answer. For each column the rows show the indexes that the column index appears in A. That is 1 appears in rows [2 3 4], 2 appear in row [4], 3 rows [1 2 3], 4 row [1 3 4], 5 in row [1 2].

然后,您可以根据需要使用cLogical代替单元格作为索引矩阵.

Then you can use cLogical instead of a cell as an indexing matrix in the future for your needs.

另一种方法是为C分配期望值,以使索引在C中出现多少次.

Another way would be to allocate C with the expected value for how many times an index should appear in C.

% Fancier solution using some assumed knowledge of A
bMax = numel(B);
nRows = size(A,1);
nColumns = size(A,2);

% Pre-allocating with the expected value, an attempt to reduce re-allocations.
% tic; for rep=1:10000; C = mat2cell(zeros(bMax,nColumns),ones(1,bMax),nColumns); end; toc 
% Elapsed time is 1.364558 seconds.
% tic; for rep=1:10000; C = repmat({zeros(1,nColumns)},bMax,1); end; toc
% Elapsed time is 0.606266 seconds.
% So we keep the not fancy repmat solution
C = repmat({zeros(1,nColumns)},bMax,1);
for curRow = 1:nRows
  curIdxMsk = A(curRow,:);
  for curCol = 1:nColumns
    curIdx = curIdxMsk(curCol);
    fillIdx = ~C{curIdx};
    if any(fillIdx) 
      fillIdx = find(fillIdx,1);
    else
      fillIdx = numel(fillIdx)+1;
    end
    C{curIdx}(fillIdx) = curRow;
  end
end

% Squeeze empty indexes:
for curRow = 1:bMax
  C{curRow}(~C{curRow}) = [];
end

答案:

>> C{:}

ans =

     2     3     4


ans =

     4


ans =

     1     2     3


ans =

     1     3     4


ans =

     1     2

哪种解决方案效果最好?您在代码中进行性能测试,因为它取决于A,bMax,计算机的内存大小等大小.但是,我仍然对其他人可以对此x所做的解决方案感到好奇.我喜欢chappjc的解决方案,尽管它具有他指出的缺点.

Which solution will performs best? You do a performance test in your code because it depends on how big is A, bMax, the memory size of your computer and so on. Yet, I'm still curious with solutions other people can do for this x). I liked chappjc's solution although it has the cons that he has pointed out.

对于给定的示例(一万次):

For the given example (10k times):

Solution 1: Elapsed time is 0.516647 seconds. 
Solution 2: Elapsed time is 4.201409 seconds (seems that solution 2 is a bad idea hahaha, but since it was created to the specific issue of A having many rows it has to be tested in those conditions).
chappjc' solution: Elapsed time is 2.405341 seconds.

这篇关于MATLAB中每一行的出现索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆