每一行的occurence的指数 [英] indices of occurence of each row

查看:144
本文介绍了每一行的occurence的指数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个矩阵,A和B(B是连续的像1:N)

I have two matrices, A and B. (B is continuous like 1:n.)

我需要找到在A和B的每个行的所有事件,并在单元阵列存储相应的行索引C.请参见下面的例子。

I need to find all the occurrences of each individual row of B in A, and store those row indices accordingly in cell array C. See below for an example.

A = [3,4,5;1,3,5;1,4,3;4,2,1]
B = [1;2;3;4;5]

因此​​,

C = {[2,3,4];[4];[1,2,3];[1,3,4];[1,2]}

注C不需要在一个单元阵列用于我的申请。我只建议,因为C的行向量长度不相等。如果你能提出一个解决方法,这也没关系。

Note C does not need to be in a cell array for my application. I only suggest it because the row vectors of C are of unequal length. If you can suggest a work-around, this is fine too.

我一直在使用运行的B每行ismember循环试过,但这个速度太慢时,矩阵A和B是巨大的,大约一万个条目。矢量化code是AP preciated。

I've tried using a loop running ismember for each row of B, but this is too slow when the matrices A and B are huge, with around a million entries. Vectorized code is appreciated.

(给你的背景下,这样做的目的是确定在一个网格,附加到一个顶点那些面孔。注意,因为我的数据的形式为TR不,我不能使用的功能edgeattachments三角重新presentation。我只有面和顶点的名单列表。)

(To give you context, the purpose of this is to identify, in a mesh, those faces that are attached to a single vertex. Note I cannot use the function edgeattachments because my data are not of the form "TR" in triangulation representation. All I have is a list of faces and list of vertices.)

推荐答案

那么,对于这一点的最好答案,需要的是如何填补了知识。如果A是稀疏的,也就是说,如果它有一些列的值,B是相当大的,那么我想为节省内存的最佳方式,可以使用一个稀疏矩阵,而不是细胞。

Well, the best answer for this would require knowledge of how A is filled. If A is sparse, that is, if it has few columns values and B is quite large, then I think the best way for memory saving may be using a sparse matrix instead of a cell.

% No fancy stuff, just fast and furious 
bMax = numel(B);
nRows = size(A,1);

cLogical = sparse(nRows,bMax);

for curRow = 1:nRows
  curIdx = A(curRow,:);
  cLogical(curRow,curIdx) = 1;
end

答:

cLogical =

   (2,1)        1
   (3,1)        1
   (4,1)        1
   (4,2)        1
   (1,3)        1
   (2,3)        1
   (3,3)        1
   (1,4)        1
   (3,4)        1
   (4,4)        1
   (1,5)        1
   (2,5)        1

如何阅读答案。对于每一列的行表明列索引出现在A.即 1 出现在排索引 [2 3 4] 2 出现在一行 [4] 3 [1 2 3] 4 [1 3 4] 5 [1〜2]

How to read the answer. For each column the rows show the indexes that the column index appears in A. That is 1 appears in rows [2 3 4], 2 appear in row [4], 3 rows [1 2 3], 4 row [1 3 4], 5 in row [1 2].

然后你可以使用 cLogical ,而不是一个单元格作为未来的一个索引矩阵满足您的需求。

Then you can use cLogical instead of a cell as an indexing matrix in the future for your needs.

另一种方法是用一个指数应多少次出现在C预期值分配℃。

Another way would be to allocate C with the expected value for how many times an index should appear in C.

% Fancier solution using some assumed knowledge of A
bMax = numel(B);
nRows = size(A,1);
nColumns = size(A,2);

% Pre-allocating with the expected value, an attempt to reduce re-allocations.
% tic; for rep=1:10000; C = mat2cell(zeros(bMax,nColumns),ones(1,bMax),nColumns); end; toc 
% Elapsed time is 1.364558 seconds.
% tic; for rep=1:10000; C = repmat({zeros(1,nColumns)},bMax,1); end; toc
% Elapsed time is 0.606266 seconds.
% So we keep the not fancy repmat solution
C = repmat({zeros(1,nColumns)},bMax,1);
for curRow = 1:nRows
  curIdxMsk = A(curRow,:);
  for curCol = 1:nColumns
    curIdx = curIdxMsk(curCol);
    fillIdx = ~C{curIdx};
    if any(fillIdx) 
      fillIdx = find(fillIdx,1);
    else
      fillIdx = numel(fillIdx)+1;
    end
    C{curIdx}(fillIdx) = curRow;
  end
end

% Squeeze empty indexes:
for curRow = 1:bMax
  C{curRow}(~C{curRow}) = [];
end

答:

>> C{:}

ans =

     2     3     4


ans =

     4


ans =

     1     2     3


ans =

     1     3     4


ans =

     1     2

哪个解决方案将执行最好?您在code做了性能测试,因为它取决于有多大A,BMAX,您的计算机的内存大小等等。然而,我还是好奇与解决方案,其他人可以为这张X一样)。我喜欢chappjc的解决方案,虽然它有,他指出了缺点。

Which solution will performs best? You do a performance test in your code because it depends on how big is A, bMax, the memory size of your computer and so on. Yet, I'm still curious with solutions other people can do for this x). I liked chappjc's solution although it has the cons that he has pointed out.

对于给定的例子(10K次):

For the given example (10k times):

Solution 1: Elapsed time is 0.516647 seconds. 
Solution 2: Elapsed time is 4.201409 seconds (seems that solution 2 is a bad idea hahaha, but since it was created to the specific issue of A having many rows it has to be tested in those conditions).
chappjc' solution: Elapsed time is 2.405341 seconds.

这篇关于每一行的occurence的指数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆