在 Mathematica 中快速实现 Position2D [英] A fast implementation in Mathematica for Position2D

查看:21
本文介绍了在 Mathematica 中快速实现 Position2D的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找以下内容的快速实现,由于缺乏更好的术语,我将其称为 Position2D:

I'm looking for a fast implementation for the following, I'll call it Position2D for lack of a better term:

Position2D[ matrix, sub_matrix ]

它在 matrix 中找到 sub_matrix 的位置,并返回匹配的左上和右下行/列.

which finds the locations of sub_matrix inside matrix and returns the upper left and lower right row/column of a match.

例如:

Position2D[{
 {0, 1, 2, 3},
 {1, 2, 3, 4},
 {2, 3, 4, 5},
 {3, 4, 5, 6}
}, {
 {2, 3},
 {3, 4}
}]

应该返回这个:

{
 {{1, 3}, {2, 4}},
 {{2, 2}, {3, 3}},
 {{3, 1}, {4, 2}} 
}

它应该足够快,可以在带有 100x100 子矩阵的 3000x2000 矩阵上快速工作.为简单起见,只考虑整数矩阵就足够了.

It should be fast enough to work quickly on 3000x2000 matrices with 100x100 sub-matrices. For simplicity, it is enough to only consider integer matrices.

推荐答案

算法

以下代码基于高效的自定义位置函数,用于在大型整数列表中查找(可能重叠)整数序列的位置.主要思想是我们可以先尝试有效地找到子矩阵第一行在Flatten-ed大矩阵中的位置,然后过滤那些,提取完整的子矩阵和与感兴趣的子矩阵进行比较.除了非常病态的情况外,这对于大多数情况都是有效的(对于这些情况,此过程会产生大量潜在的候选位置,而子矩阵的真实条目数会少得多.但这种情况似乎不太可能一般情况下,可以对这个简单的方案进行进一步的改进).

Algorithm

The following code is based on an efficient custom position function to find positions of (possibly overlapping) integer sequences in a large integer list. The main idea is that we can first try to eficiently find the positions where the first row of the sub-matrix is in the Flatten-ed large matrix, and then filter those, extracting full sub-matrices and comparing to the sub-matrix of interest. This will be efficient for most cases except very pathological ones (those, for which this procedure would generate a huge number of potential position candidates, while the true number of entries of the sub-matrix would be much smaller. But such cases seem rather unlikely generally, and then, further improvements to this simple scheme can be made).

对于大型矩阵,当使用序列位置函数的编译版本时,所提出的解决方案将比@Szabolcs 的解决方案快约 15-25 倍,对于序列位置的顶层实现,速度将快 3-5 倍- 查找功能.实际加速取决于矩阵大小,对于更大的矩阵,它更多.代码和基准如下.

For large matrices, the proposed solution will be about 15-25 times faster than the solution of @Szabolcs when a compiled version of sequence positions function is used, and 3-5 times faster for the top-level implementation of sequence positions - finding function. The actual speedup depends on matrix sizes, it is more for larger matrices. The code and benchmarks are below.

这些辅助函数归功于 Norbert Pozar 并取自 this 数学组线程.它们用于在更大的列表中有效地找到整数序列的起始位置(有关详细信息,请参阅上述帖子).

These helper functions are due to Norbert Pozar and taken from this Mathgroup thread. They are used to efficiently find starting positions of an integer sequence in a larger list (see the mentioned post for details).

Clear[seqPos];
fdz[v_] := Rest@DeleteDuplicates@Prepend[v, 0];
seqPos[list_List, seq_List] :=
  Fold[
     fdz[#1 (1 - Unitize[list[[#1]] - #2])] + 1 &, 
     fdz[Range[Length[list] - Length[seq] + 1] *
       (1 - Unitize[list[[;; -Length[seq]]] - seq[[1]]])] + 1,
     Rest@seq
  ] - Length[seq];

使用示例:

In[71]:= seqPos[{1,2,3,2,3,2,3,4},{2,3,2}]
Out[71]= {2,4}

一个更快的整数定位函数

无论 seqPos 有多快,它仍然是我解决方案的主要瓶颈.这是一个编译为 C 的版本,它为我的代码提供了另外 5 倍的性能提升:

A faster position-finding function for integers

However fast seqPos might be, it is still the major bottleneck in my solution. Here is a compiled-to-C version of this, which gives another 5x performance boost to my code:

seqposC = 
  Compile[{{list, _Integer, 1}, {seq, _Integer, 1}},
    Module[{i = 1, j = 1, res = Table[0, {Length[list]}], ctr = 0},
       For[i = 1, i <= Length[list], i++,
          If[list[[i]] == seq[[1]],
             While[j < Length[seq] && i + j <= Length[list] && 
                     list[[i + j]] == seq[[j + 1]], 
                j++
             ];
             If[j == Length[seq], res[[++ctr]] = i];
             j = 1;
          ]
       ];
       Take[res, ctr]
    ], CompilationTarget -> "C", RuntimeOptions -> "Speed"]

使用示例:

In[72]:= seqposC[{1, 2, 3, 2, 3, 2, 3, 4}, {2, 3, 2}]
Out[72]= {2, 4}

下面的基准测试已经用这个函数重做(主函数的代码也略有修改)

The benchmarks below have been redone with this function (also the code for main function is slightly modified )

这是主要功能.它找到矩阵中第一行的位置,然后过滤它们,提取这些位置的子矩阵并针对感兴趣的完整子矩阵进行测试:

This is the main function. It finds positions of the first row in a matrix, and then filters them, extracting the sub-matrices at these positions and testing against the full sub-matrix of interest:

Clear[Position2D];
Position2D[m_, what_,seqposF_:Automatic] :=
  Module[{posFlat, pos2D,sp = If[seqposF === Automatic,seqposC,seqposF]},
     With[{dm = Dimensions[m], dwr = Reverse@Dimensions[what]},
         posFlat = sp[Flatten@m, First@what];
         pos2D = 
            Pick[Transpose[#], Total[Clip[Reverse@dm - # - dwr + 2, {0, 1}]],2] &@
                {Mod[posFlat, #, 1], IntegerPart[posFlat/#] + 1} &@Last[dm];
         Transpose[{#, Transpose[Transpose[#] + dwr - 1]}] &@
            Select[pos2D,
                m[[Last@# ;; Last@# + Last@dwr - 1, 
                   First@# ;; First@# + First@dwr - 1]] == what &
            ]
     ]
  ];

对于整数列表,可以使用更快编译的子序列定位函数seqposC(这是默认设置).对于通用列表,可以提供例如seqPos,作为第三个参数.

For integer lists, the faster compiled subsequence position-finding function seqposC can be used (this is a default). For generic lists, one can supply e.g. seqPos, as a third argument.

我们将使用一个简单的示例来剖析代码并解释其内部工作原理.这定义了我们的测试矩阵和子矩阵:

We will use a simple example to dissect the code and explain its inner workings. This defines our test matrix and sub-matrix:

m  = {{0, 1, 2, 3}, {1, 2, 3, 4}, {2, 3, 4, 5}};
what = {{2, 3}, {3, 4}}; 

这将计算上述的维度(对子矩阵使用反向维度更方便):

This computes the dimensions of the above (it is more convenient to work with reversed dimensions for a sub-matrix):

In[78]:= 
dm=Dimensions[m]
dwr=Reverse@Dimensions[what]

Out[78]= {3,4}
Out[79]= {2,2}

这将在 Flatten ed 主矩阵中查找第一行(此处为 {2,3})的起始位置列表.这些位置同时是子矩阵左上角的平坦"候选位置:

This finds a list of starting positions of the first row ({2,3} here) in the Flattened main matrix. These positions are at the same time "flat" candidate positions of the top left corner of the sub-matrix:

In[77]:= posFlat = seqPos[Flatten@m, First@what]
Out[77]= {3, 6, 9}

这将使用主矩阵的维度重建完整矩阵中子矩阵左上角的二维候选"位置:

This will reconstruct the 2D "candidate" positions of the top left corner of a sub-matrix in a full matrix, using the dimensions of the main matrix:

In[83]:= posInterm = Transpose@{Mod[posFlat,#,1],IntegerPart[posFlat/#]+1}&@Last[dm]
Out[83]= {{3,1},{2,2},{1,3}}

然后我们可以尝试使用 Select 来过滤掉它们,提取完整的子矩阵并与 what 进行比较,但我们会在这里遇到一个问题:

We can then try using Select to filter them out, extracting the full sub-matrix and comparing to what, but we'll run into a problem here:

In[84]:= 
Select[posInterm,
   m[[Last@#;;Last@#+Last@dwr-1,First@#;;First@#+First@dwr-1]]==what&]

During evaluation of In[84]:= Part::take: Cannot take positions 3 through 4 
in {{0,1,2,3},{1,2,3,4},{2,3,4,5}}. >>

Out[84]= {{3,1},{2,2}}

除了错误信息,结果是正确的.错误消息本身是由于对于列表中的最后一个位置 ({1,3}),子矩阵的右下角将在主矩阵之外.我们当然可以使用 Quiet 来简单地忽略错误消息,但这是一种糟糕的风格.所以,我们将首先过滤掉这些情况,这就是 Pick[Transpose[#], Total[Clip[Reverse@dm - # - dwr + 2, {0, 1}]], 2]&@ 是为了.具体来说,考虑

Apart from the error message, the result is correct. The error message itself is due to the fact that for the last position ({1,3}) in the list, the bottom right corner of the sub-matrix will be outside the main matrix. We could of course use Quiet to simply ignore the error messages, but that's a bad style. So, we will first filter those cases out, and this is what the line Pick[Transpose[#], Total[Clip[Reverse@dm - # - dwr + 2, {0, 1}]], 2] &@ is for. Specifically, consider

In[90]:= 
Reverse@dm - # - dwr + 2 &@{Mod[posFlat, #, 1],IntegerPart[posFlat/#] + 1} &@Last[dm]
Out[90]= {{1,2,3},{2,1,0}}

左上角的坐标应保持在矩阵和子矩阵的维度差异内.上面的子列表由左上角的 xy 坐标组成.我添加了 2 以使所有有效结果严格为正.我们只需要在 Transpose@{Mod[posFlat, #, 1], IntegerPart[posFlat/#] + 1} &@Last[dm](即 posInterm),其中上面的两个子列表都具有严格的正数.我使用 Total[Clip[...,{0,1}]] 将其重铸为仅在第二个列表具有 2 (2>Clip 将所有正整数转换为 1,并且 Total 将 2 个子列表中的数字相加.得到 2 的唯一方法是两个子列表中的数字都是正数).

The coordinates of the top left corners should stay within a difference of dimensions of matrix and a sub-matrix. The above sub-lists were made of x and y coordiantes of top - left corners. I added 2 to make all valid results strictly positive. We have to pick only coordiantes at those positions in Transpose@{Mod[posFlat, #, 1], IntegerPart[posFlat/#] + 1} &@Last[dm] ( which is posInterm), at which both sub-lists above have strictly positive numbers. I used Total[Clip[...,{0,1}]] to recast it into picking only at those positions at which this second list has 2 (Clip converts all positive integers to 1, and Total sums numbers in 2 sublists. The only way to get 2 is when numbers in both sublists are positive).

所以,我们有:

In[92]:= 
pos2D=Pick[Transpose[#],Total[Clip[Reverse@dm-#-dwr+2,{0,1}]],2]&@
           {Mod[posFlat,#,1],IntegerPart[posFlat/#]+1}&@Last[dm]
Out[92]= {{3,1},{2,2}} 

过滤二维位置列表后,不存在结构上无效的位置,我们可以使用 Select 提取完整的子矩阵并针对感兴趣的子矩阵进行测试:

After the list of 2D positions has been filtered, so that no structurally invalid positions are present, we can use Select to extract the full sub-matrices and test against the sub-matrix of interest:

In[93]:= 
finalPos = 
  Select[pos2D,m[[Last@#;;Last@#+Last@dwr-1,First@#;;First@#+First@dwr-1]]==what&]
Out[93]= {{3,1},{2,2}}

在这种情况下,两个职位都是真实的.最后要做的是重建子矩阵右下角的位置并将它们添加到左上角位置.这是由这一行完成的:

In this case, both positions are genuine. The final thing to do is to reconstruct the positions of the bottom - right corners of the submatrix and add them to the top-left corner positions. This is done by this line:

In[94]:= Transpose[{#,Transpose[Transpose[#]+dwr-1]}]&@finalPos 
Out[94]= {{{3,1},{4,2}},{{2,2},{3,3}}}

我可以使用 Map,但是对于大量的位置列表,上面的代码会更有效.

I could have used Map, but for a large list of positions, the above code would be more efficient.

In[216]:= Position2D[{{0,1,2,3},{1,2,3,4},{2,3,4,5},{3,4,5,6}},{{2,3},{3,4}}]
Out[216]= {{{3,1},{4,2}},{{2,2},{3,3}},{{1,3},{2,4}}}

请注意,我的索引约定与 w.r.t. 相反.@Szabolcs 的解决方案.

Note that my index conventions are reversed w.r.t. @Szabolcs' solution.

这是一个功率测试:

nmat = 1000;
(* generate a large random matrix and a sub-matrix *)
largeTestMat = RandomInteger[100, {2000, 3000}];
what = RandomInteger[10, {100, 100}];
(* generate upper left random positions where to insert the submatrix *)    
rposx = RandomInteger[{1,Last@Dimensions[largeTestMat] - Last@Dimensions[what] + 1}, nmat];
rposy = RandomInteger[{1,First@Dimensions[largeTestMat] - First@Dimensions[what] + 1},nmat];
(* insert the submatrix nmat times *)
With[{dwr = Reverse@Dimensions[what]},
    Do[largeTestMat[[Last@p ;; Last@p + Last@dwr - 1, 
                     First@p ;; First@p + First@dwr - 1]] = what, 
       {p,Transpose[{rposx, rposy}]}]]

现在,我们测试:

In[358]:= (ps1 = position2D[largeTestMat,what])//Short//Timing
Out[358]= {1.39,{{{1,2461},{100,2560}},<<151>>,{{1900,42},{1999,141}}}}

In[359]:= (ps2 = Position2D[largeTestMat,what])//Short//Timing
Out[359]= {0.062,{{{2461,1},{2560,100}},<<151>>,{{42,1900},{141,1999}}}}

(子矩阵的实际数量小于我们尝试生成的数量,因为它们中的许多重叠并破坏"了先前插入的子矩阵 - 这是因为子矩阵大小是我们基准中的矩阵大小).

(the actual number of sub-matrices is smaller than the number we try to generate, since many of them overlap and "destroy" the previously inserted ones - this is so because the sub-matrix size is a sizable fraction of the matrix size in our benchmark).

为了进行比较,我们应该反转其中一个解决方案(级别 3)中的 x-y 索引,并对两个列表进行排序,因为位置可能以不同的顺序获得:

To compare, we should reverse the x-y indices in one of the solutions (level 3), and sort both lists, since positions may have been obtained in different order:

In[360]:= Sort@ps1===Sort[Reverse[ps2,{3}]]
Out[360]= True

我不排除进一步优化的可能性.

I do not exclude a possibility that further optimizations are possible.

这篇关于在 Mathematica 中快速实现 Position2D的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆