找到一个二维矩阵是否为另一二维矩阵的子集 [英] Find whether a 2d matrix is subset of another 2d matrix

查看:258
本文介绍了找到一个二维矩阵是否为另一二维矩阵的子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近我正在一个黑客马拉松的一部分,我才知道哪些试图找到在2D matrix.A图案的格子状的图案可能是U,H和T和将重新$ P $问题由3×3矩阵psented
假设如果我想present H和U

Recently i was taking part in one Hackathon and i came to know about a problem which tries to find a pattern of a grid form in a 2d matrix.A pattern could be U,H and T and will be represented by 3*3 matrix suppose if i want to present H and U

+--+--+--+            +--+--+--+
|1 |0 |1 |            |1 |0 |1 |
+--+--+--+            +--+--+--+
|1 |1 |1 |  --> H     |1 |0 |1 |    -> U
+--+--+--+            +--+--+--+
|1 |0 |1 |            |1 |1 |1 |
+--+--+--+            +--+--+--+

现在我需要搜索到这个 10 * 10包含0和1的矩阵 .Closest和唯一的解决办法我能得到它的O蛮力算法(N ^ 4)像。在MATLAB和R语言有非常微妙的方式来做到这一点,但不是在C,C ++。我尝试了很多搜索在谷歌这个解决方案和SO.But最接近我可以是这样的<一个href=\"http://stackoverflow.com/questions/1975386/fast-counting-of-2d-sub-matrices-withing-a-large-dense-2d-matrix\">SO帖子其中约实施拉宾,卡普字符串搜索算法讨论。但没有伪code或任何文章,解释this.Could谁能帮助或提供任何链接,PDF或一些逻辑来简化这个?

Now i need to search this into 10*10 matrix containing 0s and 1s.Closest and only solution i can get it brute force algorithm of O(n^4).In languages like MATLAB and R there are very subtle ways to do this but not in C,C++. I tried a lot to search this solution on Google and on SO.But closest i can get is this SO POST which discuss about implementing Rabin-Karp string-search algorithm .But there is no pseudocode or any post explaining this.Could anyone help or provide any link,pdf or some logic to simplify this?

修改

尤金嘘。评论说,如果N大矩阵(N×N个)和K的大小 - 小一(KXK),该算法buteforce应该采取O((NK)^ 2)。由于k是固定的,它减少O(N ^ 2)。是的完全正确。
但是,有什么通用的方法,如果N和K是大?

as Eugene Sh. commented that If N is the size of the large matrix(NxN) and k - the small one (kxk), the buteforce algorithm should take O((Nk)^2). Since k is fixed, it is reducing to O(N^2).Yes absolutely right. But is there is any generalised way if N and K is big?

推荐答案

好吧,这里是那么2D 拉宾,卡普的方法。

Alright, here is then the 2D Rabin-Karp approach.

对于下面的讨论,假设我们希望找到一个( M M )内( N ,<我子矩阵> N )矩阵。 (这个概念适用于长方形矩阵一样好,但我跑了指数的。)

For the following discussion, assume we want to find a (m, m) sub-matrix inside a (n, n) matrix. (The concept works for rectangular matrices just as well but I ran out of indices.)

我们的想法是,对于每个可能的子矩阵,我们计算哈希值。只有当散列我们要找出矩阵的哈希值匹配,我们将比较元素明智的。

The idea is that for each possible sub-matrix, we compute a hash. Only if that hash matches the hash of the matrix we want to find, we will compare element-wise.

为了使这个效率,我们必须避免重复计算的子矩阵的每次整个哈希值。因为今晚我睡了一点,为此我可以弄清楚如何轻松地做到这一点的唯一散列函数是1S的在各自的子矩阵的总和。我把它作为一个练习的人比我聪明想出一个更好的滚动哈希函数。

To make this efficient, we must avoid re-computing the entire hash of the sub-matrix each time. Because I got little sleep tonight, the only hash function for which I could figure out how to do this easily is the sum of 1s in the respective sub-matrix. I leave it as an exercise to someone smarter than me to figure out a better rolling hash function.

现在,如果我们刚检查了子矩阵从(Ĵ)至( + M - 1,Ĵ + M - 1),并知道它有 X 1秒内,我们可以计算在1s的数量子矩阵之一的权利 - 也就是说,由( I 的,的Ĵ的+ 1)到( I 的+的 M - 1,Ĵ + M ) - 减去从子向量中1的个数(Ĵ )至( + M - 1,Ĵ),并增加从子向量(中1的个数< I>我,Ĵ + M )至( + M - 1,<我>Ĵ + M )。

Now, if we have just checked the sub-matrix from (i, j) to (i + m – 1, j + m – 1) and know it has x 1s inside, we can compute the number of 1s in the sub-matrix one to the right – that is, from (i, j + 1) to (i + m – 1, j + m) – by subtracting the number of 1s in the sub-vector from (i, j) to (i + m – 1, j) and adding the number of 1s in the sub-vector from (i, j + m) to (i + m – 1, j + m).

如果我们打的大型矩阵的右边缘,我们由一个移窗下来,然后回到左边缘,然后再下降一个,然后再向右等等。

If we hit the right margin of the large matrix, we shift the window down by one and then back to the left margin and then again down by one and then again to the right and so forth.

请注意,这需要O( M )操作,而不是O( M 2 )每个候选。如果我们这样做每对指数的,我们得到O( M 名词 2 )工作。因此,通过经由大矩阵巧妙移位潜在的子矩阵的大小的一个窗口,我们可以通过 M 的一个因子减少的工作量。也就是说,如果我们没有得到太多的散列冲突。

Note that this requires O(m) operations, not O(m2) for each candidate. If we do this for every pair of indices, we get O(mn2) work. Thus, by cleverly shifting a window of the size of the potential sub-matrix through the large matrix, we can reduce the amount of work by a factor of m. That is, if we don't get too many hash collisions.

下面是图片:

由于我们的当前窗口中的一个向右移,我们减去1的个数在左侧的红色列矢量,并添加1的个数在绿色列向量的右侧,以获得1秒的数量在新的窗口。

As we shift the current window one to the right, we subtract the number of 1s in the red column vector on the left side and add the number of 1s in the green column vector on the right side to obtain the number of 1s in the new window.

我已经实现用伟大的 C ++模板库这个想法的快速演示。这个例子还使用了一些东西,从升压但仅用于参数解析和输出格式,因此您可以轻松地摆脱它,如果你不具备升压,但想尝试一下code。该指数伪造账目有些混乱,但我会离开它没有进一步的解释在这里。以上散文应该充分地覆盖它。

I have implemented a quick demo of this idea using the great Eigen C++ template library. The example also uses some stuff from Boost but only for argument parsing and output formatting so you can easily get rid of it if you don't have Boost but want to try out the code. The index fiddling is a bit messy but I'll leave it without further explanation here. The above prose should cover it sufficiently.

#include <cassert>
#include <cstddef>
#include <cstdlib>
#include <iostream>
#include <random>
#include <type_traits>
#include <utility>

#include <boost/format.hpp>
#include <boost/lexical_cast.hpp>

#include <Eigen/Dense>

#define PROGRAM "submatrix"
#define SEED_CSTDLIB_RAND 1

using BitMatrix = Eigen::Matrix<bool, Eigen::Dynamic, Eigen::Dynamic>;
using Index1D = BitMatrix::Index;
using Index2D = std::pair<Index1D, Index1D>;

std::ostream&
operator<<(std::ostream& out, const Index2D& idx)
{
  out << "(" << idx.first << ", " << idx.second << ")";
  return out;
}

BitMatrix
get_random_bit_matrix(const Index1D rows, const Index1D cols)
{
  auto matrix = BitMatrix {rows, cols};
  matrix.setRandom();
  return matrix;
}

Index2D
findSubMatrix(const BitMatrix& haystack,
              const BitMatrix& needle,
              Index1D *const collisions_ptr = nullptr) noexcept
{
  static_assert(std::is_signed<Index1D>::value, "unsigned index type");
  const auto end = Index2D {haystack.rows(), haystack.cols()};
  const auto hr = haystack.rows();
  const auto hc = haystack.cols();
  const auto nr = needle.rows();
  const auto nc = needle.cols();
  if (nr > hr || nr > hc)
    return end;
  const auto target = needle.count();
  auto current = haystack.block(0, 0, nr - 1, nc).count();
  auto j = Index1D {0};
  for (auto i = Index1D {0}; i <= hr - nr; ++i)
    {
      if (j == 0)  // at left margin
        current += haystack.block(i + nr - 1, 0, 1, nc).count();
      else if (j == hc - nc)  // at right margin
        current += haystack.block(i + nr - 1, hc - nc, 1, nc).count();
      else
        assert(!"this should never happen");
      while (true)
        {
          if (i % 2 == 0)  // moving right
            {
              if (j > 0)
                current += haystack.block(i, j + nc - 1, nr, 1).count();
            }
          else  // moving left
            {
              if (j < hc - nc)
                current += haystack.block(i, j, nr, 1).count();
            }
          assert(haystack.block(i, j, nr, nc).count() == current);
          if (current == target)
            {
              // TODO: There must be a better way than using cwiseEqual().
              if (haystack.block(i, j, nr, nc).cwiseEqual(needle).all())
                return Index2D {i, j};
              else if (collisions_ptr)
                *collisions_ptr += 1;
            }
          if (i % 2 == 0)  // moving right
            {
              if (j < hc - nc)
                {
                  current -= haystack.block(i, j, nr, 1).count();
                  ++j;
                }
              else break;
            }
          else  // moving left
            {
              if (j > 0)
                {
                  current -= haystack.block(i, j + nc - 1, nr, 1).count();
                  --j;
                }
              else break;
            }
        }
      if (i % 2 == 0)  // at right margin
        current -= haystack.block(i, hc - nc, 1, nc).count();
      else  // at left margin
        current -= haystack.block(i, 0, 1, nc).count();
    }
  return end;
}

int
main(int argc, char * * argv)
{
  if (SEED_CSTDLIB_RAND)
    {
      std::random_device rnddev {};
      srand(rnddev());
    }
  if (argc != 5)
    {
      std::cerr << "usage: " << PROGRAM
                << " ROWS_HAYSTACK COLUMNS_HAYSTACK"
                << " ROWS_NEEDLE COLUMNS_NEEDLE"
                << std::endl;
      return EXIT_FAILURE;
    }
  auto hr = boost::lexical_cast<Index1D>(argv[1]);
  auto hc = boost::lexical_cast<Index1D>(argv[2]);
  auto nr = boost::lexical_cast<Index1D>(argv[3]);
  auto nc = boost::lexical_cast<Index1D>(argv[4]);
  const auto haystack = get_random_bit_matrix(hr, hc);
  const auto needle = get_random_bit_matrix(nr, nc);
  auto collisions = Index1D {};
  const auto idx = findSubMatrix(haystack, needle, &collisions);
  const auto end = Index2D {haystack.rows(), haystack.cols()};
  std::cout << "This is the haystack:\n\n" << haystack << "\n\n";
  std::cout << "This is the needle:\n\n" << needle << "\n\n";
  if (idx != end)
    std::cout << "Found as sub-matrix at " << idx << ".\n";
  else
    std::cout << "Not found as sub-matrix.\n";
  std::cout << boost::format("There were %d (%.2f %%) hash collisions.\n")
    % collisions
    % (100.0 * collisions / ((hr - nr) * (hc - nc)));
  return (idx != end) ? EXIT_SUCCESS : EXIT_FAILURE;
}

虽然编译和运行,请考虑以上为伪code。我在优化它让几乎没有尝试。这只是一个验证的概念,为我自己。

While it compiles and runs, please consider the above as pseudo-code. I have made almost no attempt at optimizing it. It was just a proof-of concept for myself.

这篇关于找到一个二维矩阵是否为另一二维矩阵的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆