在矩阵内查找匹配的子矩阵 [英] Finding matching submatrices inside a matrix

查看:215
本文介绍了在矩阵内查找匹配的子矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个100x200的2D数组,表示为由黑色(0)和白色(255)单元组成的numpy数组.这是一个位图文件.然后,我得到了2D形状(最容易将它们视为字母),它们也是2D黑白单元格.

我知道我可以天真地遍历矩阵,但这将是代码的热门"部分,因此速度是一个问题.有没有一种快速的方法来以numpy/scipy的方式执行此操作?

我简短地看了看Scipy的相关函数.我对模糊匹配"不感兴趣,仅对完全匹配感兴趣.我还看了一些学术论文,但它们超出了我的头脑.

解决方案

可以使用关联.您需要将黑色值设置为-1,将白色值设置为1(反之亦然),以便知道相关峰的值,并且仅在正确的字母下出现.

以下代码可以满足我的需求.

import numpy
from scipy import signal

# Set up the inputs
a = numpy.random.randn(100, 200)
a[a<0] = 0
a[a>0] = 255

b = numpy.random.randn(20, 20)
b[b<0] = 0
b[b>0] = 255

# put b somewhere in a
a[37:37+b.shape[0], 84:84+b.shape[1]] = b

# Now the actual solution...

# Set the black values to -1
a[a==0] = -1
b[b==0] = -1

# and the white values to 1
a[a==255] = 1
b[b==255] = 1

max_peak = numpy.prod(b.shape)

# c will contain max_peak where the overlap is perfect
c = signal.correlate(a, b, 'valid')

overlaps = numpy.where(c == max_peak)

print overlaps

这将输出(array([37]), array([84])),即代码中设置的偏移量的位置.

您可能会发现,如果您的字母大小乘以您的大数组大小大于大约Nlog(N),其中N是您要搜索的大数组的对应大小(每个维度),则您可能会通过使用基于fft的算法(例如scipy.signal.fftconvolve)来提高速度(请注意,如果您使用卷积而不是相关性,则需要翻转其中一个数据集的每个轴-flipudfliplr).唯一的修改是分配c:

c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid')

比较以上尺寸的计时:

In [5]: timeit c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid')
100 loops, best of 3: 6.78 ms per loop

In [6]: timeit c = signal.correlate(a, b, 'valid')
10 loops, best of 3: 151 ms per loop

I have a 100x200 2D array expressed as a numpy array consisting of black (0) and white (255) cells. It is a bitmap file. I then have 2D shapes (it's easiest to think of them as letters) that are also 2D black and white cells.

I know I can naively iterate through the matrix but this is going to be a 'hot' portion of my code so speed is an concern. Is there a fast way to perform this in numpy/scipy?

I looked briefly at Scipy's correlate function. I am not interested in 'fuzzy matches', only exact matches. I also looked at some academic papers but they are above my head.

解决方案

You can use correlate. You'll need to set your black values to -1 and your white values to 1 (or vice-versa) so that you know the value of the peak of the correlation, and that it only occurs with the correct letter.

The following code does what I think you want.

import numpy
from scipy import signal

# Set up the inputs
a = numpy.random.randn(100, 200)
a[a<0] = 0
a[a>0] = 255

b = numpy.random.randn(20, 20)
b[b<0] = 0
b[b>0] = 255

# put b somewhere in a
a[37:37+b.shape[0], 84:84+b.shape[1]] = b

# Now the actual solution...

# Set the black values to -1
a[a==0] = -1
b[b==0] = -1

# and the white values to 1
a[a==255] = 1
b[b==255] = 1

max_peak = numpy.prod(b.shape)

# c will contain max_peak where the overlap is perfect
c = signal.correlate(a, b, 'valid')

overlaps = numpy.where(c == max_peak)

print overlaps

This outputs (array([37]), array([84])), the locations of the offsets set in the code.

You will likely find that if your letter size multiplied by your big array size is bigger than roughly Nlog(N), where N is corresponding size of the big array in which you're searching (for each dimension), then you will probably get a speed up by using an fft based algorithm like scipy.signal.fftconvolve (bearing in mind that you'll need to flip each axis of one of the datasets if you're using a convolution rather than a correlation - flipud and fliplr). The only modification would be to assigning c:

c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid')

Comparing the timings on the sizes above:

In [5]: timeit c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid')
100 loops, best of 3: 6.78 ms per loop

In [6]: timeit c = signal.correlate(a, b, 'valid')
10 loops, best of 3: 151 ms per loop

这篇关于在矩阵内查找匹配的子矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆