快速将标记图像转换为{label:[coordinates]}的字典 [英] Fast way to turn a labeled image into a dictionary of { label : [coordinates] }
问题描述
说我用标记了一张图片scipy.ndimage.measurements.label 如下:
[[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 3, 0],
[2, 2, 0, 0, 0, 0],
[2, 2, 0, 0, 0, 0]]
收集属于每个标签的坐标的快速方法是什么?即类似于:
What's a fast way to collect the coordinates belonging to each label? I.e. something like:
{ 1: [[0, 1], [1, 1], [2, 1]],
2: [[4, 0], [4, 1], [5, 0], [5, 1]],
3: [[3, 4]] }
我正在处理大小约为15,000 x 5000像素的图像,并且大约有一半图像的像素被标记(即非零)。
I'm working with images that are ~15,000 x 5000 pixels in size, and roughly half of each image's pixels are labeled (i.e. non-zero).
不是用 nditer
迭代整个图像,它会更快到对每个标签执行类似 np.where(img == label)
的操作?
Rather than iterating through the entire image with nditer
, would it be faster to do something like np.where(img == label)
for each label?
编辑:
哪种算法最快取决于标记图像与其标记图像的数量相比有多大。 Warren Weckesser和Salvador Dali / BHAT IRSHAD的方法(基于 np.nonzero
和 np.where
)似乎都是与标签数量成线性比例,而用 nditer
迭代每个图像元素显然与标记图像的大小成线性比例。
Which algorithm is fastest depends on how big the labeled image is as compared to how many labels it has. Warren Weckesser and Salvador Dali / BHAT IRSHAD's methods (which are based on np.nonzero
and np.where
) all seem to scale linearly with the number of labels, whereas iterating through each image element with nditer
obviously scales linearly with the size of labeled image.
小测试的结果:
size: 1000 x 1000, num_labels: 10
weckesser ... 0.214357852936s
dali ... 0.650229930878s
nditer ... 6.53645992279s
size: 1000 x 1000, num_labels: 100
weckesser ... 0.936990022659s
dali ... 1.33582305908s
nditer ... 6.81486487389s
size: 1000 x 1000, num_labels: 1000
weckesser ... 8.43906402588s
dali ... 9.81333303452s
nditer ... 7.47897100449s
size: 1000 x 1000, num_labels: 10000
weckesser ... 100.405524015s
dali ... 118.17239809s
nditer ... 9.14583897591s
因此问题变得更加具体:
So the question becomes more specific:
对于标签图像,其中标签数量为 sqrt(尺寸(图像) ))
是否有一种算法来收集标签坐标,这比迭代每个图像元素要快(即 nditer
)?
For labeled images in which the number of labels is on the order of sqrt(size(image))
is there an algorithm to gather label coordinates that is faster than iterating through every image element (i.e. with nditer
)?
推荐答案
这是一种可能性:
import numpy as np
a = np.array([[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 3, 0],
[2, 2, 0, 0, 0, 0],
[2, 2, 0, 0, 0, 0]])
# If the array was computed using scipy.ndimage.measurements.label, you
# already know how many labels there are.
num_labels = 3
nz = np.nonzero(a)
coords = np.column_stack(nz)
nzvals = a[nz[0], nz[1]]
res = {k:coords[nzvals == k] for k in range(1, num_labels + 1)}
我调用了这个脚本 get_label_indices.py
。这是一个示例运行:
I called this script get_label_indices.py
. Here's a sample run:
In [97]: import pprint
In [98]: run get_label_indices.py
In [99]: pprint.pprint(res)
{1: array([[0, 1],
[1, 1],
[2, 1]]),
2: array([[4, 0],
[4, 1],
[5, 0],
[5, 1]]),
3: array([[3, 4]])}
这篇关于快速将标记图像转换为{label:[coordinates]}的字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!