在Python中创建快速RGB查找表 [英] Creating fast RGB look up tables in Python

查看：113 发布时间：2020/5/18 23:12:57 python performance numpy opencv lookup-tables

本文介绍了在Python中创建快速RGB查找表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个函数，我将其称为"rgb2something"，该函数将RGB数据[1x1x3]转换为单个值(概率)，循环遍历输入RGB数据中的每个像素非常慢.

I have a function I will call 'rgb2something' that transforms RGB data [1x1x3] into a single value (a probability), looping over each pixel in the input RGB data turns out to be rather slow.

我尝试了以下方法来加快转换速度.生成LUT(查找表):

I have tried the following approach to speed the conversion up. To generate the LUT (Look up table):

import numpy as np

levels = 256
levels2 = levels**2
lut = [0] * (levels ** 3)

levels_range = range(0, levels)

for r in levels_range:
    for g in levels_range:
        for b in levels_range:
            lut[r + (g * levels) + (b * levels2)] = rgb2something(r, g, b)

并将RGB转换为变换后的概率图像:

And to convert RGB to the transformed probability image:

result = np.take(lut, r_channel + (g_channel * 256) + (b_channel * 65536))

但是，生成LUT和计算结果的速度仍然很慢.在2维中它相当快，但是在3维(r，g和b)中它很慢.我该如何提高其性能?

However both generating the LUT and calculating the result is still slow. In 2 dimensions it's rather fast, however in 3 dimensions (r, g and b) it's slow. How can I increase the performance of this?

rgb2something(r, g, b)看起来像这样:

def rgb2something(r, g, b):
    y = np.array([[r, g, b]])
    y_mean = np.mean(y, axis=0)
    y_centered = y - y_mean
    y_cov = y_centered.T.dot(y_centered) / len(y_centered)
    m = len(Consts.x)
    n = len(y)
    q = m + n
    pool_cov = (m / q * x_cov) + (n / q * y_cov)
    inv_pool_cov = np.linalg.inv(pool_cov)
    g = Consts.x_mean - y_mean
    mah = g.T.dot(inv_pool_cov).dot(g) ** 0.5
    return mah

我正在尝试实现的完整工作代码示例，我正在使用OpenCV，因此任何OpenCV方法(例如

EDIT 2:

Full working code sample of what I'm trying to achieve, I am using OpenCV so any OpenCV approaches such as Apply LUT are welcome, as are C/C++ approaches:

import matplotlib.pyplot as plt
import numpy as np 
import cv2

class Model:
    x = np.array([
        [6, 5, 2],
        [2, 5, 7],
        [6, 3, 1]
    ])
    x_mean = np.mean(x, axis=0)
    x_centered = x - x_mean
    x_covariance = x_centered.T.dot(x_centered) / len(x_centered)
    m = len(x)
    n = 1  # Only ever comparing to a single pixel
    q = m + n
    pooled_covariance = (m / q * x_covariance)  # + (n / q * y_cov) -< Always 0 for a single point
    inverse_pooled_covariance = np.linalg.inv(pooled_covariance)

def rgb2something(r, g, b):
    #Calculates Mahalanobis Distance between pixel and model X
    y = np.array([[r, g, b]])
    y_mean = np.mean(y, axis=0)
    g = Model.x_mean - y_mean
    mah = g.T.dot(Model.inverse_pooled_covariance).dot(g) ** 0.5
    return mah

def generate_lut():
    levels = 256
    levels2 = levels**2
    lut = [0] * (levels ** 3)

    levels_range = range(0, levels)

    for r in levels_range:
        for g in levels_range:
            for b in levels_range:
                lut[r + (g * levels) + (b * levels2)] = rgb2something(r, g, b)

    return lut

def calculate_distance(lut, input_image):
    return np.take(lut, input_image[:, :, 0] + (input_image[:, :, 1] * 256) + (input_image[:, :, 2] * 65536))

lut = generate_lut()
rgb = np.random.randint(255, size=(1080, 1920, 3), dtype=np.uint8)
result = calculate_distance(lut, rgb)

cv2.imshow("Example", rgb)
cv2.imshow("Result", result)
cv2.waitKey(0)

更新:添加了blas优化

有几种简单有效的优化方法:

Update: added blas optimization

There are several straightforward and very effective optimizations:

(1)向量化，向量化！对这段代码中的所有内容进行矢量化并不是很困难.见下文.

(1) vectorize, vectorize! It is not so difficult to vectorize essentially everything in this code. See below.

(2)使用适当的查找，即花式索引，而不是np.take

(2) use proper lookup, i.e fancy indexing, not np.take

(3)使用Cholesky反压缩.使用blas dtrmm，我们可以利用其三角形结构

(3) use Cholesky decomp. With blas dtrmm we can exploit its triangular structure

这是代码.只需将其添加到OP的代码末尾(在EDIT 2下).除非您非常有耐心，否则您可能还想注释掉lut = generate_lut()和result = calculate_distance(lut, rgb)行以及对cv2的所有引用.我还在x中添加了一个随机行，以使其协方差矩阵变为非奇数.

And here is the code. Just add it to the end of OP's code (under EDIT 2). Unless you are very patient you probably also want to comment out the lut = generate_lut() and result = calculate_distance(lut, rgb) lines and all references to cv2. I've also added a random row to x to make its covariance matrix non singular.

class Full_Model(Model):
    ch = np.linalg.cholesky(Model.inverse_pooled_covariance)
    chx = Model.x_mean@ch

def rgb2something_vectorized(rgb):
    return np.sqrt(np.sum(((rgb - Full_Model.x_mean)@Full_Model.ch)**2,  axis=-1))

from scipy.linalg import blas

def rgb2something_blas(rgb):
    *shp, nchan = rgb.shape
    return np.sqrt(np.einsum('...i,...i', *2*(blas.dtrmm(1, Full_Model.ch.T, rgb.reshape(-1, nchan).T, 0, 0, 0, 0, 0).T - Full_Model.chx,))).reshape(shp)

def generate_lut_vectorized():
    return rgb2something_vectorized(np.transpose(np.indices((256, 256, 256))))

def generate_lut_blas():
    rng = np.arange(256)
    arr = np.empty((256, 256, 256, 3))
    arr[0, ..., 0]  = rng
    arr[0, ..., 1]  = rng[:, None]
    arr[1:, ...] = arr[0]
    arr[..., 2] = rng[:, None, None]
    return rgb2something_blas(arr)

def calculate_distance_vectorized(lut, input_image):
    return lut[input_image[..., 2], input_image[..., 1], input_image[..., 0]]

# test code

def random_check_lut(lut):
    """Because the original lut generator is excruciatingly slow,
    we only compare a random sample, using the original code
    """
    levels = 256
    levels2 = levels**2
    lut = lut.ravel()

    levels_range = range(0, levels)

    for r, g, b in np.random.randint(0, 256, (1000, 3)):
        assert np.isclose(lut[r + (g * levels) + (b * levels2)], rgb2something(r, g, b))

import time
td = []
td.append((time.time(), 'create lut vectorized'))
lutv = generate_lut_vectorized()
td.append((time.time(), 'create lut using blas'))
lutb = generate_lut_blas()
td.append((time.time(), 'lookup using np.take'))
res = calculate_distance(lutv, rgb)
td.append((time.time(), 'process on the fly (no lookup)'))
resotf = rgb2something_vectorized(rgb)
td.append((time.time(), 'process on the fly (blas)'))
resbla = rgb2something_blas(rgb)
td.append((time.time(), 'lookup using fancy indexing'))
resv = calculate_distance_vectorized(lutv, rgb)
td.append((time.time(), None))

print("sanity checks ... ", end='')
assert np.allclose(res, resotf) and np.allclose(res, resv) \
    and np.allclose(res, resbla) and np.allclose(lutv, lutb)
random_check_lut(lutv)
print('all ok\n')

t, d = zip(*td)
for ti, di in zip(np.diff(t), d):
    print(f'{di:32s} {ti:10.3f} seconds')

样品运行:

sanity checks ... all ok

create lut vectorized                 1.116 seconds
create lut using blas                 0.917 seconds
lookup using np.take                  0.398 seconds
process on the fly (no lookup)        0.127 seconds
process on the fly (blas)             0.069 seconds
lookup using fancy indexing           0.064 seconds

我们可以看到，最好的查找要比晶须上的实时计算要好.那就是说该示例可能高估了查找成本，因为随机像素大概比自然图像对缓存的友好度低.

We can see that the best lookup beats the best on-the-fly computation by a whisker. That said the example may overestimate lookup cost, because random pixels are presumably less cache friendly than natural images.

如果无法对rgb2something进行矢量化处理，并且希望处理一张典型图像，则可以使用np.unique获得不错的加速效果.

If rgb2something can't be vectorized, and you want to process one typical image, then you can get a decent speedup using np.unique.

如果rgb2something昂贵且必须处理多个图像，则可以将unique与缓存结合使用，这可以方便地使用functools.lru_cache －(仅)(较小)绊脚石完成:参数必须是可哈希的.事实证明，代码中的修改使这种强制(将rgb数组投射到3字节的字符串中)恰好可以提高性能.

If rgb2something is expensive and multiple images have to be processed, then unique can be combined with caching, which is conveniently done using functools.lru_cache---only (minor) stumbling block: arguments must be hashable. As it turns out the modification in code that this forces (casting rgb-arrays to 3-byte strings) happens to benefit performance.

仅当您拥有覆盖大多数色相的大量像素时，才使用完整的查找表是值得的.在这种情况下，最快的方法是使用numpy花式索引进行实际查找.

Using a full look up table is only worth it if you have a huge number of pixels covering most hues. In that case the fastest way is using numpy fancy indexing to do the actual lookup.

import numpy as np
import time
import functools

def rgb2something(rgb):
    # waste some time:
    np.exp(0.1*rgb)
    return rgb.mean()

@functools.lru_cache(None)
def rgb2something_lru(rgb):
    rgb = np.frombuffer(rgb, np.uint8)
    # waste some time:
    np.exp(0.1*rgb)
    return rgb.mean()

def apply_to_img(img):
    shp = img.shape
    return np.reshape([rgb2something(x) for x in img.reshape(-1, shp[-1])], shp[:2])

def apply_to_img_lru(img):
    shp = img.shape
    return np.reshape([rgb2something_lru(x) for x in img.ravel().view('S3')], shp[:2])

def apply_to_img_smart(img, print_stats=True):
    shp = img.shape
    unq, bck = np.unique(img.reshape(-1, shp[-1]), return_inverse=True, axis=0)
    if print_stats:
        print('total no pixels', shp[0]*shp[1], '\nno unique pixels', len(unq))
    return np.array([rgb2something(x) for x in unq])[bck].reshape(shp[:2])

def apply_to_img_smarter(img, print_stats=True):
    shp = img.shape
    unq, bck = np.unique(img.ravel().view('S3'), return_inverse=True)
    if print_stats:
        print('total no pixels', shp[0]*shp[1], '\nno unique pixels', len(unq))
    return np.array([rgb2something_lru(x) for x in unq])[bck].reshape(shp[:2])

def make_full_lut():
    x = np.empty((3,), np.uint8)
    return np.reshape([rgb2something(x) for x[0] in range(256)
                       for x[1] in range(256) for x[2] in range(256)],
                      (256, 256, 256))

def make_full_lut_cheat(): # for quicker testing lookup
    i, j, k = np.ogrid[:256, :256, :256]
    return (i + j + k) / 3

def apply_to_img_full_lut(img, lut):
    return lut[(*np.moveaxis(img, 2, 0),)]

from scipy.misc import face

t0 = time.perf_counter()
bw = apply_to_img(face())
t1 = time.perf_counter()
print('naive                 ', t1-t0, 'seconds')

t0 = time.perf_counter()
bw = apply_to_img_lru(face())
t1 = time.perf_counter()
print('lru first time        ', t1-t0, 'seconds')

t0 = time.perf_counter()
bw = apply_to_img_lru(face())
t1 = time.perf_counter()
print('lru second time       ', t1-t0, 'seconds')

t0 = time.perf_counter()
bw = apply_to_img_smart(face(), False)
t1 = time.perf_counter()
print('using unique:         ', t1-t0, 'seconds')

rgb2something_lru.cache_clear()

t0 = time.perf_counter()
bw = apply_to_img_smarter(face(), False)
t1 = time.perf_counter()
print('unique and lru first: ', t1-t0, 'seconds')

t0 = time.perf_counter()
bw = apply_to_img_smarter(face(), False)
t1 = time.perf_counter()
print('unique and lru second:', t1-t0, 'seconds')

t0 = time.perf_counter()
lut = make_full_lut_cheat()
t1 = time.perf_counter()
print('creating full lut:    ', t1-t0, 'seconds')

t0 = time.perf_counter()
bw = apply_to_img_full_lut(face(), lut)
t1 = time.perf_counter()
print('using full lut:       ', t1-t0, 'seconds')

print()
apply_to_img_smart(face())

import Image
Image.fromarray(bw.astype(np.uint8)).save('bw.png')

样品运行:

naive                  6.8886632949870545 seconds
lru first time         1.7458112589956727 seconds
lru second time        0.4085628940083552 seconds
using unique:          2.0951434450107627 seconds
unique and lru first:  2.0168916099937633 seconds
unique and lru second: 0.3118703299842309 seconds
creating full lut:     151.17599205300212 seconds
using full lut:        0.12164952099556103 seconds

total no pixels 786432 
no unique pixels 134105

这篇关于在Python中创建快速RGB查找表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Python中创建快速RGB查找表 [英] Creating fast RGB look up tables in Python

问题描述

EDIT 2:

推荐答案

更新:添加了blas优化

Update: added blas optimization

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在Python中创建快速RGB查找表 [英] Creating fast RGB look up tables in Python

问题描述

EDIT 2:

推荐答案

更新:添加了blas优化

Update: added blas optimization

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭