OpenCV中的快速颜色量化 [英] Fast color quantization in OpenCV

查看:288
本文介绍了OpenCV中的快速颜色量化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何以最快的方式使用OpenCV(+ C ++)减少图像中不同颜色的数量?我不想要完整的代码。我已经使用kmeans做了它,但它不是很快。这是我的代码的一部分很慢:

  kmeans(samples,clusterCount,labels,
TermCriteria(TermCriteria) :: EPS + TermCriteria :: COUNT,10,10.0),
1,KMEANS_RANDOM_CENTERS,center);

此代码需要几秒钟才能处理,这对我来说非常慢。我正在使用Matlab( rgb2ind )这很快。差不多0.01秒。



我想用我的代码进行生产,用户希望程序能够快速生成。



有没有替代kmeans进行颜色量化?有没有办法更快地运行kmeans(我不这么认为,因为我尝试了很多不同的参数)?



编辑:


关闭色彩量化是一个非常复杂的主题,需要时间来编写优化的色彩量化。我已经决定使用 Magick ++(ImageMagick API)

因为我没有尝试过Cris Luengo的新(编辑)答案。但我将其标记为答案(也请查看评论),以便其他人不认为这个问题没有得到解答。

解决方案

有许多方法可以量化颜色。在这里我描述了四个。



统一量化



这里我们使用的颜色图是均匀分布的颜色,无论是它们是否存在于图像中。在MATLAB中说你会写

  qimg = round(img *(N / 255))*(255 / N); 

将每个频道量化为 N 级别(假设输入在[0,255]范围内。您也可以使用 floor ,这在某些情况下更合适。这导致 N ^ 3 不同的颜色。例如,使用 N = 8 ,您将获得512种独特的RGB颜色。



K-means聚类



这是生成自适应调色板的经典方法。显然它将是最昂贵的.OP正在应用k-means相反,k-means可以应用于颜色直方图。过程是相同的,但不是1000万个数据点(现在是一个典型的图像),你只有32 ^ 3 = 33千。在处理自然照片时,由柱数减少的直方图引起的量化效果很小。如果要量化具有有限颜色集的图形,则不需要进行k均值聚类。



你d o单次遍历所有像素以创建直方图。接下来,运行常规k-means聚类,但使用直方图箱。现在,每个数据点都有一个权重(该区域内的像素数),您需要将其考虑在内。算法中确定群集中心的步骤会受到影响。您需要计算数据点的加权平均值,而不是常规平均值。



结果受初始化影响。



八叉树量化



八叉树是空间索引的数据结构,通过将每个轴切成两半,将卷递归地分成8个子卷。因此,树由每个有8个孩子的节点组成。对于颜色量化,RGB立方体由八叉树表示,并且计算每个节点的像素数(这相当于构建颜色直方图,并在其上构建八叉树)。接下来,移除叶节点,直到剩下所需数量的叶节点。移除叶节点一次发生8个,使得一个级别的节点变为叶子。有不同的策略来选择修剪哪些节点,但它们通常围绕具有低像素数的修剪节点。



这是Gimp使用的方法。



因为八叉树总是在中间分割节点,所以它不如k-means聚类或下一个方法灵活。



最小方差量化





Uniform with N = 4 导致最多64种不同颜色[含 N = 2 获得8种不同的颜色并与其他方法相比,结果非常难看]:





K-means有8种颜色:





8种颜色的新最小方差:





我最后的结果比K-means结果更好,尽管它们非常相似。


How can I reduce the number of distinct colors in images using OpenCV (+ C++) the fastest way possible? I don't want the complete code. I'm already doing it using kmeans but it's not very fast. This is the part of my code that is slow:

kmeans(samples, clusterCount, labels,
    TermCriteria(TermCriteria::EPS + TermCriteria::COUNT, 10, 10.0),
    1, KMEANS_RANDOM_CENTERS, centers);

This code takes a few seconds to process which is very very slow for me. I was using Matlab for this (rgb2ind) which was fast. Almost 0.01 seconds.

I want to use my code for production where the users expect the program to be fast.

Is there any alternative to kmeans for color quantization? Is there any way to run kmeans faster (which I don't think so because I've tried many different parameters)?

Edit:
Turned out color quantization is a very complex topic and takes time to write a good optimized one. I've decided to use Magick++ (ImageMagick API) for this.
Because of that I haven't tried Cris Luengo's new (edited) answer. But I mark it as answer (check out the comments too) so that other people don't think this question isn't answered.

解决方案

There are many ways to quantize colors. Here I describe four.

Uniform quantization

Here we are using a color map with uniformly distributed colors, whether they exist in the image or not. In MATLAB-speak you would write

qimg = round(img*(N/255))*(255/N);

to quantize each channel into N levels (assuming the input is in the range [0,255]. You can also use floor, which is more suitable in some cases. This leads to N^3 different colors. For example with N=8 you get 512 unique RGB colors.

K-means clustering

This is the "classical" method to generate an adaptive palette. Obviously it is going to be the most expensive. The OP is applying k-means on the collection of all pixels. Instead, k-means can be applied to the color histogram. The process is identical, but instead of 10 million data points (a typical image nowadays), you have only maybe 32^3 = 33 thousand. The quantization caused by the histogram with reduced number of bins has little effect here when dealing with natural photographs. If you are quantizing a graph, which has a limited set of colors, you don't need to do k-means clustering.

You do a single pass through all pixels to create the histogram. Next, you run the regular k-means clustering, but using the histogram bins. Each data point has a weight now also (the number of pixels within that bin), that you need to take into account. The step in the algorithm that determines the cluster centers is affected. You need to compute the weighted mean of the data points, instead of the regular mean.

The result is affected by the initialization.

Octree quantization

An octree is a data structure for spatial indexing, where the volume is recursively divided into 8 sub-volumes by cutting each axis in half. The tree thus is formed of nodes with 8 children each. For color quantization, the RGB cube is represented by an octree, and the number of pixels per node is counted (this is equivalent to building a color histogram, and constructing an octree on top of that). Next, leaf nodes are removed until the desired number of them is left. Removing leaf nodes happens 8 at a time, such that a node one level up becomes a leaf. There are different strategies to pick which nodes to prune, but they typically revolve around pruning nodes with low pixel counts.

This is the method that Gimp uses.

Because the octree always splits nodes down the middle, it is not as flexible as k-means clustering or the next method.

Minimum variance quantization

MATLAB's rgb2ind, which the OP mentions, does uniform quantization and something they call "minimum variance quantization":

Minimum variance quantization cuts the RGB color cube into smaller boxes (not necessarily cubes) of different sizes, depending on how the colors are distributed in the image.

I'm not sure what this means. This page doesn't give away anything more, but it has a figure that looks like a k-d tree partitioning of the RGB cube. K-d trees are spatial indexing structures that divide spatial data in half recursively. At each level, you pick the dimension where there is most separation, and split along that dimension, leading to one additional leaf node. In contrast to octrees, the splitting can happen at an optimal location, it is not down the middle of the node.

The advantage of using a spatial indexing structure (either k-d trees or octrees) is that the color lookup is really fast. You start at the root, and make a binary decision based on either R, G or B value, until you reach a leaf node. There is no need to compute distances to each prototype cluster, as is the case of k-means.

[Edit two weeks later] I have been thinking about a possible implementation, and came up with one. This is the algorithm:

  • The full color histogram is considered a partition. This will be the root for a k-d tree, which right now is also the leaf node because there are yet no other nodes.
  • A priority queue is created. It contains all the leaf nodes of the k-d tree. The priority is given by the variance of the partition along one axis, minus the variances of the two halves if we were to split the partition along that axis. The split location is picked such that the variances of the two halves are minimal (using Otsu's algorithm). That is, the larger the priority, the more total variance we reduce by making the split. For each leaf node, we compute this value for each axis, and use the largest result.
  • We process partitions on the queue until we have the desired number of partitions:
    • We split the partition with highest priority along the axis and at the location computed when determining the priority.
    • We compute the priority for each of the two halves, and put them on the queue.

This is a relatively simple algorithm when described this way, the code is somewhat more complex, because I tried to make it efficient but generic.

Comparison

On a 256x256x256 RGB histogram I got these timings comparing k-means clustering and this new algorithm:

# clusters    kmeans (s)    minvar (s)
     5          3.98         0.34
    20         17.9          0.48
    50        220.8          0.59

Note that k-means needs more iterations as the number of clusters increases, hence the exponential time increase. Normally one would not use such a big histogram, I wanted to have large data to make the timings more robust.

Here is an example of these three methods applied to a test image:

Input:

Uniform with N=4 leading to up to 64 different colors [with N=2 to get 8 different colors and comparable to the other methods, the result is very ugly]:

K-means with 8 colors:

New "minimum variance" with 8 colors:

I like this last result better than the K-means result, though they are fairly similar.

这篇关于OpenCV中的快速颜色量化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆