快速阈值和位压缩算法(可能的改进?) [英] fast threshold and bit packing algorithm ( possible improvements ? )

查看:230
本文介绍了快速阈值和位压缩算法(可能的改进?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在执行一个8位灰度级图像的全局阈值到一个1比特的算法(位填充,使得1个字节包含8个像素)的单色图像。灰度图像中的每个像素可具有0的亮度值 - 255

I am working on an algorithm that performs a global thresholding of an 8-bit grayscale image into a 1-bit ( bit packed, such that 1 byte contains 8 pixels ) monochrome image. Each pixel in the Grayscale image can have a luminance value of 0 - 255.

我的环境是Win32的在Microsoft Visual Studio C ++。

My environment is Win32 in Microsoft Visual Studio C++.

我感兴趣的算法尽可能出于好奇优化为多,1位图像会变成一个TIFF。目前,我设置FillOrder是MSB2LSB(最高有效位到最低有效位),只是因为TIFF规格表明,这(它不一定需要是MSB2LSB)

I am interested in optimizing the algorithm as much as possible out curiosity, The 1-bit image will be turned into a TIFF. Currently I am setting the FillOrder to be MSB2LSB (Most Significant Bit to Least Significant Bit ) just because the TIFF spec suggests this ( it doesn't necessarily need to be MSB2LSB )

只是一些背景对于那些不知道是谁:

Just some background for those who don't know:

MSB2LSB订单像素从左到右在就像像素的图像中的方位的X坐标增加一个字节。如果您遍历灰度图像由左到右X轴这显然需要你觉得落后因为你是在当前字节装箱位。随着中说,让我告诉你,我现在有(这是C,我没有尝试ASM或编译器内在函数还只是因为我有这经验非常少,但是这将是一个可能性)。

MSB2LSB orders pixels from left to right in a byte just as pixels are oriented in an image as the X coordinate increases. If you are traversing the Grayscale image from left to right on the X axis this obviously requires you to think "backward" as you are packing the bits in your current byte. With that said, let me show you what I currently have (This is in C, I have not attempted ASM or Compiler Intrinsics yet just because I have very little experience with it, but that would be a possibility ).

由于单色图像将具有每字节8个像素,单色图像的宽度将是

Because the Monochrome image will have 8 pixels per byte, the Width of the monochrome image will be

(grayscaleWidth+7)/8;

仅供参考,我认为我最大的图像是6000像素宽:

FYI, I assume my largest image to be 6000 pixels wide:

第一件事,我做的(处理任何图像前)

First thing I do (before any image is processed) is

1)计算量的查找表,我需要转移到给定的一个X特定字节从我的灰度图像的坐标:

1) calculate a look up table of amounts I need to shift into a specific byte given an X coordinate from my grayscale image:

int _shift_lut[6000];

for( int x = 0 ; x < 6000; x++)
{ 
    _shift_lut[x] = 7-(x%8);
}

有了这个查找表,我可以包一个单色位值到我的东西,如工作的当前字节:

With this lookup table I can pack a monochrome bit value into the current byte I am working on with something like:

monochrome_pixel |= 1 << _shift_lut[ grayX ];

这结束是在做一个巨大的速度提升

which ends up being a huge speed increase over doing

monochrome_pixel |= 1 << _shift_lut[ 7-(x%8)];

第二个查找表我估计是一个查找表,告诉我在X索引给出的灰度像素的X像素的单色我像素。这个非常简单的LUT的计算像这样的:

The second lookup table I calculate is a Lookup Table that tells me the X index into my monochrome pixel given an X pixel on the Grayscale pixel. This very simple LUT is calculated like such:

int xOffsetLut[6000];
int element_size=8; //8 bits
for( int x = 0; x < 6000; x++)
{
    xOffsetLut[x]=x/element_size;
}

这LUT可以让我做的事情一样。

This LUT allows me to do things like

monochrome_image[ xOffsetLut[ GrayX ] ] = packed_byte; //packed byte contains 8 pixels

我的灰度图像是一个简单的无符号字符*,所以是我的黑白图像;

My Grayscale image is a simple unsigned char* and so is my monochrome image;

这是我如何初始化单色图像:

This is how I initialize the monochrome image:

int bitPackedScanlineStride = (grayscaleWidth+7)/8;
int bitpackedLength=bitPackedScanlineStride * grayscaleHeight;
unsigned char * bitpack_image = new unsigned char[bitpackedLength];
memset(bitpack_image,0,bitpackedLength);

然后我打电话给我的二值化功能,例如像:

Then I call my binarize function like such:

binarize(
    gray_image.DataPtr(),
    bitpack_image,
    globalFormThreshold,
    grayscaleWidth,
    grayscaleHeight,
    bitPackedScanlineStride,
    bitpackedLength,
    _shift_lut,  
    xOffsetLut);

这是我的二值化功能(因为你可以看到我做了一些循环展开,这可能会或可能不会帮助)。

And here is my Binarize function ( as you can see I did some loop unrolling, which may or may not help ).

void binarize( unsigned char grayImage[], unsigned char bitPackImage[], int threshold, int grayscaleWidth, int grayscaleHeight, int  bitPackedScanlineStride, int bitpackedLength,  int shiftLUT[], int xOffsetLUT[] )
{
    int yoff;
    int byoff;
    unsigned char bitpackPel=0;
    unsigned char pel1=0;
    unsigned char  pel2=0;
    unsigned char  pel3=0;
    unsigned char  pel4=0;
    unsigned char  pel5=0;
    unsigned char  pel6=0;
    unsigned char  pel7=0;
    unsigned char  pel8=0;
    int checkX=grayscaleWidth;
    int checkY=grayscaleHeight;

    for ( int by = 0 ; by < checkY; by++)
    {
    yoff=by*grayscaleWidth;
    byoff=by*bitPackedScanlineStride;

    for( int bx = 0; bx < checkX; bx+=32)
    {
        bitpackPel = 0;

        //pixel 1 in bitpack image
        pel1=grayImage[yoff+bx];
        pel2=grayImage[yoff+bx+1];
        pel3=grayImage[yoff+bx+2];
        pel4=grayImage[yoff+bx+3];
        pel5=grayImage[yoff+bx+4];
        pel6=grayImage[yoff+bx+5];
        pel7=grayImage[yoff+bx+6];
        pel8=grayImage[yoff+bx+7];

        bitpackPel |= ( (pel1<=threshold) << shiftLUT[bx]);
        bitpackPel |= ( (pel2<=threshold) << shiftLUT[bx+1] );
        bitpackPel |= ( (pel3<=threshold) << shiftLUT[bx+2] );
        bitpackPel |= ( (pel4<=threshold) << shiftLUT[bx+3] );
        bitpackPel |= ( (pel5<=threshold) << shiftLUT[bx+4] );
        bitpackPel |= ( (pel6<=threshold) << shiftLUT[bx+5] );
        bitpackPel |= ( (pel7<=threshold) << shiftLUT[bx+6] );
        bitpackPel |= ( (pel8<=threshold) << shiftLUT[bx+7] );

        bitPackImage[byoff+(xOffsetLUT[bx])] = bitpackPel;

        //pixel 2 in bitpack image
        pel1=grayImage[yoff+bx+8];
        pel2=grayImage[yoff+bx+9];
        pel3=grayImage[yoff+bx+10];
        pel4=grayImage[yoff+bx+11];
        pel5=grayImage[yoff+bx+12];
        pel6=grayImage[yoff+bx+13];
        pel7=grayImage[yoff+bx+14];
        pel8=grayImage[yoff+bx+15];

        bitpackPel = 0;

        bitpackPel |= ( (pel1<=threshold) << shiftLUT[bx+8]  );
        bitpackPel |= ( (pel2<=threshold) << shiftLUT[bx+9]  );
        bitpackPel |= ( (pel3<=threshold) << shiftLUT[bx+10] );
        bitpackPel |= ( (pel4<=threshold) << shiftLUT[bx+11] );
        bitpackPel |= ( (pel5<=threshold) << shiftLUT[bx+12] );
        bitpackPel |= ( (pel6<=threshold) << shiftLUT[bx+13] );
        bitpackPel |= ( (pel7<=threshold) << shiftLUT[bx+14] );
        bitpackPel |= ( (pel8<=threshold) << shiftLUT[bx+15] );

        bitPackImage[byoff+(xOffsetLUT[bx+8])] = bitpackPel;

        //pixel 3 in bitpack image
        pel1=grayImage[yoff+bx+16];
        pel2=grayImage[yoff+bx+17];
        pel3=grayImage[yoff+bx+18];
        pel4=grayImage[yoff+bx+19];
        pel5=grayImage[yoff+bx+20];
        pel6=grayImage[yoff+bx+21];
        pel7=grayImage[yoff+bx+22];
        pel8=grayImage[yoff+bx+23];

        bitpackPel = 0;

        bitpackPel |= ( (pel1<=threshold) << shiftLUT[bx+16]  );
        bitpackPel |= ( (pel2<=threshold) << shiftLUT[bx+17]  );
        bitpackPel |= ( (pel3<=threshold) << shiftLUT[bx+18] );
        bitpackPel |= ( (pel4<=threshold) << shiftLUT[bx+19] );
        bitpackPel |= ( (pel5<=threshold) << shiftLUT[bx+20] );
        bitpackPel |= ( (pel6<=threshold) << shiftLUT[bx+21] );
        bitpackPel |= ( (pel7<=threshold) << shiftLUT[bx+22] );
        bitpackPel |= ( (pel8<=threshold) << shiftLUT[bx+23] );

        bitPackImage[byoff+(xOffsetLUT[bx+16])] = bitpackPel;

        //pixel 4 in bitpack image
        pel1=grayImage[yoff+bx+24];
        pel2=grayImage[yoff+bx+25];
        pel3=grayImage[yoff+bx+26];
        pel4=grayImage[yoff+bx+27];
        pel5=grayImage[yoff+bx+28];
        pel6=grayImage[yoff+bx+29];
        pel7=grayImage[yoff+bx+30];
        pel8=grayImage[yoff+bx+31];

        bitpackPel = 0;

        bitpackPel |= ( (pel1<=threshold) << shiftLUT[bx+24]  );
        bitpackPel |= ( (pel2<=threshold) << shiftLUT[bx+25]  );
        bitpackPel |= ( (pel3<=threshold) << shiftLUT[bx+26] );
        bitpackPel |= ( (pel4<=threshold) << shiftLUT[bx+27] );
        bitpackPel |= ( (pel5<=threshold) << shiftLUT[bx+28] );
        bitpackPel |= ( (pel6<=threshold) << shiftLUT[bx+29] );
        bitpackPel |= ( (pel7<=threshold) << shiftLUT[bx+30] );
        bitpackPel |= ( (pel8<=threshold) << shiftLUT[bx+31] );

        bitPackImage[byoff+(xOffsetLUT[bx+24])] = bitpackPel;
    }
}
}

我知道这个算法将有可能错过每一行中一些尾随像素,但不用担心这一点。

I know this algorithm will potentially miss some trailing pixels in each row, but don't worry about that.

正如你可以看到每一个单色字节,我处理8灰阶像素。

As you can see for every monochrome byte, I process 8 grayscale pixels.

如果您看到
    pel8&LT; =门槛
是解析为0或1,比如果{}其他{​​}

Where you see pel8<=threshold is a neat little trick that resolves to 0 or 1 and is much faster than if{} else{}

有关X的所有增量我包了一下进入一个更高的序位比previous X

For every increment of X I pack a bit into a higher order bit than the previous X

因此​​对于灰度图像中的第一组的8个像素

so for the first set of 8 pixels in the grayscale image

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

这是在字节看起来像位(显然每个编号位仅仅是已经处理相应编号的像素的门槛的结果,但你的想法)是什么

This is what the bits in the byte look like (obviously each numbered bit is just the threshold result of having processed the corresponding numbered pixel but you get the idea)

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

这应该是它。欢迎有一些有趣的一些漂亮的位操作技巧,会挤出更多汁出了这个算法。

PHEW that should be it. Feel free to have some fun with some nifty bit twiddling tricks that would squeeze more juice out of this algorithm.

使用编译器优化开启此功能平均需要16毫秒的大约5000倍2200像素的图像上的Core 2 Duo处理器的机器上。

With compiler optimizations turned on this function takes on average 16 milliseconds on a roughly 5000 x 2200 pixel image on a core 2 duo machine.

编辑:

研究..的建议是,除去移位LUT,只是使用常量,这实际上是完全合乎逻辑的......我已经修改每个像素的或运算是这样:

R..'s suggestion was to remove the shift LUT and just use constants which is actually perfectly logical...I have modified the OR'ing of each pixel to be as such:

void binarize( unsigned char grayImage[], unsigned char bitPackImage[], int threshold, int grayscaleWidth, int grayscaleHeight, int  bitPackedScanlineStride, int bitpackedLength,  int shiftLUT[], int xOffsetLUT[] )
{
int yoff;
int byoff;
unsigned char bitpackPel=0;
unsigned char pel1=0;
unsigned char  pel2=0;
unsigned char  pel3=0;
unsigned char  pel4=0;
unsigned char  pel5=0;
unsigned char  pel6=0;
unsigned char  pel7=0;
unsigned char  pel8=0;
int checkX=grayscaleWidth-32;
int checkY=grayscaleHeight;

for ( int by = 0 ; by < checkY; by++)
{
    yoff=by*grayscaleWidth;
    byoff=by*bitPackedScanlineStride;

    for( int bx = 0; bx < checkX; bx+=32)
    {
        bitpackPel = 0;

        //pixel 1 in bitpack image
        pel1=grayImage[yoff+bx];
        pel2=grayImage[yoff+bx+1];
        pel3=grayImage[yoff+bx+2];
        pel4=grayImage[yoff+bx+3];
        pel5=grayImage[yoff+bx+4];
        pel6=grayImage[yoff+bx+5];
        pel7=grayImage[yoff+bx+6];
        pel8=grayImage[yoff+bx+7];

        /*bitpackPel |= ( (pel1<=threshold) << shiftLUT[bx]);
        bitpackPel |= ( (pel2<=threshold) << shiftLUT[bx+1] );
        bitpackPel |= ( (pel3<=threshold) << shiftLUT[bx+2] );
        bitpackPel |= ( (pel4<=threshold) << shiftLUT[bx+3] );
        bitpackPel |= ( (pel5<=threshold) << shiftLUT[bx+4] );
        bitpackPel |= ( (pel6<=threshold) << shiftLUT[bx+5] );
        bitpackPel |= ( (pel7<=threshold) << shiftLUT[bx+6] );
        bitpackPel |= ( (pel8<=threshold) << shiftLUT[bx+7] );*/
        bitpackPel |= ( (pel1<=threshold) << 7);
        bitpackPel |= ( (pel2<=threshold) << 6 );
        bitpackPel |= ( (pel3<=threshold) << 5 );
        bitpackPel |= ( (pel4<=threshold) << 4 );
        bitpackPel |= ( (pel5<=threshold) << 3 );
        bitpackPel |= ( (pel6<=threshold) << 2 );
        bitpackPel |= ( (pel7<=threshold) << 1 );
        bitpackPel |= ( (pel8<=threshold)  );

        bitPackImage[byoff+(xOffsetLUT[bx])] = bitpackPel;

        //pixel 2 in bitpack image
        pel1=grayImage[yoff+bx+8];
        pel2=grayImage[yoff+bx+9];
        pel3=grayImage[yoff+bx+10];
        pel4=grayImage[yoff+bx+11];
        pel5=grayImage[yoff+bx+12];
        pel6=grayImage[yoff+bx+13];
        pel7=grayImage[yoff+bx+14];
        pel8=grayImage[yoff+bx+15];

        bitpackPel = 0;

        /*bitpackPel |= ( (pel1<=threshold) << shiftLUT[bx+8]  );
        bitpackPel |= ( (pel2<=threshold) << shiftLUT[bx+9]  );
        bitpackPel |= ( (pel3<=threshold) << shiftLUT[bx+10] );
        bitpackPel |= ( (pel4<=threshold) << shiftLUT[bx+11] );
        bitpackPel |= ( (pel5<=threshold) << shiftLUT[bx+12] );
        bitpackPel |= ( (pel6<=threshold) << shiftLUT[bx+13] );
        bitpackPel |= ( (pel7<=threshold) << shiftLUT[bx+14] );
        bitpackPel |= ( (pel8<=threshold) << shiftLUT[bx+15] );*/
         bitpackPel |= ( (pel1<=threshold) << 7);
        bitpackPel |= ( (pel2<=threshold) << 6 );
        bitpackPel |= ( (pel3<=threshold) << 5 );
        bitpackPel |= ( (pel4<=threshold) << 4 );
        bitpackPel |= ( (pel5<=threshold) << 3 );
        bitpackPel |= ( (pel6<=threshold) << 2 );
        bitpackPel |= ( (pel7<=threshold) << 1 );
        bitpackPel |= ( (pel8<=threshold)  );


        bitPackImage[byoff+(xOffsetLUT[bx+8])] = bitpackPel;

        //pixel 3 in bitpack image
        pel1=grayImage[yoff+bx+16];
        pel2=grayImage[yoff+bx+17];
        pel3=grayImage[yoff+bx+18];
        pel4=grayImage[yoff+bx+19];
        pel5=grayImage[yoff+bx+20];
        pel6=grayImage[yoff+bx+21];
        pel7=grayImage[yoff+bx+22];
        pel8=grayImage[yoff+bx+23];

        bitpackPel = 0;

        /*bitpackPel |= ( (pel1<=threshold) << shiftLUT[bx+16]  );
        bitpackPel |= ( (pel2<=threshold) << shiftLUT[bx+17]  );
        bitpackPel |= ( (pel3<=threshold) << shiftLUT[bx+18] );
        bitpackPel |= ( (pel4<=threshold) << shiftLUT[bx+19] );
        bitpackPel |= ( (pel5<=threshold) << shiftLUT[bx+20] );
        bitpackPel |= ( (pel6<=threshold) << shiftLUT[bx+21] );
        bitpackPel |= ( (pel7<=threshold) << shiftLUT[bx+22] );
        bitpackPel |= ( (pel8<=threshold) << shiftLUT[bx+23] );*/
          bitpackPel |= ( (pel1<=threshold) << 7);
        bitpackPel |= ( (pel2<=threshold) << 6 );
        bitpackPel |= ( (pel3<=threshold) << 5 );
        bitpackPel |= ( (pel4<=threshold) << 4 );
        bitpackPel |= ( (pel5<=threshold) << 3 );
        bitpackPel |= ( (pel6<=threshold) << 2 );
        bitpackPel |= ( (pel7<=threshold) << 1 );
        bitpackPel |= ( (pel8<=threshold)  );


        bitPackImage[byoff+(xOffsetLUT[bx+16])] = bitpackPel;

        //pixel 4 in bitpack image
        pel1=grayImage[yoff+bx+24];
        pel2=grayImage[yoff+bx+25];
        pel3=grayImage[yoff+bx+26];
        pel4=grayImage[yoff+bx+27];
        pel5=grayImage[yoff+bx+28];
        pel6=grayImage[yoff+bx+29];
        pel7=grayImage[yoff+bx+30];
        pel8=grayImage[yoff+bx+31];

        bitpackPel = 0;

        /*bitpackPel |= ( (pel1<=threshold) << shiftLUT[bx+24]  );
        bitpackPel |= ( (pel2<=threshold) << shiftLUT[bx+25]  );
        bitpackPel |= ( (pel3<=threshold) << shiftLUT[bx+26] );
        bitpackPel |= ( (pel4<=threshold) << shiftLUT[bx+27] );
        bitpackPel |= ( (pel5<=threshold) << shiftLUT[bx+28] );
        bitpackPel |= ( (pel6<=threshold) << shiftLUT[bx+29] );
        bitpackPel |= ( (pel7<=threshold) << shiftLUT[bx+30] );
        bitpackPel |= ( (pel8<=threshold) << shiftLUT[bx+31] );*/
         bitpackPel |= ( (pel1<=threshold) << 7);
        bitpackPel |= ( (pel2<=threshold) << 6 );
        bitpackPel |= ( (pel3<=threshold) << 5 );
        bitpackPel |= ( (pel4<=threshold) << 4 );
        bitpackPel |= ( (pel5<=threshold) << 3 );
        bitpackPel |= ( (pel6<=threshold) << 2 );
        bitpackPel |= ( (pel7<=threshold) << 1 );
        bitpackPel |= ( (pel8<=threshold)  );


        bitPackImage[byoff+(xOffsetLUT[bx+24])] = bitpackPel;
    }
}
}

我现在测试上的英特尔至强5670,采用(GCC)4.1.2。根据这些规范硬codeD bitshift比使用我原来的LUT算法慢4毫秒。在Xeon和海湾合作委员会的LUT算法平均需要8.61毫秒,硬codeD bitshift平均需要12.285毫秒。

I am now testing on an Intel Xeon 5670, using (GCC) 4.1.2. Under these specs the hardcoded bitshift are 4 ms slower than using my original LUT algorithm. In the Xeon and GCC, the LUT algorithm takes on average 8.61 ms and the hard coded bitshift takes on average 12.285 ms.

推荐答案

尝试是这样的:

unsigned i, w8=w>>3, x;
for (i=0; i<w8; i++) {
    x = thres-src[0]>>1&0x80;
    x |= thres-src[1]>>2&0x40;
    x |= thres-src[2]>>3&0x20;
    x |= thres-src[3]>>4&0x10;
    x |= thres-src[4]>>5&0x08;
    x |= thres-src[5]>>6&0x04;
    x |= thres-src[6]>>7&0x02;
    x |= thres-src[7]>>8&0x01;
    out[i] = x;
    src += 8;
}

您可以计算出额外的code为在宽度行的末尾其余不是8的倍数,或者你可以只垫/对齐源,以确保它是8的倍数。

You can figure out the extra code for the remainder at the end of the line of the width is not a multiple of 8, or you could just pad/align the source to ensure it's a multiple of 8.

这篇关于快速阈值和位压缩算法(可能的改进?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆