ARM解释霓虹灯图像采集 [英] Explaining ARM Neon Image Sampling

查看:232
本文介绍了ARM解释霓虹灯图像采集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试着写)OpenCV的更好版本的简历::调整(和我来到跨一个code,它是在这里:的https://github.com/rmaz/NEON-Image-Downscaling/blob/master/ImageResize/BDPViewController.m
在code是由2下采样的图像,但我不能得到的算法。我想首先给算法转换为C,那么尝试修改其用于学习的目的。是不是也容易将其转换任何大小下采样?

功能是:

 静态无效内嵌resizeRow(uint32_t的* DST,uint32_t的* SRC,uint32_t的pixelsPerRow)
{
    常量uint32_t的* rowB中= SRC + pixelsPerRow;    //迫使每行的像素数为8一复式
    pixelsPerRow = 8 *(pixelsPerRow / 8);    __asm​​__挥发性(Lresizeloop:\\ n//开始循环
                     vld1.32 {D0-D3},[%1]!\\ N//负载从顶行8个像素
                     vld1.32 {D4-D7},[%2]!\\ N//负载从底行8个像素
                     vhadd.u8 Q0,Q0,Q2 \\ n//平均的垂直像素
                     vhadd.u8 Q1,Q1,Q3 \\ n
                     VTRN.32 Q0,Q2 \\ n//转摆在不同的寄存器中的水平相邻像素
                     VTRN.32 Q1,Q3 \\ n
                     vhadd.u8 Q0,Q0,Q2 \\ n//平均水平像素
                     vhadd.u8 Q1,Q1,Q3 \\ n
                     VTRN.32 D0,D1 \\ n//填充像素寄存器
                     VTRN.32 D2,D3 \\ n
                     VSWP D1,D2 \\ n
                     vst1.64 {D0-D1},[%0]!\\ n//存储结果
                     潜艇%3%3,#8 \\ n//从像素数中减去8
                     BNE Lresizeloop \\ n//重复,直到该行完成
:= R(DST),= R(SRC),= R(rowB中),= R(pixelsPerRow)
:0(DST),1(SRC),2(rowB中),3(pixelsPerRow)
Q0,Q1,Q2,Q3,CC
);
}要调用它: //降尺度代替图像
    用于(为size_t的rowIndex = 0;&rowIndex位置LT;高度; rowIndex位置+ = 2)
    {
        无效* sourceRow =(uint8_t有*)缓冲区+的rowIndex * bytesPerRow;
        无效* destRow =(uint8_t有*)缓冲区+(的rowIndex / 2)* bytesPerRow;
        resizeRow(destRow,sourceRow,宽度);
    }


解决方案

的算法是pretty简单。它读取从下面的行中的当前行和8×8像素。然后,它使用vhadd(减半-ADD)指令垂直平均的8个像素。然后,它调换,使得水平邻接象素对现在是在分开的寄存器(垂直排列)中的像素的位置。然后它做另一套减半,增加了平均的在一起。结果然后再次转变为把他们在其原来的位置,并写入目标。该算法可以被重写,以处理比例的不同的积分大小,但写它只能做2x2至1减少与平均。下面是C $ C $等价的C:

 静态无效内嵌resizeRow(uint32_t的* DST,uint32_t的* SRC,uint32_t的pixelsPerRow)
{
    uint8_t有* pSrc8 =(uint8_t有*)SRC;
    uint8_t有* pDest8 =(uint8_t有*)DST;
    INT跨距= pixelsPerRow * sizeof的(uint32_t的);
    INT X;
    INT R,G,B,A;    为(X = 0; X&下; pixelsPerRow; X ++)
    {
       R = pSrc8 [0] + pSrc8 [4] + pSrc8 [步幅+ 0] + pSrc8 [步幅+ 4];
       G = pSrc8 [1] + pSrc8 [5] + pSrc8 [步幅+ 1] + pSrc8 [步幅+ 5];
       B = pSrc8 [2] + pSrc8 [6] + pSrc8 [步幅+ 2] + pSrc8 [步幅+ 6];
       一个= pSrc8 [3] + pSrc8 [7] + pSrc8 [步幅+ 3] + pSrc8 [步幅+ 7];
       pDest8 [0] =(uint8_t有)((R + 2)/ 4); //平均进行舍入
       pDest8 [1] =(uint8_t有)((G + 2)/ 4);
       pDest8 [2] =(uint8_t有)((B + 2)/ 4);
       pDest8 [3] =(uint8_t有)((A + 2)/ 4);
       pSrc8 + = 8; //向前跳2源像素
       pDest8 + = 4; //向前跳1目标像素
    }

I'm trying to write a better version of cv::resize() of the OpenCV, and I came a cross a code that is here: https://github.com/rmaz/NEON-Image-Downscaling/blob/master/ImageResize/BDPViewController.m The code is for downsampling an image by 2 but I can not get the algorithm. I would like first to convert that algorithm to C then try to modify it for Learning purposes. Is it easy also to convert it to downsample by any size ?

The function is:

static void inline resizeRow(uint32_t *dst, uint32_t *src, uint32_t pixelsPerRow)
{
    const uint32_t * rowB = src + pixelsPerRow;

    // force the number of pixels per row to a mutliple of 8
    pixelsPerRow = 8 * (pixelsPerRow / 8);

    __asm__ volatile("Lresizeloop: \n" // start loop
                     "vld1.32 {d0-d3}, [%1]! \n" // load 8 pixels from the top row
                     "vld1.32 {d4-d7}, [%2]! \n" // load 8 pixels from the bottom row
                     "vhadd.u8 q0, q0, q2 \n" // average the pixels vertically
                     "vhadd.u8 q1, q1, q3 \n"
                     "vtrn.32 q0, q2 \n" // transpose to put the horizontally adjacent pixels in different registers
                     "vtrn.32 q1, q3 \n"
                     "vhadd.u8 q0, q0, q2 \n" // average the pixels horizontally
                     "vhadd.u8 q1, q1, q3 \n"
                     "vtrn.32 d0, d1 \n" // fill the registers with pixels
                     "vtrn.32 d2, d3 \n"
                     "vswp d1, d2 \n"
                     "vst1.64 {d0-d1}, [%0]! \n" // store the result
                     "subs %3, %3, #8 \n" // subtract 8 from the pixel count
                     "bne Lresizeloop \n" // repeat until the row is complete
: "=r"(dst), "=r"(src), "=r"(rowB), "=r"(pixelsPerRow)
: "0"(dst), "1"(src), "2"(rowB), "3"(pixelsPerRow)
: "q0", "q1", "q2", "q3", "cc"
);
}

To call it:

 // downscale the image in place
    for (size_t rowIndex = 0; rowIndex < height; rowIndex+=2)
    {
        void *sourceRow = (uint8_t *)buffer + rowIndex * bytesPerRow;
        void *destRow = (uint8_t *)buffer + (rowIndex / 2) * bytesPerRow;
        resizeRow(destRow, sourceRow, width);
    }

解决方案

The algorithm is pretty straightforward. It reads 8 pixels from the current line and 8 from the line below. It then uses the vhadd (halving-add) instruction to average the 8 pixels vertically. It then transposes the position of the pixels so that the horizontally adjacent pixel pairs are now in separate registers (arranged vertically). It then does another set of halving-adds to average those together. The result is then transformed again to put them in their original positions and written to the destination. This algorithm could be rewritten to handle different integral sizes of scaling, but as written it can only do 2x2 to 1 reduction with averaging. Here's the C code equivalent:

static void inline resizeRow(uint32_t *dst, uint32_t *src, uint32_t pixelsPerRow)
{
    uint8_t * pSrc8 = (uint8_t *)src;
    uint8_t * pDest8 = (uint8_t *)dst;
    int stride = pixelsPerRow * sizeof(uint32_t);
    int x;
    int r, g, b, a;

    for (x=0; x<pixelsPerRow; x++)
    {
       r = pSrc8[0] + pSrc8[4] + pSrc8[stride+0] + pSrc8[stride+4];
       g = pSrc8[1] + pSrc8[5] + pSrc8[stride+1] + pSrc8[stride+5];
       b = pSrc8[2] + pSrc8[6] + pSrc8[stride+2] + pSrc8[stride+6];
       a = pSrc8[3] + pSrc8[7] + pSrc8[stride+3] + pSrc8[stride+7];
       pDest8[0] = (uint8_t)((r + 2)/4); // average with rounding
       pDest8[1] = (uint8_t)((g + 2)/4);
       pDest8[2] = (uint8_t)((b + 2)/4);
       pDest8[3] = (uint8_t)((a + 2)/4);
       pSrc8 += 8; // skip forward 2 source pixels
       pDest8 += 4; // skip forward 1 destination pixel
    }

这篇关于ARM解释霓虹灯图像采集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆