优化SIMD直方图计算 [英] Optimizing SIMD histogram calculation

查看:268
本文介绍了优化SIMD直方图计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个代码,该代码在给opencv struct IplImage *和一个无符号int *的缓冲区中实现了直方图的计算.我还是SIMD的新手,所以我可能没有充分利用指令集提供的全部潜能.

I worked on a code that implements an histogram calculation given an opencv struct IplImage * and a buffer unsigned int * to the histogram. I'm still new to SIMD so I might not be taking advantage of the full potential the instruction set provides.

histogramASM:
xor rdx, rdx
xor rax, rax
mov eax, dword [imgPtr + imgWidthOffset]
mov edx, dword [imgPtr + imgHeightOffset]
mul rdx                                     
mov rdx, rax                                ; rdx = Image Size
mov r10, qword [imgPtr + imgDataOffset]     ; r10 = ImgData

NextPacket: 
mov rax, rdx
movdqu  xmm0, [r10 + rax - 16]
mov rcx,16                               ; 16 pixels/paq

PacketLoop:
pextrb  rbx, xmm0, 0                ; saving the pixel value on rbx
shl rbx,2
inc dword [rbx + Hist]
psrldq  xmm0,1
loop    PacketLoop

sub rdx,16
cmp rdx,0
jnz NextPacket
ret

在C上,我将运行这些代码来获得相同的结果.

On C, I'd be running these piece of code to obtain the same result.

imgSize = (img->width)*(img->height);
pixelData = (unsigned char *) img->imageData;

for(i = 0; i < imgSize; i++)
{
    pixel = *pixelData; 
    hist[pixel]++;
    pixelData++;
}

但是,在我的计算机上使用rdtsc()进行计算,两者所需的时间仅是SIMD汇编程序的1.5倍.有没有一种方法可以优化上面的代码,并用SIMD快速填充直方图矢量? 预先感谢

But the time it takes for both, measured in my computer with rdtsc(), is only 1.5 times better SIMD's assembler. Is there a way to optimize the code above and quickly fill the histogram vector with SIMD? Thanks in advance

推荐答案

像Jester一样,我很惊讶您的SIMD代码有了重大改进.您是否在启用优化的情况下编译了C代码?

Like Jester I'm surprised that your SIMD code had any significant improvement. Did you compile the C code with optimization turned on?

我可以提出的另一项建议是展开您的Packetloop循环.这是一个相当简单的优化,并且将每个迭代"的指令数量减少到只有两个:

The one additional suggestion I can make is to unroll your Packetloop loop. This is a fairly simple optimization and reduces the number of instructions per "iteration" to just two:

pextrb  ebx, xmm0, 0
inc dword [ebx * 4 + Hist]
pextrb  ebx, xmm0, 1
inc dword [ebx * 4 + Hist]
pextrb  ebx, xmm0, 2
inc dword [ebx * 4 + Hist]
...
pextrb  ebx, xmm0, 15
inc dword [ebx * 4 + Hist]

如果您使用的是NASM,则可以使用%rep指令保存一些输入内容:

If you're using NASM you can use the %rep directive to save some typing:

%assign pixel 0
%rep 16
    pextrb  rbx, xmm0, pixel
    inc dword [rbx * 4 + Hist]
%assign pixel pixel + 1
%endrep

这篇关于优化SIMD直方图计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆