优化SIMD直方图计算 [英] Optimizing SIMD histogram calculation
问题描述
我编写了一个代码,该代码在给opencv struct IplImage *和一个无符号int *的缓冲区中实现了直方图的计算.我还是SIMD的新手,所以我可能没有充分利用指令集提供的全部潜能.
I worked on a code that implements an histogram calculation given an opencv struct IplImage * and a buffer unsigned int * to the histogram. I'm still new to SIMD so I might not be taking advantage of the full potential the instruction set provides.
histogramASM:
xor rdx, rdx
xor rax, rax
mov eax, dword [imgPtr + imgWidthOffset]
mov edx, dword [imgPtr + imgHeightOffset]
mul rdx
mov rdx, rax ; rdx = Image Size
mov r10, qword [imgPtr + imgDataOffset] ; r10 = ImgData
NextPacket:
mov rax, rdx
movdqu xmm0, [r10 + rax - 16]
mov rcx,16 ; 16 pixels/paq
PacketLoop:
pextrb rbx, xmm0, 0 ; saving the pixel value on rbx
shl rbx,2
inc dword [rbx + Hist]
psrldq xmm0,1
loop PacketLoop
sub rdx,16
cmp rdx,0
jnz NextPacket
ret
在C上,我将运行这些代码来获得相同的结果.
On C, I'd be running these piece of code to obtain the same result.
imgSize = (img->width)*(img->height);
pixelData = (unsigned char *) img->imageData;
for(i = 0; i < imgSize; i++)
{
pixel = *pixelData;
hist[pixel]++;
pixelData++;
}
但是,在我的计算机上使用rdtsc()进行计算,两者所需的时间仅是SIMD汇编程序的1.5倍.有没有一种方法可以优化上面的代码,并用SIMD快速填充直方图矢量? 预先感谢
But the time it takes for both, measured in my computer with rdtsc(), is only 1.5 times better SIMD's assembler. Is there a way to optimize the code above and quickly fill the histogram vector with SIMD? Thanks in advance
推荐答案
像Jester一样,我很惊讶您的SIMD代码有了重大改进.您是否在启用优化的情况下编译了C代码?
Like Jester I'm surprised that your SIMD code had any significant improvement. Did you compile the C code with optimization turned on?
我可以提出的另一项建议是展开您的Packetloop
循环.这是一个相当简单的优化,并且将每个迭代"的指令数量减少到只有两个:
The one additional suggestion I can make is to unroll your Packetloop
loop. This is a fairly simple optimization and reduces the number of instructions per "iteration" to just two:
pextrb ebx, xmm0, 0
inc dword [ebx * 4 + Hist]
pextrb ebx, xmm0, 1
inc dword [ebx * 4 + Hist]
pextrb ebx, xmm0, 2
inc dword [ebx * 4 + Hist]
...
pextrb ebx, xmm0, 15
inc dword [ebx * 4 + Hist]
如果您使用的是NASM,则可以使用%rep指令保存一些输入内容:
If you're using NASM you can use the %rep directive to save some typing:
%assign pixel 0
%rep 16
pextrb rbx, xmm0, pixel
inc dword [rbx * 4 + Hist]
%assign pixel pixel + 1
%endrep
这篇关于优化SIMD直方图计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!