如何将一个像素结构加载到一个SSE寄存器? [英] How to load a pixel struct into an SSE register?

查看:1492
本文介绍了如何将一个像素结构加载到一个SSE寄存器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有8位像素数据的结构:

I have a struct of 8-bit pixel data:

struct __attribute__((aligned(4))) pixels {
    char r;
    char g;
    char b;
    char a;
}

我想使用SSE指令来计算这些像素(即,Paeth转换)某些事情。我怎么能这些像素加载到一个SSE寄存器为32位无符号整数?

I want to use SSE instructions to calculate certain things on these pixels (namely, a Paeth transformation). How can I load these pixels into an SSE register as 32-bits unsigned integers?

推荐答案

好吧,使用SSE2整数内部函数从< emmintrin.h> 首先加载的东西进入下寄存器的32位:

Ok, using SSE2 integer intrinsics from <emmintrin.h> first load the thing into the lower 32 bits of the register:

__m128i xmm0 = _mm_cvtsi32_si128(*(const int*)&pixel);

然后首先解的8位值到16位值在寄存器的低64位,以0交织他们:

Then first unpack those 8-bit values into 16-bit values in the lower 64 bits of the register, interleaving them with 0s:

xmm0 = _mm_unpacklo_epi8(xmm0, _mm_setzero_si128());

和再次的16位值解压到32位值:

And again unpack those 16-bit values into 32-bit values:

xmm0 = _mm_unpacklo_epi16(xmm0, _mm_setzero_si128());

您现在应该有每个像素为32位整数上证所寄存器的各4个组分。

You should now have each pixel as 32-bit integer in the respective 4 components of the SSE register.

编辑:我刚才读,要获得这些值作为32位的签署整数,但我不知道是什么感觉在签署像素[-127,127]令。但如果你的像素值确实是负数,零交织将无法正常工作,因为它使一个负的8位数字变成了积极的16位数字(因而间$ P $其中pts您的数字为无符号像素值)。负数必须用 1 将扩大S代替 0 S,但不幸的是,将有动态地决定对组件的基础组件,在这SSE并不好。

I just read, that you want to get those values as 32-bit signed integers, though I wonder what sense a signed pixel in [-127,127] makes. But if your pixel values can indeed be negative, the interleaving with zeros won't work, since it makes a negative 8-bit number into a positive 16-bit number (thus interprets your numbers as unsigned pixel values). A negative number has to be extended with 1s instead of 0s, but unfortunately that would have to be decided dynamically on a component by component basis, at which SSE is not that good.

你可以做的是比较消极的价值观和使用产生的面具(幸好使用 1 ... 1 为真, 0 ... 0 假)作为interleavand,而不是零寄存器:

What you could do is compare the values for negativity and use the resulting mask (which fortunately uses 1...1 for true and 0...0 for false) as interleavand, instead of the zero register:

xmm0 = _mm_unpacklo_epi8(xmm0, _mm_cmplt_epi8(xmm0, _mm_setzero_si128()));
xmm0 = _mm_unpacklo_epi16(xmm0, _mm_cmplt_epi16(xmm0, _mm_setzero_si128()));

这将妥善 1 和阳性延长负数与 0 秒。不过,当然这额外的开销(在可能2-4额外的SSE指令的形式)只是neccessary如果您最初的8位像素值可以永远是负面的,这是我仍然怀疑。但如果这是真的话,你倒是应该考虑符号字符在字符,因为后者定义了实现符号类型(以同样的方式,你应该使用 unsigned char型如果这些是常见的无符号[0,255]像素值)。

This will properly extend negative numbers with 1s and positives with 0s. But of course this additional overhead (in the form of probably 2-4 additional SSE instructions) is only neccessary if your initial 8-bit pixel values can ever be negative, which I still doubt. But if this is really the case, you should rather consider signed char over char, as the latter has implementation-defined signedness (in the same way you should use unsigned char if those are the common unsigned [0,255] pixel values).

编辑:作为扭转这种转变的跟进,首先我们包了符号的32位整数到有符号16位整数和饱和:

As for the follow-up of reversing this transformation, first we pack the signed 32-bit integers into signed 16-bit integers and saturating:

xmm0 = _mm_packs_epi32(xmm0, xmm0);

然后,我们用饱和包中的16位值到无符号8位值:

Then we pack those 16-bit values into unsigned 8-bit values using saturation:

xmm0 = _mm_packus_epi16(xmm0, xmm0);

我们就可以最终把我们从像素寄存器的低32位:

We can then finally take our pixel from the lower 32-bits of the register:

*(int*)&pixel = _mm_cvtsi128_si32(xmm0);

由于饱和度,这整个过程将autmatically映射到 0 任何负值任何值大于 255 255 ,用彩色像素时通常是意。

Due to the saturation, this whole process will autmatically map any negative values to 0 and any values greater than 255 to 255, which is usually intended when working with color pixels.

编辑:如果您装箱32位值时,回 unsigned char型取值确实需要截断,而不是饱和,那么你就需要这种自己动手做,因为只有上证所提供饱和包装说明。但是,这可以通过做一个简单的实现:

If you actually need truncation instead of saturation when packing the 32-bit values back into unsigned chars, then you will need to do this yourself, since SSE only provides saturating packing instructions. But this can be achieved by doing a simple:

xmm0 = _mm_and_si128(xmm0, _mm_set1_epi32(0xFF));

右上面的包装程序之前。当分摊在很多像素这应该等于只有2额外的SSE指令,或仅1额外的指令。

right before the above packing procedure. This should amount to just 2 additional SSE instructions, or only 1 additional instruction when amortized over many pixels.

编辑:感谢的哈罗德的的评论,甚至有前8至32改造一个更好的选择。如果你有SSE4的支持(SSE4.1是precise),这对做完全转化,从4包装的8位值寄存器的低32位为4个32位值在整个寄存器指令,无论是对符号和无符号的8位值:

Thanks to harold's comment, there is even a better option for the first 8-to-32 transformation. If you have SSE4 support (SSE4.1 to be precise), which has instructions for doing the complete conversion from 4 packed 8-bit values in the lower 32 bits of the register into 4 32-bit values in the whole register, both for signed and unsigned 8-bit values:

xmm0 = _mm_cvtepu8_epi32(xmm0);   //or _mm_cvtepi8_epi32 for signed 8-bit values


编辑:虽然你并不需要签署8位到32位的转换,但对于完整起见的哈罗德的曾经为另一个非常不错的主意SSE2基于符号扩展,而不是使用上述的基于比较的版本。我们首先解压缩8位值到32位的值,而不是低字节的高字节。由于我们不关心的下部,我们只是再次使用8位值,这可以让我们从需要一个额外的零寄存器和一个额外的举动:


Although you don't need signed-8-bit to 32-bit conversion, but for the sake of completeness harold had another very good idea for the SSE2-based sign-extension, instead of using the above mentioned comparison based version. We first unpack the 8-bit values into the upper byte of the 32-bit values instead of the lower byte. Since we don't care for the lower parts, we just use the 8-bit values again, which frees us from the need for an extra zero-register and an additional move:

xmm0 = _mm_unpacklo_epi8(xmm0, xmm0);
xmm0 = _mm_unpacklo_epi16(xmm0, xmm0);

现在我们只需要执行和高字节的算术右移进入低字节,它不正确的符号扩展为负值:

Now we just need to perform and arithmetic right-shift of the upper byte into the lower byte, which does the proper sign-extension for negative values:

xmm0 = _mm_srai_epi32(xmm0, 24);

这应该是比较指令数和注册比我的上述SSE2版本高效。

This should be more instruction count and register efficient than my above SSE2-version.

和,因为它应该比高于零扩展甚至是在指令计数相等的单个像素(尽管当摊销更多的指令在多个像素)和更多寄存器有效(由于没有额外的零寄存器),它甚至可能用于无符号到符号转换如果寄存器是罕见的,但随后用逻辑移位( _mm_srli_epi32 ),而不是一个算术移位

And as it should even be equal in instruction count for a single pixel (though 1 more instruction when amortized over many pixels) and more register efficient (due to no extra zero-register) compared to the above zero-extension, it might even be used for the unsigned-to-signed conversion if registers are rare, but then with a logical shift (_mm_srli_epi32) instead of an arithmetic shift.

这篇关于如何将一个像素结构加载到一个SSE寄存器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆