AVX 4位整数 [英] AVX 4-bit integers

查看:68
本文介绍了AVX 4位整数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要执行以下操作:

 w[i] = scale * v[i] + point

比例尺和点是固定的,而 v [] 是4位整数的向量.

scale and point are fixed, whereas v[] is a vector of 4-bit integers.

我需要为任意输入向量 v [] 计算 w [] ,并且我想使用AVX内在函数加快处理速度.但是, v [i] 是4位整数的向量.

I need to compute w[] for the arbitrary input vector v[] and I want to speed up the process using AVX intrinsics. However, v[i] is a vector of 4-bit integers.

问题是如何使用内在函数对4位整数执行运算?我可以使用8位整数并以这种方式执行操作,但是有一种方法可以执行以下操作:

The question is how to perform operations on 4-bit integers using intrinsics? I could use 8-bit integers and perform operations that way, but is there a way to do the following:

[a,b] + [c,d] = [a+b,c+d]

[a,b] * [c,d] = [a * b,c * d]

(忽略溢出)

使用AVX内部函数,其中[...,...]是8位整数,而a,b,c,d是4位整数吗?

Using AVX intrinsics, where [...,...] Is an 8-bit integer and a,b,c,d are 4-bit integers?

如果是,是否可以举一个简短的示例说明其工作原理?

If yes, would it be possible to give a short example on how this could work?

推荐答案

只是部分答案(仅是加法)和伪代码(应易于扩展到AVX2内部函数):

Just a partial answer (only addition) and in pseudo code (should be easy to extent to AVX2 intrinsics):

uint8_t a, b;          // input containing two nibbles each

uint8_t c = a + b;     // add with (unwanted) carry between nibbles
uint8_t x = a ^ b ^ c; // bits which are result of a carry
x &= 0x10;             // only bit 4 is of interest
c -= x;                // undo carry of lower to upper nibble

如果已知 a b 的第4位未设置(即,高半字节的最低位),则可以省去> x .

If either a or b is known to have bit 4 unset (i.e. the lowest bit of the upper nibble), it can be left out the computation of x.

关于乘法:如果所有产品的 scale 都相同,则可能需要进行一些移位和加减运算(必要时掩盖溢出位).否则,恐怕您需要掩盖每个16位字的4位,进行操作,然后在最后将它们摆弄在一起.伪代码(没有AVX 8位乘法,因此我们需要使用16位字进行操作):

As for multiplication: If scale is the same for all products, you can likely get away with some shifting and adding/subtracting (masking out overflow bits where necessarry). Otherwise, I'm afraid you need to mask out 4 bits of each 16bit word, do the operation, and fiddle them together at the end. Pseudo code (there is no AVX 8bit multiplication, so we need to operate with 16bit words):

uint16_t m0=0xf, m1=0xf0, m2=0xf00, m3=0xf000; // masks for each nibble

uint16_t a, b; // input containing 4 nibbles each.

uint16_t p0 = (a*b) & m0; // lowest nibble, does not require masking a,b
uint16_t p1 = ((a>>4) * (b&m1)) & m1;
uint16_t p2 = ((a>>8) * (b&m2)) & m2;
uint16_t p3 = ((a>>12)* (b&m3)) & m3;

uint16_t result = p0 | p1 | p2 | p3;  // join results together 

这篇关于AVX 4位整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆