加载GCC矢量扩展的数据 [英] Loading data for GCC's vector extensions
问题描述
GCC的矢量扩展提供了一个很好的,合理的便携方式来访问不同的SIMD指令硬件体系结构,而不诉诸于硬件专用内部函数(或自动向量化)。
一个实际的用例是计算一个简单的加法校验和。
typedef char v16qi __attribute__((vector_size( 16)));
static uint8_t checksum(uint8_t * buf,size_t size)
{
assert(size%16 == 0);
uint8_t sum = 0;
vec16qi vec = {0};
for(size_t i = 0; i<(size / 16); i ++)
{
// XXX:Yuck!有没有更好的办法?
vec + = *((v16qi *)buf + i * 16);
}
//总结矢量
sum = vec [0] + vec [1] + vec [2] + vec [3] + vec [4] + vec [5] + vec [6] + vec [7] + vec [8] + vec [9] + vec [10] + vec [11] + vec [12] + vec [13] + vec [14] + VEC [15];
归还金额;
}
投射指向矢量类型的指针似乎可行,但我很担心如果SIMD硬件期望矢量类型正确对齐,这可能会以可怕的方式爆炸。
我想到的唯一的其他选项是使用临时向量,并明确加载值(通过memcpy或元素明智的赋值),但是在测试中,这抵消了大部分使用SIMD指令的加速。理想情况下,我可以想象这就像是一个普通的 __ builtin_load()
函数,但似乎都没有。
将数据加载到向量中的更安全方式可能会遇到对齐问题?
您可以使用初始化程序加载值,即做
$ $ p $ code> const vec16qi e = {buf [0],buf [1],...,buf [15]}
,并希望GCC将其转换为SSE加载指令。我会用一个反汇编器来验证,但是;-)。此外,为了获得更好的性能,您可以尝试使 buf
16字节对齐,并通过对齐的
属性通知编译器。如果你可以保证输入缓冲区是对齐的,按字节处理,直到你达到16字节的边界。
GCC's vector extensions offer a nice, reasonably portable way of accessing some SIMD instructions on different hardware architectures without resorting to hardware specific intrinsics (or auto-vectorization).
A real use case, is calculating a simple additive checksum. The one thing that isn't clear is how to safely load data into a vector.
typedef char v16qi __attribute__ ((vector_size(16)));
static uint8_t checksum(uint8_t *buf, size_t size)
{
assert(size%16 == 0);
uint8_t sum = 0;
vec16qi vec = {0};
for (size_t i=0; i<(size/16); i++)
{
// XXX: Yuck! Is there a better way?
vec += *((v16qi*) buf+i*16);
}
// Sum up the vector
sum = vec[0] + vec[1] + vec[2] + vec[3] + vec[4] + vec[5] + vec[6] + vec[7] + vec[8] + vec[9] + vec[10] + vec[11] + vec[12] + vec[13] + vec[14] + vec[15];
return sum;
}
Casting a pointer to the vector type appears to work, but I'm worried this might explode in a horrible fashion if SIMD hardware expects the vector types to be correctly aligned.
The only other option I've thought of is use a temp vector and explicitly load the values (via either a memcpy or element-wise assignment), but in testing this counteract most of speedup gained use of SIMD instructions. Ideally I'd imagine this would be something like a generic __builtin_load()
function, but none seems to exist.
What's a safer way of loading data into a vector risking alignment issues?
You could use an initializer to load the values, i.e. do
const vec16qi e = { buf[0], buf[1], ... , buf[15] }
and hope that GCC turns this into a SSE load instruction. I'd verify that with a dissassembler, though ;-). Also, for better performance, you try to make buf
16-byte aligned, and inform that compiler via an aligned
attribute. If you can guarantee that the input buffer will be aligned, process it bytewise until you've reached a 16-byte boundard.
这篇关于加载GCC矢量扩展的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!