加载GCC矢量扩展的数据 [英] Loading data for GCC's vector extensions

查看:209
本文介绍了加载GCC矢量扩展的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

GCC的矢量扩展提供了一个很好的,合理的便携方式来访问不同的SIMD指令硬件体系结构,而不诉诸于硬件专用内部函数(或自动向量化)。



一个实际的用例是计算一个简单的加法校验和。

  typedef char v16qi __attribute__((vector_size( 16))); 

static uint8_t checksum(uint8_t * buf,size_t size)
{
assert(size%16 == 0);
uint8_t sum = 0;

vec16qi vec = {0};
for(size_t i = 0; i<(size / 16); i ++)
{
// XXX:Yuck!有没有更好的办法?
vec + = *((v16qi *)buf + i * 16);
}

//总结矢量
sum = vec [0] + vec [1] + vec [2] + vec [3] + vec [4] + vec [5] + vec [6] + vec [7] + vec [8] + vec [9] + vec [10] + vec [11] + vec [12] + vec [13] + vec [14] + VEC [15];

归还金额;
}

投射指向矢量类型的指针似乎可行,但我很担心如果SIMD硬件期望矢量类型正确对齐,这可能会以可怕的方式爆炸。



我想到的唯一的其他选项是使用临时向量,并明确加载值(通过memcpy或元素明智的赋值),但是在测试中,这抵消了大部分使用SIMD指令的加速。理想情况下,我可以想象这就像是一个普通的 __ builtin_load()函数,但似乎都没有。



将数据加载到向量中的更安全方式可能会遇到对齐问题?

解决方案

您可以使用初始化程序加载值,即做

$ $ p $ code> const vec16qi e = {buf [0],buf [1],...,buf [15]}

,并希望GCC将其转换为SSE加载指令。我会用一个反汇编器来验证,但是;-)。此外,为了获得更好的性能,您可以尝试使 buf 16字节对齐,并通过对齐的属性通知编译器。如果你可以保证输入缓冲区是对齐的,按字节处理,直到你达到16字节的边界。


GCC's vector extensions offer a nice, reasonably portable way of accessing some SIMD instructions on different hardware architectures without resorting to hardware specific intrinsics (or auto-vectorization).

A real use case, is calculating a simple additive checksum. The one thing that isn't clear is how to safely load data into a vector.

typedef char v16qi __attribute__ ((vector_size(16)));

static uint8_t checksum(uint8_t *buf, size_t size)
{
    assert(size%16 == 0);
    uint8_t sum = 0;

    vec16qi vec = {0};
    for (size_t i=0; i<(size/16); i++)
    {
        // XXX: Yuck! Is there a better way?
        vec += *((v16qi*) buf+i*16);
    }

    // Sum up the vector
    sum = vec[0] + vec[1] + vec[2] + vec[3] + vec[4] + vec[5] + vec[6] + vec[7] + vec[8] + vec[9] + vec[10] + vec[11] + vec[12] + vec[13] + vec[14] + vec[15];

    return sum;
}

Casting a pointer to the vector type appears to work, but I'm worried this might explode in a horrible fashion if SIMD hardware expects the vector types to be correctly aligned.

The only other option I've thought of is use a temp vector and explicitly load the values (via either a memcpy or element-wise assignment), but in testing this counteract most of speedup gained use of SIMD instructions. Ideally I'd imagine this would be something like a generic __builtin_load() function, but none seems to exist.

What's a safer way of loading data into a vector risking alignment issues?

解决方案

You could use an initializer to load the values, i.e. do

const vec16qi e = { buf[0], buf[1], ... , buf[15] }

and hope that GCC turns this into a SSE load instruction. I'd verify that with a dissassembler, though ;-). Also, for better performance, you try to make buf 16-byte aligned, and inform that compiler via an aligned attribute. If you can guarantee that the input buffer will be aligned, process it bytewise until you've reached a 16-byte boundard.

这篇关于加载GCC矢量扩展的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆