加载GCC矢量扩展的数据 [英] Loading data for GCC's vector extensions

查看：209 发布时间：2018/4/20 16:40:50 gcc checksum vectorization simd

本文介绍了加载GCC矢量扩展的数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

GCC的矢量扩展提供了一个很好的，合理的便携方式来访问不同的SIMD指令硬件体系结构，而不诉诸于硬件专用内部函数（或自动向量化）。

一个实际的用例是计算一个简单的加法校验和。

  typedef char v16qi __attribute__（（vector_size（ 16）））; 
 
 static uint8_t checksum（uint8_t * buf，size_t size）
 {
 assert（size％16 == 0）; 
 uint8_t sum = 0; 
 
 vec16qi vec = {0}; 
 for（size_t i = 0; i<（size / 16）; i ++）
 {
 // XXX：Yuck！有没有更好的办法？ 
 vec + = *（（v16qi *）buf + i * 16）; 
} 
 
 //总结矢量
 sum = vec [0] + vec [1] + vec [2] + vec [3] + vec [4] + vec [5] + vec [6] + vec [7] + vec [8] + vec [9] + vec [10] + vec [11] + vec [12] + vec [13] + vec [14] + VEC [15]; 
 
归还金额; 
}

投射指向矢量类型的指针似乎可行，但我很担心如果SIMD硬件期望矢量类型正确对齐，这可能会以可怕的方式爆炸。

我想到的唯一的其他选项是使用临时向量，并明确加载值（通过memcpy或元素明智的赋值），但是在测试中，这抵消了大部分使用SIMD指令的加速。理想情况下，我可以想象这就像是一个普通的 __ builtin_load（）函数，但似乎都没有。

将数据加载到向量中的更安全方式可能会遇到对齐问题？

解决方案

您可以使用初始化程序加载值，即做

$ $ p $ code> const vec16qi e = {buf [0]，buf [1]，...，buf [15]}

，并希望GCC将其转换为SSE加载指令。我会用一个反汇编器来验证，但是;-)。此外，为了获得更好的性能，您可以尝试使 buf 16字节对齐，并通过对齐的属性通知编译器。如果你可以保证输入缓冲区是对齐的，按字节处理，直到你达到16字节的边界。

GCC's vector extensions offer a nice, reasonably portable way of accessing some SIMD instructions on different hardware architectures without resorting to hardware specific intrinsics (or auto-vectorization).

A real use case, is calculating a simple additive checksum. The one thing that isn't clear is how to safely load data into a vector.

typedef char v16qi __attribute__ ((vector_size(16)));

static uint8_t checksum(uint8_t *buf, size_t size)
{
    assert(size%16 == 0);
    uint8_t sum = 0;

    vec16qi vec = {0};
    for (size_t i=0; i<(size/16); i++)
    {
        // XXX: Yuck! Is there a better way?
        vec += *((v16qi*) buf+i*16);
    }

    // Sum up the vector
    sum = vec[0] + vec[1] + vec[2] + vec[3] + vec[4] + vec[5] + vec[6] + vec[7] + vec[8] + vec[9] + vec[10] + vec[11] + vec[12] + vec[13] + vec[14] + vec[15];

    return sum;
}

Casting a pointer to the vector type appears to work, but I'm worried this might explode in a horrible fashion if SIMD hardware expects the vector types to be correctly aligned.

The only other option I've thought of is use a temp vector and explicitly load the values (via either a memcpy or element-wise assignment), but in testing this counteract most of speedup gained use of SIMD instructions. Ideally I'd imagine this would be something like a generic __builtin_load() function, but none seems to exist.

What's a safer way of loading data into a vector risking alignment issues?

解决方案

You could use an initializer to load the values, i.e. do

const vec16qi e = { buf[0], buf[1], ... , buf[15] }

and hope that GCC turns this into a SSE load instruction. I'd verify that with a dissassembler, though ;-). Also, for better performance, you try to make buf 16-byte aligned, and inform that compiler via an aligned attribute. If you can guarantee that the input buffer will be aligned, process it bytewise until you've reached a 16-byte boundard.

这篇关于加载GCC矢量扩展的数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

加载GCC矢量扩展的数据 [英] Loading data for GCC's vector extensions

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

加载GCC矢量扩展的数据 [英] Loading data for GCC&#39;s vector extensions

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

加载GCC矢量扩展的数据 [英] Loading data for GCC's vector extensions

登录关闭