使用联合(封装在结构中)绕过霓虹灯数据类型的转换 [英] Using an union (encapsulated in a struct) to bypass conversions for neon data types

查看:25
本文介绍了使用联合(封装在结构中)绕过霓虹灯数据类型的转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用 SSE 的矢量化内在函数做了我的第一个方法,其中基本上只有一种数据类型 __m128i.切换到 Neon,我发现数据类型和函数原型更加具体,例如uint8x16_t(16 个 unsigned char 的向量),uint8x8x2_t(2 个向量,每个向量 8 个 unsigned char),<代码>uint32x4_t(带有4个uint32_t的向量)等

I made my first approach with vectorization intrinsics with SSE, where there is basically only one data type __m128i. Switching to Neon I found the data types and function prototypes to be much more specific, e.g. uint8x16_t (a vector of 16 unsigned char), uint8x8x2_t (2 vectors with 8 unsigned char each), uint32x4_t (a vector with 4 uint32_t) etc.

一开始我很热情(找到对所需数据类型进行操作的确切函数要容易得多),然后我发现想要以不同方式处理数据是多么混乱.使用特定的转换运算符 会带我永远.该问题也在此处得到解决.然后我想到了将联合封装到结构体中的想法,以及一些转换和赋值运算符.

First I was enthusiastic (much easier to find the exact function operating on the desired data type), then I saw what a mess it was when wanting to treat the data in different ways. Using specific casting operators would take me forever. The problem is also addressed here. I then came up with the idea of an union encapsulated into a struct, and some casting and assignment operators.

struct uint_128bit_t { union {
        uint8x16_t uint8x16;
        uint16x8_t uint16x8;
        uint32x4_t uint32x4;
        uint8x8x2_t uint8x8x2;
        uint8_t uint8_array[16] __attribute__ ((aligned (16) ));
        uint16_t uint16_array[8] __attribute__ ((aligned (16) ));
        uint32_t uint32_array[4] __attribute__ ((aligned (16) ));
    };

    operator uint8x16_t& () {return uint8x16;}
    operator uint16x8_t& () {return uint16x8;}
    operator uint32x4_t& () {return uint32x4;}
    operator uint8x8x2_t& () {return uint8x8x2;}
    uint8x16_t& operator =(const uint8x16_t& in) {uint8x16 = in; return uint8x16;}
    uint8x8x2_t& operator =(const uint8x8x2_t& in) {uint8x8x2 = in; return uint8x8x2;}

};

这种方法对我有用:我可以使用 uint_128bit_t 类型的变量作为参数并使用不同的 Neon 内在函数输出,例如vshlq_n_u32vuzp_u8vget_low_u8(在本例中作为输入).如果需要,我可以使用更多数据类型扩展它.注意:数组是为了方便地打印变量的内容.

This approach works for me: I can use a variable of type uint_128bit_t as an argument and output with different Neon intrinsics, e.g. vshlq_n_u32, vuzp_u8, vget_low_u8 (in this case just as input). And I can extend it with more data types if I need. Note: The arrays are to easily print the content of a variable.

这是正确的处理方式吗?
有什么隐藏的缺陷吗?
我重新发明了轮子吗?
(是否需要对齐属性?)

Is this a correct way of proceeding?
Is there any hidden flaw?
Have I reinvented the wheel?
(Is the aligned attribute necessary?)

推荐答案

由于最初提出的方法在 C++ 中有未定义的行为,我已经实现了这样的东西:

Since the initial proposed method has undefined behaviour in C++, I have implemented something like this:

template <typename T>
struct NeonVectorType {

    private:
    T data;

    public:
    template <typename U>
    operator U () {
        BOOST_STATIC_ASSERT_MSG(sizeof(U) == sizeof(T),"Trying to convert to data type of different size");
        U u;
        memcpy( &u, &data, sizeof u );
        return u;
    }

    template <typename U>
    NeonVectorType<T>& operator =(const U& in) {
        BOOST_STATIC_ASSERT_MSG(sizeof(U) == sizeof(T),"Trying to copy from data type of different size");
        memcpy( &data, &in, sizeof data );
        return *this;
    }

};

那么:

typedef NeonVectorType<uint8x16_t> uint_128bit_t; //suitable for uint8x16_t, uint8x8x2_t, uint32x4_t, etc.
typedef NeonVectorType<uint8x8_t> uint_64bit_t; //suitable for uint8x8_t, uint32x2_t, etc.

memcpy 的使用在这里(和此处),并且避免违反严格的别名规则.请注意 一般情况下它会被优化.

The use of memcpy is discussed here (and here), and avoids breaking the strict aliasing rule. Note that in general it gets optimized away.

如果您查看编辑历史记录,我已经实现了一个自定义版本,其中包含用于向量向量的组合运算符(例如 uint8x8x2_t).此处提到了该问题.但是,由于这些数据类型被声明为数组(参见 指南,第 12.2.2 节),因此位于连续的内存位置,编译器必然会正确处理 memcpy.

If you look at the edit history, I had implemented a custom version with combine operators for vectors of vectors (e.g. uint8x8x2_t). The problem was mentioned here. However, since those data types are declared as arrays (see guide, section 12.2.2) and therefore located in consecutive memory locations, the compiler is bound to treat the memcpy correctly.

最后,要打印变量的内容,可以使用这样的函数.

Finally, to print the content of the variable one could use a function like this.

这篇关于使用联合(封装在结构中)绕过霓虹灯数据类型的转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆