int64_t指针转换为AVX2内在_m256i [英] int64_t pointer cast to AVX2 intrinsic _m256i

查看:791
本文介绍了int64_t指针转换为AVX2内在_m256i的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我有一个奇怪的问题与AVX2内在。我使用int64_t * cast创建一个指向_m256i向量的指针。然后我通过解引用指针来分配一个值。奇怪的是,在向量变量中没有观察到值,除非我在它之后运行几个cout语句。指针和向量具有相同的存储器地址,并且解除引用指针产生正确的值,但是向量不会。我缺少什么?

Hello I have a strange problem with AVX2 intrinsics. I create a pointer to a _m256i vector with a int64_t* cast. I then assign a value by dereferencing the pointer. The strange thing is that the value isn't observed in the vector variable, unless i run a few cout statements after it. The pointer and the vector have the same memory address and dereferencing the pointer produces the correct value, but the vector does not. What am I missing?

// Vector Variable 
__m256i R_A0to3 = _mm256_set1_epi32(0xFFFFFFFF);

int64_t *ptr = NULL;
for(int m=0; m<4; m++){
    // Cast pointer to vector type
    ptr = (int64_t*)&R_A0to3;

    cout<<"ptr_ADDRESS:      "<<ptr<<endl;
    cout<<"&R_A0to3_ADDRESS: "<<&R_A0to3<<endl;

    // access
    ptr[m] = (int64_t) m_array[m];

    // generic function that prints out register
    print_mm256_reg<int64_t>(R_A0to3, "R_A0to3");
    cout<<"m_array: "<< m_array[m]<<std::ends;

    // Additional print statements
    cout<<"ptr[m]: "<< ptr[m]<<std::endl;
    cout<<"ptr[0]: "<< ptr[0]<<std::endl;
    cout<<"ptr[1]: "<< ptr[1]<<std::endl;
    cout<<"ptr[2]: "<< ptr[2]<<std::endl;
    cout<<"ptr[3]: "<< ptr[3]<<std::endl;
    print_mm256_reg<int64_t>(R_A0to3, "R_A0to3");
}







Output:
 ptr_ADDRESS      0x7ffd9313e880
 &R_A0to3_ADDRESS 0x7ffd9313e880
 m_array: 8
 printing reg -    R_C0to3    -1|  -1|  -1|  -1|
 printing reg -    R_D0to3    -1|  -1|  -1|  -1|

Output with Additional print statements:
ptr_ADDRESS      0x7ffd36359e20
&R_A0to3_ADDRESS 0x7ffd36359e20
printing reg -    R_A0to3     -1|  -1|  -1|  -1|
m_array: 8

ptr[0]: 8
ptr[1]: -1
ptr[2]: -1
ptr[3]: -1
printing reg -    R_A0to3      8|  -1|  -1|  -1|


推荐答案

我建议使用 _mm256_extract_epi64 _mm256_insert_epi64 内联函数,当您需要偶尔访问单个元素。如果需要访问向量中的所有元素,请考虑使用 _mm256_store_si256 _mm256_lddqu_si256 来存储和加载它。这些内在函数不太可能依赖未定义的行为,并且它们对于生成的机器指令是透明的(因此对于性能)。

I suggest using the _mm256_extract_epi64 and _mm256_insert_epi64 intrinsics when you need occasional access to individual elements. If you need to access all elements from the vector, consider using _mm256_store_si256 and _mm256_lddqu_si256 to store and load it. These intrinsics are less likely to rely on undefined behavior and they are transparent as to the machine instructions being generated (and thus as to the performance).

这篇关于int64_t指针转换为AVX2内在_m256i的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆