int64_t指针转换为AVX2内在_m256i [英] int64_t pointer cast to AVX2 intrinsic _m256i
问题描述
您好,我有一个奇怪的问题与AVX2内在。我使用int64_t * cast创建一个指向_m256i向量的指针。然后我通过解引用指针来分配一个值。奇怪的是,在向量变量中没有观察到值,除非我在它之后运行几个cout语句。指针和向量具有相同的存储器地址,并且解除引用指针产生正确的值,但是向量不会。我缺少什么?
Hello I have a strange problem with AVX2 intrinsics. I create a pointer to a _m256i vector with a int64_t* cast. I then assign a value by dereferencing the pointer. The strange thing is that the value isn't observed in the vector variable, unless i run a few cout statements after it. The pointer and the vector have the same memory address and dereferencing the pointer produces the correct value, but the vector does not. What am I missing?
// Vector Variable
__m256i R_A0to3 = _mm256_set1_epi32(0xFFFFFFFF);
int64_t *ptr = NULL;
for(int m=0; m<4; m++){
// Cast pointer to vector type
ptr = (int64_t*)&R_A0to3;
cout<<"ptr_ADDRESS: "<<ptr<<endl;
cout<<"&R_A0to3_ADDRESS: "<<&R_A0to3<<endl;
// access
ptr[m] = (int64_t) m_array[m];
// generic function that prints out register
print_mm256_reg<int64_t>(R_A0to3, "R_A0to3");
cout<<"m_array: "<< m_array[m]<<std::ends;
// Additional print statements
cout<<"ptr[m]: "<< ptr[m]<<std::endl;
cout<<"ptr[0]: "<< ptr[0]<<std::endl;
cout<<"ptr[1]: "<< ptr[1]<<std::endl;
cout<<"ptr[2]: "<< ptr[2]<<std::endl;
cout<<"ptr[3]: "<< ptr[3]<<std::endl;
print_mm256_reg<int64_t>(R_A0to3, "R_A0to3");
}
Output:
ptr_ADDRESS 0x7ffd9313e880
&R_A0to3_ADDRESS 0x7ffd9313e880
m_array: 8
printing reg - R_C0to3 -1| -1| -1| -1|
printing reg - R_D0to3 -1| -1| -1| -1|
Output with Additional print statements:
ptr_ADDRESS 0x7ffd36359e20
&R_A0to3_ADDRESS 0x7ffd36359e20
printing reg - R_A0to3 -1| -1| -1| -1|
m_array: 8
ptr[0]: 8
ptr[1]: -1
ptr[2]: -1
ptr[3]: -1
printing reg - R_A0to3 8| -1| -1| -1|
推荐答案
我建议使用 _mm256_extract_epi64
和 _mm256_insert_epi64
内联函数,当您需要偶尔访问单个元素。如果需要访问向量中的所有元素,请考虑使用 _mm256_store_si256
和 _mm256_lddqu_si256
来存储和加载它。这些内在函数不太可能依赖未定义的行为,并且它们对于生成的机器指令是透明的(因此对于性能)。
I suggest using the _mm256_extract_epi64
and _mm256_insert_epi64
intrinsics when you need occasional access to individual elements. If you need to access all elements from the vector, consider using _mm256_store_si256
and _mm256_lddqu_si256
to store and load it. These intrinsics are less likely to rely on undefined behavior and they are transparent as to the machine instructions being generated (and thus as to the performance).
这篇关于int64_t指针转换为AVX2内在_m256i的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!