如何将__m256i向量除以整数变量? [英] How to divide a __m256i vector by an integer variable?

查看:275
本文介绍了如何将__m256i向量除以整数变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将AVX2向量除以一个常数.我访问了此问题以及许多其他页面.看到可能对定点算术有帮助的东西,我听不懂.所以问题是这种分裂是瓶颈.我尝试了两种方法:

I want to divide an AVX2 vector by a constant. I visited this question and many other pages. Saw something that might help Fixed-point arithmetic and I didn't understand. So the problem is this division is the bottleneck. I tried two ways:

首先,将其强制转换为浮点,然后使用AVX指令进行操作:

First, casting to float and do the operation with AVX instruction:

//outside the bottleneck:
__m256i veci16; // containing some integer numbers (16x16-bit numbers)
__m256 div_v = _mm256_set1_ps(div);

//inside the bottlneck
//some calculations which make veci16
vecps = _mm256_castsi256_ps (veci16);
vecps = _mm256_div_ps (vecps, div_v);
veci16 = _mm256_castps_si256 (vecps);
_mm256_storeu_si256((__m256i *)&output[i][j], veci16);

采用第一种方法时,问题是:不进行除法运算,经过时间为5ns,经过除法时间约为60ns.

With the first method, the problem is: without division elapsed time is 5ns and with this elapsed time is about 60ns.

第二,我存储到一个数组并像这样加载它:

Second, I stored to an array and loaded it like this:

int t[16] ;
inline __m256i _mm256_div_epi16 (__m256i a , int b){

    _mm256_store_si256((__m256i *)&t[0] , a);
    t[0]/=b; t[1]/=b; t[2]/=b; t[3]/=b; t[4]/=b; t[5]/=b; t[6]/=b; t[7]/=b;
    t[8]/=b; t[9]/=b; t[10]/=b; t[11]/=b; t[12]/=b; t[13]/=b; t[14]/=b; t[15]/=b;
    return _mm256_load_si256((__m256i *)&t[0]);         
}

好吧,这更好.但是仍然经过的时间是17ns.计算结果太多,无法在此处显示.

Well, it was better. But still elapsed time is 17ns. Calculations are too much to show here.

问题是:有没有更快的方法来优化此内联函数?

The question is: Is there any faster way to optimize this inline function?

推荐答案

您可以使用请注意,此假设为b > 1.

这篇关于如何将__m256i向量除以整数变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆