SSE整除法? [英] SSE integer division?

查看:177
本文介绍了SSE整除法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有_mm_div_ps用于浮点值除法,有_mm_mullo_epi16用于整数乘法。但是有整数除法(16位值)的东西吗?

There is _mm_div_ps for floating-point values division, there is _mm_mullo_epi16 for integer multiplication. But is there something for integer division (16 bits value)? How can i conduct such division?

推荐答案

请参阅Agner Fog的矢量类,他已经实现了一个快速算法来执行整数除法与SSE / AVX用于8位,16位和32位字(但不是64位) http://www.agner .org / optimize /#vectorclass

Please see Agner Fog's vectorclass he has implemented a fast algorithm to do integer division with SSE/AVX for 8-bit, 16-bit, and 32-bit words (but not 64-bit) http://www.agner.org/optimize/#vectorclass

查看文件vectori128.h中的代码和algoirthm的描述,作为他写得很好的手册VectorClass.pdf

Look in the file vectori128.h for the code and a description of the algoirthm as his well written manual VectorClass.pdf

这是一个描述他的手册算法的片段。

Here is a fragment describing the algorithm from his manual.

整数除法
在x86指令集中没有指令,它的扩展是
对整数向量除法有用,如果它们
存在,这样的指令将是相当慢的。因此,向量类库使用快速整数该算法的基本原理可以用下面的公式表示:
a / b≈a *(2n / b)>> n
该计算过程如下:
1.为n
找到一个合适的值2.计算2n / b
3.计算舍入误差的必要校正
4.执行乘法和右移并应用舍入修正
errors

"Integer division There are no instructions in the x86 instruction set and its extensions that are useful for integer vector division, and such instructions would be quite slow if they existed. Therefore, the vector class library is using an algorithm for fast integer division. The basic principle of this algorithm can be expressed in this formula: a / b ≈ a * (2n / b) >> n This calculation goes through the following steps: 1. find a suitable value for n 2. calculate 2n / b 3. calculate necessary corrections for rounding errors 4. do the multiplication and shift-right and apply corrections for rounding errors

如果多个数字除以相同的除数
b,这个公式是有利的。步骤1,2和3只需要执行一次,而对于被除数a的每个
值重复步骤4。数学细节在文件
vectori128.h中描述。 (参见T.Granlund和PLMontgomery:Division by Invariant
Integers Using Multiplication,Proceedings of the SIGPLAN。...

This formula is advantageous if multiple numbers are divided by the same divisor b. Steps 1, 2 and 3 need only be done once while step 4 is repeated for each value of the dividend a. The mathematical details are described in the file vectori128.h. (See also T. Granlund and P. L. Montgomery: Division by Invariant Integers Using Multiplication, Proceedings of the SIGPLAN."...

接近文件末尾的vectori128.h显示如何使用标量变量进行短划分
计算用于快速分割的参数需要更多的时间,而不是计算
。因此,多次使用相同的除数对象
是有利的。例如,将80个无符号整数除以10:

near the end of the file vectori128.h shows how to do short division with a scalar variable "It takes more time to compute the parameters used for fast division than to do the division. Therefore, it is advantageous to use the same divisor object multiple times. For example, to divide 80 unsigned short integers by 10:

short x = 10;
uint16_t dividends[80], quotients[80];         // numbers to work with
Divisor_us div10(x);                          // make divisor object for dividing by 10
Vec8us temp;                                   // temporary vector
for (int i = 0; i < 80; i += 8) {              // loop for 4 elements per iteration
    temp.load(dividends+i);                    // load 4 elements
    temp /= div10;                             // divide each element by 10
    temp.store(quotients+i);                   // store 4 elements
}

编辑:整数除以向量的短裤

#include <stdio.h>
#include "vectorclass.h"

int main() {    
    short numa[] = {10, 20, 30, 40, 50, 60, 70, 80};
    short dena[] = {10, 20, 30, 40, 50, 60, 70, 80};

    Vec8s num = Vec8s().load(numa);
    Vec8s den = Vec8s().load(dena);

    Vec4f num_low = to_float(extend_low(num));
    Vec4f num_high = to_float(extend_high(num));
    Vec4f den_low = to_float(extend_low(den));
    Vec4f den_high = to_float(extend_high(den));

    Vec4f qf_low = num_low/den_low;
    Vec4f qf_high = num_high/den_high;
    Vec4i q_low = truncate_to_int(qf_low);
    Vec4i q_high = truncate_to_int(qf_high);

    Vec8s q = compress(q_low, q_high);
    for(int i=0; i<8; i++) {
        printf("%d ", q[i]);
    } printf("\n");
}

这篇关于SSE整除法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆