重复整数除以运行时常量值 [英] Repeated integer division by a runtime constant value

查看:75
本文介绍了重复整数除以运行时常量值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在程序的某个时刻,我计算了一个整数除数d.从那时起,d将保持不变.

At some point in my program I compute an integer divisor d. From that point onward d is going to be constant.

稍后在代码中,由于d的值不是编译时已知的常数,因此我将对d除以多次-执行整数除法.

Later in the code I will divide by that d several times - performing an integer division, since the value of d is not a compile-time known constant.

鉴于整数除法与其他类型的整数算术相比是一个相对较慢的过程,我想对其进行优化.有什么其他格式可以存储d,以便除法过程可以更快地执行?也许是某种形式的对等?

Given that integer division is a relatively slow process compared to other kind of integer arithmetic, I would like to optimize it. Is there some alternative format that I could store d in, so that the division process would perform faster? Maybe a reciprocal of some form?

我不需要d的任何其他值.

I do not need the value of d for anything else.

d的值可以是任何64位整数,但通常非常适合32位.

The value of d is any 64-bit integer, but usually fits in 32-bit quite well.

推荐答案

为此提供了一个库- libdivide :

There is a library for this—libdivide:

libdivide是一个用于优化整数除法的开源库

libdivide允许您将昂贵的整数除法替换为 比较便宜的乘法和移位.编译器通常会 这,但仅当在编译时知道除数时. libdivide 使您可以在运行时利用它.结果是 整数除法可以变得更快-快得多.此外, libdivide允许您将SSE2向量除以运行时常量, 这特别好,因为SSE2没有整数除法 说明!

libdivide allows you to replace expensive integer divides with comparatively cheap multiplication and bitshifts. Compilers usually do this, but only when the divisor is known at compile time. libdivide allows you to take advantage of it at runtime. The result is that integer division can become faster - a lot faster. Furthermore, libdivide allows you to divide an SSE2 vector by a runtime constant, which is especially nice because SSE2 has no integer division instructions!

libdivide是免费的开放源代码,并具有许可许可.名字 "libdivide"有点开玩笑,因为本身没有库: 代码完全打包为单个标头文件,同时包含C和 C ++ API.

libdivide is free and open source with a permissive license. The name "libdivide" is a bit of a joke, as there is no library per se: the code is packaged entirely as a single header file, with both a C and a C++ API.

您可以在以下博客中了解其背后的算法.例如,此条目.

You can read about the algorithm behind it at this blog; for example, this entry.

基本上,其背后的算法与编译器用于按常数优化除法的算法相同,只是它允许在运行时完成这些强度降低优化.

Basically, the algorithm behind it is the same one that compilers use to optimize division by a constant, except that it allows these strength-reduction optimizations to be done at run-time.

注意:您可以创建甚至更快的libdivide版本.这个想法是,对于每个除数,您始终可以创建一个三元组(mul/add/shift),因此此表达式给出结果:(num * mul + add)>> shift(此处的乘数是一个宽乘数).有趣的是,该方法可以击败编译器版本的多个微基准进行恒定除法!

Note: you can create an even faster version of libdivide. The idea is that for every divisor, you can always create a triplet (mul/add/shift), so this expression gives the result: (num*mul+add)>>shift (multiply is a wide multiply here). Interestingly, this method could beat the compiler version for constant division for several microbenchmarks!

这是我的实现(无法直接编译,但可以看到常规算法):

Here's my implementation (this is not compilable out of the box, but the general algorithm can be seen):

struct Divider_u32 {
    u32 mul;
    u32 add;
    s32 shift;

    void set(u32 divider);
};

void Divider_u32::set(u32 divider) {
    s32 l = indexOfMostSignificantBit(divider);
    if (divider&(divider-1)) {
        u64 m = static_cast<u64>(1)<<(l+32);
        mul = static_cast<u32>(m/divider);

        u32 rem = static_cast<u32>(m)-mul*divider;
        u32 e = divider-rem;

        if (e<static_cast<u32>(1)<<l) {
            mul++;
            add = 0;
        } else {
            add = mul;
        }
        shift = l;
    } else {
        if (divider==1) {
            mul = 0xffffffff;
            add = 0xffffffff;
            shift = 0;
        } else {
            mul = 0x80000000;
            add = 0;
            shift = l-1;
        }
    }
}

u32 operator/(u32 v, const Divider_u32 &div) {
    u32 t = static_cast<u32>((static_cast<u64>(v)*div.mul+div.add)>>32)>>div.shift;

    return t;
}

这篇关于重复整数除以运行时常量值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆