什么是最快的整数除以零配套分工无论结果是什么? [英] What is the fastest integer division supporting division by zero no matter what the result is?

查看:110
本文介绍了什么是最快的整数除以零配套分工无论结果是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

摘要:

我在找最快的方式来计算。

I'm looking for the fastest way to calculate

(int) x / (int) y

没有得到一个例外Ÿ== 0 。相反,我只是想要一个任意的结果。

without getting an exception for y==0. Instead I just want an arbitrary result.

背景:

在编码图像处理算法,我经常需要用(累积)Alpha值来划分。最简单的变体是纯C code。与整数运算。我的问题是我通常得到一个除以零错误结果像素阿尔法== 0 。然而,这正是它的结果不会在所有事情的像素:我不在乎像素的色彩值与阿尔法== 0

When coding image processing algorithms I often need to divide by an (accumulated) alpha value. The most simple variant is plain C code with integer arithmetic. My problem is that I typically get a division by zero error for result pixels with alpha==0. However this are exactly the pixels where the result doesn't matter at all: I don't care about color values of pixels with alpha==0.

详细内容:

我在寻找类似:

result = (y==0)? 0 : x/y;

result = x / MAX( y, 1 );

x和y为正整数。在code执行的次数数量庞大的嵌套循环,所以我在寻找一种方式来摆脱条件分支的。

x and y are positive integers. The code is executed a huge number of times in a nested loop, so I'm looking for a way to get rid of the conditional branching.

当Y不超过字节范围,我很高兴与解决方案

When y does not exceed the byte range, I'm happy with the solution

unsigned char kill_zero_table[256] = { 1, 1, 2, 3, 4, 5, 6, 7, [...] 255 };
[...]
result = x / kill_zero_table[y];

不过,这显然不能很好地更大范围的工作。

But this obviously does not work well for bigger ranges.

我想最后的问题是:请告诉我最快位摆弄黑客改变0到任何其他整数值,同时保留所有其他值不变。

I guess the final question is: Whats the fastest bit twiddling hack changing 0 to any other integer value, while leaving all other values unchanged?

澄清

我不是100%肯定分支太贵了。然而,使用不同的编译器,所以我美元,小的优化p $ PFER基准(这确实是值得怀疑)。

I'm not 100% sure that branching is too expensive. However, different compilers are used, so I prefer benchmarking with little optimizations (which is indeed questionable).

有关肯定,编译器是伟大的,当谈到位操作,但我不能EX preSS的不关心的结果在C,所以编译器将永远无法使用全系列优化。

For sure, compilers are great when it comes to bit twiddling, but I can't express the "don't care" result in C, so the compiler will never be able to use the full range of optimizations.

code应该是完全兼容C,主要的平台是Linux的64位与海湾合作委员会和放大器;铛和MacOS。

Code should be fully C compatible, main platforms are Linux 64 Bit with gcc & clang and MacOS.

推荐答案

这是一些我得到了我的奔腾摆脱分支的意见和启发GCC 编译器使用

Inspired by some of the comments I got rid of the branch on my Pentium and gcc compiler using

int f (int x, int y)
{
        y += y == 0;
        return x/y;
}

编译器基本上识别出它可以在除了使用该试验的条件旗标

The compiler basically recognizes that it can use a condition flag of the test in the addition.

根据要求汇编:

.globl f
    .type   f, @function
f:
    pushl   %ebp
    xorl    %eax, %eax
    movl    %esp, %ebp
    movl    12(%ebp), %edx
    testl   %edx, %edx
    sete    %al
    addl    %edx, %eax
    movl    8(%ebp), %edx
    movl    %eax, %ecx
    popl    %ebp
    movl    %edx, %eax
    sarl    $31, %edx
    idivl   %ecx
    ret

由于这竟然是如此受欢迎的问题和答案,我会阐述多一点。上面的例子是基于一个编译器识别程序成语。在上述情况下的布尔前pression在积分算术使用和使用条件标记的在硬件被发明用于此目的。在一般情况标志只在C通过使用成语访问。这就是为什么这么难使便携式多precision整数C库,而不诉诸(内置)总成。我的猜测是最体面的编译器会明白上面的成语。

As this turned out to be such a popular question and answer, I'll elaborate a bit more. The above example is based on programming idiom that a compiler recognizes. In the above case a boolean expression is used in integral arithmetic and the use of condition flags are invented in hardware for this purpose. In general condition flags are only accessible in C through using idiom. That is why it so hard to make a portable multiple precision integer library in C without resorting to (inline) assembly. My guess is that most decent compilers will understand the above idiom.

避免树枝,在上面的一些评论也指出的另一种方式,是predicated执行。因此,我把菲利普的第一个code和我的code和运行它通过从ARM编译器和GCC编译器为ARM架构,其特点predicated执行。这两种编译器避免code的两个样品中的分支:

Another way of avoiding branches, as also remarked in some of the above comments, is predicated execution. I therefore took philipp's first code and my code and ran it through the compiler from ARM and the GCC compiler for the ARM architecture, which features predicated execution. Both compilers avoid the branch in both samples of code:

菲利普的版本与ARM编译器:

Philipp's version with the ARM compiler:

f PROC
        CMP      r1,#0
        BNE      __aeabi_idivmod
        MOVEQ    r0,#0
        BX       lr

菲利普的版本GCC:

Philipp's version with GCC:

f:
        subs    r3, r1, #0
        str     lr, [sp, #-4]!
        moveq   r0, r3
        ldreq   pc, [sp], #4
        bl      __divsi3
        ldr     pc, [sp], #4

我的code。与ARM编译器:

My code with the ARM compiler:

f PROC
        RSBS     r2,r1,#1
        MOVCC    r2,#0
        ADD      r1,r1,r2
        B        __aeabi_idivmod

我的code与海湾合作委员会:

My code with GCC:

f:
        str     lr, [sp, #-4]!
        cmp     r1, #0
        addeq   r1, r1, #1
        bl      __divsi3
        ldr     pc, [sp], #4

所有版本仍然需要一个分支来划分的套路,因为这个版本的ARM不具备硬件除法,但测试Ÿ== 0 完全通过predicated执行落实。

All versions still need a branch to the division routine, because this version of the ARM doesn't have hardware for a division, but the test for y == 0 is fully implemented through predicated execution.

这篇关于什么是最快的整数除以零配套分工无论结果是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆