为什么使用A/< constant-int> A是无符号还是有符号,更快吗? [英] Why A / <constant-int> is faster when A is unsigned vs signed?

查看:115
本文介绍了为什么使用A/< constant-int> A是无符号还是有符号,更快吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在阅读优化C ++ Wikibook .在更快的操作章节中,建议之一如下:

I have been reading through the Optimizing C++ wikibook. In the faster operations chapter one of the advice is as follows:

整数除以常数

将整数(已知为正数或零)除以a时 常量,将整数转换为无符号.

When you divide an integer (that is known to be positive or zero) by a constant, convert the integer to unsigned.

如果s是一个有符号整数,则u是一个无符号整数,而C是一个 常数整数表达式(正或负),运算s/ C比u/C慢,而s%C比u%C慢.这是最 当C为2的幂时有效,但是在所有情况下,符号必须 在划分过程中要予以考虑.

If s is a signed integer, u is an unsigned integer, and C is a constant integer expression (positive or negative), the operation s / C is slower than u / C, and s % C is slower than u % C. This is most significant when C is a power of two, but in all cases, the sign must be taken into account during division.

但是,从有符号到无符号的转换是免费的,因为 它只是对相同位的重新解释.因此,如果s是一个 您知道是正数还是零的有符号整数,可以加快速度 使用以下(等效)表达式对其进行除法:(无符号)s /C和(无符号)s%C.

The conversion from signed to unsigned, however, is free of charge, as it is only a reinterpretation of the same bits. Therefore, if s is a signed integer that you know to be positive or zero, you can speed up its division using the following (equivalent) expressions: (unsigned)s / C and (unsigned)s % C.

我使用gcc测试了此语句,并且u / C表达式的性能似乎始终优于s / c

I tested this statement with gcc and the u / C expression seems to perform consistently better than the s / c

下面也提供了以下示例:

The following example is also provided below:

#include <iostream>
#include <chrono>
#include <cstdlib>
#include <vector>
#include <numeric>

using namespace std;

int main(int argc, char *argv[])
{

    constexpr int vsize = 1e6;
    std::vector<int> x(vsize);
    std::iota(std::begin(x), std::end(x), 0); //0 is the starting number

    constexpr int a = 5;

  auto start_signed = std::chrono::system_clock::now();
  int sum_signed = 0;
    for ([[gnu::unused]] auto  i : x)
    {
        // signed is by default
        int v = rand() % 30 + 1985;   // v in the range 1985-2014

        sum_signed += v / a;
    }
  auto end_signed = std::chrono::system_clock::now();

  auto start_unsigned = std::chrono::system_clock::now();
  int sum_unsigned = 0;
    for ([[gnu::unused]] auto  i : x)
    {
        int v = rand() % 30 + 1985;   // v in the range 1985-2014
        sum_unsigned += static_cast<unsigned int>(v) / a;
    }
  auto end_unsigned = std::chrono::system_clock::now();

  // signed
  std::chrono::duration<double> diff_signed = end_signed - start_signed;
  std::cout << "sum_signed: " << sum_signed << std::endl;
  std::cout << "Time it took SIGNED: " << diff_signed.count() * 1000 << "ms" << std::endl;

  // unsigned
  std::chrono::duration<double> diff_unsigned = end_unsigned - start_unsigned;
  std::cout << "sum_unsigned: " << sum_unsigned << std::endl;
  std::cout << "Time it took UNSIGNED: " << diff_unsigned.count() * 1000 << "ms" << std::endl;

  return 0;
}

您可以在此处编译和运行示例: http://cpp.sh/8kie3

You can compile and run the example here: http://cpp.sh/8kie3

为什么会这样?

推荐答案

经过一番玩弄之后,我相信我已经找到了问题的根源,是根据标准来保证负整数除以四舍五入为零,因为C ++ 11.对于最简单的除以2的情况,请检查以下代码和相应的程序集(

After some toying around, I believe I've tracked down the source of the problem to be the guarantee by the standard that negative integer divisions are rounded towards zero since C++11. For the simplest case, which is division by two, check out the following code and the corresponding assembly (godbolt link).

constexpr int c = 2;

int signed_div(int in){
    return in/c;
}

int unsigned_div(unsigned in){
    return in/c;
}

组装:

signed_div(int):
  mov eax, edi
  shr eax, 31
  add eax, edi
  sar eax
  ret

unsigned_div(unsigned int):
  mov eax, edi
  shr eax
  ret

这些额外的指令有什么作用? shr eax, 31(右移31)仅隔离符号位,这意味着如果输入为非负,则eax == 0,否则为eax == 1.然后将输入添加到eax.换句话说,这两个指令转换为如果输入为负,则在其上添加1.加法的含义如下(仅对于负输入).

What do these extra instructions accomplish? shr eax, 31 (right shift by 31) just isolates the sign bit, meaning that if input is non-negative, eax == 0, otherwise eax == 1. Then the input is added to eax. In other words, these two instructions translate to "if input is negative, add 1 to it. The implications of the addition are the following (only for negative input).

  • 如果输入为偶数,则其最低有效位设置为1,但移位将其丢弃.输出不受此操作的影响.

  • If input is even, its least significant bit is set to 1, but the shift discards it. The output is not affected by this operation.

如果输入为奇数,则其最低有效位已经为1,因此加法导致余数传播到其余数字.当发生右移时,最低有效位将被丢弃,并且输出要比我们没有在输入中添加符号位的输出大一.因为默认情况下,将二进制补码向右无穷右移,所以现在的输出是相同除法的结果,但四舍五入为零.

If input is odd, its least significant bit was already 1 so the addition causes a remainder to propagate to the rest of the digits. When the right shift occurs, the least significant bit is discarded and the output is greater by one than the output we'd have if we hadn't added the sign bit to the input. Because by default right-shift in two's complement rounds towards negative infinity, the output now is the result of the same division but rounded towards zero.

简而言之,即使负数也不会受到影响,奇数现在会朝着零取整,而不是朝着负无穷大.

In short, even negative numbers aren't affected, and odd numbers are now rounded towards zero instead of towards negative infinity.

对于非2的幂的常数,它变得更加复杂.并非所有常量都提供相同的输出,但是对于许多常量而言,它看起来类似于以下内容(

For non-power-of-2 constants it gets a bit more complicated. Not all constants give the same output, but for a lot of them it looks similar to the following (godbolt link).

constexpr int c = 3;

int signed_div(int in){
    return in/c;
}

int unsigned_div(unsigned in){
    return in/c;
}

组装:

signed_div(int):
  mov eax, edi
  mov edx, 1431655766
  sar edi, 31
  imul edx
  mov eax, edx
  sub eax, edi
  ret
unsigned_div(unsigned int):
  mov eax, edi
  mov edx, -1431655765
  mul edx
  mov eax, edx
  shr eax
  ret

我们不在乎程序集输出中常量的变化,因为它不会影响执行时间.假设mulimul花费相同的时间(我不确定,但是希望比我更精通的人可以在上面找到源文件),带签名的版本会再次花费更长的时间,因为它具有额外的功能.用于处理负操作数的符号位的指令.

We don't care about the change of the constant in the assembly output, because it does not affect execution time. Assuming that mul and imul take the same amount of time (which I don't know for sure but hopefully someone more knowledgeable than me can find a source on it), the signed version once again takes longer because it has extra instructions to handle the sign bit for negative operands.

  • 使用带有-O2标志的x86-64 GCC 7.3在Godbot上进行了编译.

  • Compilation was done on godbot using x86-64 GCC 7.3 with the -O2 flag.

自C ++ 11起,向零行为的舍入是标准规定的.根据 cpp引用页进行定义.

Rounds towards zero behavior is standard-mandated since C++11. Before it was implementation defined, according to this cppreference page.

这篇关于为什么使用A/&lt; constant-int&gt; A是无符号还是有符号,更快吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆