C ++有符号和无符号int与长而长的速度 [英] C++ signed and unsigned int vs long long speed

查看:65
本文介绍了C ++有符号和无符号int与长而长的速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

今天,我注意到 int unsigned long long unsigned long long .

Today, I noticed that the speed of several simple bitwise and arithmetic operations differs significantly between int, unsigned, long long and unsigned long long on my 64-bit pc.

特别是,以下循环对于 unsigned 而言,大约是 long long 的两倍,这是我所没想到的.

In particular, the following loop is about twice as fast for unsigned as for long long, which I didn't expect.

int k = 15;
int N = 30;
int mask = (1 << k) - 1;
while (!(mask & 1 << N)) {
    int lo = mask & ~(mask - 1);
    int lz = (mask + lo) & ~mask;
    mask |= lz;
    mask &= ~(lz - 1);
    mask |= (lz / lo / 2) - 1;
}

(完整代码此处)

以下是计时(以秒为单位)(对于 g ++ -O -O2 -O3 ):

Here are the timings (in seconds) (for g++ -O, -O2 and -O3):

1.834207723 (int)
3.054731598 (long long)
1.584846237 (unsigned)
2.201142018 (unsigned long long)

这些时间非常一致(即保证金为1%).没有 -O 标志,每一个都慢大约一秒钟,但是相对速度是相同的.

These times are very consistent (i.e. a 1% margin). Without -O flag, each one is about one second slower, but the relative speeds are the same.

是否有明确的原因?向量化可能是针对32位类型的,但是我看不到 long long unsigned long long 之间的区别来自.有些操作在某些类型上比在其他类型上要慢很多吗,还是64位类型的速度变慢(甚至在64位体系结构上)只是一般的事情?

Is there a clear reason for this? Vectorization might be it for the 32-bit types, but I can't see where the huge difference between long long and unsigned long long comes from. Are some operations just much slower on some types than on others, or is it just a general thing that 64-bit types are slower (even on 64-bit architectures)?

对于那些感兴趣的人,此循环循环遍历 {1,2,...,30} 的所有子集,其中恰好包含15个元素.这是通过(按顺序)循环所有小于 1<<<<<<< 30>>设置15位的整数来完成的.对于当前情况,这是 155117520 迭代.我不再知道此代码段的来源,但帖子介绍了更多内容.

For those interested, this loop loops over all subsets of {1,2,...,30} with exactly 15 elements. This is done by looping (in order) over all integers less than 1<<30 with exactly 15 bits set. For the current case, that's 155117520 iterations. I don't know the source of this snippet anymore, but this post explains some more.

从汇编代码看来,当类型为无符号时,可以使划分更快.我认为这是有道理的,因为我们不必考虑符号位.

It seems from the assembly code that the division can be made faster when the type is unsigned. I think that makes sense, because we don't have to account for the sign bit.

此外,32位操作使用 movl 和其他 xxxl 指令,而64位操作使用 movq xxxq .

Furthermore, the 32-bit operations use movl and other xxxl instructions, while the 64-bit operations use movq and xxxq.

在阅读我链接的帖子后,我决定使用那里给出的公式:

After reading the post I linked, I decided to use the formula given over there:

T k = 15;
T N = 30;
T mask = (1 << k) - 1;
while (!(mask & 1 << N)) {
    T t = mask | (mask - 1);
    mask = (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(mask) + 1));
}

这大约是上面发布的代码的三分之一,并且对所有四种类型都使用相同的时间.

This runs in about a third of the time of the code posted above, and uses the same time for all four types.

推荐答案

代码中最慢的操作是

mask |= (lz / lo / 2) - 1

32位除法明显快于64位除法.例如,在Ivy Bridge上,32位IDIV占用19-26个时钟,而64位IDIV占用28-103个时钟的延迟.

32-bit division is significantly faster than 64-bit division. For example, on Ivy Bridge, 32 bit IDIV takes 19-26 clocks while 64 bit IDIV takes 28-103 clocks latency.

无符号版本也比有符号版本快,因为在无符号情况下,除以2就是简单的移位,并且没有大小扩展调用(CDQ,CQO).

The unsigned version is also faster than signed because the division by 2 is simple bit shift in the unsigned case and there is no size expansion calls (CDQ, CQO).

在无符号的情况下是带符号的简单移位

in the unsigned case is simple bit shift while in signed

[1] http://www.agner.org/optimize/instruction_tables.pdf

这篇关于C ++有符号和无符号int与长而长的速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆