和比整数模运算速度更快? [英] AND faster than integer modulo operation?

查看:196
本文介绍了和比整数模运算速度更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有可能重新前preSS:

It is possible to re-express:


  • I%M

为:


  • I和(M-1)

其中,


  • 我是无符号整数

  • m为2
  • 的功率

我的问题是:是与操作得更快?不要现代CPU支持硬件整数模在一个单一的指令?我感兴趣的是ARM,但看不到它的指令集模操作。

My question is: is the AND operation any faster? Don't modern CPUs support integer modulo in hardware in a single instruction? I'm interested in ARM, but don't see the modulo operation in its instruction set.

推荐答案

这是比单指令这些天,越来越复杂。现代的CPU是复杂的野兽,需要他们的指导分解成问题/执行/延迟。这也通常取决于分/模数的宽度 - 多少位参与

It's more complicated than "single instruction" these days. Modern CPUs are complex beasts and need their instructions broken down into issue/execute/latency. It also usually depends on the width of the divide/modulo - how many bits are involved.

在任何情况下,我不知道32位为单周期延迟对任何核心,ARM与否。在现代ARM有整数除法指令,但仅限于某些实现中,最引人注目的是没有对最常见的 - 的Cortex A8和A9

In any case, I'm not aware of 32 bit being single cycle latency on any core, ARM or not. On "modern" ARM there are integer divide instructions, but only on some implementations, and most notably not on the most common ones - Cortex A8 and A9.

在某些情况下,编译器可以为您节省鸿沟/模转换成位移位/屏蔽操作的麻烦。但是,如果这个数值就是这个是唯一可能的在编译时已知的。在你的情况,如果编译器能看到的肯定的是'M'始终是一个两个电源,然后将它优化到位老年退休金计划,但如果它是传递给函数的变量(或否则计算),那么它不能,并会采取一个完整的除法/模数。这种code建设工程往往(但并不总是 - 取决于你的优化是多么的聪明是):

In some cases, the compiler can save you the trouble of converting a divide/modulo into bit shift/mask operations. However, this is only possible if the value is known at compile time. In your case, if the compiler can see for sure that 'm' is always a power a two, then it'll optimize it to bit ops, but if it's a variable passed into a function (or otherwise computed), then it can't, and will resort to a full divide/modulo. This kind of code construction often works (but not always - depends how smart your optimizer is):

unsigned page_size_bits = 12;
unsigned foo(unsigned address) {
  unsigned page_size = 1U << page_size_bits;
  return address / page_size;
}

诀窍是让编译器知道PAGE_SIZE是二的幂。我知道,gcc和变异将特殊情况下这一点,但我不知道其他的编译器。

The trick is to let the compiler know that the "page_size" is a power of two. I know that gcc and variants will special-case this, but I'm not sure about other compilers.

根据经验,任何核心规则 - ARM或没有(甚至86),preFER位移位/掩码来划分/模数。即使您的核心有硬件除法,这将是更快地做手工。

As a rule of thumb for any core - ARM or not (even x86), prefer bit shift/mask to divide/modulo. Even if your core has hardware divide, it'll be faster to do it manually.

这篇关于和比整数模运算速度更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆