最快的方式来获得在C / C积极++模 [英] Fastest way to get a positive modulo in C/C++
问题描述
经常在我的内循环,我需要指数在环绕式的方式数组,因此,如果数组大小是100,我的code要求的元素-2,应给予98元。在许多高级语言如Python,可以做到这一点简单地用 my_array [索引%ARRAY_SIZE]
,但由于某些原因,C'S整数运算(通常)舍趋向于零,而不是一贯四舍五入,因此它的模运算符给出了否定的第一个参数时,返回结果为负。
Often in my inner loops I need to index an array in a "wrap-around" way, so that if the array size is 100 and my code asks for element -2, it should be given element 98. In many high level languages such as Python, one can do this simply with my_array[index % array_size]
, but for some reason C's integer arithmetic (usually) rounds toward zero instead of consistently rounding down, and consequently its modulo operator returns a negative result when given a negative first argument.
通常我知道首页
将不低于 -array_size
,在这种情况下,我只是做 my_array [(指数+ ARRAY_SIZE)%ARRAY_SIZE]
。但是,有时这不能得到保证,而对于那些情况下,我想知道,实现一个始终阳性模函数的最快方式。有一些聪明的方式来做到这一点没有分支,如
Often I know that index
will not be less than -array_size
, and in these cases I just do my_array[(index + array_size) % array_size]
. However, sometimes this can't be guaranteed, and for those cases I would like to know the fastest way to implement an always-positive modulo function. There are several "clever" ways to do it without branching, such as
inline int positive_modulo(int i, int n) {
return (n + (i % n)) % n
}
或
inline int positive_modulo(int i, int n) {
return (i % n) + (n * (i < 0))
}
我当然可以分析这些以找出哪些是最快的我的系统上,但我不能不担心,我可能会错过一个更好的,或者说,什么是快上我的机器可能是在一个不同的慢
Of course I can profile these to find out which is the fastest on my system, but I can't help worrying that I might have missed a better one, or that what's fast on my machine might be slow on a different one.
那么,有没有一个标准的方式来做到这一点,或者一些聪明的把戏,我已经错过了很可能是最快的方法是什么?
So is there a standard way to do this, or some clever trick that I've missed that's likely to be the fastest possible way?
另外,我知道这可能是一厢情愿的想法,但如果有这样做,可以自动矢量化的一种方式,这将是惊人的。
Also, I know it's probably wishful thinking, but if there's a way of doing this that can be auto-vectorised, that would be amazing.
推荐答案
我学到的标准方法是
inline int positive_modulo(int i, int n) {
return (i % n + n) % n;
}
此功能基本上是你没有 ABS
(其中,事实上,使它返回错误结果)第一个变种。我也不会感到惊讶,如果一个优化编译器可以识别这种模式,它被编译成机器code,计算一个无符号模。
This function is essentially your first variant without the abs
(which, in fact, makes it return the wrong result). I wouldn't be surprised if an optimizing compiler could recognize this pattern and compile it to machine code that computes an "unsigned modulo".
编辑:
移动到您的第二个变量:首先,它包含了一个bug,太 - N'LT; 0
应 I&LT; 0
。
Moving on to your second variant: First of all, it contains a bug, too -- the n < 0
should be i < 0
.
这变种可能不会看起来好像树枝,但在很多架构中, I - 的; 0
将编译成一个条件跳转。在任何情况下,这将是至少快更换(N *(I&小于0))
与 I&LT; 0? N:0
,避免了乘法;此外,它的清洁剂,因为它避免了reinter preting的布尔为int。
This variant may not look as if it branches, but on a lot of architectures, the i < 0
will compile into a conditional jump. In any case, it will be at least as fast to replace (n * (i < 0))
with i < 0? n: 0
, which avoids the multiplication; in addition, it's "cleaner" because it avoids reinterpreting the bool as an int.
至于这两个变种的速度更快,这可能依赖于编译器和处理器架构 - 时间的两种变体和观望。我不认为还有比这两种变体更快的方法,虽然。
As to which of these two variants is faster, that probably depends on the compiler and processor architecture -- time the two variants and see. I don't think there's a faster way than either of these two variants, though.
这篇关于最快的方式来获得在C / C积极++模的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!