快速模10 in c [英] Fast modulo 10 in c
问题描述
我正在寻找一种快速的10模运算法则,因为我需要加快我的程序的速度,该程序可以在周期中执行许多模运算.
I am looking for a fast modulo 10 algorithm because I need to speed up my program which does many modulo operations in cycles.
我已检出此页面比较了一些替代方案.据我正确理解,T3是最快的.我的问题是,使用T3技术, x%y
看起来如何?
I have checked out this page which compares some alternatives.
As far as I understand it correctly, T3 was the fastest of all.
My question is, how would x % y
look like using T3 technique?
为简便起见,我在这里复制了T3技术,以防万一链接断开.
I copied T3 technique here for simplicity in case the link gets down.
for (int x = 0; x < max; x++)
{
if (y > (threshold - 1))
{
y = 0; //reset
total += x;
}
y += 1;
}
关于评论,如果不是真的比常规mod快,我正在寻找比使用%
至少快2倍的模.我已经看到许多示例具有2的使用功效,但是由于10并非如此,我如何使它起作用?
Regarding to comments, if this is not really faster then regular mod, I am looking for at least 2 times faster modulo than using %
.
I have seen many examples with use power of two, but since 10 is not, how can I get it to work?
对于我的程序,假设我有2个周期,其中 n = 1 000 000
和 m = 1000
.
For my program, let's say I have 2 for cycles where n=1 000 000
and m=1000
.
看起来像这样:
for (i = 1; i <= n; i++) {
D[(i%10)*m] = i;
for (j = 1; j <= m; j++) {
...
}
}
推荐答案
这是您可以编写的最快的modulo-10函数:
Here's the fastest modulo-10 function you can write:
unsigned mod10(unsigned x)
{
return x % 10;
}
这是编译后的样子:
movsxd rax, edi
imul rcx, rax, 1717986919
mov rdx, rcx
shr rdx, 63
sar rcx, 34
add ecx, edx
add ecx, ecx
lea ecx, [rcx + 4*rcx]
sub eax, ecx
ret
请注意缺少除法/模数指令,神秘常数,最初用于复杂数组索引的指令等的使用.不用说,编译器知道许多技巧可以使您的程序达到最快的速度.可能的.在这样的任务上,您很少会击败它.
Note the lack of division/modulus instructions, the mysterious constants, the use of an instruction which was originally intended for complex array indexing, etc. Needless to say, the compiler knows a lot of tricks to make your program as fast as possible. You'll rarely beat it on tasks like this.
这篇关于快速模10 in c的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!