以小素数为模优化乘法 [英] Optimising multiplication modulo a small prime

查看:126
本文介绍了以小素数为模优化乘法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要多次 次执行以下操作:

I need to do the following operation many times:

  1. 取两个整数a, b
  2. 计算a * b mod p,其中p = 1000000007a, b的数量级与p
  1. Take two integers a, b
  2. Compute a * b mod p, where p = 1000000007 and a, b are of the same order of magnitude as p

我的直觉是天真

result = a * b
result %= p

效率低下.我能像乘幂模ppow(a, b, p)优化一样优化乘模p吗?

is inefficient. Can I optimise multiplication modulo p much like exponentiation modulo p is optimised with pow(a, b, p)?

推荐答案

要在汇编中执行此计算,但可以从Python调用它,我会 尝试从内联汇编 用C编写的Python模块. GCC MSVC 编译器具有内联汇编,仅具有不同的语法.

To do this calculation in assembly, but have it callable from Python, I'd try inline assembly from a Python module written in C. Both GCC and MSVC compilers feature inline assembly, only with differing syntax.

请注意,我们的模量p = 1000000007恰好适合30位.结果 在某些弱点的情况下,可以在Intel 80x86寄存器中计算所需的(a*b)%p a,b的限制不会比p大.

Note that our modulus p = 1000000007 just fits into 30-bits. The result desired (a*b)%p can be computed in Intel 80x86 registers given some weak restrictions on a,b not being much bigger than p.

a,b

Restrictions on size of a,b

(1)a,b是32位无符号整数

(1) a,b are 32-bit unsigned integers

(2)a*b小于p << 32,即p乘以2 ^ 32

(2) a*b is less than p << 32, i.e. p times 2^32

尤其是如果a,b均小于2*p,将避免溢出. 在给定(1)的情况下,只要其中一个小于p,就足够了.

In particular if a,b are each less than 2*p, overflow will be avoided. Given (1), it also suffices that either one of them is less than p.

Intel 80x86指令MUL可以将两个32位无符号整数相乘 并将64位结果存储在累加器寄存器对EDX:EAX中.一些 有关MUL的详细信息和怪癖,请参见本帮助的第10.2.1节 摘要.

The Intel 80x86 instruction MUL can multiply two 32-bit unsigned integers and store the 64-bit result in accumulator register pair EDX:EAX. Some details and quirks of MUL are discussed in Section 10.2.1 of this helpful summary.

然后,指令DIV可以将此64位结果除以32位常数 (模数p),将商存储在EAX中,将余数存储在EDX中. 请参阅最后一个链接的第10.2.2节.我们想要的结果就是余数.

The instruction DIV can then divide this 64-bit result by a 32-bit constant (the modulus p), storing the quotient in EAX and the remainder in EDX. See Section 10.2.2 of the last link. The result we want is that remainder.

该除法指令DIV应该有溢出的危险,应该 分子EDX:EAX中的64位乘积的商大于32位 不能满足上面的(2).

It is this division instruction DIV that entails a risk of overflow, should the 64-bit product in numerator EDX:EAX give a quotient larger than 32-bits by failing to satisfy (2) above.

我正在使用C/内联汇编中的代码段进行概念验证". 但是,最大的速度收益将取决于批量阵列的速度. 数据a,b进行处理,分摊函数调用的开销等 Python(如果是目标平台).

I'm working on a code snippet in C/inline assembly for "proof of concept". However the maximum benefit in speed will depend on batching up arrays of data a,b to process, amortizing the overhead of function calls, etc. in Python (if that is the target platform).

这篇关于以小素数为模优化乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆