从复数FFT转换为有限场FFT [英] Translation from Complex-FFT to Finite-Field-FFT

查看:133
本文介绍了从复数FFT转换为有限场FFT的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下午好!

我正在尝试基于已有的朴素递归FFT实现开发NTT算法.

I am trying to develop an NTT algorithm based on the naive recursive FFT implementation I already have.

考虑以下代码(coefficients'的长度,设为m,是2的精确幂):

Consider the following code (coefficients' length, let it be m, is an exact power of two):

/// <summary>
/// Calculates the result of the recursive Number Theoretic Transform.
/// </summary>
/// <param name="coefficients"></param>
/// <returns></returns>
private static BigInteger[] Recursive_NTT_Skeleton(
    IList<BigInteger> coefficients, 
    IList<BigInteger> rootsOfUnity, 
    int step, 
    int offset)
{
    // Calculate the length of vectors at the current step of recursion.
    // -
    int n = coefficients.Count / step - offset / step;

    if (n == 1)
    {
        return new BigInteger[] { coefficients[offset] };
    }

    BigInteger[] results = new BigInteger[n];

    IList<BigInteger> resultEvens = 
        Recursive_NTT_Skeleton(coefficients, rootsOfUnity, step * 2, offset);

    IList<BigInteger> resultOdds = 
        Recursive_NTT_Skeleton(coefficients, rootsOfUnity, step * 2, offset + step);

    for (int k = 0; k < n / 2; k++)
    {
        BigInteger bfly = (rootsOfUnity[k * step] * resultOdds[k]) % NTT_MODULUS;

        results[k]          = (resultEvens[k] + bfly) % NTT_MODULUS;
        results[k + n / 2]  = (resultEvens[k] - bfly) % NTT_MODULUS;
    }

    return results;
}

它适用于复杂的FFT(用复杂的数字类型(我有我自己的)替换BigInteger).即使我适当地更改了找到统一原始根源的过程,它也无法在这里工作.

It worked for complex FFT (replace BigInteger with a complex numeric type (I had my own)). It doesn't work here even though I changed the procedure of finding the primitive roots of unity appropriately.

据说,问题是这样的:最初传递的rootsOfUnity参数按顺序仅包含m个单位的复数根的前半部分:

Supposedly, the problem is this: rootsOfUnity parameter passed originally contained only the first half of m-th complex roots of unity in this order:

omega^0 = 1, omega^1, omega^2, ..., omega^(n/2)

就足够了,因为在以下三行代码中:

It was enough, because on these three lines of code:

BigInteger bfly = (rootsOfUnity[k * step] * resultOdds[k]) % NTT_MODULUS;        

results[k]          = (resultEvens[k] + bfly) % NTT_MODULUS;
results[k + n / 2]  = (resultEvens[k] - bfly) % NTT_MODULUS;

我最初利用的事实是,在任何递归级别(对于任何ni),都是统一的-omega^(i) = omega^(i + n/2)根.

I originally made use of the fact, that at any level of recursion (for any n and i), the complex root of unity -omega^(i) = omega^(i + n/2).

但是,该属性显然在有限字段中不成立.但是有没有类似的方法可以让我仍然只计算根的前半部分?

However, that property obviously doesn't hold in finite fields. But is there any analogue of it which would allow me to still compute only the first half of the roots?

还是我应该将循环从n/2扩展到n并预先计算所有m单位的根?

Or should I extend the cycle from n/2 to n and pre-compute all the m-th roots of unity?

此代码可能还有其他问题吗?..

Maybe there are other problems with this code?..

非常感谢您!

推荐答案

我最近也想实现 NTT 来实现快速乘法,而不是 DFFT .阅读了很多令人困惑的内容,到处都是不同的字母,没有简单的解决方案,而且我的有限领域知识也很生疏,但今天我终于弄对了(经过2天的尝试,并使用 DFT 模拟)系数),这是我对 NTT 的见解:

I recently wanted to implement NTT for fast multiplication instead of DFFT too. Read a lot of confusing things, different letters everywhere and no simple solution, and also my finite fields knowledge is rusty , but today i finally got it right (after 2 days of trying and analog-ing with DFT coefficients) so here are my insights for NTT:

  1. 计算

X(i) = sum(j=0..n-1) of ( Wn^(i*j)*x(i) );

其中X[] NTT 转换后的大小nx[],其中Wn NTT 的基础.所有计算都是基于整数模运算mod p到处都没有复数.

where X[] is NTT transformed x[] of size n where Wn is the NTT basis. All computations are on integer modulo arithmetics mod p no complex numbers anywhere.

重要值


Wn = r ^ L mod p NTT 的基础
Wn = r ^ (p-1-L) mod p INTT 的基础
Rn = n ^ (p-2) mod p正在缩放 INTT ~(1/n)的乘法常数.
pp mod n == 1p>max'的质数
max对于 NTT x [i] 的最大值,对于 INTT X [i] 的最大值>
r = <1,p)
L = <1,p)并除以p-1
r,L必须组合在一起,因此r^(L*i) mod p == 1如果i=0i=n
r,L必须组合,因此r^(L*i) mod p != 1如果0 < i < n
max'是子结果最大值,并且取决于n和计算类型.对于单个(I)NTT ,它是max' = n*max,但是对于两个n大小的向量的卷积,它是max' = n*max*max等.请参见


Wn = r ^ L mod p is basis for NTT
Wn = r ^ (p-1-L) mod p is basis for INTT
Rn = n ^ (p-2) mod p is scaling multiplicative constant for INTT ~(1/n)
p is prime that p mod n == 1 and p>max'
max is max value of x[i] for NTT or X[i] for INTT
r = <1,p)
L = <1,p) and also divides p-1
r,L must be combined so r^(L*i) mod p == 1 if i=0 or i=n
r,L must be combined so r^(L*i) mod p != 1 if 0 < i < n
max' is the sub-result max value and depends on n and type of computation. For single (I)NTT it is max' = n*max but for convolution of two n sized vectors it is max' = n*max*max etc. See Implementing FFT over finite fields for more info about it.

r,L,p的功能组合对于不同的n

functional combination of r,L,p is different for different n

这很重要,您必须在每个NTT层之前重新计算或从表中选择参数(n始终是上一个递归的一半).

this is important, you have to recompute or select parameters from table before each NTT layer (n is always half of the previous recursion).

这是我的 C ++ 代码,可找到r,L,p参数(需要不包含的模块化算术,您可以将其替换为(a + b)%c,(ab)%c, (a * b)%c,...,但在这种情况下,请注意尤其是modpowmodmul的溢出)代码没有经过优化,但是有很多方法可以大大提高它的速度.质数表也相当有限,因此可以使用

Here is my C++ code that finds the r,L,p parameters (needs modular arithmetics which is not included, you can replace it with (a+b)%c,(a-b)%c,(a*b)%c,... but in that case beware of overflows especial for modpow and modmul) The code is not optimized yet there are ways to speed it up considerably. Also prime table is fairly limited so either use SoE or any other algo to obtain primes up to max' in order to work safely.

DWORD _arithmetics_primes[]=
    {
    2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,
    179,181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,383,389,397,401,409,
    419,421,431,433,439,443,449,457,461,463,467,479,487,491,499,503,509,521,523,541,547,557,563,569,571,577,587,593,599,601,607,613,617,619,631,641,643,647,653,659,
    661,673,677,683,691,701,709,719,727,733,739,743,751,757,761,769,773,787,797,809,811,821,823,827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941,
    947,953,967,971,977,983,991,997,1009,1013,1019,1021,1031,1033,1039,1049,1051,1061,1063,1069,1087,1091,1093,1097,1103,1109,1117,1123,1129,1151,
    0}; // end of table is 0, the more primes are there the bigger numbers and n can be used
// compute NTT consts W=r^L%p for n
int i,j,k,n=16;
long w,W,iW,p,r,L,l,e;
long max=81*n;  // edit1: max num for NTT for my multiplication purposses
for (e=1,j=0;e;j++)             // find prime p that p%n=1 AND p>max ... 9*9=81
    {
    p=_arithmetics_primes[j];
    if (!p) break;
    if ((p>max)&&(p%n==1))
     for (r=2;r<p;r++)  // check all r
        {
        for (l=1;l<p;l++)// all l that divide p-1
            {
            L=(p-1);
            if (L%l!=0) continue;
            L/=l;
            W=modpow(r,L,p);
            e=0;
            for (w=1,i=0;i<=n;i++,w=modmul(w,W,p))
                {
                if ((i==0)      &&(w!=1)) { e=1; break; }
                if ((i==n)      &&(w!=1)) { e=1; break; }
                if ((i>0)&&(i<n)&&(w==1)) { e=1; break; }
                }
            if (!e) break;
            }
        if (!e) break;
        }
    }
if (e) { error; }           // error no combination r,l,p for n found
 W=modpow(r,    L,p);   // Wn for NTT
iW=modpow(r,p-1-L,p);   // Wn for INTT

这是我的慢速NTT和INTT实现方式(我还没有快速实现NTT,INTT),它们均已通过Schönhage–Strassen乘法成功进行了测试.

and here is my slow NTT and INTT implementations (i havent got to fast NTT,INTT yet) they are both tested with Schönhage–Strassen multiplication successfully.

//---------------------------------------------------------------------------
void NTT(long *dst,long *src,long n,long m,long w)
    {
    long i,j,wj,wi,a,n2=n>>1;
    for (wj=1,j=0;j<n;j++)
        {
        a=0;
        for (wi=1,i=0;i<n;i++)
            {
            a=modadd(a,modmul(wi,src[i],m),m);
            wi=modmul(wi,wj,m);
            }
        dst[j]=a;
        wj=modmul(wj,w,m);
        }
    }
//---------------------------------------------------------------------------
void INTT(long *dst,long *src,long n,long m,long w)
    {
    long i,j,wi=1,wj=1,rN,a,n2=n>>1;
    rN=modpow(n,m-2,m);
    for (wj=1,j=0;j<n;j++)
        {
        a=0;
        for (wi=1,i=0;i<n;i++)
            {
            a=modadd(a,modmul(wi,src[i],m),m);
            wi=modmul(wi,wj,m);
            }
        dst[j]=modmul(a,rN,m);
        wj=modmul(wj,w,m);
        }
    }
//---------------------------------------------------------------------------


dst是目标数组
src是源数组
n是数组大小
m是模量(p)
w是基础(Wn)


dst is destination array
src is source array
n is array size
m is modulus (p)
w is basis (Wn)

希望这对某人有帮助.如果我忘记了一些东西,请写...

hope this helps to someone. If i forgot something please write ...

[edit1:快速NTT/INTT]

最后,我设法快速获得 NTT/INTT .比普通的 FFT 更加棘手:

Finally I manage to get fast NTT/INTT to work. Was little bit more tricky than normal FFT:

//---------------------------------------------------------------------------
void _NFTT(long *dst,long *src,long n,long m,long w)
    {
    if (n<=1) { if (n==1) dst[0]=src[0]; return; }
    long i,j,a0,a1,n2=n>>1,w2=modmul(w,w,m);
    // reorder even,odd
    for (i=0,j=0;i<n2;i++,j+=2) dst[i]=src[j];
    for (    j=1;i<n ;i++,j+=2) dst[i]=src[j];
    // recursion
    _NFTT(src   ,dst   ,n2,m,w2);   // even
    _NFTT(src+n2,dst+n2,n2,m,w2);   // odd
    // restore results
    for (w2=1,i=0,j=n2;i<n2;i++,j++,w2=modmul(w2,w,m))
        {
        a0=src[i];
        a1=modmul(src[j],w2,m);
        dst[i]=modadd(a0,a1,m);
        dst[j]=modsub(a0,a1,m);
        }
    }
//---------------------------------------------------------------------------
void _INFTT(long *dst,long *src,long n,long m,long w)
    {
    long i,rN;
    rN=modpow(n,m-2,m);
    _NFTT(dst,src,n,m,w);
    for (i=0;i<n;i++) dst[i]=modmul(dst[i],rN,m);
    }
//---------------------------------------------------------------------------

[edit3]

我已经优化了我的代码(比上面的代码快3倍),但是我仍然对它不满意,所以我开始对其提出新的问题.在那里,我进一步优化了代码(比上述代码快40倍),因此在相同位长的浮点上,它的速度几乎与 FFT 相同.链接到这里:

I have optimized my code (3x times faster than code above),but still i am not satisfied with it so i started new question with it. There I have optimized my code even further (about 40x times faster than code above) so its almost the same speed as FFT on floating point of the same bit size. Link to it is here:

这篇关于从复数FFT转换为有限场FFT的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆