快速斐波那契计算 [英] Fast Fibonacci computation

查看:98
本文介绍了快速斐波那契计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

几周前,我在Google+上看到一条评论,其中有人演示了斐波那契数的直接计算,该计算不是基于递归,也不使用记忆。他实际上只是想起了最后两个数字,并不断添加它们。这是一个O(n)算法,但是他非常干净地实现了它。因此,我迅速指出,一种更快的方法是利用以下事实:可以将它们计算为[[0,1],[1,1]]矩阵的幂,并且只需要O(log(N))计算。



当然,问题在于,这远非最优。只要数字不是太大,它就很有效,但是它们的长度以N * log(phi)/ log(10)的速率增长,其中N是第N个斐波那契数,而phi是黄金比例((1 + sqrt(5))/ 2〜1.6)。事实证明,log(phi)/ log(10)非常接近1/5。因此,第N个斐波纳契数可望具有大约N / 5位数。



当数字开始具有数百万或数十亿个数字时,矩阵乘法,甚至是偶数乘法,会变得非常慢。因此,F(100,000)花费了大约.03秒来计算(在Python中),而F(1000,000)花费了大约5秒。这几乎不是O(log(N))增长。我的估计是,这种方法没有改进,只能将计算优化为O((log(N))^(2.5))左右。



以这种速度计算十亿分之一的斐波那契数将是非常慢的(即使它只有〜1,000,000,000 / 5位,因此很容易放入32位内存中)。



有人知道实现更快的计算的实现或算法吗?也许可以计算出万亿分之一的斐波那契数的东西。



为了清楚起见,我并不是在寻找一个近似值。我正在寻找精确计算(最后一位)。



编辑1: Python代码显示我相信是O((log N)^ 2.5))算法。

 从运算符import mul as mul 
从时间导入时钟

class TwoByTwoMatrix:
__slots__ =行

def __init __(self,m):
self。行= m

def __imul __(self,other):
self.rows = [[sum(map(mul(my,my_row,oth_col)))for zip(* other.rows)中的oth_col ] for my_row in self.rows]
return self

def intpow(self,i):
i = int(i)
结果= TwoByTwoMatrix([[long (1),long(0)],[long(0),long(1)]])
如果i< = 0:
返回结果
k = 0
当i%2 == 0时:
k + = 1
i>> = 1 =
乘数= TwoByTwoMatrix(self.rows)
而我> 0:
(如果我& 1:
结果* =乘数
乘数* =乘数#将xb(k)中j的
i>> == 1
平方:
结果* =结果
返回结果


m = TwoByTwoMatrix([[0,1],[1,1]])

t1 = clock()
print len(str(m.intpow(100000).rows [1] [1]))
t2 = clock()
print t2-t1

t1 = clock()
print len(str(m.intpow(1000000).rows [1] [1]))
t2 = clock()
print t2-t1

编辑2:
看来我没有考虑这个事实 len(str(...))将对测试的总体运行时间做出重大贡献。将测试从数学导入日志更改为

 作为日志

t1 = clock()
打印日志(m.intpow(100000).rows [1] [1])/ log(10)
t2 =时钟()
打印t2-t1

t1 = clock()
print log(m.intpow(1000000).rows [1] [1])/ log(10)
t2 = clock()
print t2-t1

将运行时间缩短至.008秒和.31秒(当<$ c $时为.03秒和5秒c> len(str(...)))。



因为M = [[[0,1],[1, 1]]的幂N是[[F(N-2),F(N-1)],[F(N-1),F(N)]],
是效率低下的另一个明显来源正在计算矩阵的(0,1)和(1,0)元素,就好像它们是不同的一样。这(我切换到Python3,但是Python2.7的时间相似):

 类SymTwoByTwoMatrix():
对称2x2矩阵的#个元素(0,0),(0,1),(1,1)是a,b,c。
#b也是(1,0)元素,因为矩阵是对称的

def __init __(self,a,b,c):
self.a = a
self.b = b
self.c = c

def __imul __(self,other):
#此乘法确实可以正常工作,因为我们
#是同一对称矩阵的乘幂
self.a,self.b,self.c = \
self.a * other.a + self.b * other.b,\
self.a * other.b + self.b * other.c,\
self.b * other.b + self.c * other.c
return self

def intpow(self,i):
i = int(i)
结果= SymTwoByTwoMatrix(1、0、1)
如果i< = 0:
返回结果
k = 0
而i%2 == 0:
k + = 1
i>> = 1
乘数= SymTwoByTwoMatrix(self.a,self。 b)self.c)
而我> 0:
(如果我& 1:
结果* =乘数
乘数* =乘数#将
i>> == 1
乘以范围(k)中的j:
结果* =结果
返回结果

在.006中计算出F(100,000),在中计算出F(1,000,000) .235和F(10,000,000),时间为9.51秒。



这是可以预期的。对于最快的测试,它可以更快地产生45%的结果,并且预计增益应该渐近地接近
phi /(1 + 2 * phi + phi * phi)〜23.6%。



M ^ N的(0,0)元素实际上是N-2nd斐波那契数:

 对于范围内的i(15):
x = m.intpow(i)
print([xa,xb,xc])

给予

  [1、0、1] 
[0 ,1,1]
[1,1,2]
[1,2,3]
[2,3,5]
[3,5,8]
[5,8,13]
[8,13,21]
[13,21,34]
[21,34,55]
[34, 55,89]
[55,89,144]
[89,144,233]
[144,233,377]
[233,377,610]

我希望不必计算元素(0,0)可以使速度提高1 / (1 + phi + phi * phi)〜19%。但是F(2N)和F(2N-1)的 lru_cache



EDIT



对不起,我错过了精确部分。矩阵乘法的另一种精确的O(log(n))替代可以计算如下



 从functools导入lru_cache 

@lru_cache(无)
def fib(n):
如果(0,1)中的n:
如果n&则返回1
1:#如果n为奇数,则比用模
进行校验更快return fib((n + 1)// 2-1)*(2 * fib((n + 1)// 2)-fib( (n + 1)// 2-1))
a,b = fib(n // 2-1),fib(n // 2)
返回a ** 2 + b ** 2

这是基于注释。该解决方案利用了这样一个事实:要计算F(2N)和F(2N-1),您只需要知道F(N)和F(N-1)。不过,尽管开销应该小于基于矩阵的解决方案,但您仍在处理长数运算。在Python中,由于记忆和递归速度较慢,因此最好以命令式方式进行重写,尽管我这样写是为了使功能表述更加清晰。


I saw a comment on Google+ a few weeks ago in which someone demonstrated a straight-forward computation of Fibonacci numbers which was not based on recursion and didn't use memoization. He effectively just remembered the last 2 numbers and kept adding them. This is an O(n) algorithm, but he implemented it very cleanly. So I quickly pointed out that a quicker way is to take advantage of the fact that they can be computed as powers of [[0,1],[1,1]] matrix and it requires only a O(log(N)) computation.

The problem, of course is that this is far from optimal past a certain point. It is efficient as long as the numbers are not too large, but they grow in length at the rate of N*log(phi)/log(10), where N is the Nth Fibonacci number and phi is the golden ratio ( (1+sqrt(5))/2 ~ 1.6 ). As it turns out, log(phi)/log(10) is very close to 1/5. So Nth Fibonacci number can be expected to have roughly N/5 digits.

Matrix multiplication, heck even number multiplication, gets very slow when the numbers start to have millions or billions of digits. So the F(100,000) took about .03 seconds to compute (in Python), while F(1000,000) took roughly 5 seconds. This is hardly O(log(N)) growth. My estimate was that this method, without improvements, only optimizes the computation to be O( (log(N)) ^ (2.5) ) or so.

Computing a billionth Fibonacci number, at this rate, would be prohibitively slow (even though it would only have ~ 1,000,000,000 / 5 digits so it easily fits within 32-bit memory).

Does anyone know of an implementation or algorithm which would allow a faster computation? Perhaps something which would allow calculation of a trillionth Fibonacci number.

And just to be clear, I am not looking for an approximation. I am looking for the exact computation (to the last digit).

Edit 1: I am adding the Python code to show what I believe is O( (log N) ^ 2.5) ) algorithm.

from operator import mul as mul
from time import clock

class TwoByTwoMatrix:
    __slots__ = "rows"

    def __init__(self, m):
        self.rows = m

    def __imul__(self, other):
        self.rows = [[sum(map(mul, my_row, oth_col)) for oth_col in zip(*other.rows)] for my_row in self.rows]
        return self

    def intpow(self, i):
        i = int(i)
        result = TwoByTwoMatrix([[long(1),long(0)],[long(0),long(1)]])
        if i <= 0:
            return result
        k = 0
        while i % 2 == 0:
            k +=1
            i >>= 1
        multiplier = TwoByTwoMatrix(self.rows)
        while i > 0:
            if i & 1:
                result *= multiplier
            multiplier *= multiplier # square it
            i >>= 1
        for j in xrange(k):
            result *= result
        return result


m = TwoByTwoMatrix([[0,1],[1,1]])

t1 = clock()
print len(str(m.intpow(100000).rows[1][1]))
t2 = clock()
print t2 - t1

t1 = clock()
print len(str(m.intpow(1000000).rows[1][1]))
t2 = clock()
print t2 - t1

Edit 2: It looks like I didn't account for the fact that len(str(...)) would make a significant contribution to the overall runtime of the test. Changing tests to

from math import log as log

t1 = clock()
print log(m.intpow(100000).rows[1][1])/log(10)
t2 = clock()
print t2 - t1

t1 = clock()
print log(m.intpow(1000000).rows[1][1])/log(10)
t2 = clock()
print t2 - t1

shortened the runtimes to .008 seconds and .31 seconds (from .03 seconds and 5 seconds when len(str(...)) were used).

Because M=[[0,1],[1,1]] raised to power N is [[F(N-2), F(N-1)], [F(N-1), F(N)]], the other obvious source of inefficiency was calculating (0,1) and (1,0) elements of the matrix as if they were distinct. This (and I switched to Python3, but Python2.7 times are similar):

class SymTwoByTwoMatrix():
    # elments (0,0), (0,1), (1,1) of a symmetric 2x2 matrix are a, b, c.
    # b is also the (1,0) element because the matrix is symmetric

    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

    def __imul__(self, other):
        # this multiplication does work correctly because we 
        # are multiplying powers of the same symmetric matrix
        self.a, self.b, self.c = \
            self.a * other.a + self.b * other.b, \
            self.a * other.b + self.b * other.c, \
            self.b * other.b + self.c * other.c
        return self

    def intpow(self, i):
        i = int(i)
        result = SymTwoByTwoMatrix(1, 0, 1)
        if i <= 0:
            return result
        k = 0
        while i % 2 == 0:
            k +=1
            i >>= 1
        multiplier = SymTwoByTwoMatrix(self.a, self.b, self.c)
        while i > 0:
            if i & 1:
                result *= multiplier
            multiplier *= multiplier # square it
            i >>= 1
        for j in range(k):
            result *= result
        return result

calculated F(100,000) in .006, F(1,000,000) in .235 and F(10,000,000) in 9.51 seconds.

Which is to be expected. It is producing results 45% faster for the fastest test and it is expected that the gain should asymptotically approach phi/(1+2*phi+phi*phi) ~ 23.6%.

The (0,0) element of M^N is actually N-2nd Fibonacci number:

for i in range(15):
    x = m.intpow(i)
    print([x.a,x.b,x.c])

gives

[1, 0, 1]
[0, 1, 1]
[1, 1, 2]
[1, 2, 3]
[2, 3, 5]
[3, 5, 8]
[5, 8, 13]
[8, 13, 21]
[13, 21, 34]
[21, 34, 55]
[34, 55, 89]
[55, 89, 144]
[89, 144, 233]
[144, 233, 377]
[233, 377, 610]

I would expect that not having to calculate element (0,0) would produce a speed up of additional 1/(1+phi+phi*phi) ~ 19%. But the lru_cache of F(2N) and F(2N-1) solution given by Eli Korvigo below actually gives a speed up of 4 times (ie, 75%). So, while I have not worked out a formal explanation, I am tempted to think that it caches the spans of 1's within the binary expansion of N and does the minimum number of multiplications necessary. Which obviates the need to find those ranges, precompute them and then multiply them at the right point in the expansion of N. lru_cache allows for a top-to-bottom computation of what would have been a more complicated buttom-to-top computation.

Both SymTwoByTwoMatrix and the lru_cache-of-F(2N)-and-F(2N-1) are taking roughly 40 times longer to compute every time N grows 10 times. I think that's possibly due to Python's implementation of multiplication of long ints. I think the multiplication of large numbers and their addition should be parallelizable. So a multi-threaded sub-O(N) solution should be possible even though (as Daniel Fisher states in comments) the F(N) solution is Theta(n).

解决方案

Since Fibonacci sequence is a linear recurrence, its members can be evaluated in closed form. This involves computing a power, which can be done in O(logn) similarly to the matrix-multiplication solution, but the constant overhead should be lower. That's the fastest algorithm I know.

EDIT

Sorry, I missed the "exact" part. Another exact O(log(n)) alternative for the matrix-multiplication can be calculated as follows

from functools import lru_cache

@lru_cache(None)
def fib(n):
    if n in (0, 1):
        return 1
    if n & 1:  # if n is odd, it's faster than checking with modulo
        return fib((n+1)//2 - 1) * (2*fib((n+1)//2) - fib((n+1)//2 - 1))
    a, b = fib(n//2 - 1), fib(n//2)
    return a**2 + b**2

This is based on the derivation from a note by Prof. Edsger Dijkstra. The solution exploits the fact that to calculate both F(2N) and F(2N-1) you only need to know F(N) and F(N-1). Nevertheless, you are still dealing with long-number arithmetics, though the overhead should be smaller than that of the matrix-based solution. In Python you'd better rewrite this in imperative style due to the slow memoization and recursion, though I wrote it this way for the clarity of the functional formulation.

这篇关于快速斐波那契计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆