生成二项式系数的最快方法 [英] Fastest way to generate binomial coefficients

查看:16
本文介绍了生成二项式系数的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要计算一个数字的组合.

I need to calculate combinations for a number.

计算 nCp 其中 n>>p 的最快方法是什么?

What is the fastest way to calculate nCp where n>>p?

我需要一种快速的方法来为多项式方程生成二项式系数,我需要获取所有项的系数并将其存储在数组中.

I need a fast way to generate binomial coefficients for an polynomial equation and I need to get the coefficient of all the terms and store it in an array.

(a+b)^n = a^n + nC1 a^(n-1) * b + nC2 a^(n-2) * ........+nC(n-1) a * b^(n-1) + b^n

(a+b)^n = a^n + nC1 a^(n-1) * b + nC2 a^(n-2) * ............ +nC(n-1) a * b^(n-1) + b^n

计算 nCp 的最有效方法是什么??

What is the most efficient way to calculate nCp ??

推荐答案

如果您想对较大的 n 值进行完全扩展,FFT 卷积可能是最快的方法.在具有相等系数(例如,一系列公平的抛硬币)和偶数阶(例如,抛硬币的次数)的二项式展开的情况下,您可以这样利用对称性:

If you want complete expansions for large values of n, FFT convolution might be the fastest way. In the case of a binomial expansion with equal coefficients (e.g. a series of fair coin tosses) and an even order (e.g. number of tosses) you can exploit symmetries thus:

理论

用表达式 A + A*cos(Pi*n/N) 表示两次抛硬币的结果(例如正面和反面总数之差的一半).N 是缓冲区中的样本数 - 偶数阶 O 的二项式展开将具有 O+1 系数并且需要 N >= O/2 + 1 个样本的缓冲区 - n 是正在生成的样本数,而 A 是比例因子通常为 2(用于生成二项式系数)或 0.5(用于生成二项式概率分布).

Represent the results of two coin tosses (e.g. half the difference between the total number of heads and tails) with the expression A + A*cos(Pi*n/N). N is the number of samples in your buffer - a binomial expansion of even order O will have O+1 coefficients and require a buffer of N >= O/2 + 1 samples - n is the sample number being generated, and A is a scale factor that will usually be either 2 (for generating binomial coefficients) or 0.5 (for generating a binomial probability distribution).

请注意,在频率上,此表达式类似于这两次抛硬币的二项式分布 - 在对应于数字 (正面-反面)/2 的位置有三个对称尖峰.由于对独立事件的整体概率分布建模需要对它们的分布进行卷积,因此我们希望在频域中对表达式进行卷积,这相当于时域中的乘法.

Notice that, in frequency, this expression resembles the binomial distribution of those two coin tosses - there are three symmetrical spikes at positions corresponding to the number (heads-tails)/2. Since modelling the overall probability distribution of independent events requires convolving their distributions, we want to convolve our expression in the frequency domain, which is equivalent to multiplication in the time domain.

换句话说,通过将两次抛掷结果的余弦表达式提高到一个幂(例如,模拟 500 次抛掷,将其提高到 250 次幂,因为它已经代表了一对),我们可以安排二项式分布大量出现在频域中.既然这都是真实的,我们可以用DCT-I代替DFT来提高效率.

In other words, by raising our cosine expression for the result of two tosses to a power (e.g. to simulate 500 tosses, raise it to the power of 250 since it already represents a pair), we can arrange for the binomial distribution for a large number to appear in the frequency domain. Since this is all real and even, we can substitute the DCT-I for the DFT to improve efficiency.

算法

  1. 确定缓冲区大小 N,即至少为 O/2 + 1 并且可以方便地进行 DCT 处理
  2. 用表达式 pow(A + A*cos(Pi*n/N),O/2) 初始化它
  3. 应用正向 DCT-I
  4. 从缓冲区中读出系数 - 第一个数字是中心峰,其中正面=尾部,随后的条目对应于离中心更远的对称对

准确性

在累积的浮点舍入误差夺走了系数的准确整数值之前,O 的高度是有限制的,但我猜这个数字相当高.双精度浮点数可以完全准确地表示 53 位整数,我将忽略使用 pow() 涉及的舍入损失,因为生成表达式将发生在 FP 寄存器中,给我们额外的 11尾数位以吸收 Intel 平台上的舍入误差.因此,假设我们使用通过 FFT 实现的 1024 点 DCT-I,这意味着在转换过程中会因舍入误差而损失 10 位的精度,而没有其他太多,只剩下约 43 位的干净表示.我不知道什么顺序的二项式展开会产生这种大小的系数,但我敢说它足以满足您的需要.

There's a limit to how high O can be before accumulated floating-point rounding errors rob you of accurate integer values for the coefficients, but I'd guess the number is pretty high. Double-precision floating-point can represent 53-bit integers with complete accuracy, and I'm going to ignore the rounding loss involved in the use of pow() because the generating expression will take place in FP registers, giving us an extra 11 bits of mantissa to absorb the rounding error on Intel platforms. So assuming we use a 1024-point DCT-I implemented via the FFT, that means losing 10 bits' accuracy to rounding error during the transform and not much else, leaving us with ~43 bits of clean representation. I don't know what order of binomial expansion generates coefficients of that size, but I dare say it's big enough for your needs.

非对称扩展

如果您想要 a 和 b 的不等系数的不对称展开,您需要使用双边(复数)DFT 和复数 pow() 函数.生成表达式 A*A*e^(-Pi*i*n/N) + A*B + B*B*e^(+Pi*i*n/N) [使用复杂的 pow() 函数提出它的扩展顺序的一半]和DFT它.同样,缓冲区中的中心点(如果 A 和 B 非常不同,则不是最大值)在偏移量为零处,然后是分布的上半部分.缓冲区的上半部分将包含分布的下半部分,对应于为负的正面负尾值.

If you want the asymmetrical expansions for unequal coefficients of a and b, you'll need to use a two-sided (complex) DFT and a complex pow() function. Generate the expression A*A*e^(-Pi*i*n/N) + A*B + B*B*e^(+Pi*i*n/N) [using the complex pow() function to raise it to the power of half the expansion order] and DFT it. What you have in the buffer is, again, the central point (but not the maximum if A and B are very different) at offset zero, and it is followed by the upper half of the distribution. The upper half of the buffer will contain the lower half of the distribution, corresponding to heads-minus-tails values that are negative.

请注意,源数据是 Hermitian 对称的(输入缓冲区的后半部分是第一个的复共轭),因此该算法不是最佳的,可以使用所需的一半的复数到复数 FFT 来执行尺寸以获得最佳效率.

Notice that the source data is Hermitian symmetrical (the second half of the input buffer is the complex conjugate of the first), so this algorithm is not optimal and can be performed using a complex-to-complex FFT of half the required size for optimum efficiency.

不用说,与上述对称分布的纯真实算法相比,所有复杂的求幂都会消耗更多的 CPU 时间并损害准确性.

Needless to say, all the complex exponentiation will chew more CPU time and hurt accuracy compared to the purely real algorithm for symmetrical distributions above.

这篇关于生成二项式系数的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆