我怎样才能有效地计算二项式累积分布函数? [英] How can I efficiently calculate the binomial cumulative distribution function?

查看:1303
本文介绍了我怎样才能有效地计算二项式累积分布函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们说,我知道一个成功的概率为P.运行测试N次了,我见S成功。该测试是类似于折腾不均衡加重硬币(也许头是成功的,尾巴是失败的)。

Let's say that I know the probability of a "success" is P. I run the test N times, and I see S successes. The test is akin to tossing an unevenly weighted coin (perhaps heads is a success, tails is a failure).

我想知道看到既无S的成功,还是取得了一些成功不太可能超过新的成功的概率大概

I want to know the approximate probability of seeing either S successes, or a number of successes less likely than S successes.

因此​​,举例来说,如果P为0.3,N为100,而我得到20成功,我在寻找获得20 或更少的成功的概率。

So for example, if P is 0.3, N is 100, and I get 20 successes, I'm looking for the probability of getting 20 or fewer successes.

如果,另一方面hadn,P为0.3,N为100,而我得到40成功,我在寻找让40我们更多的成功的概率。

If, on the other hadn, P is 0.3, N is 100, and I get 40 successes, I'm looking for the probability of getting 40 our more successes.

我知道,这个问题却涉及到寻找下一个二项式曲线下面积,:

I'm aware that this problem relates to finding the area under a binomial curve, however:

  1. 在我的数学福是达不到这些知识转变为有效的code
  2. 的任务
  3. 虽然我理解二项式曲线会给出一个确切的结果,我得到的IM pression,这将是固有的低效率。一个快速的方法来计算一个近似的结果就足够了。

我要强调的是,这种计算有要快,而且最好应确定与标准的64位或128位浮点运算。

I should stress that this computation has to be fast, and should ideally be determinable with standard 64 or 128 bit floating point computation.

我在寻找一个函数,P,S,和N - 并返回一个概率。由于我比较熟悉code比数学符号,我想preFER任何答案雇用伪code或code。

I'm looking for a function that takes P, S, and N - and returns a probability. As I'm more familiar with code than mathematical notation, I'd prefer that any answers employ pseudo-code or code.

推荐答案

精确二项分布

def factorial(n): 
    if n < 2: return 1
    return reduce(lambda x, y: x*y, xrange(2, int(n)+1))

def prob(s, p, n):
    x = 1.0 - p

    a = n - s
    b = s + 1

    c = a + b - 1

    prob = 0.0

    for j in xrange(a, c + 1):
        prob += factorial(c) / (factorial(j)*factorial(c-j)) \
                * x**j * (1 - x)**(c-j)

    return prob

>>> prob(20, 0.3, 100)
0.016462853241869437

>>> 1-prob(40-1, 0.3, 100)
0.020988576003924564

正常估算,好大的n

import math
def erf(z):
        t = 1.0 / (1.0 + 0.5 * abs(z))
        # use Horner's method
        ans = 1 - t * math.exp( -z*z -  1.26551223 +
                                                t * ( 1.00002368 +
                                                t * ( 0.37409196 + 
                                                t * ( 0.09678418 + 
                                                t * (-0.18628806 + 
                                                t * ( 0.27886807 + 
                                                t * (-1.13520398 + 
                                                t * ( 1.48851587 + 
                                                t * (-0.82215223 + 
                                                t * ( 0.17087277))))))))))
        if z >= 0.0:
                return ans
        else:
                return -ans

def normal_estimate(s, p, n):
    u = n * p
    o = (u * (1-p)) ** 0.5

    return 0.5 * (1 + erf((s-u)/(o*2**0.5)))

>>> normal_estimate(20, 0.3, 100)
0.014548164531920815

>>> 1-normal_estimate(40-1, 0.3, 100)
0.024767304545069813

泊松估算:适合大n和小P

import math

def poisson(s,p,n):
    L = n*p

    sum = 0
    for i in xrange(0, s+1):
        sum += L**i/factorial(i)

    return sum*math.e**(-L)

>>> poisson(20, 0.3, 100)
0.013411150012837811
>>> 1-poisson(40-1, 0.3, 100)
0.046253037645840323

这篇关于我怎样才能有效地计算二项式累积分布函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆