如何生成具有预定义概率分布的随机数? [英] How to generate random numbers with predefined probability distribution?

查看:214
本文介绍了如何生成具有预定义概率分布的随机数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在python中实现一个函数(使用numpy),该函数将数学函数(例如,如下所示的p(x) = e^(-x))作为输入并生成随机数,并根据该数学函数的概率进行分配分配.我需要绘制它们,以便我们可以看到分布.

I would like to implement a function in python (using numpy) that takes a mathematical function (for ex. p(x) = e^(-x) like below) as input and generates random numbers, that are distributed according to that mathematical-function's probability distribution. And I need to plot them, so we can see the distribution.

实际上,我确实需要一个随机数生成器函数来将以下两个数学函数作为输入,但是如果可以使用其他函数,为什么不呢?

I need actually exactly a random number generator function for exactly the following 2 mathematical functions as input, but if it could take other functions, why not:

1)p(x) = e^(-x)
2)g(x) = (1/sqrt(2*pi)) * e^(-(x^2)/2)

1) p(x) = e^(-x)
2) g(x) = (1/sqrt(2*pi)) * e^(-(x^2)/2)

有人知道这在python中如何实现吗?

Does anyone have any idea how this is doable in python?

推荐答案

对于所需的简单分布,或者如果您可以轻松转换成封闭格式CDF,则可以在NumPy中找到许多正确指向的采样器在奥利维尔的答案中.

For simple distributions like the ones you need, or if you have an easy to invert in closed form CDF, you can find plenty of samplers in NumPy as correctly pointed out in Olivier's answer.

对于任意分布,可以使用Markov-Chain Montecarlo采样方法.

For arbitrary distributions you could use Markov-Chain Montecarlo sampling methods.

这些算法最简单甚至可能更容易理解的变体是 Metropolis 采样.

The simplest and maybe easier to understand variant of these algorithms is Metropolis sampling.

基本思想如下:

  • 从随机点x开始并采取随机步骤xnew = x + delta
  • 评估起点p(x)和新的p(xnew)
  • 中所需的概率分布
  • 如果新点更有可能p(xnew)/p(x) >= 1接受移动
  • 如果新点的可能性较小,则根据新点的可能性 1 随机决定是接受还是拒绝
  • 从这一点开始的新步骤并重复循环
  • start from a random point x and take a random step xnew = x + delta
  • evaluate the desired probability distribution in the starting point p(x) and in the new one p(xnew)
  • if the new point is more probable p(xnew)/p(x) >= 1 accept the move
  • if the new point is less probable randomly decide whether to accept or reject depending on how probable1 the new point is
  • new step from this point and repeat the cycle

可以显示,例如 Sokal 2 ,用这种方法进行采样遵循接受概率分布.

It can be shown, see e.g. Sokal2, that points sampled with this method follow the acceptance probability distribution.

可以在 PyMC3 包中找到Python中Montecarlo方法的广泛实现.

An extensive implementation of Montecarlo methods in Python can be found in the PyMC3 package.

这只是一个玩具示例,目的只是向您展示基本思想,而不以任何方式作为参考实现.请认真参考成熟的软件包.

Here's a toy example just to show you the basic idea, not meant in any way as a reference implementation. Please refer to mature packages for any serious work.

def uniform_proposal(x, delta=2.0):
    return np.random.uniform(x - delta, x + delta)

def metropolis_sampler(p, nsamples, proposal=uniform_proposal):
    x = 1 # start somewhere

    for i in range(nsamples):
        trial = proposal(x) # random neighbour from the proposal distribution
        acceptance = p(trial)/p(x)

        # accept the move conditionally
        if np.random.uniform() < acceptance:
            x = trial

        yield x

让我们看看它是否可以与一些简单的发行版一起使用

Let's see if it works with some simple distributions

def gaussian(x, mu, sigma):
    return 1./sigma/np.sqrt(2*np.pi)*np.exp(-((x-mu)**2)/2./sigma/sigma)

p = lambda x: gaussian(x, 1, 0.3) + gaussian(x, -1, 0.1) + gaussian(x, 3, 0.2)
samples = list(metropolis_sampler(p, 100000))

def cauchy(x, mu, gamma):
    return 1./(np.pi*gamma*(1.+((x-mu)/gamma)**2))

p = lambda x: cauchy(x, -2, 0.5)
samples = list(metropolis_sampler(p, 100000))

您实际上不必从适当的概率分布中采样.您可能只需要强制执行一个有限的域,即可在其中采样您的随机步骤 3

You don't really have to sample from proper probability distributions. You might just have to enforce a limited domain where to sample your random steps3

p = lambda x: np.sqrt(x)
samples = list(metropolis_sampler(p, 100000, domain=(0, 10)))

p = lambda x: (np.sin(x)/x)**2
samples = list(metropolis_sampler(p, 100000, domain=(-4*np.pi, 4*np.pi)))

关于提案分布,收敛,相关性,效率,应用,贝叶斯形式主义,其他MCMC采样器等,还有太多话要说. 我认为这不是合适的地方,而且有很多资料比我在网上可以找到的更好.

There is still way too much to say, about proposal distributions, convergence, correlation, efficiency, applications, Bayesian formalism, other MCMC samplers, etc. I don't think this is the proper place and there is plenty of much better material than what I could write here available online.

  1. 此处的想法是倾向于探索概率较高的区域,但仍会关注低概率区域,因为它们可能会导致其他峰值.基本是 proposal 分布的选择,即您如何选择要探索的新点.步长太小可能会限制您到发行版的有限区域,步长太大可能会导致探索效率非常低.

  1. The idea here is to favor exploration where the probability is higher but still look at low probability regions as they might lead to other peaks. Fundamental is the choice of the proposal distribution, i.e. how you pick new points to explore. Too small steps might constrain you to a limited area of your distribution, too big could lead to a very inefficient exploration.

面向物理.如今,贝叶斯形式主义(Metropolis-Hastings)受到人们的青睐,但是恕我直言,对于初学者而言,它有点难掌握.在线上有很多教程,请参见例如杜克大学的这个人.

Physics oriented. Bayesian formalism (Metropolis-Hastings) is preferred these days but IMHO it's a little harder to grasp for beginners. There are plenty of tutorials available online, see e.g. this one from Duke university.

实现未显示不会增加太多混乱,但是很简单,您只需要在域边缘包装试用步骤或使所需函数在域外变为零即可.

Implementation not shown not to add too much confusion, but it's straightforward you just have to wrap trial steps at the domain edges or make the desired function go to zero outside the domain.

这篇关于如何生成具有预定义概率分布的随机数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆