为抽样创建混合概率分布 [英] Creating a mixture of probability distributions for sampling

查看:44
本文介绍了为抽样创建混合概率分布的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有一种通用的方法可以连接 SciPy(或 NumPy)概率分布以创建混合概率分布,然后可以从中采样?

我有一个这样的发行版,使用类似的东西来显示:

mixture_gaussian = (norm.pdf(x_axis, -3, 1) + norm.pdf(x_axis, 3, 1))/2

如果然后绘制看起来像:

但是,我无法从这个生成的模型中采样,因为它只是一个将绘制为曲线的点列表.

注意,这个特定的分布只是一个简单的例子.我希望能够生成多种分布(包括子"分布,而不仅仅是正态分布).理想情况下,我希望有某种方式可以让函数自动规范化(即不必像上面的代码那样显式地执行 /2.

SciPy/NumPy 是否提供了一些轻松实现此目的的方法?

Is there a general way to join SciPy (or NumPy) probability distributions to create a mixture probability distribution which can then be sampled from?

I have such a distribution for display using something like:

mixture_gaussian = (norm.pdf(x_axis, -3, 1) + norm.pdf(x_axis, 3, 1)) / 2

which if then plotted looks like:

However, I can't sample from this generated model, as it's just a list of points which will plot as the curve.

Note, this specific distribution is just a simple example. I'd like to be able to generate several kinds of distributions (including "sub"-distributions which are not just normal distributions). Ideally, I would hope there would be someway for the function to be automatically normalized (i.e. not having to do the / 2 explicitly as in the code above.

Does SciPy/NumPy provide some way of easily accomplishing this?

This answer provides a way that such a sampling from a multiple distributions could be done, but it certainly requires a bit of handcrafting for a given mixture distribution, especially when wanting to weight different "sub"-distributions differently. This is usable, but I would hope for method that's a bit cleaner and straight forward if possible. Thanks!

解决方案

Sampling from a mixture of distributions (where PDFs are added with some coefficients c_1, c_2, ... c_n) is equivalent to sampling each independently, and then, for each index, picking the value from k-th sample, with probability c_k.

The latter, mixing, step can be efficiently done with numpy.random.choice. Here is an example where three distributions are mixed. The distributions are listed in distributions, and their coefficients in coefficients. There is a fat normal distribution, a uniform distribution, and a narrow normal distribution, with coefficients 0.5, 0.2, 0.3. The mixing happens at data[np.arange(sample_size), random_idx] after random_idx are generated according to given coefficients.

import numpy as np
import matplotlib.pyplot as plt

distributions = [
    {"type": np.random.normal, "kwargs": {"loc": -3, "scale": 2}},
    {"type": np.random.uniform, "kwargs": {"low": 4, "high": 6}},
    {"type": np.random.normal, "kwargs": {"loc": 2, "scale": 1}},
]
coefficients = np.array([0.5, 0.2, 0.3])
coefficients /= coefficients.sum()      # in case these did not add up to 1
sample_size = 100000

num_distr = len(distributions)
data = np.zeros((sample_size, num_distr))
for idx, distr in enumerate(distributions):
    data[:, idx] = distr["type"](size=(sample_size,), **distr["kwargs"])
random_idx = np.random.choice(np.arange(num_distr), size=(sample_size,), p=coefficients)
sample = data[np.arange(sample_size), random_idx]
plt.hist(sample, bins=100, density=True)
plt.show()

这篇关于为抽样创建混合概率分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆