当参数已知时,如何从自定义分布中采样? [英] How to sample from a custom distribution when parameters are known?

查看:62
本文介绍了当参数已知时,如何从自定义分布中采样?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目标是从已知参数的分布中获取样本.

例如,自定义分布为p(X | theta),其中theta为K维的参数矢量,X为N维的随机矢量.

现在我们知道(1)theta是已知的;(2)p(X | theta)未知,但我知道p(X | theta)∝ f(X,theta),f是已知函数.

pymc3可以从p(X | theta)进行这种采样吗?

目的不是从参数的后验分布中采样,而是要从自定义分布中采样.

从一个简单的伯努利分布抽样示例开始.我做了以下事情:

 将pymc3导入为pm将numpy导入为np导入scipy.stats作为统计信息将熊猫作为pd导入将theano.tensor导入为tt使用pm.Model()作为model1:p = 0.3密度= pm.DensityDist('密度',lambda x1:tt.switch(x1,tt.log(p),tt.log(1- p)),)#tt.switch(x1,tt.log(p),tt.log(1- p))是pymc3源代码的对数似然使用model1:步骤= pm.Metropolis()样本= pm.sample(1000,step = step) 

我希望结果是1000个二进制数字,其中1的比例约为0.3.但是,在输出中出现大量数字的情况下,我得到了奇怪的结果.

我知道出了点问题.请为如何正确地为此类非后置MCMC采样问题编写pymc3代码提供帮助.

解决方案

先前的预测采样(您应该使用 pm.sample_prior_predictive())仅涉及使用 RandomVariable 对象.默认情况下, DensityDist 不实现RNG,但是为此目的提供了 random 参数,因此您需要使用它.对数似然率仅针对可观察对象进行评估,因此在这里不起作用.

一种为任意分布生成有效RNG的简单方法是使用逆变换采样.在这种情况下,可以对单位间隔上的均匀分布进行采样,然后通过所需函数的逆CDF对其进行变换.对于伯努利案例,逆CDF根据成功概率对单位线进行划分,将0分配给一个零件,将1分配给另一零件.

这是一个类似于工厂的实现,它创建与 pm.DensityDist random 参数兼容的Bernoulli RNG(即,接受 point size kwargs).

  def get_bernoulli_rng(p = 0.5):def _rng(point = None,size = 1):#伯努利逆CDF,给定p(成功概率)_icdf = lambda q:np.uint8(q< p)返回_icdf(pm.Uniform.dist().random(point = point,size = size))返回_rng 

因此,要填写示例,它将类似于

 ,其中pm.Model()为m:p = 0.3y = pm.DensityDist('y',lambda x:tt.switch(x,tt.log(p),tt.log(1-p)),random = get_bernoulli_rng(p))之前= pm.sample_prior_predictive(random_seed = 2019)Priority ['y'].mean()#0.306 

很明显,这可以同样地通过 random = pm.Bernoulli.dist(p).random 完成,但以上示例大致说明了如何根据任意CDF来实现任意分布,即,您只需要修改 _icdf 和参数.

The target is to get samples from a distribution whose parameters is known.

For example, the self-defined distribution is p(X|theta), where theta the parameter vector of K dimensions and X is the random vector of N dimensions.

Now we know (1) the theta is known; (2) p(X|theta) is NOT known, but I know p(X|theta) ∝ f(X,theta), and f is a known function.

Can pymc3 do such sampling from p(X|theta), and how?

The purpose is not sampling from posterior distribution of parameters, but want to samples from a self-defined distribution.

Starting from a simple example of sampling from a Bernoulli distribution. I did the following:

import pymc3 as pm
import numpy as np
import scipy.stats as stats
import pandas as pd
import theano.tensor as tt

with pm.Model() as model1:
    p=0.3
    density = pm.DensityDist('density',
                             lambda x1: tt.switch( x1, tt.log(p), tt.log(1 - p) ),
                             ) #tt.switch( x1, tt.log(p), tt.log(1 - p) ) is the log likelihood from pymc3 source code

with model1:
    step = pm.Metropolis()
    samples = pm.sample(1000, step=step)

I expect the result is 1000 binary digits, with the proportion of 1 is about 0.3. However, I got strange results where very large numbers occur in the output.

I know something is wrong. Please help on how to correctly write pymc3 codes for such non-posterior MCMC sampling questions.

解决方案

Prior predictive sampling (for which you should be using pm.sample_prior_predictive()) involves only using the RNGs provided by the RandomVariable objects in your compute graph. By default, DensityDist does not implement a RNG, but does provide the random parameter for this purpose, so you'll need to use that. The log-likelihood is only evaluated with respect to observables, so it plays no role here.

A simple way to generate a valid RNG for an arbitrary distribution is to use inverse transform sampling. In this case, one samples a uniform distribution on the unit interval and then transforms it through the inverse CDF of the desired function. For the Bernoulli case, the inverse CDF partitions the unit line based on the probability of success, assigning 0 to one part and 1 to the other.

Here is a factory-like implementation that creates a Bernoulli RNG compatible with pm.DensityDist's random parameter (i.e., accepts point and size kwargs).

def get_bernoulli_rng(p=0.5):

    def _rng(point=None, size=1):
        # Bernoulli inverse CDF, given p (prob of success)
        _icdf = lambda q: np.uint8(q < p)

        return _icdf(pm.Uniform.dist().random(point=point, size=size))

    return _rng

So, to fill out the example, it would go something like

with pm.Model() as m:
    p = 0.3
    y = pm.DensityDist('y', lambda x: tt.switch(x, tt.log(p), tt.log(1-p)),
                       random=get_bernoulli_rng(p))
    prior = pm.sample_prior_predictive(random_seed=2019)

prior['y'].mean() # 0.306

Obviously, this could equally be done with random=pm.Bernoulli.dist(p).random, but the above illustrates generically how one could do this with arbitrary distributions, given their inverse CDF, i.e., you only need to modify _icdf and the parameters.

这篇关于当参数已知时,如何从自定义分布中采样?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆