如何使用 theano.op 在 pymc3 中编写自定义确定性或随机性? [英] How to write a custom Deterministic or Stochastic in pymc3 with theano.op?

查看:36
本文介绍了如何使用 theano.op 在 pymc3 中编写自定义确定性或随机性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一些 pymc3 并且我想创建自定义随机指标,但是似乎没有很多关于它是如何完成的文档.我知道如何使用 as_op 方式,但是显然这使得无法使用 NUTS 采样器,在这种情况下,我看不到 pymc3 优于 pymc.

I'm doing some pymc3 and I would like to create custom Stochastics, however there doesn't seem to be a lot documentation about how it's done. I know how to use the as_op way, however apparently that makes it impossible to use the NUTS sampler, in which case I don't see the advantage of pymc3 over pymc.

教程中提到可以通过继承theano.Op来完成.但是谁能告诉我它是如何工作的(我还在开始使用 theano)?我有两个要定义的随机指标.

The tutorial mentions that it can be done by inheriting from theano.Op. But can anyone show me how that would work (I'm still getting started on theano)? I have two Stochastics that I want to define.

第一个应该更简单,它是一个 N 维向量 F,只有常量父变量:

The first one should be easier, it's an N dimension vector F that has only constant parent variables:

with myModel:
    F = DensityDist('F', lambda value: pymc.skew_normal_like(value, F_mu_array, F_std_array, F_a_array), shape = N)

我想要一个偏斜正态分布,它似乎还没有在 pymc3 中实现,我只是导入了 pymc2 版本.不幸的是,F_mu_array、F_std_array、F_a_array 和 F 都是 N 维向量,而且 lambda 的东西似乎不适用于 N 维列表 value.

I want a skew normal distribution, which doesn't seem to be implemented in pymc3 yet, I just imported the pymc2 version. Unfortunately, F_mu_array, F_std_array, F_a_array and F are all N-dimensional vectors, and the lambda thing doesn't seem to work with an N-dimension list value.

首先,有没有办法让 lambda 输入一个 N 维数组?如果没有,我想我需要直接定义 Stochastic F,这就是我认为我需要 theano.Op 才能使其工作的地方.

Firstly, is there a way to make the lambda input an N-dimensional array? If not, I guess I would need to define the Stochastic F directly, and this is where I presume I need theano.Op to make it work.

第二个例子是其他随机指标的一个更复杂的函数.这里我想如何定义它(目前不正确):

The second example is a more complicated function of other Stochastics. Here how I want to define it (incorrectly at the moment):

with myModel:
    ln2_var = Uniform('ln2_var', lower=-10, upper=4)
    sigma = Deterministic('sigma', exp(0.5*ln2_var))        
    A = Uniform('A', lower=-10, upper=10, shape=5)
    C = Uniform('C', lower=0.0, upper=2.0, shape=5)
    sw = Normal('sw', mu=5.5, sd=0.5, shape=5)

    # F from before
    F = DensityDist('F', lambda value: skew_normal_like(value, F_mu_array, F_std_array, F_a_array), shape = N)
    M = Normal('M', mu=M_obs_array, sd=M_stdev, shape=N)

    #   Radius forward-model (THIS IS THE STOCHASTIC IN QUESTION)
    R = Normal('R', mu = R_forward(F, M, A, C, sw, N), sd=sigma, shape=N)

其中函数 R_forward(F,M,A,C,sw,N) 被天真地定义为:

Where the function R_forward(F,M,A,C,sw,N) is naively defined as:

from theano.tensor import lt, le, eq, gt, ge

def R_forward(Flux, Mass, A, C, sw, num):
    for i in range(num):
        if lt(Mass[i], 0.2):
            if lt(Flux[i], sw[0]):
                muR = C[0]
            else:
                muR = A[0]*log10(Flux[i]) + C[0] - A[0]*log10(sw[0])
        elif (le(0.2, Mass[i]) or le(Mass[i], 0.5)):
            if lt(Flux[i], sw[1]):
                muR = C[1]
            else:
                muR = A[1]*log10(Flux[i]) + C[1] - A[1]*log10(sw[1])
        elif (le(0.5, Mass[i]) or le(Mass[i], 1.5)):
            if lt(Flux[i], sw[2]):
                muR = C[2]
            else:
                muR = A[2]*log10(Flux[i]) + C[2] - A[2]*log10(sw[2])
        elif (le(1.5, Mass[i]) or le(Mass[i], 3.5)):
            if lt(Flux[i], sw[3]):
                muR = C[3]
            else:
                muR = A[3]*log10(Flux[i]) + C[3] - A[3]*log10(sw[3])
        else:
            if lt(Flux[i], sw[4]):
                muR = C[4]
            else:
                muR = A[4]*log10(Flux[i]) + C[4] - A[4]*log10(sw[4])
    return muR

这当然是行不通的.我可以看到我将如何使用 as_op,但我想保留 NUTS 采样.

This presumably won't work of course. I can see how I would use as_op, but I want to preserve the NUTS sampling.

推荐答案

我现在意识到这有点晚了,但我想我还是会回答这个问题(相当模糊).

I realize this is a bit late now, but I thought I'd answer the question (rather vaguely) anyways.

如果你想定义一个随机函数(例如概率分布),那么你需要做几件事:

If you want to define a stochastic function (e.g. a probability distribution), then you need to do a couple of things:

首先,定义离散 (pymc3.distributions.Discrete) 或连续的子类,它至少具有 logp 方法,该方法返回随机变量的对数似然.如果你把它定义为一个简单的符号方程 (x+1),我相信你不需要考虑任何梯度(但不要在这上面引用我的话;查看相关文档).我将在下面讨论更复杂的情况.在不幸的情况下,您需要做任何更复杂的事情,如您的第二个示例(顺便说一下,pymc3 现在实现了偏斜正态分布),您需要定义它所需的操作(在 logp 方法中使用)作为Theano Op.如果您不需要导数,那么 as_op 就可以完成这项工作,但正如您所说,梯度是一种 pymc3 的想法.

First, define a subclass of either Discrete (pymc3.distributions.Discrete) or Continuous, which has at least the method logp, which returns the log-likelihood of your stochastic. If you define this as a simple symbolic equation (x+1), I believe you do not need to take care of any gradients (but don't quote me on this; see the documentation about this). I'll get on to more complicated cases below. In the unfortunate case that you need to do anything more complex, as in your second example (pymc3 now has a skew normal distribution implemented, by the way), you need to define the operations required for it (used in the logp method) as a Theano Op. If you need no derivatives, then the as_op does the job, but as you said, gradients are kind of the idea of pymc3.

这就是事情变得复杂的地方.如果您想使用 NUTS(或出于任何原因需要梯度),那么您需要将 logp 中使用的操作实现为 theano.gof.Op 的子类.您的新 op 类(从现在开始我们简称它为 Op)至少需要两个或三个方法.第一个定义操作的输入/输出(检查操作文档).perform() 方法(或您可能选择的变体)是执行您想要的操作的方法(例如,您的 R_forward 函数).如果您愿意,这可以在纯 python 中完成.第三种方法 grad() 是您定义 perform() 输出与输入的梯度的地方.grad() 的实际输出有点不同,但没什么大不了的.

This is where it gets complicated. If you want to use NUTS (or need gradients for whatever reason), then you need to implement your operation used in logp as a subclass of theano.gof.Op. Your new op class (let's call it just Op from now on) will need two or three methods at least. The first one defines inputs/outputs to the Op (check the Op documentation). The perform() method (or variants you might choose) is the one that does the operation you want (your R_forward function, for example). This can be done in pure python, if you so wish. The third method, grad(), is where you define the gradient of your perform()'s output wrt the inputs. The actual output to grad() is a bit different, but not a big deal.

在 grad() 中使用 Theano 是有回报的.如果您在 Theano 中定义整个 perform(),那么您可能可以轻松地使用自动微分(theano.tensor.grad 或 theano.tensor.jacobian)为您完成工作(请参见下面的示例).然而,这并不一定容易.

And it is in grad() that using Theano pays off. If you define your entire perform() in Theano, then it might be that you can easily use automatic differentiation (theano.tensor.grad or theano.tensor.jacobian) to do the work for you (see the example below). However, this is not necessarily going to be easy.

在您的第二个示例中,这意味着在 Theano 中实现您的 R_forward 函数,这可能很复杂.

In your second example, it would mean implementing your R_forward function in Theano, which could be complicated.

这里包含了一个我在学习做这些事情时创建的 Op 的最小示例.

Here I include a somewhat minimal example of an Op that I created while learning to do these things.

def my_th_fun():
    """ Some needed auxiliary functions.
    """
    X = th.tensor.vector('X')
    SCALE = th.tensor.scalar('SCALE')

    X.tag.test_value = np.array([1,2,3,4])
    SCALE.tag.test_value = 5.

    Scale, upd_sm_X = th.scan(lambda x, scale: scale*(scale+ x),
                               sequences=[X],
                               outputs_info=[SCALE])
    fun_Scale = th.function(inputs=[X, SCALE], outputs=Scale)
    D_out_d_scale = th.tensor.grad(Scale[-1], SCALE)
    fun_d_out_d_scale = th.function([X, SCALE], D_out_d_scale)
    return Scale, fun_Scale, D_out_d_scale, fun_d_out_d_scale

class myOp(th.gof.Op):
    """ Op subclass with a somewhat silly computation. It uses
    th.scan and th.tensor.grad is used to calculate the gradient
    automagically in the grad() method.
    """
    __props__ = ()
    itypes = [th.tensor.dscalar]
    otypes = [th.tensor.dvector]
    def __init__(self, *args, **kwargs):
        super(myOp, self).__init__(*args, **kwargs)
        self.base_dist = np.arange(1,5)
        (self.UPD_scale, self.fun_scale, 
         self.D_out_d_scale, self.fun_d_out_d_scale)= my_th_fun()

    def perform(self, node, inputs, outputs):
        scale = inputs[0]
        updated_scale = self.fun_scale(self.base_dist, scale)
        out1 = self.base_dist[0:2].sum()
        out2 = self.base_dist[2:4].sum()
        maxout = np.max([out1, out2])
        exp_out1 = np.exp(updated_scale[-1]*(out1-maxout))
        exp_out2 = np.exp(updated_scale[-1]*(out2-maxout))
        norm_const = exp_out1 + exp_out2
        outputs[0][0] = np.array([exp_out1/norm_const, exp_out2/norm_const])

    def grad(self, inputs, output_gradients): #working!
        """ Calculates the gradient of the output of the Op wrt
        to the input. As a simple example, the input is scalar.

        Notice how the output is actually the gradient multiplied
        by the output_gradients, which is an input provided by
        theano when calculating gradients.
        """
        scale = inputs[0]
        X = th.tensor.as_tensor(self.base_dist)
        # Do I need to recalculate all this or can I assume that perform() has
        # always been called before grad() and thus can take it from there?
        # In any case, this is a small enough example to recalculate quickly:
        all_scale, _ = th.scan(lambda x, scale_1: scale_1*(scale_1+ x),
                               sequences=[X],
                               outputs_info=[scale])
        updated_scale = all_scale[-1]

        out1 = self.base_dist[0:1].sum()
        out2 = self.base_dist[2:3].sum()
        maxout = np.max([out1, out2])

        exp_out1 = th.tensor.exp(updated_scale*(out1 - maxout))
        exp_out2 = th.tensor.exp(updated_scale*(out2 - maxout))
        norm_const = exp_out1 + exp_out2

        d_S_d_scale = th.theano.grad(all_scale[-1], scale)
        Jac1 = (-(out1-out2)*d_S_d_scale*
                th.tensor.exp(updated_scale*(out1+out2 - 2*maxout))/(norm_const**2))
        Jac2 = -Jac1
        return Jac1*output_gradients[0][0]+ Jac2*output_gradients[0][1],

然后可以在 pymc3 中的随机变量的 logp() 方法中使用此操作:

This Op can then be used inside the logp() method of a stochastic in pymc3:

import pymc3 as pm

class myDist(pm.distributions.Discrete):
    def __init__(self, invT, *args, **kwargs):
        super(myDist, self).__init__(*args, **kwargs)
        self.invT = invT
        self.myOp = myOp()
    def logp(self, value):
        return self.myOp(self.invT)[value]

我希望它可以帮助任何(绝望的)pymc3/theano 新手.

I hope it helps any (hopeless) pymc3/theano newbie out there.

这篇关于如何使用 theano.op 在 pymc3 中编写自定义确定性或随机性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆