PyMC回归有很多回归? [英] PyMC regression of many regressions?

查看:91
本文介绍了PyMC回归有很多回归?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经很长时间没有使用PyMC了,但是我对能够以多快的速度实现线性回归感到高兴(该代码应该在IPython中无需修改即可运行):

I haven't been using PyMC for long, but I was pleased at how quickly I was able to get a linear regression off the ground (this code should run without modification in IPython):

import pandas as pd
from numpy import *
import pymc

data=pd.DataFrame(rand(40))
predictors=pd.DataFrame(rand(40,5))
sigma = pymc.Uniform('sigma', 0.0, 200.0, value=20)
params= array([pymc.Normal('%s_coef' % (c), mu=0, tau=1e-3,value=0) for c in predictors.columns])

@pymc.deterministic(plot=False)
def linear_regression_model(x=predictors,beta=params):
    return dot(x,beta)

ynode = pymc.Normal('ynode', mu=linear_regression_model, tau=1.0/sigma**2,value=data,observed=True)


m = pymc.Model(concatenate((params,array([sigma,ynode]))))

%time pymc.MCMC(m).sample(10000, 5000, 5, progress_bar=True)

在此模型中,有40个受试者(观察值),每个受试者有5个协变量.该模型不会因为随机数据而收敛,但是它采样时没有错误(我的真实数据确实收敛到了准确的结果).

In this model there are 40 subjects (observations) and 5 covariates for each subject. The model won't converge because of the random data, but it samples with no error (and my real data does converge to an accurate result).

我遇到问题的模型是对此的扩展.每个主题实际上都有3个(或N个)观察值,因此我需要为这些观察值拟合一条线,然后将线的截距用作数据"或最终回归节点的输入.我认为这是一个经典的分层模型,但是如果我以错误的方式考虑它,请纠正我.

The model I am having a problem with is an extension of this. Each subject actually has 3 (or N) observations and so I need to fit a line to these observations and then use the intercept of the line as the "data" or input for the final regression node. I think this a classical hierarchical model, but correct me if I'm thinking about it in the wrong way.

我的解决方案是建立一系列40个线性回归(每个受试者一个),然后使用截距参数向量作为最终回归的数据.

My solution was to set up a series of 40 linear regressions (one for each subject) and then use the vector of intercept parameters as the data for the final regression.

#nodes for fitting 3 values for each of 40 subjects with a line
#40 subjects, 3 data points each
data=pd.DataFrame(rand(40,3))
datax=arange(3)

"""
to fit a line to each subject's data we need:
    (1) a slope and offset parameter
    (2) a stochastic node for the data
    (3) a sigma parameter for the stochastic node

Initialize them all as object arrays
"""
sigmaArr=empty((len(data.index)),dtype=object)
ynodeArr=empty((len(data.index)),dtype=object)
slopeArr=empty((len(data.index)),dtype=object)
offsetArr=empty((len(data.index)),dtype=object)

#Fill in the empty arrays
for i,ID in enumerate(data.index):
    sigmaArr[i]=pymc.Uniform('sigma_%s' % (ID) , 0.0, 200.0, value=20)
    slopeArr[i]=pymc.Normal('%s_slope' % (ID), mu=0, tau=1e-3,value=0)
    offsetArr[i]=pymc.Normal('%s_avg' % (ID), mu=0, tau=1e-3,value=data.ix[ID].mean())

    #each model fits a line to the three data points
    @pymc.deterministic(name='time_model_%s' % ID,plot=False)
    def line_model(xx=datax,slope=slopeArr[i],avg=offsetArr[i]):
        return slope*xx + avg

    ynodeArr[i]=pymc.Normal('ynode_%s' % (ID), mu = line_model, tau = 1/sigmaArr[i]**2,value=data.ix[ID],observed=True)


#nodes for final regression (there are 5 covariates in this regression)
predictors=pd.DataFrame(rand(40,5))
sigma = pymc.Uniform('sigma', 0.0, 200.0, value=20)
params= array([pymc.Normal('%s_coef' % (c), mu=0, tau=1e-3,value=0) for c in predictors.columns])


@pymc.deterministic(plot=False)
def linear_regression_model(x=predictors,beta=params):
    return dot(x,beta)

ynode = pymc.Normal('ynode', mu=linear_regression_model, tau=1.0/sigma**2,value=offsetArr)

nodes=concatenate((sigmaArr,ynodeArr,slopeArr,offsetArr,params,array([sigma, ynode])))

m = pymc.Model(nodes)

%time pymc.MCMC(m).sample(10000, 5000, 5, progress_bar=True)

此模型在采样步骤失败.错误似乎是在尝试将offsetArr强制转换为dtype = float64时,而是将其dtype = object转换为dtype = float64.正确的方法是什么?我是否需要另一个确定性节点来帮助将我的offsetArr强制转换为float64?我需要一种特殊的pymc.Container吗?感谢您的帮助!

This model fails at the sampling step. The error appears to be in trying to cast offsetArr as dtype=float64 when instead its dtype=object. What is the correct way to do this? Do I need another deterministic node to help cast my offsetArr to float64? Do I need a special kind of pymc.Container? Thanks for your help!

推荐答案

您是否尝试过使用简单列表而不是numpy数组来存储PyMC对象?

Have you tried using a simple list instead of numpy arrays to store the PyMC objects?

这篇关于PyMC回归有很多回归?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆