拟合贝叶斯线性回归并预测不可观察的值 [英] Fit a bayesian linear regression and predict unobservable values

查看:273
本文介绍了拟合贝叶斯线性回归并预测不可观察的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Jags加R来调整具有可观察量的线性模型,并推断出不可观察量.我在互联网上找到了很多有关如何调整模型的示例,但是在Jags环境中拟合模型后,却没有关于如何外推其系数的任何示例.因此,在此方面的任何帮助,我将不胜感激.

I'd like to use Jags plus R to adjust a linear model with observable quantities, and make inference about unobservable ones. I found lots of example on the internet about how to adjust the model, but nothing on how to extrapolate its coefficients after having fitted the model in the Jags environment. So, I'll appreciate any help on this.

我的数据如下:

ngroups <- 2
group <- 1:ngroups
nobs <- 100
dta <- data.frame(group=rep(group,each=nobs),y=rnorm(nobs*ngroups),x=runif(nobs*ngroups))
head(dta)

推荐答案

JAGS具有强大的推断丢失数据的方法,一旦掌握了这些,就很容易!我强烈建议您查看MarcKéry的优秀书籍很好地介绍了BUGS语言编程(JAGS与BUGS足够接近,几乎所有内容都可以转移).

JAGS has powerful ways to make inference about missing data, and once you get the hang of it, it's easy! I strongly recommend that you check out Marc Kéry's excellent book which provides a wonderful introduction to BUGS language programming (JAGS is close enough to BUGS that almost everything transfers).

最简单的方法包括调整模型.下面,我提供了一个完整的工作示例,说明了它是如何工作的.但是您似乎在寻求一种无需重新运行模型即可获得预测间隔的方法(您的模型是否很大且计算量大?).这也可以完成.
如何预测-困难的方式(无需重新运行模型) 对于MCMC的每次迭代,请基于该协变量值的迭代后验来模拟所需x值的响应.因此,假设您要预测X = 10的值.然后,如果迭代1(预烧)的斜率= 2,截距= 1,标准偏差= 0.5,则从

The easiest way to do this involves, as you say, adjusting the model. Below I provide a complete worked example of how this works. But you seem to be asking for a way to get the prediction interval without re-running the model (is your model very large and computationally expensive?). This can also be done.
How to predict--the hard way (without re-running the model) For each iteration of the MCMC, simulate the response for the desired x-value based on that iteration's posterior draws for the covariate values. So imagine you want to predict a value for X=10. Then if iteration 1 (post burn-in) has slope=2, intercept=1, and standard deviation=0.5, draw a Y-value from

Y=rnorm(1, 1+2*10, 0.5)  

并重复进行迭代2、3、4、5 ... 这些将是您在X = 10时的响应的后验. 注意:如果您未在JAGS模型中监控标准偏差,则很不走运,因此需要再次拟合模型.

And repeat for iteration 2, 3, 4, 5... These will be your posterior draws for the response at X=10. Note: if you did not monitor the standard deviation in your JAGS model, you are out of luck and need to fit the model again.

如何通过简单的示例进行预测 基本思想是将要预测其响应的x值插入到您的数据中,并带有关联的y值NA.例如,如果您希望将预测间隔设为X = 10,则只需在数据中包括点(10,NA),然后为y值设置跟踪监视器即可.

How to predict--the easy way--with worked example The basic idea is to insert (into your data) the x-values whose responses you want to predict, with the associated y-values NA. For example, if you want a prediction interval for X=10, you just have to include the point (10, NA) in your data, and set a trace monitor for the y-value.

我将R的JAGS与rjags包一起使用.下面是一个完整的工作示例,首先模拟数据,然后向数据中添加一些额外的x值,然后通过rjags在JAGS中指定并运行线性模型,并对结果进行汇总. Y [101:105]包含来自X [101:105]的后验预测间隔的抽奖.请注意,Y [1:100]仅包含X [1:100]的y值.这些是我们提供给模型的观察数据,它们在模型更新时永远不会改变.

I use JAGS from R with the rjags package. Below is a complete worked example that begins by simulating the data, then adds some extra x-values to the data, specifies and runs the linear model in JAGS via rjags, and summarizes the results. Y[101:105] contains draws from the posterior prediction intervals for X[101:105]. Notice that Y[1:100] just contains the y-values for X[1:100]. These are the observed data that we fed to the model, and they never change as the model updates.

library(rjags)
# Simulate data (100 observations)
my.data <- as.data.frame(matrix(data=NA, nrow=100, ncol=2))
names(my.data) <- c("X", "Y")
# the linear model will predict Y based on the covariate X

my.data$X <- runif(100) # values for the covariate
int <- 2     # specify the true intercept
slope <- 1   # specify the true slope
sigma <- .5   # specify the true residual standard deviation
my.data$Y <- rnorm(100, slope*my.data$X+int, sigma)  # Simulate the data

#### Extra data for prediction of unknown Y-values from known X-values
y.predict <- as.data.frame(matrix(data=NA, nrow=5, ncol=2))
names(y.predict) <- c("X", "Y")
y.predict$X <- c(-1, 0, 1.3, 2, 7)

mydata <- rbind(my.data, y.predict)


set.seed(333)
setwd(INSERT YOUR WORKING DIRECTORY HERE)
sink("mymodel.txt")
cat("model{

    # Priors

    int ~ dnorm(0, .001)
    slope ~ dnorm(0, .001)
    tau <- 1/(sigma * sigma)
    sigma ~ dunif(0,10) 

    # Model structure

    for(i in 1:R){
    Y[i] ~ dnorm(m[i],tau)
    m[i] <- int + slope * X[i]
    }
    }", fill=TRUE)
sink()
jags.data <- list(R=dim(mydata)[1], X=mydata$X, Y=mydata$Y)

inits <- function(){list(int=rnorm(1, 0, 5), slope=rnorm(1,0,5),
                         sigma=runif(1,0,10))}

params <- c("Y", "int", "slope", "sigma")

nc <- 3
n.adapt <-1000
n.burn <- 1000
n.iter <- 10000
thin <- 10
my.model <- jags.model('mymodel.txt', data = jags.data, inits=inits, n.chains=nc, n.adapt=n.adapt)
update(my.model, n.burn)
my.model_samples <- coda.samples(my.model,params,n.iter=n.iter, thin=thin)
summary(my.model_samples)

这篇关于拟合贝叶斯线性回归并预测不可观察的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆