生成随机变量,它们之间具有给定的相关性: [英] Generating Random Variables with given correlations between pairs of them:

查看:191
本文介绍了生成随机变量,它们之间具有给定的相关性:的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想生成2个连续随机变量Q1Q2(定量特征,每个都是正常的)和2个二进制随机变量Z1Z2(二进制特征),并且所有可能的对之间都具有给定的成对相关性其中. 说

(Q1,Q2):0.23 
(Q1,Z1):0.55 
(Q1,Z2):0.45 
(Q2,Z1):0.4 
(Q2,Z2):0.5 
(Z1,Z2):0.47 

请帮助我在R中生成此类数据.

解决方案

这很粗糙,但可能会使您朝正确的方向入手.

library(copula)

options(digits=3)
probs <- c(0.5,0.5)
corrs <- c(0.23,0.55,0.45,0.4,0.5,0.47)  ## lower triangle

模拟相关值(前两个为定量值,后两个转换为二进制值)

sim <- function(n,probs,corrs) {
    tmp <- normalCopula( corrs, dim=4 , "un")
    getSigma(tmp) ## test
    x <- rCopula(1000, tmp)
    x2 <- x
    x2[,3:4] <- qbinom(x[,3:4],size=1,prob=rep(probs,each=nrow(x)))
    x2
}

测试观察到的目标相关性之间的SSQ距离:

objfun <- function(corrs,targetcorrs,probs,n=1000) {
    cc <- try(cor(sim(n,probs,corrs)),silent=TRUE)
    if (is(cc,"try-error")) return(NA)
    sum((cc[lower.tri(cc)]-targetcorrs)^2)
}

查看当输入corrs = target时情况有多糟:

cc0 <- cor(sim(1000,probs=probs,corrs=corrs))
cc0[lower.tri(cc0)]
corrs
objfun(corrs,corrs,probs=probs) ## 0.112

现在尝试优化.

opt1 <- optim(fn=objfun,
              par=corrs,
              targetcorrs=corrs,probs=c(0.5,0.5))
opt1$value     ## 0.0208

在501次迭代后停止,超过最大迭代次数".这永远都不会很好用,因为我们试图对随机目标函数使用确定性爬山算法...

cc1 <- cor(sim(1000,probs=c(0.5,0.5),corrs=opt1$par))
cc1[lower.tri(cc1)]
corrs

也许尝试模拟退火?

opt2 <- optim(fn=objfun,
              par=corrs,
              targetcorrs=corrs,probs=c(0.5,0.5),
              method="SANN")

它似乎并没有比以前的值做得好得多.两个可能的问题(留给读者练习)(1)我们指定了一组与我们选择的边际分布不可行的相关性,或者(2)目标函数表面的误差进入了方式-为了做得更好,我们必须对更多重复项进行平均(即增加n).

I want to generate 2 continuous random variables Q1, Q2 (quantitative traits, each are normal) and 2 binary random variables Z1, Z2 (binary traits) with given pairwise correlations between all possible pairs of them. Say

(Q1,Q2):0.23 
(Q1,Z1):0.55 
(Q1,Z2):0.45 
(Q2,Z1):0.4 
(Q2,Z2):0.5 
(Z1,Z2):0.47 

Please help me generate such data in R.

解决方案

This is crude but might get you started in the right direction.

library(copula)

options(digits=3)
probs <- c(0.5,0.5)
corrs <- c(0.23,0.55,0.45,0.4,0.5,0.47)  ## lower triangle

Simulate correlated values (first two quantitative, last two transformed to binary)

sim <- function(n,probs,corrs) {
    tmp <- normalCopula( corrs, dim=4 , "un")
    getSigma(tmp) ## test
    x <- rCopula(1000, tmp)
    x2 <- x
    x2[,3:4] <- qbinom(x[,3:4],size=1,prob=rep(probs,each=nrow(x)))
    x2
}

Test SSQ distance between observed and target correlations:

objfun <- function(corrs,targetcorrs,probs,n=1000) {
    cc <- try(cor(sim(n,probs,corrs)),silent=TRUE)
    if (is(cc,"try-error")) return(NA)
    sum((cc[lower.tri(cc)]-targetcorrs)^2)
}

See how bad things are when input corrs=target:

cc0 <- cor(sim(1000,probs=probs,corrs=corrs))
cc0[lower.tri(cc0)]
corrs
objfun(corrs,corrs,probs=probs) ## 0.112

Now try to optimize.

opt1 <- optim(fn=objfun,
              par=corrs,
              targetcorrs=corrs,probs=c(0.5,0.5))
opt1$value     ## 0.0208

Stops after 501 iterations with "max iterations exceeded". This will never work really well because we're trying to use a deterministic hill-climbing algorithm on a stochastic objective function ...

cc1 <- cor(sim(1000,probs=c(0.5,0.5),corrs=opt1$par))
cc1[lower.tri(cc1)]
corrs

Maybe try simulated annealing?

opt2 <- optim(fn=objfun,
              par=corrs,
              targetcorrs=corrs,probs=c(0.5,0.5),
              method="SANN")

It doesn't seem to do much better than the previous value. Two possible problems (left as an exercise for the reader are) (1) we have specified a set of correlations that are not feasible with the marginal distributions we have chosen, or (2) the error in the objective function surface is getting in the way -- to do better we would have to average over more replicates (i.e. increase n).

这篇关于生成随机变量,它们之间具有给定的相关性:的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆