在R中生成具有选定关联度的两个类别变量 [英] Generate two categorical variables with a chosen degree of association in R

查看:130
本文介绍了在R中生成具有选定关联度的两个类别变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用R生成两个类别变量(例如,眼睛颜色和头发颜色),在这里可以指定这两个变量的关联程度。对我而言,哪个颜色的眼睛颜色与哪个头发的颜色颜色相关并不重要,但是仅需要能够指定整体关联性(例如通过指定比值比)即可。另外,我知道有一些方法可以使用例如 mvtnorm 包对两个正态分布的连续变量执行此操作,因此我可以采用该路线,然后选择切点作为在事后对变量进行分类,但是如果可以避免的话,我不想那样做。任何帮助将不胜感激!

I'd like to use R to generate two categorical variables (such as eye color and hair color, for instance) where I can specify the degree to which these two variables are associated. It doesn't really matter to me which levels of eye color would be associated with which levels of hair color, but just being able to specify an overall association, such as by specifying the odds ratio, is a requirement. Also, I know there are ways to do this for two normally distributed continuous variables using, for example, the mvtnorm package, so I could take that route and then choose cut points to make the variables categorical after the fact, but I don't want to do it that way if I can avoid it. Any help would be greatly appreciated!

编辑:抱歉从一开始就不清楚,但是我真正要问的是,是否有人有功能知道在某些R包中可以在一两行中执行此操作。

apologies for not being clearer from the start, but what I'm really asking I suppose is whether or not there's a function anybody knows of in some R package that will do this in one or two lines.

推荐答案

如果您可以指定优势比(和您还需要指定基准赔率),只需将其转换为概率并使用 runif()

If you can specify the odds ratios (and you also need to specify the baseline odds), you just convert them to probabilities and use runif().

编辑(我误解了这个问题):看看 bindata包

如果您愿意,这是我写的一个函数,您可以在不使用软件包的情况下生成此类数据。它很笨重;它的目的是为了不言自明,而不是优雅或快速。

If you like, here is a function I wrote that you can use to generate such data without the package. It is rather clunky; it's intended to be self-explanatory rather than elegant or fast.

odds.to.probs <- function(odds){
  probs <- odds / (odds+1)
  return(probs)
}

get.correlated.binary.data <- function(N, odds.x.eq.0, odds.y.eq.0.x.eq.0, 
                                       odds.ratio){
  odds.y.eq.0.x.eq.1 <- odds.y.eq.0.x.eq.0*odds.ratio
  prob.x.eq.0        <- odds.to.probs(odds.x.eq.0)
  prob.y.eq.0.x.eq.0 <- odds.to.probs(odds.y.eq.0.x.eq.0)
  prob.y.eq.0.x.eq.1 <- odds.to.probs(odds.y.eq.0.x.eq.1)

  x <- ifelse(runif(N)<=prob.x.eq.0, 0, 1)
  y <- rep(NA, N)
  y <- ifelse(x==0, ifelse(runif(sum(x))<=prob.y.eq.0.x.eq.0,       0, 1), y)
  y <- ifelse(x==1, ifelse(runif( (N-sum(x)) )<=prob.y.eq.0.x.eq.1, 0, 1), y)

  dat <- data.frame(x=x, y=y)
  return(dat)
}

> set.seed(9)
> dat <- get.correlated.binary.data(30, 3, 1.5, -.03)
> table(dat)
   y
x    0  1
  0 10 13
  1  0  7

这篇关于在R中生成具有选定关联度的两个类别变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆