将 R 中的多个非排他虚拟变量汇总为一个变量 [英] summarising multiple non-exclusive dummy variables in R into one variable

查看:29
本文介绍了将 R 中的多个非排他虚拟变量汇总为一个变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我收到了一个包含多个虚拟变量和其他变量的数据集.基本上我想做的是使用来自rms的summary.formula创建汇总表.但是,我不知道如何从多个虚拟变量创建单个变量,而且它们并不相互排斥.这是可能吗.当然,我可以创建表格等,但是我不能使用 summary.formula 并且我希望 summary.formula 输出仅包含虚拟变量的各个级别.

I was sent a dataset with multiple dummy variables and other variables as well. Basically what I´d like to do is create summary table with summary.formula from rms. However, I do not know how to create a single variable from the multiple dummy variables and they are not mutually exclusive. Is this at all possible. Of course I could do it creating a table etc, but then I cannot use summary.formula and I´d like the summary.formula output to include just the individual levels of the dummy variables.

澄清:a &b 需要总结,但它们并不相互排斥.由于每行都记录了年龄,因此我需要总结一个 &b 成一个变量,以便在 summary.formula 中使用.我已经编辑了下面的代码,以便将 0 和 1 分别更改为 NA 或 a,b.

edit: to clarify: a & b need to be summarized, but they are not mutually exclusive. Since age is recorded for every row I need to summarize a & b into one variable for it to be used in summary.formula. I´ve edited the code below so that 0 and 1 are changed into NA or a,b respectively.

我希望 summary.formula 的输出是这样的:

I´d like the summary.formula output to be something like this:

h<-data.frame(a=sample(c("A",NA),100,replace=T),b=sample(c("B",NA),100,replace=T),age=rnorm(100,50,25),epo=sample(c("Y","N"),100,T))





library(rms)

summary.formula(epo~age####+summary variable of a & b######,method="reverse",data=h)



#-----------------
 Descriptive Statistics by epo

+---------+--------------------------+--------------------------+
|         |N                         |Y                         |
|         |(N=56)                    |(N=44)                    |
+---------+--------------------------+--------------------------+
|age      |31.53434/48.90788/67.69096|28.63689/43.93502/57.81834|
+---------+--------------------------+--------------------------+
|sab : A  |         25% (14)         |         16% ( 7)         |
+---------+--------------------------+--------------------------+
|   B     |         27% (15)         |         32% (14)         |
+---------+--------------------------+--------------------------+

推荐答案

使用 paste() 似乎可以接受.

Using paste() seems to work acceptably.

h$sab <- paste(h$a, h$b, sep="_")
summary.formula(epo~age+sab,method="reverse",data=h)
#-----------------
 Descriptive Statistics by epo

+---------+--------------------------+--------------------------+
|         |N                         |Y                         |
|         |(N=56)                    |(N=44)                    |
+---------+--------------------------+--------------------------+
|age      |31.53434/48.90788/67.69096|28.63689/43.93502/57.81834|
+---------+--------------------------+--------------------------+
|sab : 0_0|         25% (14)         |         16% ( 7)         |
+---------+--------------------------+--------------------------+
|    0_1  |         27% (15)         |         32% (14)         |
+---------+--------------------------+--------------------------+
|    1_0  |         25% (14)         |         34% (15)         |
+---------+--------------------------+--------------------------+
|    1_1  |         23% (13)         |         18% ( 8)         |
+---------+--------------------------+--------------------------+

另一个选项可能是交互():

Another option might be interaction():

summary.formula(epo~age+interaction(a,b),method="reverse",data=h)

如果您希望将逻辑或"应用于变量组合,请使用:

If instead you want a logical 'OR" applied to the combination of variables, then use:

h$a_or_b <- with(h, a|b)
summary.formula(epo ~ age+ h$a_or_b,method="reverse",data=h)

这篇关于将 R 中的多个非排他虚拟变量汇总为一个变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆