从互斥虚拟变量创建分类变量 [英] Creating categorical variables from mutually exclusive dummy variables

查看:143
本文介绍了从互斥虚拟变量创建分类变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是关于先前回答的有关组合多个问题的详细说明虚拟变量合并为单个分类变量

My question regards an elaboration on a previously answered question about combining multiple dummy variables into a single categorical variable.

在先前提出的问题中,分类变量是根据不互斥的伪变量创建的。就我而言,我的虚拟变量是互斥的,因为它们表示2X2主体间析因设计中的交叉实验条件(该主体中也有一个主体内组件,我不在此讨论),因此我认为互动完成了我需要做的。

In the question previously asked, the categorical variable was created from dummy variables that were NOT mutually exclusive. For my case, my dummy variables are mutually exclusive because they represent crossed experimental conditions in a 2X2 between-subjects factorial design (that also has a within subjects component which I'm not addressing here), so I don't think interaction does what I need to do.

例如,我的数据可能如下所示:

For example, my data might look like this:

id   conditionA    conditionB    conditionC     conditionD
1    NA            1             NA             NA
2    1             NA            NA             NA
3    NA            NA            1              NA
4    NA            NA            NA             1
5    NA            2             NA             NA
6    2             NA            NA             NA
7    NA            NA            2              NA
8    NA            NA            NA             2

我现在想创建结合不同类型条件的分类变量。例如,可以为具有条件A和B的值的人编码一个类别变量,而为具有条件C和D的值的人编码。

I'd like to now make categorical variables that combine ACROSS different types of conditions. For example, people who had values for condition A and B might be coded with one categorical variable, and people who had values for condition C and D.

id   conditionA    conditionB    conditionC     conditionD  factor1    factor2
1    NA            1             NA             NA          1          NA
2    1             NA            NA             NA          1          NA
3    NA            NA            1              NA          NA         1
4    NA            NA            NA             1           NA         1
5    NA            2             NA             NA          2          NA
6    2             NA            NA             NA          2          NA
7    NA            NA            2              NA          NA         2
8    NA            NA            NA             2           NA         2

现在,我正在使用 ifelse()语句,这很si mply真是一团糟(而且并不总是有效)。请帮忙!

Right now, I'm doing this using ifelse() statements, which quite simply is a hot mess (and doesn't always work). Please help! There's probably some super-obvious "easier way."

编辑:

<$ c $的种类我正在使用的c> ifelse 命令如下:

attach(df)
df$factor<-ifelse(conditionA==1 | conditionB==1, 1, NA)
df$factor<-ifelse(conditionA==2 | conditionB==2, 2, df$factor)

实际上,我每次都将6-8列合并,因此,更优雅的解决方案将有所帮助

In reality, I'm combining across 6-8 columns each time, so a more elegant solution would help a lot.

推荐答案


更新(2019):请使用 dplyr :: coalesce() ,其工作原理几乎相同。

Update (2019): Please use dplyr::coalesce(), it works pretty much the same.

我的 R package 具有便利功能,可以为向量列表中的每个元素选择第一个非 NA 值:

My R package has a convenience function that allows to choose the first non-NA value for each element in a list of vectors:

#library(devtools)
#install_github('kimisc', 'muelleki')
library(kimisc)

df$factor1 <- with(df, coalesce.na(conditionA, conditionB))

(我不知道如果 conditionA是否可行 conditionB 是因素。在必要时使用 as.numeric(as.character(...))将它们转换为数字。)

(I'm not sure if this works if conditionA and conditionB are factors. Convert them to numerics before using as.numeric(as.character(...)) if necessary.)

否则,您可以尝试交互,并结合对结果因子的水平进行重新编码-但对我来说,您似乎对第一种解决方案更感兴趣:

Otherwise, you could give interaction a try, combined with recoding of the levels of the resulting factor -- but to me it looks like you're more interested in the first solution:

df$conditionAB <- with(df, interaction(coalesce.na(conditionA, 0), 
                                       coalesce.na(conditionB, 0)))
levels(df$conditionAB) <- c('A', 'B')

这篇关于从互斥虚拟变量创建分类变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆