R:将每个因子级别的R因子扩展为虚拟列 [英] R: Expanding an R factor into dummy columns for every factor level

查看:105
本文介绍了R:将每个因子级别的R因子扩展为虚拟列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中有一个很大的数据框,有两列.我试图从Code列(具有858个级别的factor类型)中提取出虚拟变量.问题是,当我尝试这样做时,R Studio总是崩溃.

I have a quite big data frame in R with two columns. I am trying to make out of the Code column (factor type with 858 levels) the dummy variables. The problem is that the R Studio always crashed when I am trying to do that.

> str(d)
'data.frame':   649226 obs. of  2 variables:
 $ User: int  210 210 210 210 269 317 317 317 317 326 ...
 $ Code      : Factor w/ 858 levels "AA02","AA03",..: 164 494 538 626 464 496 435 464 475 163 ... 

User列不是唯一的,这意味着可以有多个具有相同User的行.最终行数保持不变还是具有相同User的行合并为具有多列非空列且计数为Code s的一行都没关系.

The User column is not unique, meaning that there can be several rows with the same User. Doesn't matter if in the end the amount of rows remains the same or the rows with the same User are merged into one row having several columns non-empty with the count of Codes.

我发现了一些适用于较小数据集但不适用于我的数据集的解决方案.

I found couple of solutions that work for a smaller dataset, but not for mine.

  • 尝试使用model.matrix,但R Studio崩溃了

  • Tried using model.matrix, but the R Studio just crashes

m <- model.matrix( ~ Code, data = d)

在此处找到自动将R因子扩展为每个因子级别的1/0指标变量的集合

尝试使用ifelse循环for,但是代码运行了4个小时,然后我注意到R Studio崩溃了.

Tried for cycle with ifelse, but the code run for 4 hours and then I noticed that the R Studio crashed.

for (t in unique(d$Code)) {
  d[paste("Code", t, sep = "")] <- ifelse(d$Code == t, 1, 0)
}

在此处找到根据分类变量创建新的虚拟变量列

如果您可以向我推荐一种快速且适用于此类数据的方法,那就太好了.

Would be great if you can recommend me some method which is fast and working for such type of data.

谢谢!

推荐答案

这非常适合我:

library(reshape2)
m <- acast(data = d, User ~ Code)

唯一的是它产生了NA s而不是0 s,但这可以通过以下方式轻松更改:

The only thing was that it produced NAs, instead of 0s, but this can be easily changed with this:

m[is.na(m)] <- 0

这篇关于R:将每个因子级别的R因子扩展为虚拟列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆