在 R data.table 中创建虚拟变量 [英] Creating dummy variables in R data.table

查看:16
本文介绍了在 R data.table 中创建虚拟变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 R 中的一个非常大的数据集,并且一直在使用数据框进行操作,并决定切换到 data.tables 以帮助加快操作速度.我无法理解 J 操作,特别是我正在尝试生成虚拟变量,但我不知道如何在 data.tables[] 中编写条件操作.

I am working with an extremely large dataset in R and have been operating with data frames and have decided to switch to data.tables to help speed up with operations. I am having trouble understanding the J operations, in particular I'm trying to generate dummy variables but I can't figure out how to code conditional operations within data.tables[].

MWE:

test <- data.table("index"=rep(letters[1:10],100),"var1"=rnorm(1000,0,1))

我想做的是将列 aj 添加为虚拟变量,以便列 a 将具有值 1index == "a"0 否则.在 data.frame 环境中,它看起来像:

What I would like to do is to add columns a through j as dummy variables such that column a would have a value 1 when the index == "a" and 0 otherwise. In the data.frame environment it would look something like:

test$a <- 0

test$a[test$index=='a'] <- 1

推荐答案

这似乎符合您的要求:

inds <- unique(test$index)
test[, (inds) := lapply(inds, function(x) index == x)]

给了

      index        var1     a     b     c     d     e     f     g     h     i     j
   1:     a  0.25331851  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
   2:     b -0.02854676 FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
   3:     c -0.04287046 FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
   4:     d  1.36860228 FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
   5:     e -0.22577099 FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
  ---                                                                              
 996:     f -1.02040059 FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
 997:     g -1.31345092 FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
 998:     h -0.49448088 FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
 999:     i  1.75175715 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
1000:     j  0.05576477 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

这是另一种方式:

dcast(test, index + var1 ~ index, fun = length)
# or, if you want to preserve row order
dcast(test[, r := .I], r + index + var1 ~ index, fun = length)[, r := NULL]

还有一个:

rs = split(seq(nrow(test)), test$index)
test[, names(rs) := FALSE ]
for (n in names(rs)) set(test, i = rs[[n]], j = n, v = TRUE )

这篇关于在 R data.table 中创建虚拟变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆