在R data.table中创建虚拟变量 [英] Creating dummy variables in R data.table
问题描述
我在R中使用了一个非常大的数据集,并且已经使用数据框进行操作,并决定切换到data.tables以帮助加快操作速度。我无法理解J操作,特别是我试图生成哑变量,但我不知道如何编写data.tables []中的条件操作。
I am working with an extremely large dataset in R and have been operating with data frames and have decided to switch to data.tables to help speed up with operations. I am having trouble understanding the J operations, in particular I'm trying to generate dummy variables but I can't figure out how to code conditional operations within data.tables[].
MWE:
test <- data.table("index"=rep(letters[1:10],100),"var1"=rnorm(1000,0,1))
do是通过添加列
到
j
作为虚拟变量,使得列 a $当
index ==a
和时,c $ c>将具有值
。在data.frame环境中它看起来像: 1
0
What I would like to do is to add columns a
through j
as dummy variables such that column a
would have a value 1
when the index == "a"
and 0
otherwise. In the data.frame environment it would look something like:
test$a <- 0
test$a[test$index=='a'] <- 1
推荐答案
这似乎正在寻找:
inds <- unique(test$index)
test[, (inds) := lapply(inds, function(x) index == x)]
它提供
index var1 a b c d e f g h i j
1: a 0.25331851 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
2: b -0.02854676 FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
3: c -0.04287046 FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
4: d 1.36860228 FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
5: e -0.22577099 FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
---
996: f -1.02040059 FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
997: g -1.31345092 FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
998: h -0.49448088 FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
999: i 1.75175715 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
1000: j 0.05576477 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
这是另一种方式:
dcast(test, index + var1 ~ index, fun = length)
# or, if you want to preserve row order
dcast(test[, r := .I], r + index + var1 ~ index, fun = length)[, r := NULL]
另一个:
rs = split(seq(nrow(test)), test$index)
test[, names(rs) := FALSE ]
for (n in names(rs)) set(test, i = rs[[n]], j = n, v = TRUE )
这篇关于在R data.table中创建虚拟变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!