data.table:当列名未知时,正确的方法来创建一个条件变量? [英] data.table: Proper way to do create a conditional variable when column names are not known?
问题描述
我的问题涉及到创建一个变量,该变量取决于data.table中的其他列,当没有提前知道任何变量名。
下面是一个玩具示例,其中我有5行,当条件等于A和4 elsewise时,新变量应为1。
library(data.table)
pre>
DT< - data.table(Con = c(A A,B,A,B),
Eval_A = rep(1,5),
Eval_B = rep(4,5))
Col1 < - Con
Col2< - Eval_A
Col3< - Eval_B
Col4< - Ans
下面的代码工作,但感觉就像我误用了包!
DT [,Col4:= ifelse(DT [[Col1]] ==A,
DT [ [Col2]],
DT [[Col3]]),with = FALSE]
更新:
谢谢,我对下面的答案做了一些快速的定时。一旦在一个data.table有500万行,只有相关的列,并再次添加10个非相关列后,下面的结果:+ ------------------------- + -------------------- - + ------------------ +
|方法|仅相关cols。 |有额外的cols。 |
+ ------------------------- + ------------------- - + ------------------ +
|列表方法| 1.8 | 1.91 |
| Grothendieck - get / if | 26.79 | 30.04 |
| Grothendieck - get / join | 0.48 | 1.56 |
| Grothendieck - .SDCols | 0.38 | 0.79 |
| agstudy - 替代| 2.03 | 1.9 |
+ ------------------------- + ------------------- - + ------------------ +
看起来像.SDCols是最好的速度和使用替代容易阅读的代码。
解决方案1。 get / if
尝试使用get
:DT [,(Col4):= if(get(Col1)==A)get(Col2)else get(Col3),by = 1:nrow(DT)]
2。 get / join
或尝试此方法:setkeyv(DT,Col1)
DT [,(Col4):= get(Col3)] [A,(Col4):= get(Col2)]
3。 .SDCols
或此:setkeyv(DT,Col1)
DT [ (Col4):= .SD,.SDcols = Col3] [A,(Col4):= .SD,.SDcols = Col2]
更新:添加了一些其他方法。
My question relates to the creation of a variable which depends upon other columns within a data.table when none of the variable names are known in advance.
Below is a toy example where I have 5 rows and the new variable should be 1 when the condition is equal to A and 4 elsewise.
library(data.table) DT <- data.table(Con = c("A","A","B","A","B"), Eval_A = rep(1,5), Eval_B = rep(4,5)) Col1 <- "Con" Col2 <- "Eval_A" Col3 <- "Eval_B" Col4 <- "Ans"
The code below works but feels like I'm misusing the package!
DT[,Col4:=ifelse(DT[[Col1]]=="A", DT[[Col2]], DT[[Col3]]),with=FALSE]
Update: Thanks, I did some quick timing of the answers below. Once on a data.table with 5 million rows and only the relevant columns and again after adding 10 non relevant columns, below are the results:
+-------------------------+---------------------+------------------+ | Method | Only relevant cols. | With extra cols. | +-------------------------+---------------------+------------------+ | List method | 1.8 | 1.91 | | Grothendieck - get/if | 26.79 | 30.04 | | Grothendieck - get/join | 0.48 | 1.56 | | Grothendieck - .SDCols | 0.38 | 0.79 | | agstudy - Substitute | 2.03 | 1.9 | +-------------------------+---------------------+------------------+
Look's like .SDCols is best for speed and using substitute for easy to read code.
解决方案1. get/if Try using
get
:DT[, (Col4) := if (get(Col1) == "A") get(Col2) else get(Col3), by = 1:nrow(DT)]
2. get/join or try this approach:
setkeyv(DT, Col1) DT[, (Col4):=get(Col3)]["A", (Col4):=get(Col2)]
3. .SDCols or this:
setkeyv(DT, Col1) DT[, (Col4):=.SD, .SDcols = Col3]["A", (Col4):=.SD, .SDcols = Col2]
UPDATE: Added some additional approaches.
这篇关于data.table:当列名未知时,正确的方法来创建一个条件变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!