基于包含列名称的变量从不同列中选择值 [英] Select values from different columns based on a variable containing column names
问题描述
我有一个data.table这样:
col1 col2 col3 new
1 4 55 col1
2 3 44 col2
3 34 35 col2
4 44 87 col3
我想填充另一个列 matched_value
,其中包含来自 new
列中给出的相应列名称的值: / p>
col1 col2 col3 new matched_value
1 4 55 col1 1
2 3 44 col2 3
3 34 35 col2 34
4 44 87 col3 87
例如, new
的值为col1,因此 matched_value
从 col1 $如何在一个非常大的data.table中有效地在R中有效?
.BY
: pre> DT [,newval:= .SD [[。BY [[1]]],by = new]
col1 col2 col3 new newval
1:1 4 55 col1 1
2:2 3 44 col2 3
3:3 34 35 col2 34
4:4 44 87 col3 87
如何运作。这样会根据 new
。每个组的字符串值存储在 newname = .BY [[1]]
中。我们使用此字符串通过 .SD [[newname]]
选择 .SD
的相应列。 .SD
代表 D ata的 S ub集。
替代方法。 get(.BY [[1]])
应该能够代替 [[.BY [[1]]]]
。根据@David运行的基准,这两种方式同样快。
I have a data.table like this:
col1 col2 col3 new
1 4 55 col1
2 3 44 col2
3 34 35 col2
4 44 87 col3
I want to populate another column matched_value
that contains the values from the respective column names given in the new
column:
col1 col2 col3 new matched_value
1 4 55 col1 1
2 3 44 col2 3
3 34 35 col2 34
4 44 87 col3 87
E.g., in the first row, the value of new
is "col1" so matched_value
takes the value from col1
, which is 1.
How can I do this efficiently in R on a very large data.table?
An excuse to use the obscure .BY
:
DT[, newval := .SD[[.BY[[1]]]], by=new]
col1 col2 col3 new newval
1: 1 4 55 col1 1
2: 2 3 44 col2 3
3: 3 34 35 col2 34
4: 4 44 87 col3 87
How it works. This splits the data into groups based on the strings in new
. The value of the string for each group is stored in newname = .BY[[1]]
. We use this string to select the corresponding column of .SD
via .SD[[newname]]
. .SD
stands for Subset of Data.
Alternatives. get(.BY[[1]])
should work just as well in place of .SD[[.BY[[1]]]]
. According to a benchmark run by @David, the two ways are equally fast.
这篇关于基于包含列名称的变量从不同列中选择值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!