dplyr +“meta”列:当列包含要使用的其他列的名称而不是数据 [英] dplyr + "meta"-columns: when a column contains names of other columns to use instead of the data
问题描述
为了提供一个简单可重现的例子,请考虑以下data.frame:
df< - data.frame(a = 1:5,b = 2:6,c = 3:7,
ref = c(a,a,b,b,c),
stringsAsFactors = FALSE)
这里 a
, b
, c
是常规数字变量,而 ref
旨在引用哪个列是该观察值的主值。例如:
abc ref
1 1 2 3 a
2 2 3 4 a
3 3 4 5 b
4 4 5 6 b
5 5 6 7 c
例如,对于观察3, ref == b
,因此列 b
包含主值。在观察1时, ref == a
,因此列 a
包含主值。
有了这个data.frame,问题是使用dplyr为每个观察值创建一个 main
值的新列。
abc ref main
1 1 2 3 a 1
2 2 3 4 a 2
3 3 4 5 b 4
4 4 5 6 b 5
5 5 6 7 c 7
我可能需要使用dplyr,因为这个操作是更长的dplyr %>%
数据转换链的一部分。
这是一个简单而快速的方法,可以让您坚持使用 dplyr
链接:
require(data.table)
df%>%setDT%>%。[,main:= get(ref) by = ref]
#abc ref main
#1:1 2 3 a 1
#2:2 3 4 a 2
#3:3 4 5 b 4
#4:4 5 6 b 5
#5:5 6 7 c 7
感谢@akrun的最快的方式和基准测试来显示它(见他的答案)。
setDT
修改 df
的类,所以你不必在未来的链中再次转换为 data.table
。
转换应与链中的任何未来代码一起工作,但 dplyr
和 data.table
正在积极开发中,所以要安全地使用,可以使用
df%>%data.table%>%。[,main:= get(ref),by = ref]
I wonder if the following question has an elegant solution in dplyr.
To provide a simple reproducible example, consider the following data.frame:
df <- data.frame( a=1:5, b=2:6, c=3:7,
ref=c("a","a","b","b","c"),
stringsAsFactors = FALSE )
Here a
,b
,c
are regular numeric variables while ref
is meant to reference which column is the "main" value for that observation. For example:
a b c ref
1 1 2 3 a
2 2 3 4 a
3 3 4 5 b
4 4 5 6 b
5 5 6 7 c
For example, for observation 3, ref==b
and thus column b
contains the main value. While for observation 1, ref==a
and thus column a
contains the main value.
Having this data.frame the question is to create the new column with main
values for each observation using dplyr.
a b c ref main
1 1 2 3 a 1
2 2 3 4 a 2
3 3 4 5 b 4
4 4 5 6 b 5
5 5 6 7 c 7
I'll probably need to use dplyr for that since this one operation is a part of a longer dplyr %>%
data transformation chain.
Here's a simple, fast way that allows you to stick with dplyr
chaining:
require(data.table)
df %>% setDT %>% .[,main:=get(ref),by=ref]
# a b c ref main
# 1: 1 2 3 a 1
# 2: 2 3 4 a 2
# 3: 3 4 5 b 4
# 4: 4 5 6 b 5
# 5: 5 6 7 c 7
Thanks to @akrun for the idea for the fastest way and benchmarking to show it (see his answer).
setDT
modifies the class of df
so you won't have to convert to data.table
again in future chains.
The conversion should work with any future code in the chain, but both dplyr
and data.table
are under active development, so to be on the safe side, one could instead use
df %>% data.table %>% .[,main:=get(ref),by=ref]
这篇关于dplyr +“meta”列:当列包含要使用的其他列的名称而不是数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!