dplyr +“meta”列:当列包含要使用的其他列的名称而不是数据 [英] dplyr + "meta"-columns: when a column contains names of other columns to use instead of the data

查看:115
本文介绍了dplyr +“meta”列:当列包含要使用的其他列的名称而不是数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



为了提供一个简单可重现的例子,请考虑以下data.frame:

  df<  -  data.frame(a = 1:5,b = 2:6,c = 3:7,
ref = c(a,a,b,b,c),
stringsAsFactors = FALSE)

这里 a b c 是常规数字变量,而 ref 旨在引用哪个列是该观察值的主值。例如:

  abc ref 
1 1 2 3 a
2 2 3 4 a
3 3 4 5 b
4 4 5 6 b
5 5 6 7 c

例如,对于观察3, ref == b ,因此列 b 包含值。在观察1时, ref == a ,因此列 a 包含主值。



有了这个data.frame,问题是使用dplyr为每个观察值创建一个 main 值的新列。

  abc ref main 
1 1 2 3 a 1
2 2 3 4 a 2
3 3 4 5 b 4
4 4 5 6 b 5
5 5 6 7 c 7

我可能需要使用dplyr,因为这个操作是更长的dplyr %>%数据转换链的一部分。

解决方案

这是一个简单而快速的方法,可以让您坚持使用 dplyr 链接:

  require(data.table)
df%>%setDT%>%。[,main:= get(ref) by = ref]
#abc ref main
#1:1 2 3 a 1
#2:2 3 4 a 2
#3:3 4 5 b 4
#4:4 5 6 b 5
#5:5 6 7 c 7

感谢@akrun的最快的方式和基准测试来显示它(见他的答案)。



setDT 修改 df 的类,所以你不必在未来的链中再次转换为 data.table






转换应与链中的任何未来代码一起工作,但 dplyr data.table 正在积极开发中,所以要安全地使用,可以使用

  df%>%data.table%>%。[,main:= get(ref),by = ref] 


I wonder if the following question has an elegant solution in dplyr.

To provide a simple reproducible example, consider the following data.frame:

df <- data.frame( a=1:5, b=2:6, c=3:7,
                  ref=c("a","a","b","b","c"), 
                  stringsAsFactors = FALSE )

Here a,b,c are regular numeric variables while ref is meant to reference which column is the "main" value for that observation. For example:

  a b c ref
1 1 2 3   a
2 2 3 4   a
3 3 4 5   b
4 4 5 6   b
5 5 6 7   c

For example, for observation 3, ref==b and thus column b contains the main value. While for observation 1, ref==a and thus column a contains the main value.

Having this data.frame the question is to create the new column with main values for each observation using dplyr.

  a b c ref main
1 1 2 3   a    1
2 2 3 4   a    2
3 3 4 5   b    4
4 4 5 6   b    5
5 5 6 7   c    7

I'll probably need to use dplyr for that since this one operation is a part of a longer dplyr %>% data transformation chain.

解决方案

Here's a simple, fast way that allows you to stick with dplyr chaining:

require(data.table)
df %>% setDT %>% .[,main:=get(ref),by=ref]
#    a b c ref main
# 1: 1 2 3   a    1
# 2: 2 3 4   a    2
# 3: 3 4 5   b    4
# 4: 4 5 6   b    5
# 5: 5 6 7   c    7

Thanks to @akrun for the idea for the fastest way and benchmarking to show it (see his answer).

setDT modifies the class of df so you won't have to convert to data.table again in future chains.


The conversion should work with any future code in the chain, but both dplyr and data.table are under active development, so to be on the safe side, one could instead use

df %>% data.table %>% .[,main:=get(ref),by=ref]

这篇关于dplyr +“meta”列:当列包含要使用的其他列的名称而不是数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆