使用dplyr将列名作为参数传递给函数 [英] Passing column name as parameter to a function using dplyr
问题描述
我有一个如下数据框:
transid<-c(1,2,3,4,5,6,7,8)
accountid<-c(a,a,b,a,b,b,a,b)
month<-c(1,1,1,2,2,3,3,3)
amount<-c(10,20,30,40,50,60,70,80)
transactions<-data.frame(transid,accountid,month,amount)
我正在尝试使用dplyr软件包动词编写每个帐户id的每月总金额的函数.
I am trying to write function for total monthly amount for each accountid using dplyr package verbs.
my_sum<-function(df,col1,col2,col3){
df %>% group_by_(col1,col2) %>%summarise_(total_sum = sum(col3))
}
my_sum(transactions, "accountid","month","amount")
要获得如下结果:
accountid month total_sum
a 1 30
a 2 40
a 3 70
b 1 30
b 2 50
b 3 140
我收到如下错误:-sum(col3)中的错误:参数的'类型'(字符)无效.如何在摘要函数中将列名作为参数传递而没有引号?
I am getting error like:- Error in sum(col3) : invalid 'type' (character) of argument.How to pass column name as parameter without quote in summarise function?
推荐答案
我建议以下解决方案:
my_sum <- function(df, col_to_sum,...) {
col_to_sum <- enquo(col_to_sum)
group_by <- quos(...)
df %>%
group_by(!!!group_by) %>%
summarise(total_sum = sum(!!col_to_sum)) %>%
ungroup()
}
transactions %>% my_sum(amount, accountid, month)
结果
>> transactions %>% my_sum(amount, accountid, month)
# A tibble: 6 x 3
accountid month total_sum
<fctr> <dbl> <dbl>
1 a 1 30
2 a 2 40
3 a 3 70
4 b 1 30
5 b 2 50
6 b 3 140
数据
在您最初的答案中,您传递了未加注释的字符串,我已经解决了使用 Hmisc:Cs
函数,但是原则上,您应该将字符串用"
括起来;除非您当然要调用某些名为 a
, b
等的对象.从最初的问题还不清楚.
Data
In you original answer you have passed unqoted strings, I've solved that using Hmisc:Cs
function but, on principle, you should surround your strings with ""
; unless, of course, you are calling some objects named a
, b
and so forth. It wasn't clear from the original question.
使用的数据:
transid <- c(1, 2, 3, 4, 5, 6, 7, 8)
accountid <- Hmisc::Cs(a, a, b, a, b, b, a, b)
month <- c(1, 1, 1, 2, 2, 3, 3, 3)
amount <- c(10, 20, 30, 40, 50, 60, 70, 80)
transactions <- data.frame(transid, accountid, month, amount)
注释
-
如果您查看捕获多个变量部分> 使用
dplyr
进行编程,您将看到使用Notes
If you look at the Capturing multiple variables section of the Programming with
dplyr
article you will see that very similar problem is solved with use ofquos()
function. In effect, your task is a perfect example how thequos()
function should be used.省略号
...
然后应该放在结尾,因为假设该函数将用于对多列数据进行分组.自然地,如果需要,您可以在每一列中依次传递enquo()
列,依此类推,但是使用...
更自然并与上面链接的文章中讨论的推荐解决方案保持一致.请注意,这种方法会更改函数调用中参数的顺序,因为...
应该在末尾出现.The ellipsis
...
should then come at the end as the assumption is that the function will be used to group data with multiple column. Naturally, if desired you you could pass columns one bye oneenquo()
every single column and so forth but using...
is more natural and consistent with the recommended solution discussed in the article linked above. Please note that this approach changes the order of arguments in your function call as...
should come at the end.如果您使用的是
summarise()
,则 不必ungroup()
您的数据与我的示例相同.例如代码:If you are using
summarise()
you don't have toungroup()
your data as in my example. For instance the code:mtcars %>% group_by(am) %>% summarise(mean_disp = mean(disp)) %>% mutate(am = am + 1)
将起作用;而代码:
mtcars %>% group_by(am) %>% mutate(am = am + 1)
将返回预期的错误:
mutate_impl(.data,点)中的错误:无法修改列
am
因为它是分组变量如果要对原始数据进行
mutate()
或进行其他操作以保持分组变量完整,则应使用ungroup()
.传递分组变量以后可能会证明是有问题的,它会说这主要是您的dplyr
工作流程中的品味/顺序问题.如果您和其他函数用户要记住该小标题可能带有分组变量,那么就没有问题了.就我个人而言,我往往会忘记这一点,因此,如果我对携带分组变量不感兴趣,我更倾向于ungroup()
数据.You should use
ungroup()
if you are going tomutate()
your original data or do other operations that keep your grouping variable intact. passing grouped variable may later prove problematic, it would say it's mostly a matter of taste/order in yourdplyr
workflow. If you and other function users are going to remember that the tibble may be carrying grouping variable then there is no issue; personally, I tend to forget about that so my preference is toungroup()
the data if I'm not interested in carrying grouping variable.这篇关于使用dplyr将列名作为参数传递给函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!