dplyr:引用变量名称的变异标准评估 [英] dplyr: standard evaluation for mutate with quoted variable names
问题描述
我如何使用 mutate
(我的推测是我正在寻找标准评估,因此 mutate _
,但我不完全相信这一点)当使用接受变量名称列表的函数,如:
createSum = function(data,variableNames){
data%>%
mutate_(sumvar = interp(〜sum(var,na.rm = TRUE )
var = as.name(paste(as.character(variableNames),collapse =,))))
}
这是一个将功能剥离到其核心逻辑的MWE,并展示了我要实现的目标:
库(dplyr)
库(lazyeval)
#函数使给定列名的随机表
makeTable = function (colNames,sampleSize){
liSample = lapply(colNames,function(week){
sample = rnorm(sampleSize)
})
名称(liSample)= as.char actor(colNames)
return(tbl_df(data.frame(liSample,check.names = FALSE)))
}
#创建一些需要列名称模式的示例数据
weekDates = seq.Date(from = as.Date(2014-01-01),
to = as.Date(2014-08-01),by =week)
dfTest = makeTable(weekDates,10)
#test mutate on this table
dfTest%>%
mutate_(sumvar = interp(〜sum na.rm = TRUE),
var = as.name(paste(as.character(weekDates),collapse =,))))
此处的预期输出将由以下内容返回:
rowSums( dfTest [,as.character(weekDates)])
我认为这是你之后的
createSum = function(data,variableNames){
data%> %
mutate_(sumvar = paste(as.character(variableNames),collapse =+))
}
createSum(dfTest,我们只提供一个字符值,而不是 interp $($)
< c $ c>,因为您不能将名称列表作为单个参数传递给函数。另外, sum()
会做一些不希望的崩溃,因为操作不是逐行执行,它们一次被传递到向量列。
此示例的另一个问题是您在data.frame中设置 check.names = FALSE
,这意味着您已经创建了无法列出的列名称是有效的符号。如果您喜欢
createSum(dfTest,paste0(`,weekDates, `))
但一般来说,最好不要使用无效的名称。 >
How would I go about using mutate
(my presumption is that I am looking for standard evaluation in my case, and hence mutate_
, but I am not entirely confident on this point) when using a function that accepts a list of variable names, such as this:
createSum = function(data, variableNames) {
data %>%
mutate_(sumvar = interp(~ sum(var, na.rm = TRUE),
var = as.name(paste(as.character(variableNames), collapse =","))))
}
Here is an MWE that strips the function to its core logic and demonstrates what I am trying to achieve:
library(dplyr)
library(lazyeval)
# function to make random table with given column names
makeTable = function(colNames, sampleSize) {
liSample = lapply(colNames, function(week) {
sample = rnorm(sampleSize)
})
names(liSample) = as.character(colNames)
return(tbl_df(data.frame(liSample, check.names = FALSE)))
}
# create some sample data with the column name patterns required
weekDates = seq.Date(from = as.Date("2014-01-01"),
to = as.Date("2014-08-01"), by = "week")
dfTest = makeTable(weekDates, 10)
# test mutate on this table
dfTest %>%
mutate_(sumvar = interp(~ sum(var, na.rm = TRUE),
var = as.name(paste(as.character(weekDates), collapse =","))))
Expected output here is what would be returned by:
rowSums(dfTest[, as.character(weekDates)])
I think this is what you're after
createSum = function(data, variableNames) {
data %>%
mutate_(sumvar = paste(as.character(variableNames), collapse ="+"))
}
createSum(dfTest, weekDates)
where we just supply a character value rather than interp
because you can't pass in a list of names as a single parameter to a function. Plus, sum()
would do some undesired collapsing because operations are not performed rowwise, they are passed in columns of vectors at a time.
The other problem with this example is that you set check.names=FALSE
in your data.frame which means that you've created column names that cannot be valid symbols. You can explicitly wrap your variable names in back-ticks if you like
createSum(dfTest , paste0("`", weekDates,"`"))
but in general it would be better not to use invalid names.
这篇关于dplyr:引用变量名称的变异标准评估的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!