dplyr:引用变量名称的变异标准评估 [英] dplyr: standard evaluation for mutate with quoted variable names

查看:147
本文介绍了dplyr:引用变量名称的变异标准评估的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何使用 mutate (我的推测是我正在寻找标准评估,因此 mutate _ ,但我不完全相信这一点)当使用接受变量名称列表的函数,如:

  createSum = function(data,variableNames){
data%>%
mutate_(sumvar = interp(〜sum(var,na.rm = TRUE )
var = as.name(paste(as.character(variableNames),collapse =,))))

}

这是一个将功能剥离到其核心逻辑的MWE,并展示了我要实现的目标:

 库(dplyr)
库(lazyeval)

#函数使给定列名的随机表
makeTable = function (colNames,sampleSize){
liSample = lapply(colNames,function(week){
sample = rnorm(sampleSize)
})
名称(liSample)= as.char actor(colNames)
return(tbl_df(data.frame(liSample,check.names = FALSE)))
}

#创建一些需要列名称模式的示例数据
weekDates = seq.Date(from = as.Date(2014-01-01),
to = as.Date(2014-08-01),by =week)
dfTest = makeTable(weekDates,10)

#test mutate on this table
dfTest%>%
mutate_(sumvar = interp(〜sum na.rm = TRUE),
var = as.name(paste(as.character(weekDates),collapse =,))))

此处的预期输出将由以下内容返回:

  rowSums( dfTest [,as.character(weekDates)])


解决方案

我认为这是你之后的

  createSum = function(data,variableNames){
data%> %
mutate_(sumvar = paste(as.character(variableNames),collapse =+))
}
createSum(dfTest,我们只提供一个字符值,而不是 interp



< c $ c>,因为您不能将名称列表作为单个参数传递给函数。另外, sum()会做一些不希望的崩溃,因为操作不是逐行执行,它们一次被传递到向量列。



此示例的另一个问题是您在data.frame中设置 check.names = FALSE ,这意味着您已经创建了无法列出的列名称是有效的符号。如果您喜欢

  createSum(dfTest,paste0(`,weekDates, `))

但一般来说,最好不要使用无效的名称。 >

How would I go about using mutate (my presumption is that I am looking for standard evaluation in my case, and hence mutate_, but I am not entirely confident on this point) when using a function that accepts a list of variable names, such as this:

createSum = function(data, variableNames) {
  data %>% 
    mutate_(sumvar = interp(~ sum(var, na.rm = TRUE), 
                            var = as.name(paste(as.character(variableNames), collapse =","))))

}

Here is an MWE that strips the function to its core logic and demonstrates what I am trying to achieve:

library(dplyr)
library(lazyeval)

# function to make random table with given column names
makeTable = function(colNames, sampleSize) {
  liSample = lapply(colNames, function(week) {
    sample = rnorm(sampleSize)
  })
  names(liSample) = as.character(colNames)
  return(tbl_df(data.frame(liSample, check.names = FALSE)))
}

# create some sample data with the column name patterns required
weekDates = seq.Date(from = as.Date("2014-01-01"),
                     to = as.Date("2014-08-01"), by = "week")
dfTest = makeTable(weekDates, 10)

# test mutate on this table
dfTest %>% 
  mutate_(sumvar = interp(~ sum(var, na.rm = TRUE), 
                          var = as.name(paste(as.character(weekDates), collapse =","))))

Expected output here is what would be returned by:

rowSums(dfTest[, as.character(weekDates)])

解决方案

I think this is what you're after

createSum = function(data, variableNames) {
    data %>% 
        mutate_(sumvar = paste(as.character(variableNames), collapse ="+"))
}
createSum(dfTest, weekDates)

where we just supply a character value rather than interp because you can't pass in a list of names as a single parameter to a function. Plus, sum() would do some undesired collapsing because operations are not performed rowwise, they are passed in columns of vectors at a time.

The other problem with this example is that you set check.names=FALSE in your data.frame which means that you've created column names that cannot be valid symbols. You can explicitly wrap your variable names in back-ticks if you like

createSum(dfTest , paste0("`", weekDates,"`"))

but in general it would be better not to use invalid names.

这篇关于dplyr:引用变量名称的变异标准评估的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆