在函数中使用dplyr,使用函数参数分组错误 [英] Using dplyr within a function, Grouping Error with function arguments

查看:97
本文介绍了在函数中使用dplyr,使用函数参数分组错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面我有一个工作示例,我想要的功能,然后脚本的功能,注意错误发生在哪里。



错误消息是:

 错误:索引出边界

我知道通常意味着R无法找到正在调用的变量。



有趣的是,在我下面的函数示例中,如果我只通过我的 subgroup_name (传递给函数并成为一个列新创建的数据框)函数将成功地重新组合该变量,但是我也想通过一个新创建的列(从熔体)中调用变量进行分组。



使用 regroup()为我工作,但已被弃用。我试图使用 group_by _()但无效。



我已经阅读了很多其他的帖子和答案,

 #初始化示例数据集
数据库< - ggplot2 :: diamonds
数据库$ diamond< - row.names(diamonds)#需要熔化

subgroup_name< - cut#可以替换为color或clarity
subgroup_column < - 2#可以替换为3的颜色,4为了清晰度

#尽管最好不需要单独的变量subgroup_name和subgroup_column number

df< - database%>%
select(diamond,subgroup_column,x,y,z)%>%
melt(id.vars = c(diamond,subgroup_name))%> ;%
group_by(cut,variable)%>%
summarize(value = round(mean(value,na.rm = TRUE),2))

#不工作,我期望与以上相同的输出

subgroup_analysis< - function(d atabase,...){

df < - database%>%
select(diamond,subgroup_column,x,y,z)%>%
melt id.vars = c(diamond,subgroup_name))%>%
group_by_(subgroup_name,variable)%>%#问题似乎与找到变量
summarize(value = round (mean(value,na.rm = TRUE),2))
print(df)
}

subgroup_analysis(database,subgroup_column,subgroup_name)


解决方案

从NSE 小插曲


如果你还要输出变量来改变,你需要将
的引用对象传递给.dots参数:


这里,变量应该被引用:

  subgroup_analysis<  -  function (数据库,...){

df< - 数据库%>%
select(diamond,subgrou p_column,x,y,z)%>%
melt(id.vars = c(diamond,subgroup_name))%>%
group_by_(subgroup_name,quote(variable))%> ;%
summaryize(value = round(mean(value,na.rm = TRUE),2))
print(df)
}

subgroup_analysis ,subgroup_column,subgroup_name)

如果@RichardScriven提到,如果您计划将结果分配给新的变量,那么你可能想在最后删除打印调用,只需写入 df ,甚至不分配 df 在函数中



否则,即使您执行 x < - subgroup_analysis(...)


Below I have a working example of what I would like the function to do, and then script for the function, noting where the Error occurs.

The error message is:

Error: index out of bounds

Which I know usually means R can’t find the variable that’s being called.

Interestingly, in my function example below, if I only group by my subgroup_name (which is passed to the function and becomes a column in the newly created dataframe) the function will successfully regroup that variable, but I also want to group by a newly created column (from the melt) called variable.

Similar code used to work for me using regroup(), but that has been deprecated. I am trying to use group_by_() but to no avail.

I have read many other posts and answers and experimented several hours today but still not successful.

# Initialize example dataset
database <- ggplot2::diamonds
database$diamond <- row.names(diamonds) # needed for melting 

subgroup_name <- "cut" # can replace with  "color" or "clarity"
subgroup_column <- 2 # can replace with 3 for color, 4 for clarity

# This works, although it would be preferable not to need separate variables for subgroup_name and subgroup_column number

df <- database %>% 
  select(diamond, subgroup_column, x,y,z) %>% 
  melt(id.vars=c("diamond", subgroup_name)) %>% 
  group_by(cut, variable) %>% 
  summarise(value = round(mean(value, na.rm = TRUE),2))

# This does not work, I am expecting the same output as above

subgroup_analysis <- function(database,...){

  df <- database %>% 
    select(diamond, subgroup_column, x,y,z) %>% 
    melt(id.vars=c("diamond", subgroup_name)) %>% 
    group_by_(subgroup_name, variable) %>% # problem appears to be with finding "variable"
    summarise(value = round(mean(value, na.rm = TRUE),2))
    print(df)
}

subgroup_analysis(database, subgroup_column, subgroup_name)  

解决方案

From the NSE vignette:

If you also want to output variables to vary, you need to pass a list of quoted objects to the .dots argument:

Here, variable should be quoted:

subgroup_analysis <- function(database,...){

  df <- database %>% 
    select(diamond, subgroup_column, x,y,z) %>% 
    melt(id.vars=c("diamond", subgroup_name)) %>% 
    group_by_(subgroup_name, quote(variable)) %>% 
    summarise(value = round(mean(value, na.rm = TRUE),2))
  print(df)
}

subgroup_analysis(database, subgroup_column, subgroup_name) 

As mentionned by @RichardScriven, if you plan to assign the result to a new variable, then you may want to remove the print call at the end and just write df, or not even assign df at all in the function

Otherwise the result prints even when you do x <- subgroup_analysis(...)

这篇关于在函数中使用dplyr,使用函数参数分组错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆