在函数中组合dplyr时出错 [英] Error when combining dplyr inside a function

查看:106
本文介绍了在函数中组合dplyr时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想弄清楚我在这里做错了什么使用以下培训数据,我使用dplyr计算一些频率:

  group.count < -  c(101,99,4) 
data< - data.frame(
by = rep(3:1,group.count),
y = rep(letters [1:3],group.count))

data%>%
group_by(by)%>%
总结(non.miss = sum(!is.na(y)))

这给了我正在寻找的结果。但是,当我尝试做一个函数:

  res0 < -  function(x1,x2){
output = data%>%
group_by(x2)%>%
summaryize(non.miss = sum(!is.na(x1)))
}

res0(y,by)

我收到错误(索引超出界限)。
任何人都可以告诉我我失踪了什么?

感谢您的提前。

解决方案

我建议将数据框的名称更改为df。



这基本上是你所做的:

  df%>%
group_by(by)%>%
总结(non.miss = sum(!is.na(y)))

这产生:

 #by non.miss 
#1 1 4
#2 2 99
#3 3 101

但是要计算每组的观察次数,可以使用长度,它给出了相同的答案:

  df%>%
group_by(by)%>%
summaryize(non.miss = length(y))


#by non.miss
#1 1 4
#2 2 99
#3 3 101

或使用 tally ,这样做:

  df%>%
group_by(by)%>%
tally

#by n
#1 1 4
#2 2 99
#3 3 101

现在,如果你真的想要一个函数,你可以这样做。输入将是数据帧。像这样:

  res0<  -  function(df){
df%>%
group_by (by)%>%
tally
}

res0(df)

#由n
#1 1 4
#2 2 99
#3 3 101

这当然假定你的数据框将始终具有名为by的分组列。我意识到这些数据只是虚构的,但是避免命名列'可能是一个好主意,因为这是R中自己的功能 - 它可能会让读者的代码有点混乱。


I'm trying to figure out what I'm doing wrong here. Using the following training data I compute some frequencies using dplyr:

group.count     <- c(101,99,4) 
data   <- data.frame(
    by = rep(3:1,group.count),
    y = rep(letters[1:3],group.count))

data %>%  
group_by(by) %>%
summarise(non.miss = sum(!is.na(y)))

Which gives me the outcome I'm looking for. However, when I try to do it as a function:

res0   <- function(x1,x2) {
output = data %>%  
    group_by(x2) %>%
    summarise(non.miss = sum(!is.na(x1)))
}

res0(y,by)

I get an error (index out of bounds). Can anybody tell me what I'm missing?
Thanks on advance.

解决方案

I suggest changing the name of your dataframe to df.

This is basically what you have done:

df %>%  
  group_by(by) %>%
  summarise(non.miss = sum(!is.na(y)))

which produces this:

#  by non.miss
#1  1        4
#2  2       99
#3  3      101

but to count the number of observations per group, you could use length, which gives the same answer:

df %>%  
  group_by(by) %>%
  summarise(non.miss = length(y))


#  by non.miss
#1  1        4
#2  2       99
#3  3      101

or, use tally, which gives this:

df %>%  
  group_by(by) %>%
  tally

#  by   n
#1  1   4
#2  2  99
#3  3 101

Now, you could put that if you really wanted into a function. The input would be the dataframe. Like this:

res0   <- function(df) {
df %>%  
    group_by(by) %>%
    tally 
}

res0(df)

#       by   n
#1       1   4
#2       2  99
#3       3 101

This of course assumes that your dataframe will always have the grouping column named 'by'. I realize that these data are just fictional, but avoiding naming columns 'by' might be a good idea because that is its own function in R - it may get a bit confusing reading the code with it in.

这篇关于在函数中组合dplyr时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆