在函数中组合dplyr时出错 [英] Error when combining dplyr inside a function
问题描述
group.count < - c(101,99,4)
data< - data.frame(
by = rep(3:1,group.count),
y = rep(letters [1:3],group.count))
data%>%
group_by(by)%>%
总结(non.miss = sum(!is.na(y)))
这给了我正在寻找的结果。但是,当我尝试做一个函数:
res0 < - function(x1,x2){
output = data%>%
group_by(x2)%>%
summaryize(non.miss = sum(!is.na(x1)))
}
res0(y,by)
我收到错误(索引超出界限
)。
任何人都可以告诉我我失踪了什么?
感谢您的提前。
我建议将数据框的名称更改为df。
这基本上是你所做的:
df%>%
group_by(by)%>%
总结(non.miss = sum(!is.na(y)))
这产生:
#by non.miss
#1 1 4
#2 2 99
#3 3 101
但是要计算每组的观察次数,可以使用长度
,它给出了相同的答案:
df%>%
group_by(by)%>%
summaryize(non.miss = length(y))
#by non.miss
#1 1 4
#2 2 99
#3 3 101
或使用 tally
,这样做:
df%>%
group_by(by)%>%
tally
#by n
#1 1 4
#2 2 99
#3 3 101
现在,如果你真的想要一个函数,你可以这样做。输入将是数据帧。像这样:
res0< - function(df){
df%>%
group_by (by)%>%
tally
}
res0(df)
#由n
#1 1 4
#2 2 99
#3 3 101
这当然假定你的数据框将始终具有名为by的分组列。我意识到这些数据只是虚构的,但是避免命名列'可能是一个好主意,因为这是R中自己的功能 - 它可能会让读者的代码有点混乱。
I'm trying to figure out what I'm doing wrong here. Using the following training data I compute some frequencies using dplyr:
group.count <- c(101,99,4)
data <- data.frame(
by = rep(3:1,group.count),
y = rep(letters[1:3],group.count))
data %>%
group_by(by) %>%
summarise(non.miss = sum(!is.na(y)))
Which gives me the outcome I'm looking for. However, when I try to do it as a function:
res0 <- function(x1,x2) {
output = data %>%
group_by(x2) %>%
summarise(non.miss = sum(!is.na(x1)))
}
res0(y,by)
I get an error (index out of bounds
).
Can anybody tell me what I'm missing?
Thanks on advance.
I suggest changing the name of your dataframe to df.
This is basically what you have done:
df %>%
group_by(by) %>%
summarise(non.miss = sum(!is.na(y)))
which produces this:
# by non.miss
#1 1 4
#2 2 99
#3 3 101
but to count the number of observations per group, you could use length
, which gives the same answer:
df %>%
group_by(by) %>%
summarise(non.miss = length(y))
# by non.miss
#1 1 4
#2 2 99
#3 3 101
or, use tally
, which gives this:
df %>%
group_by(by) %>%
tally
# by n
#1 1 4
#2 2 99
#3 3 101
Now, you could put that if you really wanted into a function. The input would be the dataframe. Like this:
res0 <- function(df) {
df %>%
group_by(by) %>%
tally
}
res0(df)
# by n
#1 1 4
#2 2 99
#3 3 101
This of course assumes that your dataframe will always have the grouping column named 'by'. I realize that these data are just fictional, but avoiding naming columns 'by' might be a good idea because that is its own function in R - it may get a bit confusing reading the code with it in.
这篇关于在函数中组合dplyr时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!