dplyr:在group_by()之后在summary()中使用自定义函数 [英] dplyr: Use a custom function in summarize() after group_by()
问题描述
在 group_by()
之后如何使用自定义函数?
我检查了类似的帖子( 1 ,2 和 3 ),但我当前的代码对所有组均返回相同的值。
How can we use a custom function after group_by()
?
I checked similar posts (1, 2, and 3), but my current code returns the same values for all groups.
> data
village A Z Y
<chr> <int> <int> <dbl>
1 a 1 1 500
2 a 1 1 400
3 a 1 0 800
4 b 1 0 300
5 b 1 1 700
z <- 1
data %>%
group_by(village) %>%
summarize(Y_village = Y_hat_village(., z))
Y_hat_village <- function(data_village, z){
# Calculate the mean for a specific z in a village
data_z <- data_village %>% filter(Z==get("z"))
return(mean(data_z$Y))
}
I希望对村庄 a有(500 + 400)/ 2 = 450,对村庄 b有700。
I want to have (500 + 400)/2 = 450 for village "a" and 700 for village "b".
推荐答案
如果您从不带任何附加功能的情况下开始编写,则更容易理解。在那种情况下,应该是:
It's easier to understand if you start by writing it without an extra function. In that case it would be:
df %>%
group_by(village) %>%
summarize(Y_village = mean(Y[Z == z]))
## A tibble: 2 x 2
# village Y_village
# <fct> <dbl>
#1 a 450.
#2 b 700.
因此,您函数应该类似于
Hence, your function should be something like
Y_hat_village <- function(Ycol, Zcol, z){
mean(Ycol[Zcol == z])
}
然后使用它:
df %>%
group_by(village) %>%
summarize(Y_village = Y_hat_village(Y, Z, z))
请注意,我编写的函数仅处理可从内部直接提供的原子向量摘要
。您不需要在其中提供整个data.frame。
Note that the function I wrote only deals with atomic vectors which you can supply directly from within summarise
. You don't need to supply the whole data.frame into it.
这篇关于dplyr:在group_by()之后在summary()中使用自定义函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!