dplyr:在group_by()之后在summary()中使用自定义函数 [英] dplyr: Use a custom function in summarize() after group_by()

查看:165
本文介绍了dplyr:在group_by()之后在summary()中使用自定义函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

group_by()之后如何使用自定义函数?
我检查了类似的帖子( 1 2 3 ),但我当前的代码对所有组均返回相同的值。

How can we use a custom function after group_by()? I checked similar posts (1, 2, and 3), but my current code returns the same values for all groups.

> data
   village     A     Z      Y 
     <chr> <int> <int>   <dbl> 
 1       a     1     1   500     
 2       a     1     1   400     
 3       a     1     0   800  
 4       b     1     0   300  
 5       b     1     1   700  

z <- 1
data %>%
    group_by(village) %>%
    summarize(Y_village = Y_hat_village(., z))

Y_hat_village <- function(data_village, z){
    # Calculate the mean for a specific z in a village
    data_z <- data_village %>% filter(Z==get("z"))
    return(mean(data_z$Y))
}

I希望对村庄 a有(500 + 400)/ 2 = 450,对村庄 b有700。

I want to have (500 + 400)/2 = 450 for village "a" and 700 for village "b".

推荐答案

如果您从不带任何附加功能的情况下开始编写,则更容易理解。在那种情况下,应该是:

It's easier to understand if you start by writing it without an extra function. In that case it would be:

df %>%
  group_by(village) %>%
  summarize(Y_village = mean(Y[Z == z]))

## A tibble: 2 x 2
#  village Y_village
#  <fct>       <dbl>
#1 a            450.
#2 b            700.

因此,您函数应该类似于

Hence, your function should be something like

Y_hat_village <- function(Ycol, Zcol, z){
  mean(Ycol[Zcol == z])
}

然后使用它:

df %>%
  group_by(village) %>%
  summarize(Y_village = Y_hat_village(Y, Z, z))

请注意,我编写的函数仅处理可从内部直接提供的原子向量摘要。您不需要在其中提供整个data.frame。

Note that the function I wrote only deals with atomic vectors which you can supply directly from within summarise. You don't need to supply the whole data.frame into it.

这篇关于dplyr:在group_by()之后在summary()中使用自定义函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆