dplyr条件汇总功能 [英] dplyr conditional summarise function

查看:66
本文介绍了dplyr条件汇总功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这种情况下,我需要根据条件使用其他汇总功能
例如,使用鸢尾花,说由于某种原因,如果该物种是setosa,我想要花瓣宽度的总和,否则我想要花瓣宽度的平均值。

I have this situation where I need a different summary function based on a condition. For example, using iris, say for some reason I wanted the sum of the petal width if the species was setosa, otherwise I wanted the mean of the petal width.

天真地,我使用case_when编写了此代码,但不起作用:

Naively, I wrote this using case_when, which does not work:

iris <- tibble::as_tibble(iris)

 iris %>% 
  group_by(Species) %>% 
  summarise(pwz = case_when(
    Species == "setosa" ~ sum(Petal.Width, na.rm = TRUE),
    TRUE                ~ mean(Petal.Width, na.rm = TRUE)))

summarise_impl(.data,点)中的错误:
pwz 必须是长度1(一个汇总值),而不是50。

Error in summarise_impl(.data, dots) : Column pwz must be length 1 (a summary value), not 50

我最终找到了类似的东西,使用每种方法进行汇总,然后进行变异选择:

I eventually found something like this, summarizing using each method, and then in a mutate picking which one I actually wanted:

iris %>% 
  group_by(Species) %>% 
  summarise(pws = sum(Petal.Width, na.rm = TRUE),
            pwm = mean(Petal.Width, na.rm = TRUE)) %>% 
  mutate(pwz = case_when(
    Species == "setosa" ~ pws,
    TRUE                ~ pwm)) %>% 
  select(-pws, -pwm)

但是创建所有这些汇总值并仅在最后选择一个值似乎有点尴尬,尤其是当我的实际case_when复杂得多时。我不能在摘要中使用case_when吗?我的语法是否错误?任何帮助表示赞赏!

But that seems more than a bit awkward with creating all these summarized values and only picking one at the end, especially when my real case_when is a lot more complicated. Can I not use case_when inside of summarise? Do I have my syntax wrong? Any help is appreciated!

编辑:我想我应该指出我有多个条件/函数(只是假设我已经有了,具体取决于变量,其中一些需要均值和) ,最大值,最小值或其他摘要)。

I suppose I should have pointed out that I have multiple conditions/functions (just assume I've got, depending on the variable, some that need mean, sum, max, min, or other summary).

推荐答案

使用 data.table

library(data.table)
iris2 <- as.data.table(iris)

iris2[, if(Species == 'setosa') sum(Petal.Width) 
        else mean(Petal.Width)
      , by = Species]

更简洁,但可能不太清楚

More concisely, but maybe not as clear

iris2[, ifelse(Species == 'setosa', sum, mean)(Petal.Width)
      , by = Species]

使用 dplyr 您可以做到

iris %>% 
  group_by(Species) %>% 
  summarise(pwz = if_else(first(Species == "setosa")
                          , sum(Petal.Width)
                          , mean(Petal.Width)))

注意:

我在想使用 tidyr :: spread 传播数据可能更有意义,以便每天都有一个温度,降雨量等列。然后可以使用总结

I'm thinking it probably makes more sense to "spread" your data with tidyr::spread so that each day has a column for temperature, rainfall, etc. Then you can use summarise in the usual way.

这篇关于dplyr条件汇总功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆