dplyr条件汇总功能 [英] dplyr conditional summarise function
问题描述
在这种情况下,我需要根据条件使用其他汇总功能。
例如,使用鸢尾花,说由于某种原因,如果该物种是setosa,我想要花瓣宽度的总和,否则我想要花瓣宽度的平均值。
I have this situation where I need a different summary function based on a condition. For example, using iris, say for some reason I wanted the sum of the petal width if the species was setosa, otherwise I wanted the mean of the petal width.
天真地,我使用case_when编写了此代码,但不起作用:
Naively, I wrote this using case_when, which does not work:
iris <- tibble::as_tibble(iris)
iris %>%
group_by(Species) %>%
summarise(pwz = case_when(
Species == "setosa" ~ sum(Petal.Width, na.rm = TRUE),
TRUE ~ mean(Petal.Width, na.rm = TRUE)))
summarise_impl(.data,点)中的错误:
列 pwz
必须是长度1(一个汇总值),而不是50。
Error in summarise_impl(.data, dots) :
Column pwz
must be length 1 (a summary value), not 50
我最终找到了类似的东西,使用每种方法进行汇总,然后进行变异选择:
I eventually found something like this, summarizing using each method, and then in a mutate picking which one I actually wanted:
iris %>%
group_by(Species) %>%
summarise(pws = sum(Petal.Width, na.rm = TRUE),
pwm = mean(Petal.Width, na.rm = TRUE)) %>%
mutate(pwz = case_when(
Species == "setosa" ~ pws,
TRUE ~ pwm)) %>%
select(-pws, -pwm)
但是创建所有这些汇总值并仅在最后选择一个值似乎有点尴尬,尤其是当我的实际case_when复杂得多时。我不能在摘要中使用case_when吗?我的语法是否错误?任何帮助表示赞赏!
But that seems more than a bit awkward with creating all these summarized values and only picking one at the end, especially when my real case_when is a lot more complicated. Can I not use case_when inside of summarise? Do I have my syntax wrong? Any help is appreciated!
编辑:我想我应该指出我有多个条件/函数(只是假设我已经有了,具体取决于变量,其中一些需要均值和) ,最大值,最小值或其他摘要)。
I suppose I should have pointed out that I have multiple conditions/functions (just assume I've got, depending on the variable, some that need mean, sum, max, min, or other summary).
推荐答案
使用 data.table
library(data.table)
iris2 <- as.data.table(iris)
iris2[, if(Species == 'setosa') sum(Petal.Width)
else mean(Petal.Width)
, by = Species]
更简洁,但可能不太清楚
More concisely, but maybe not as clear
iris2[, ifelse(Species == 'setosa', sum, mean)(Petal.Width)
, by = Species]
使用 dplyr
您可以做到
iris %>%
group_by(Species) %>%
summarise(pwz = if_else(first(Species == "setosa")
, sum(Petal.Width)
, mean(Petal.Width)))
注意:
我在想使用 tidyr :: spread
传播数据可能更有意义,以便每天都有一个温度,降雨量等列。然后可以使用总结
。
I'm thinking it probably makes more sense to "spread" your data with tidyr::spread
so that each day has a column for temperature, rainfall, etc. Then you can use summarise
in the usual way.
这篇关于dplyr条件汇总功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!