dplyr条件汇总功能 [英] dplyr conditional summarise function

查看：66 发布时间：2020/10/26 4:12:17 r dplyr

本文介绍了dplyr条件汇总功能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在这种情况下，我需要根据条件使用其他汇总功能。
例如，使用鸢尾花，说由于某种原因，如果该物种是setosa，我想要花瓣宽度的总和，否则我想要花瓣宽度的平均值。

I have this situation where I need a different summary function based on a condition. For example, using iris, say for some reason I wanted the sum of the petal width if the species was setosa, otherwise I wanted the mean of the petal width.

天真地，我使用case_when编写了此代码，但不起作用：

Naively, I wrote this using case_when, which does not work:

iris <- tibble::as_tibble(iris)

 iris %>% 
  group_by(Species) %>% 
  summarise(pwz = case_when(
    Species == "setosa" ~ sum(Petal.Width, na.rm = TRUE),
    TRUE                ~ mean(Petal.Width, na.rm = TRUE)))

summarise_impl（.data，点）中的错误：
列 pwz 必须是长度1（一个汇总值），而不是50。

Error in summarise_impl(.data, dots) : Column pwz must be length 1 (a summary value), not 50

我最终找到了类似的东西，使用每种方法进行汇总，然后进行变异选择：

I eventually found something like this, summarizing using each method, and then in a mutate picking which one I actually wanted:

iris %>% 
  group_by(Species) %>% 
  summarise(pws = sum(Petal.Width, na.rm = TRUE),
            pwm = mean(Petal.Width, na.rm = TRUE)) %>% 
  mutate(pwz = case_when(
    Species == "setosa" ~ pws,
    TRUE                ~ pwm)) %>% 
  select(-pws, -pwm)

但是创建所有这些汇总值并仅在最后选择一个值似乎有点尴尬，尤其是当我的实际case_when复杂得多时。我不能在摘要中使用case_when吗？我的语法是否错误？任何帮助表示赞赏！

But that seems more than a bit awkward with creating all these summarized values and only picking one at the end, especially when my real case_when is a lot more complicated. Can I not use case_when inside of summarise? Do I have my syntax wrong? Any help is appreciated!

编辑：我想我应该指出我有多个条件/函数（只是假设我已经有了，具体取决于变量，其中一些需要均值和），最大值，最小值或其他摘要）。

I suppose I should have pointed out that I have multiple conditions/functions (just assume I've got, depending on the variable, some that need mean, sum, max, min, or other summary).

推荐答案

使用 data.table

library(data.table)
iris2 <- as.data.table(iris)

iris2[, if(Species == 'setosa') sum(Petal.Width) 
        else mean(Petal.Width)
      , by = Species]

更简洁，但可能不太清楚

More concisely, but maybe not as clear

iris2[, ifelse(Species == 'setosa', sum, mean)(Petal.Width)
      , by = Species]

使用 dplyr 您可以做到

iris %>% 
  group_by(Species) %>% 
  summarise(pwz = if_else(first(Species == "setosa")
                          , sum(Petal.Width)
                          , mean(Petal.Width)))

注意：

我在想使用 tidyr :: spread 传播数据可能更有意义，以便每天都有一个温度，降雨量等列。然后可以使用总结。

I'm thinking it probably makes more sense to "spread" your data with tidyr::spread so that each day has a column for temperature, rainfall, etc. Then you can use summarise in the usual way.

这篇关于dplyr条件汇总功能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

dplyr条件汇总功能 [英] dplyr conditional summarise function

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

dplyr条件汇总功能 [英] dplyr conditional summarise function

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭