使用 dplyr group_by 时将汇总条件应用于一系列列? [英] Apply a summarise condition to a range of columns when using dplyr group_by?

查看：16 发布时间：2021/12/23 12:54:34 r group-by dplyr

本文介绍了使用 dplyr group_by 时将汇总条件应用于一系列列?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我们想要group_by()和summarise一个包含非常多列的海量数据框，但是有一些大的连续列组将具有相同的summarise 条件(例如max、mean 等)

有没有办法避免为每一列指定summarise条件，而是为列范围指定?

示例

假设我们想这样做:

iris %>%group_by(物种)%>%总结(最大(Sepal.Length)，平均值(Sepal.Width)，平均值(Petal.Length)，平均值(Petal.Width))

但请注意，连续 3 列具有相同的 summarise 条件，mean(Sepal.Width), mean(Petal.Length), mean(Petal.Width)

有没有办法使用诸如 mean(Sepal.Width:Petal.Width) 之类的方法来指定列范围的条件，从而避免输入汇总条件中间的所有列多次)

注意

上面的 iris 示例是一个小型且易于管理的示例，其范围为 3 个连续列，但实际用例有大约数百个.

解决方案

即将发布的版本 1.0.0 将具有 across() 功能可以满足您的需求

<块引用>

基本用法

<块引用>

across() 有两个主要参数:

<块引用>

第一个参数 .cols 选择要操作的列.它使用整洁的选择(如 select())，因此您可以通过以下方式选择变量位置、名称和类型.

<块引用>

第二个参数 .fns 是一个函数或要应用的函数列表每列.这也可以是 purrr 风格的公式(或公式列表)像~.x/2.(这个参数是可选的，如果你只是想要，你可以省略它获取底层数据；你会看到该技术用于vignette(rowwise").)

### 先在 GitHub 上安装开发版# install.packages("devtools")# devtools::install_github(tidyverse/dplyr")图书馆(dplyr，warn.conflicts = FALSE)

使用 .names 参数控制如何创建名称，该参数采用 glue 规格:

iris %>%group_by(物种)%>%总结(跨越(c(Sepal.Width:Petal.Width), ~ mean(.x, na.rm = TRUE), .names = "mean_{col}"),跨(c(Sepal.Length)，〜max(.x，na.rm = TRUE)，.names =max_{col}"))#># 小块:3 x 5#>物种 mean_Sepal.Width mean_Petal.Leng~ mean_Petal.Width max_Sepal.Length#>* <fct><dbl><dbl><dbl><dbl>#>1 setosa 3.43 1.46 0.246 5.8#>2 杂色 2.77 4.26 1.33 7#>3 维吉尼亚 2.97 5.55 2.03 7.9

使用多种功能

my_func <- list(均值 = ~ 均值(., na.rm = TRUE),max = ~ max(., na.rm = TRUE))虹膜％＞％group_by(物种)%>%summarise(across(where(is.numeric), my_func, .names = "{fn}.{col}"))#># 小费:3 x 9#>物种均值.Sepal.Length max.Sepal.Length mean.Sepal.Width max.Sepal.Width#>* <fct><dbl><dbl><dbl><dbl>#>1 setosa 5.01 5.8 3.43 4.4#>2 杂色 5.94 7 2.77 3.4#>3 维吉尼亚 6.59 7.9 2.97 3.8#>mean.Petal.Length max.Petal.Length mean.Petal.Width max.Petal.Width#>* <dbl><dbl><dbl><dbl>#>1 1.46 1.9 0.246 0.6#>2 4.26 5.1 1.33 1.8#>3 5.55 6.9 2.03 2.5

^{由 reprex 包 (v0.3.0) 于 2020 年 3 月 6 日创建}

Suppose we want to group_by() and summarise a massive data.frame with very many columns, but that there are some large groups of consecutive columns that will have the same summarise condition (e.g. max, mean etc)

Is there a way to avoid having to specify the summarise condition for each and every column, and instead do it for ranges of columns?

Example

Suppose we want to do this:

iris %>% 
  group_by(Species) %>% 
  summarise(max(Sepal.Length), mean(Sepal.Width), mean(Petal.Length), mean(Petal.Width))

but note that 3 consecutive columns have the same summarise condition, mean(Sepal.Width), mean(Petal.Length), mean(Petal.Width)

Is there a way to use some method like mean(Sepal.Width:Petal.Width) to specify the condition for the range of columns, and hence a avoiding having to type out the summarise condition multiple times for all the columns in between)

Note

The iris example above is a small and manageable example that has a range of 3 consecutive columns, but actual use case has ~hundreds.

解决方案

The upcoming version 1.0.0 of dplyr will have across() function that does what you wish for

Basic usage

across() has two primary arguments:

The first argument, .cols, selects the columns you want to operate on. It uses tidy selection (like select()) so you can pick variables by position, name, and type.

The second argument, .fns, is a function or list of functions to apply to each column. This can also be a purrr style formula (or list of formulas) like ~ .x / 2. (This argument is optional, and you can omit it if you just want to get the underlying data; you'll see that technique used in vignette("rowwise").)

### Install development version on GitHub first
# install.packages("devtools")
# devtools::install_github("tidyverse/dplyr")
library(dplyr, warn.conflicts = FALSE)

Control how the names are created with the .names argument which takes a glue spec:

iris %>% 
  group_by(Species) %>% 
  summarise(
    across(c(Sepal.Width:Petal.Width), ~ mean(.x, na.rm = TRUE), .names = "mean_{col}"),
    across(c(Sepal.Length), ~ max(.x, na.rm = TRUE), .names = "max_{col}")
    )
#> # A tibble: 3 x 5
#>   Species    mean_Sepal.Width mean_Petal.Leng~ mean_Petal.Width max_Sepal.Length
#> * <fct>                 <dbl>            <dbl>            <dbl>            <dbl>
#> 1 setosa                 3.43             1.46            0.246              5.8
#> 2 versicolor             2.77             4.26            1.33               7  
#> 3 virginica              2.97             5.55            2.03               7.9

Using multiple functions

my_func <- list(
  mean = ~ mean(., na.rm = TRUE),
  max  = ~ max(., na.rm = TRUE)
)

iris %>%
  group_by(Species) %>%
  summarise(across(where(is.numeric), my_func, .names = "{fn}.{col}"))
#> # A tibble: 3 x 9
#>   Species    mean.Sepal.Length max.Sepal.Length mean.Sepal.Width max.Sepal.Width
#> * <fct>                  <dbl>            <dbl>            <dbl>           <dbl>
#> 1 setosa                  5.01              5.8             3.43             4.4
#> 2 versicolor              5.94              7               2.77             3.4
#> 3 virginica               6.59              7.9             2.97             3.8
#>   mean.Petal.Length max.Petal.Length mean.Petal.Width max.Petal.Width
#> *             <dbl>            <dbl>            <dbl>           <dbl>
#> 1              1.46              1.9            0.246             0.6
#> 2              4.26              5.1            1.33              1.8
#> 3              5.55              6.9            2.03              2.5

^{Created on 2020-03-06 by the reprex package (v0.3.0)}

这篇关于使用 dplyr group_by 时将汇总条件应用于一系列列?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 dplyr group_by 时将汇总条件应用于一系列列? [英] Apply a summarise condition to a range of columns when using dplyr group_by?

问题描述

示例

注意

Example

Note

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 dplyr group_by 时将汇总条件应用于一系列列? [英] Apply a summarise condition to a range of columns when using dplyr group_by?

问题描述

示例

注意

Example

Note

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭