R summarise_at根据条件动态：对某些列取平均值，对其他列取平均值 [英] R summarise_at dynamically by condition : mean for some columns, sum for others

查看：91 发布时间：2020/10/26 3:58:35 sql r sql-server dplyr

本文介绍了R summarise_at根据条件动态：对某些列取平均值，对其他列取平均值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在标题中动态添加了该词：当我在 summarise_at（） vars（c（））时c $ c>用于快速清晰的示例，但实际上它用于 contains（）， starts_with（）和 matches（,, perl = TRUE），因为我有50列，其中有许多 sum（）和一些 mean（）。

I've added the word dynamically in the title: When I use vars(c()) in the summarise_at() it's for fast and clear examples, but in fact it's for use contains(), starts_with() and matches(,, perl=TRUE), because I have 50 columns, with many sum() and some mean().

目标是使用 tbl（）..％>生成动态SQL。％group_by（）...％>％summarise_at（）...％>％collect（）。

我添加了第二个示例中生成的SQL示例

I added example with SQL generated in my second example



library(tidyverse)
(mtcars 
  %>% group_by(carb)
  %>% summarise_at(vars(c("mpg","cyl","disp")), list (~mean(.),~sum(.)))
  # I don't want this line below, I would like a conditionnal in summarise_at() because I have 50 columns in my real case
  %>% select(carb,cyl_mean,disp_mean,mpg_sum)
)
#> # A tibble: 6 x 4
#>    carb cyl_mean disp_mean mpg_sum
#>   <dbl>    <dbl>     <dbl>   <dbl>
#> 1     1     4.57      134.   177. 
#> 2     2     5.6       208.   224  
#> 3     3     8         276.    48.9
#> 4     4     7.2       309.   158. 
#> 5     6     6         145     19.7
#> 6     8     8         301     15

Created on 2020-02-19 by the reprex package (v0.3.0)

这可行，但是我只希望mpg的总和，而只希望cyl和disp的意思。

library(RSQLite)
library(dbplyr)
library(tidyverse)
library(DBI)

db <- dbConnect(SQLite(),":memory:")

dbCreateTable(db, "mtcars_table", mtcars)

(tbl( db, build_sql( con=db,"select * from mtcars_table" ))
  %>% group_by(carb)
  %>% summarise_at(vars(c("mpg","cyl","disp")), list (~mean(.),~sum(.)))
  %>% select(carb,cyl_mean,disp_mean,mpg_sum)                   
  %>% show_query()
)
#> <SQL>
#> Warning: Missing values are always removed in SQL.[...]  to silence this warning
#> SELECT `carb`, `cyl_mean`, `disp_mean`, `mpg_sum`
#> FROM (SELECT `carb`, AVG(`mpg`) AS `mpg_mean`, AVG(`cyl`) AS `cyl_mean`, AVG(`disp`) AS `disp_mean`, SUM(`mpg`) AS `mpg_sum`, SUM(`cyl`) AS `cyl_sum`, SUM(`disp`) AS `disp_sum`
#> FROM (select * from mtcars_table)
#> GROUP BY `carb`)
#> # Source:   lazy query [?? x 4]
#> # Database: sqlite 3.30.1 [:memory:]
#> # … with 4 variables: carb <dbl>, cyl_mean <lgl>, disp_mean <lgl>,
#> #   mpg_sum <lgl>

我尝试了所有类似的方法，但是它不起作用或产生错误。

错误

(mtcars %>% group_by(carb)%>% summarise_at(vars(c("mpg","cyl","disp")),ifelse(vars(contains(names(.),"mpg")),list(sum(.)),list(mean(.)))) )

不好，列太多

library(tidyverse)
(mtcars %>% group_by(carb)%>% summarise_at(vars(c("mpg","cyl","disp")),ifelse ((names(.)=="mpg"), list(~sum(.)) , list(~mean(.)))))
#> # A tibble: 6 x 34
#>    carb mpg_sum cyl_sum disp_sum mpg_mean..2 cyl_mean..2 disp_mean..2
#>   <dbl>   <dbl>   <dbl>    <dbl>       <dbl>       <dbl>        <dbl>
#> 1     1   177.       32     940.        25.3        4.57         134.
#> 2     2   224        56    2082.        22.4        5.6          208.
#> 3     3    48.9      24     827.        16.3        8            276.
#> 4     4   158.       72    3088.        15.8        7.2          309.
#> 5     6    19.7       6     145         19.7        6            145 
#> 6     8    15         8     301         15          8            301 
#> # … with 27 more variables: mpg_mean..3 <dbl>, cyl_mean..3 <dbl>,
#> #   disp_mean..3 <dbl>, mpg_mean..4 <dbl>, cyl_mean..4 <dbl>,
#> #   disp_mean..4 <dbl>, mpg_mean..5 <dbl>, cyl_mean..5 <dbl>,
#> #   disp_mean..5 <dbl>, mpg_mean..6 <dbl>, cyl_mean..6 <dbl>,
#> #   disp_mean..6 <dbl>, mpg_mean..7 <dbl>, cyl_mean..7 <dbl>,
#> #   disp_mean..7 <dbl>, mpg_mean..8 <dbl>, cyl_mean..8 <dbl>,
#> #   disp_mean..8 <dbl>, mpg_mean..9 <dbl>, cyl_mean..9 <dbl>,
#> #   disp_mean..9 <dbl>, mpg_mean..10 <dbl>, cyl_mean..10 <dbl>,
#> #   disp_mean..10 <dbl>, mpg_mean..11 <dbl>, cyl_mean..11 <dbl>,
#> #   disp_mean..11 <dbl>

其他一些尝试和说明

我想要有条件的 sum（。）或平均值（。），具体取决于 summarise（）列的名称。

Some other tries and remarks

I would like conditional sum(.) or mean(.) depending of the name of the column in the summarise().

如果它不仅接受原始函数，那可能会很好。

It could be good if it accepts not only primitive functions.

最后，它用于 tbl（）..％>％group_by（）...％>％summarise_at（）...％>％collect（）生成带 AVG（）和 SUM（）。

At then end it's for tbl()..%>% group_by() ... %>% summarise_at()...%>% collect() to generate conditional SQL with AVG() and SUM().

Mssql SQL函数，例如 〜（convert（varchar（））适用于 mutate_at（）和类似的〜AVG（）适用于 summarise_at（），但是我到达了同一点：条件 summarise_at（）不起作用，这取决于列名称。

Mssql SQL function like ~(convert(varchar()) works for mutate_at() and similar ~AVG()works for summarise_at() but I arrive at the same point: conditional summarise_at() doesn't work depending of name of columns.

：）

推荐答案

选项是 group_by 'carb'，然后创建'mpg'的 sum 作为另一个分组变量，然后使用 summarise_at ，其余的所需变量的数量

An option is to group_by the 'carb', and then create the sum of 'mpg' as another grouping variable and then use summarise_at with the rest of the variables needed

library(dplyr)
mtcars %>%
    group_by(carb) %>%
    group_by(mpg_sum = sum(mpg), .add = TRUE) %>%
    summarise_at(vars(cyl, disp), list(mean = mean))
# A tibble: 6 x 4
# Groups:   carb [6]
#   carb mpg_sum cyl_mean disp_mean
#  <dbl>   <dbl>    <dbl>     <dbl>
#1     1   177.      4.57      134.
#2     2   224       5.6       208.
#3     3    48.9     8         276.
#4     4   158.      7.2       309.
#5     6    19.7     6         145 
#6     8    15       8         301

或使用 dplyr 的开发版本，可以在单个<$ c中完成$ c>总结，方法是将 cross 中的列块和单个列分别包装起来，并对其应用不同的功能

Or using the devel version of dplyr, this can be done in a single summarise by wrapping the blocks of columns in across and the single column by themselves and apply different functions on it

mtcars %>%
  group_by(carb) %>% 
  summarise(across(one_of(c("cyl", "disp")), list(mean = mean)), 
            mpg_sum = sum(mpg))
# A tibble: 6 x 4
#   carb cyl_mean disp_mean mpg_sum
#  <dbl>    <dbl>     <dbl>   <dbl>
#1     1     4.57      134.   177. 
#2     2     5.6       208.   224  
#3     3     8         276.    48.9
#4     4     7.2       309.   158. 
#5     6     6         145     19.7
#6     8     8         301     15

注意： summarise_at / summarise_if / mutate_at / mutate_if /...等将被 across 动词替换为默认功能（ summarise / mutate / filter /...）

NOTE: summarise_at/summarise_if/mutate_at/mutate_if/... etc. will be superseded by the across verb with the default functions (summarise/mutate/filter/...) in the upcoming releases

这篇关于R summarise_at根据条件动态：对某些列取平均值，对其他列取平均值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R summarise_at根据条件动态：对某些列取平均值，对其他列取平均值 [英] R summarise_at dynamically by condition : mean for some columns, sum for others

问题描述

这可行，但是我只希望mpg的总和，而只希望cyl和disp的意思。

我尝试了所有类似的方法，但是它不起作用或产生错误。

错误

不好，列太多

其他一些尝试和说明

Some other tries and remarks

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

R summarise_at根据条件动态：对某些列取平均值，对其他列取平均值 [英] R summarise_at dynamically by condition : mean for some columns, sum for others

问题描述

这可行，但是我只希望mpg的总和，而只希望cyl和disp的意思。

我尝试了所有类似的方法，但是它不起作用或产生错误。

错误

不好，列太多

其他一些尝试和说明

Some other tries and remarks

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭