summarise_at 对不同的变量使用不同的函数 [英] summarise_at using different functions for different variables

查看：27 发布时间：2021/12/23 12:28:53 r dplyr tidyverse

本文介绍了summarise_at 对不同的变量使用不同的函数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当我在dplyr中使用group_by和summary时，自然可以对不同的变量应用不同的summary函数.例如:

When I use group_by and summarise in dplyr, I can naturally apply different summary functions to different variables. For instance:

    library(tidyverse)

    df <- tribble(
      ~category,   ~x,  ~y,  ~z,
      #----------------------
          'a',      4,   6,   8,
          'a',      7,   3,   0,
          'a',      7,   9,   0,
          'b',      2,   8,   8,
          'b',      5,   1,   8,
          'b',      8,   0,   1,
          'c',      2,   1,   1,
          'c',      3,   8,   0,
          'c',      1,   9,   1
     )

    df %>% group_by(category) %>% summarize(
      x=mean(x),
      y=median(y),
      z=first(z)
    )

结果输出:

    # A tibble: 3 x 4
      category     x     y     z
         <chr> <dbl> <dbl> <dbl>
    1        a     6     6     8
    2        b     5     1     8
    3        c     2     8     1

我的问题是，我将如何使用 summarise_at 做到这一点?显然，对于这个例子，这是不必要的，但假设我有很多要取平均值的变量，很多中位数等.

My question is, how would I do this with summarise_at? Obviously for this example it's unnecessary, but assume I have lots of variables that I want to take the mean of, lots of medians, etc.

一旦我转向 summarise_at，我会失去这个功能吗?我是否必须对所有变量组使用所有函数，然后扔掉我不想要的函数?

Do I lose this functionality once I move to summarise_at? Do I have to use all functions on all groups of variables and then throw away the ones I don't want?

也许我只是遗漏了一些东西，但我无法弄清楚，而且我在文档中没有看到任何此类示例.任何帮助表示赞赏.

Perhaps I'm just missing something, but I can't figure it out, and I don't see any examples of this in the documentation. Any help is appreciated.

推荐答案

这是一个想法.

library(tidyverse)

df_mean <- df %>%
  group_by(category) %>%
  summarize_at(vars(x), funs(mean(.)))

df_median <- df %>%
  group_by(category) %>%
  summarize_at(vars(y), funs(median(.)))

df_first <- df %>%
  group_by(category) %>%
  summarize_at(vars(z), funs(first(.)))

df_summary <- reduce(list(df_mean, df_median, df_first), 
                     left_join, by = "category")

如您所说，本示例无需使用 summarise_at.但是，如果您有很多列需要按不同的功能进行汇总，则此策略可能会奏效.您需要在 vars(...) 中为每个 summarize_at 指定列.规则与 dplyr::select 函数相同.

Like you said, there is no need to use summarise_at for this example. However, if you have a lot of columns need to be summarized by different functions, this strategy may work. You will need to specify the columns in the vars(...) for each summarize_at. The rule is the same as the dplyr::select function.

这是另一个想法.定义一个修改 summarise_at 函数的函数，然后使用 map2 来应用这个函数，并带有一个显示变量和要应用的关联函数的查找列表.在这个例子中，我将 mean 应用到 x 和 y 列和 median 到 z>.

Here is another idea. Define a function which modifies the summarise_at function, and then use map2 to apply this function with a look-up list showing variables and associated functions to apply. In this example, I applied mean to x and y column and median to z.

# Define a function
summarise_at_fun <- function(variable, func, data){
  data2 <- data %>%
    summarise_at(vars(variable), funs(get(func)(.)))
  return(data2)
}

# Group the data
df2 <- df %>% group_by(category)

# Create a look-up list with function names and variable to apply
look_list <- list(mean = c("x", "y"),
                  median = "z")

# Apply the summarise_at_fun
map2(look_list, names(look_list), summarise_at_fun, data = df2) %>%
  reduce(left_join, by = "category")

# A tibble: 3 x 4
  category     x     y     z
     <chr> <dbl> <dbl> <dbl>
1        a     6     6     0
2        b     5     3     8
3        c     2     6     1

这篇关于summarise_at 对不同的变量使用不同的函数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

summarise_at 对不同的变量使用不同的函数 [英] summarise_at using different functions for different variables

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

summarise_at 对不同的变量使用不同的函数 [英] summarise_at using different functions for different variables

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭