R dplyr汇总了多个函数以选择变量 [英] R dplyr summarise multiple functions to selected variables

查看:105
本文介绍了R dplyr汇总了多个函数以选择变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,我想对它进行平均总结,但也要计算最多1个变量的最大值。

I have a dataset for which I want to summarise by mean, but also calculate the max to just 1 of the variables.

让我从一个示例开始我想实现的目标:

Let me start with an example of what I would like to achieve:

iris %>%
  group_by(Species) %>%
  filter(Sepal.Length > 5) %>%
  summarise_at("Sepal.Length:Petal.Width",funs(mean))

这给了我以下结果

# A tibble: 3 × 5
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
      <fctr>        <dbl>       <dbl>        <dbl>       <dbl>
1     setosa          5.8         4.4          1.9         0.5
2 versicolor          7.0         3.4          5.1         1.8
3  virginica          7.9         3.8          6.9         2.5

是否有一种简单的方法可以添加 max(Petal.Width)进行总结?

Is there an easy way to add, for example, max(Petal.Width)to summarise?

到目前为止,我已经尝试了以下操作:

So far I have tried the following:

iris %>%
  group_by(Species) %>%
  filter(Sepal.Length > 5) %>%
  summarise_at("Sepal.Length:Petal.Width",funs(mean)) %>%
  mutate(Max.Petal.Width = max(iris$Petal.Width))

但是这种方法使我失去了上面代码中的 group_by filter 并给出了错误的结果。

But with this approach I lose both the group_by and the filter from the code above and gives the wrong results.

我唯一能够实现的解决方案如下:

The only solution I have been able to achieve is the following:

iris %>%
  group_by(Species) %>%
  filter(Sepal.Length > 5) %>%
  summarise_at("Sepal.Length:Petal.Width",funs(mean,max)) %>%
  select(Species:Petal.Width_mean,Petal.Width_max) %>% 
  rename(Max.Petal.Width = Petal.Width_max) %>%
  rename_(.dots = setNames(names(.), gsub("_.*$","",names(.))))

这有点令人费解,并且涉及很多键入操作,只是添加具有不同摘要的列。

Which is a bit convoluted and involves a lot of typing to just add a column with a different summarisation.

谢谢

推荐答案

如果您尝试使用dplyr进行所有操作(可能是更容易记住),则可以利用新的 cross 函数,该函数可从 dplyr 1.0.0

If you are trying to do everything with dplyr (which might be easier to remember), then you can leverage the new across function which will be available from dplyr 1.0.0.

iris %>%
  group_by(Species) %>%
  filter(Sepal.Length > 5) %>% 
  summarize(across(Sepal.Length:Petal.Width, mean)) %>% 
  cbind(iris %>% 
          group_by(Species) %>% 
          summarize(across(Petal.Width, max)) %>% 
          select(-Species)
  )

这表明,唯一的困难是在同一列 Petal.Width 的同一列上组合两个计算-您必须再次进行分组,但可以将其嵌套到 cbind 中。
这将正确返回结果:

It shows that the only difficulty is to combine two calculations on the same column Petal.Width on a grouped variable - you have to do the grouping again but can nest it into the cbind. This returns correctly the result:

     Species Sepal.Length Sepal.Width Petal.Length Petal.Width Petal.Width
1     setosa     5.313636    3.713636     1.509091   0.2772727         0.6
2 versicolor     5.997872    2.804255     4.317021   1.3468085         1.8
3  virginica     6.622449    2.983673     5.573469   2.0326531         2.5

如果任务未指定两个计算,但在同一列 Petal.Width 中仅指定一个,则可以写得很漂亮as:

If the task would not specify two calculations but only one on the same column Petal.Width, then this could be elegantly written as:

iris %>%
  group_by(Species) %>%
  filter(Sepal.Length > 5) %>% 
  summarize(
    across(Sepal.Length:Petal.Length, mean),
    across(Petal.Width, max)
  )

这篇关于R dplyr汇总了多个函数以选择变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆