R dplyr汇总了多个函数以选择变量 [英] R dplyr summarise multiple functions to selected variables
问题描述
我有一个数据集,我想对它进行平均总结,但也要计算最多1个变量的最大值。
I have a dataset for which I want to summarise by mean, but also calculate the max to just 1 of the variables.
让我从一个示例开始我想实现的目标:
Let me start with an example of what I would like to achieve:
iris %>%
group_by(Species) %>%
filter(Sepal.Length > 5) %>%
summarise_at("Sepal.Length:Petal.Width",funs(mean))
这给了我以下结果
# A tibble: 3 × 5
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<fctr> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.8 4.4 1.9 0.5
2 versicolor 7.0 3.4 5.1 1.8
3 virginica 7.9 3.8 6.9 2.5
是否有一种简单的方法可以添加 max(Petal.Width)
进行总结?
Is there an easy way to add, for example, max(Petal.Width)
to summarise?
到目前为止,我已经尝试了以下操作:
So far I have tried the following:
iris %>%
group_by(Species) %>%
filter(Sepal.Length > 5) %>%
summarise_at("Sepal.Length:Petal.Width",funs(mean)) %>%
mutate(Max.Petal.Width = max(iris$Petal.Width))
但是这种方法使我失去了上面代码中的 group_by
和 filter
并给出了错误的结果。
But with this approach I lose both the group_by
and the filter
from the code above and gives the wrong results.
我唯一能够实现的解决方案如下:
The only solution I have been able to achieve is the following:
iris %>%
group_by(Species) %>%
filter(Sepal.Length > 5) %>%
summarise_at("Sepal.Length:Petal.Width",funs(mean,max)) %>%
select(Species:Petal.Width_mean,Petal.Width_max) %>%
rename(Max.Petal.Width = Petal.Width_max) %>%
rename_(.dots = setNames(names(.), gsub("_.*$","",names(.))))
这有点令人费解,并且涉及很多键入操作,只是添加具有不同摘要的列。
Which is a bit convoluted and involves a lot of typing to just add a column with a different summarisation.
谢谢
推荐答案
如果您尝试使用dplyr进行所有操作(可能是更容易记住),则可以利用新的 cross
函数,该函数可从 dplyr 1.0.0 。
If you are trying to do everything with dplyr (which might be easier to remember), then you can leverage the new across
function which will be available from dplyr 1.0.0.
iris %>%
group_by(Species) %>%
filter(Sepal.Length > 5) %>%
summarize(across(Sepal.Length:Petal.Width, mean)) %>%
cbind(iris %>%
group_by(Species) %>%
summarize(across(Petal.Width, max)) %>%
select(-Species)
)
这表明,唯一的困难是在同一列 Petal.Width
的同一列上组合两个计算-您必须再次进行分组,但可以将其嵌套到 cbind
中。
这将正确返回结果:
It shows that the only difficulty is to combine two calculations on the same column Petal.Width
on a grouped variable - you have to do the grouping again but can nest it into the cbind
.
This returns correctly the result:
Species Sepal.Length Sepal.Width Petal.Length Petal.Width Petal.Width
1 setosa 5.313636 3.713636 1.509091 0.2772727 0.6
2 versicolor 5.997872 2.804255 4.317021 1.3468085 1.8
3 virginica 6.622449 2.983673 5.573469 2.0326531 2.5
如果任务未指定两个计算,但在同一列 Petal.Width
中仅指定一个,则可以写得很漂亮as:
If the task would not specify two calculations but only one on the same column Petal.Width
, then this could be elegantly written as:
iris %>%
group_by(Species) %>%
filter(Sepal.Length > 5) %>%
summarize(
across(Sepal.Length:Petal.Length, mean),
across(Petal.Width, max)
)
这篇关于R dplyr汇总了多个函数以选择变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!