将内置函数n与summary_if结合使用 [英] use of builtin function n with summarize_if
问题描述
我正在尝试使用内置n函数在df上使用基本的dplyr :: summarize_if:
I am attempting a basic dplyr::summarize_if on a df with the built-in n function:
###Seems like this should work
df %>% summarise_if(is.numeric, funs(n, mean, sd, min, max), na.rm = TRUE)
Error in summarise_impl(.data, dots) : `n()` does not take arguments
###Works fine without the n
df %>% summarise_if(is.numeric, funs(mean, sd, min, max), na.rm = TRUE)
A tibble: 1 x 104
我尝试了 n()
和 n(。)
(当然不会期望能工作而不会)。
I've tried n()
and n(.)
(which of course wouldn't expect to work and don't).
我缺少使用 funs(n )
放在 summarise_if
中?
推荐答案
我认为这不是一次通过两种不同方式进行汇总的操作。您想总结一下(1)行数(也许是每组); (2)某些列的特定功能。 n()
辅助函数倾向于期望用于 full- data.frame
,而在 funs(...)
中标识的函数将一次全部传递给向量。
I don't think it's a single-pass operation to summarize in two different ways. You want to summarize (1) the number of rows (perhaps per-group); and (2) specific functions for certain columns. The n()
helper function tends to expect to be employed on a full-data.frame
, whereas the functions identified within funs(...)
will all be passed a vector at a time.
一种方法是合并/加入所需的内容。由于您没有提供数据,因此我将使用 mtcars
。虽然您没有提到分组,但我猜可能会有分组(尽管它不会使事情复杂化),所以我也要注入分组:
One method would be to merge/join in what you need. Since you didn't provide data, I'll use mtcars
. Though you don't mention grouping, I'm guessing that there may be groups (though it doesn't complicate things), so I'll inject that, too:
library(dplyr)
counts <- select(mtcars, cyl, mpg, wt) %>%
group_by(cyl) %>%
count()
counts
# # A tibble: 3 × 2
# cyl n
# <dbl> <int>
# 1 4 11
# 2 6 7
# 3 8 14
( count()
本质上是 summarize(n = n())
的快捷方式。用 select(mtcars,cyl,mpg,wt)%&>%count(cyl)
来完成同样容易,但是我希望此答案的分组是明确的。 )
(count()
is essentially a shortcut for summarize(n = n())
. This could have been done with select(mtcars, cyl, mpg, wt) %>% count(cyl)
just as easily, but I wanted the grouping to be explicit for this answer.)
others <- select(mtcars, cyl, mpg, wt) %>%
group_by(cyl) %>%
summarise_if(is.numeric, funs(mean, sd))
others
# # A tibble: 3 × 5
# cyl mpg_mean wt_mean mpg_sd wt_sd
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 4 26.66364 2.285727 4.509828 0.5695637
# 2 6 19.74286 3.117143 1.453567 0.3563455
# 3 8 15.10000 3.999214 2.560048 0.7594047
left_join(counts, others, by = "cyl")
# # A tibble: 3 × 6
# cyl n mpg_mean wt_mean mpg_sd wt_sd
# <dbl> <int> <dbl> <dbl> <dbl> <dbl>
# 1 4 11 26.66364 2.285727 4.509828 0.5695637
# 2 6 7 19.74286 3.117143 1.453567 0.3563455
# 3 8 14 15.10000 3.999214 2.560048 0.7594047
当然可以一键完成,而不用创建中间变量 counts
和 others
,但是(1)我认为将它们分解会更具有示范性; (2)有时代码的清晰性要优于紧凑性。可以在 others
管道的末尾添加%>%left_join(counts,by = cyl)
,但不会造成任何损失。
This could of course be done in one-fell-swoop instead of creating the intermediate variables counts
and others
, but (1) I thought it would be more demonstrative to break them out; and (2) sometimes clarity in code is preferred to compactness. One could add %>% left_join(counts, by = "cyl")
to the end of the others
pipeline, though, with no loss of clarity.
这篇关于将内置函数n与summary_if结合使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!