将内置函数n与summary_if结合使用 [英] use of builtin function n with summarize_if

查看:96
本文介绍了将内置函数n与summary_if结合使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用内置n函数在df上使用基本的dplyr :: summarize_if:

I am attempting a basic dplyr::summarize_if on a df with the built-in n function:

###Seems like this should work
df %>% summarise_if(is.numeric, funs(n, mean, sd, min, max), na.rm = TRUE)  

Error in summarise_impl(.data, dots) : `n()` does not take arguments


###Works fine without the n

df %>% summarise_if(is.numeric, funs(mean, sd, min, max), na.rm = TRUE)  
A tibble: 1 x 104

我尝试了 n() n(。)(当然不会期望能工作而不会)。

I've tried n() and n(.) (which of course wouldn't expect to work and don't).


我缺少使用 funs(n )放在 summarise_if 中?

推荐答案

我认为这不是一次通过两种不同方式进行汇总的操作。您想总结一下(1)行数(也许是每组); (2)某些列的特定功能。 n()辅助函数倾向于期望用于 full- data.frame ,而在 funs(...)中标识的函数将一次全部传递给向量。

I don't think it's a single-pass operation to summarize in two different ways. You want to summarize (1) the number of rows (perhaps per-group); and (2) specific functions for certain columns. The n() helper function tends to expect to be employed on a full-data.frame, whereas the functions identified within funs(...) will all be passed a vector at a time.

一种方法是合并/加入所需的内容。由于您没有提供数据,因此我将使用 mtcars 。虽然您没有提到分组,但我猜可能会有分组(尽管它不会使事情复杂化),所以我也要注入分组:

One method would be to merge/join in what you need. Since you didn't provide data, I'll use mtcars. Though you don't mention grouping, I'm guessing that there may be groups (though it doesn't complicate things), so I'll inject that, too:

library(dplyr)
counts <- select(mtcars, cyl, mpg, wt) %>%
  group_by(cyl) %>%
  count()
counts
# # A tibble: 3 × 2
#     cyl     n
#   <dbl> <int>
# 1     4    11
# 2     6     7
# 3     8    14

count()本质上是 summarize(n = n())的快捷方式。用 select(mtcars,cyl,mpg,wt)%&>%count(cyl)来完成同样容易,但是我希望此答案的分组是明确的。 )

(count() is essentially a shortcut for summarize(n = n()). This could have been done with select(mtcars, cyl, mpg, wt) %>% count(cyl) just as easily, but I wanted the grouping to be explicit for this answer.)

others <- select(mtcars, cyl, mpg, wt) %>%
  group_by(cyl) %>%
  summarise_if(is.numeric, funs(mean, sd))
others
# # A tibble: 3 × 5
#     cyl mpg_mean  wt_mean   mpg_sd     wt_sd
#   <dbl>    <dbl>    <dbl>    <dbl>     <dbl>
# 1     4 26.66364 2.285727 4.509828 0.5695637
# 2     6 19.74286 3.117143 1.453567 0.3563455
# 3     8 15.10000 3.999214 2.560048 0.7594047

left_join(counts, others, by = "cyl")
# # A tibble: 3 × 6
#     cyl     n mpg_mean  wt_mean   mpg_sd     wt_sd
#   <dbl> <int>    <dbl>    <dbl>    <dbl>     <dbl>
# 1     4    11 26.66364 2.285727 4.509828 0.5695637
# 2     6     7 19.74286 3.117143 1.453567 0.3563455
# 3     8    14 15.10000 3.999214 2.560048 0.7594047

当然可以一键完成,而不用创建中间变量 counts others ,但是(1)我认为将它们分解会更具有示范性; (2)有时代码的清晰性要优于紧凑性。可以在 others 管道的末尾添加%>%left_join(counts,by = cyl) ,但不会造成任何损失。

This could of course be done in one-fell-swoop instead of creating the intermediate variables counts and others, but (1) I thought it would be more demonstrative to break them out; and (2) sometimes clarity in code is preferred to compactness. One could add %>% left_join(counts, by = "cyl") to the end of the others pipeline, though, with no loss of clarity.

这篇关于将内置函数n与summary_if结合使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆