使用 add_tally 和 top_n 函数选择 R 中的前几行 [英] Select top rows in R using add_tally and top_n functions
问题描述
我想选择数据框中的前 n 行计算表示变量总和的列 n
.例如,使用 mtcars
数据,我想过滤以只保留两个 cyl
mpg
的最大总和.在下面的例子中,我期待选择 cyl == 4
和 cyl == 8
的所有行.一定很简单,但是我想不通我的错误.
I would like to select the top n rows in a data frame for which I
calculated a column n
that represents the sum of a variable. For example,
using the mtcars
data, I would like to filter to keep only the two cyl
with the greatest sum of mpg
. In the following example, I was expecting
to select all rows where cyl == 4
and cyl == 8
. It must be simple, but
I can not figure out my mistake.
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
summarise(sum(mpg))
#> # A tibble: 3 x 2
#> cyl `sum(mpg)`
#> <dbl> <dbl>
#> 1 4 293.
#> 2 6 138.
#> 3 8 211.
mtcars %>%
group_by(cyl) %>% # Calculate the sum of mpg for each cyl
add_tally(mpg, sort = TRUE) %>%
ungroup() %>%
top_n(2, n)
#> # A tibble: 11 x 12
#> mpg cyl disp hp drat wt qsec vs am gear carb n
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 293.
#> 2 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 293.
#> 3 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 293.
#> 4 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1 293.
#> 5 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2 293.
#> 6 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1 293.
#> 7 21.5 4 120. 97 3.7 2.46 20.0 1 0 3 1 293.
#> 8 27.3 4 79 66 4.08 1.94 18.9 1 1 4 1 293.
#> 9 26 4 120. 91 4.43 2.14 16.7 0 1 5 2 293.
#> 10 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2 293.
#> 11 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2 293.
由 reprex 包 (v0.3.0) 于 2019 年 7 月 26 日创建上>
Created on 2019-07-26 by the reprex package (v0.3.0)
推荐答案
top_n
似乎在对 dataframe 进行排序后返回顶部 n
行 > 如果有关系,则返回多于 n
行.它不会返回具有不同顶部 n
值的行.
It seems that top_n
returns the top n
rows after ordering the dataframe and returns more than n
rows if there are ties. It does not return rows with distinct top n
values.
来自文档 -
使用
top_n(x, n, wt)
top_n(x, n, wt)
参数
x: 一个 tbl() 来过滤
x: a tbl() to filter
n:要返回的行数.如果 x 被分组,这是每组的行数.将包含多于 n 行,如果有联系.如果 n 为正,则选择前 n 行.如果是否定的,选择底部的 n 行.
n: number of rows to return. If x is grouped, this is the number of rows per group. Will include more than n rows if there are ties. If n is positive, selects the top n rows. If negative, selects the bottom n rows.
你需要,正如@tmfmnk 所建议的那样 -
You need, as suggested by @tmfmnk -
mtcars %>%
group_by(cyl) %>%
add_tally(mpg, sort = TRUE) %>%
ungroup() %>%
filter(dense_rank(desc(n)) < 3)
这篇关于使用 add_tally 和 top_n 函数选择 R 中的前几行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!