dplyr:根据聚合函数结果过滤行 [英] dplyr: filter rows according to aggregated function result
问题描述
我有一个表格列表(金额,年份和月份),我想过滤与完整年份相对应的行。即我想忽略下面给出的示例数据框的最后4行,它指的是2015,并得到其余的60.可以用一个dplyr命令来做到这一点吗?
I have a table listing (amount, year and month) and I want to filter the rows corresponding to complete years. I.e. I want to ommit the last 4 rows of the sample dataframe I give below, that refer to 2015, and get the rest 60. Is it possible to do that with a single dplyr command?
我尝试过:
df %>%
group_by(year) %>%
tally() %>%
filter (n==12) %>%
ungroup()
但我想ungroup做的不同于我想要的东西。可以通过一个dplyr命令来做到这一点吗?
but I guess ungroup does something different than what I want. Is it possible to do that with a single dplyr command?
df <- structure(list(amount = c(16365, 31850, 32230, 34177.75, 27900,
29650, 28846, 27300, 37115.31, 34130.38, 39676.1, 47244.44, 3500,
25425.48, 22628.43, 30822.86, 30100, 41567.13, 25400, 23125,
40073.75, 16505.82, 17770, 38406.03, 1528.25, 23475.77, 29869.69,
17020, 19270, 13085.47, 10607.48, 7800, 15220, 15260, 17580,
25094.66, 3908.74, 8150, 25055.89, 19690.65, 12445.4, 10347.39,
7645.39, 49300, 8690, 13660, 16510, 34457.08, 522.68, 10202,
18900, 25027.1, 24956.42, 23259, 32743, 37226, 32697, 32258,
31336.67, 36135.81, 4389.26, 12450, 46220.43, 36770.7), year = c("2010",
"2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010",
"2010", "2010", "2010", "2011", "2011", "2011", "2011", "2011",
"2011", "2011", "2011", "2011", "2011", "2011", "2011", "2012",
"2012", "2012", "2012", "2012", "2012", "2012", "2012", "2012",
"2012", "2012", "2012", "2013", "2013", "2013", "2013", "2013",
"2013", "2013", "2013", "2013", "2013", "2013", "2013", "2014",
"2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014",
"2014", "2014", "2014", "2015", "2015", "2015", "2015"), month = c("01",
"02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12",
"01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11",
"12", "01", "02", "03", "04", "05", "06", "07", "08", "09", "10",
"11", "12", "01", "02", "03", "04", "05", "06", "07", "08", "09",
"10", "11", "12", "01", "02", "03", "04", "05", "06", "07", "08",
"09", "10", "11", "12", "01", "02", "03", "04")), .Names = c("amount",
"year", "month"), class = c("tbl_df", "data.frame"), row.names = c(NA,
-64L))
推荐答案
tally()
相当于 summarize(n = n())
。但是,在这种情况下,您希望保留数据帧的原始行,但要进行过滤,以便删除不完整年份的行。 @ AndresT的答案可以正常工作,但是您也可以更简洁地进行操作,而无需创建列来计算每个组的行数的中间步骤:
tally()
is the equivalent of summarise(n=n())
. However, in this case you want to keep the original rows of the data frame, but filtered so that rows that are part of incomplete years are removed. @AndresT's answer will work fine, but you can also do it more concisely without an intermediate step of creating a column to count the number of rows for each group:
df %>% group_by(year) %>% filter(n()==12)
这篇关于dplyr:根据聚合函数结果过滤行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!