创建一个“其他"场地 [英] Creating an "other" field
问题描述
现在,我有以下由 original.df %.% group_by(Category) %.% Tally() %.%排列(desc(n))
创建的data.frame.
Right now, I have the following data.frame which was created by original.df %.% group_by(Category) %.% tally() %.% arrange(desc(n))
.
DF <- structure(list(Category = c("E", "K", "M", "L", "I", "A",
"S", "G", "N", "Q"), n = c(163051, 127133, 106680, 64868, 49701,
47387, 47096, 45601, 40056, 36882)), .Names = c("Category",
"n"), row.names = c(NA, 10L), class = c("tbl_df", "tbl", "data.frame"
))
Category n
1 E 163051
2 K 127133
3 M 106680
4 L 64868
5 I 49701
6 A 47387
7 S 47096
8 G 45601
9 N 40056
10 Q 36882
我想从排名最低的类别中创建一个其他"字段,按 n.即
I want to create an "Other" field from the bottom ranked Categories by n. i.e.
Category n
1 E 163051
2 K 127133
3 M 106680
4 L 64868
5 I 49701
6 Other 217022
现在,我在做
rbind(filter(DF, rank(rev(n)) <= 5),
summarise(filter(DF, rank(rev(n)) > 5), Category = "Other", n = sum(n)))
将所有不在前 5 个类别中的类别折叠到其他类别中.
which collapses all categories not in the top 5 into the Other category.
但我很好奇在 dplyr
或其他一些现有包中是否有更好的方法.更好"我的意思是更简洁/可读.我也对选择Other
的更聪明或更灵活的方法感兴趣.
But I'm curious whether there's a better way in dplyr
or some other existing package. By "better" I mean more succinct/readable. I'm also interested in methods with cleverer or more flexible ways to choose Other
.
推荐答案
不同的包/不同的语法版本:
Different package/different syntax version:
library(data.table)
dt = as.data.table(DF)
dt[order(-n), # your data is already sorted, so this does nothing for it
if (.BY[[1]]) .SD else list("Other", sum(n)),
by = 1:nrow(dt) <= 5][, !"nrow", with = F]
# Category n
#1: E 163051
#2: K 127133
#3: M 106680
#4: L 64868
#5: I 49701
#6: Other 217022
这篇关于创建一个“其他"场地的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!