创建“其他”领域 [英] Creating an "other" field
问题描述
现在,我有以下数据框架,由 original.df%创建。%group_by(Category)%。%tally()%。%arrange(desc(n))
。
Right now, I have the following data.frame which was created by original.df %.% group_by(Category) %.% tally() %.% arrange(desc(n))
.
DF <- structure(list(Category = c("E", "K", "M", "L", "I", "A",
"S", "G", "N", "Q"), n = c(163051, 127133, 106680, 64868, 49701,
47387, 47096, 45601, 40056, 36882)), .Names = c("Category",
"n"), row.names = c(NA, 10L), class = c("tbl_df", "tbl", "data.frame"
))
Category n
1 E 163051
2 K 127133
3 M 106680
4 L 64868
5 I 49701
6 A 47387
7 S 47096
8 G 45601
9 N 40056
10 Q 36882
我想从底部排名的其他字段创建n。即
I want to create an "Other" field from the bottom ranked Categories by n. i.e.
Category n
1 E 163051
2 K 127133
3 M 106680
4 L 64868
5 I 49701
6 Other 217022
现在,我正在做
rbind(filter(DF, rank(rev(n)) <= 5),
summarise(filter(DF, rank(rev(n)) > 5), Category = "Other", n = sum(n)))
将所有不在前5名的类别折叠到其他类别中。
which collapses all categories not in the top 5 into the Other category.
但我很好奇是否有更好的方式在 dplyr
或其他现有的包中。 更好我的意思是更简洁/可读。我也有兴趣使用更聪明或更灵活的方法来选择其他
。
But I'm curious whether there's a better way in dplyr
or some other existing package. By "better" I mean more succinct/readable. I'm also interested in methods with cleverer or more flexible ways to choose Other
.
推荐答案
不同的包/不同的语法版本:
Different package/different syntax version:
library(data.table)
dt = as.data.table(DF)
dt[order(-n), # your data is already sorted, so this does nothing for it
if (.BY[[1]]) .SD else list("Other", sum(n)),
by = 1:nrow(dt) <= 5][, !"nrow", with = F]
# Category n
#1: E 163051
#2: K 127133
#3: M 106680
#4: L 64868
#5: I 49701
#6: Other 217022
这篇关于创建“其他”领域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!