R中的特定组排名 [英] Specific group rankings in R
问题描述
我有数据框Category"、ID"、Score(t)",我想得到Rank(t)":
I have the data frame "Category", "ID", "Score(t)", and I want to get "Rank(t)":
Category ID Score.08.2007 Score.09.2007 Rank.08.2007 Rank.09.2007 ...
Orange FSGBR070N3 0.16 ... 5 ...
Orange FSGBR070N3 0.05 ... 7 ...
Orange FSGBR070N3 0.11 6
Orange FS00008L4G 0.28 1
Orange FS00008VLD 0.27 2
Orange FS00008VLD 0.27 2
Orange FS00008VLD 0.27 2
Orange FS00009SQX -2.03 8
Orange FS00009SQX NA
Orange FSUSA0A1KW NA
Orange FSUSA0A1KW NA
Orange FSUSA0A1KX NA
Orange FSUSA0A1KY NA
Orange FS0000B389 NA
Banana FS000092GP 96.25 1
Banana FS000092GP 96.25 1
Banana FS000092GP 96.25 1
Banana FS000092GP 52.33 4
Banana FS0000ATLN 31.73 5
Banana FSUSA0AVMF 1.38 7
Banana FSGBR058O8 1.37 8
Banana FSGBR05845 2.24 6
排名基于每个类别"中分数"的降序排序.我很难捕捉到的附加规范是,当有相同的分数和相同的 ID 时,对于具有不同值的以下分数,分配一个等级等于前一个 ID 的等级加上共享相同 ID 的数量分数(示例中的排名输出列应该清楚这一点).
The ranking is based on descending sorting of the "Score" in each "Category". The additional specification, which I struggle to capture, is that when there are identical scores AND identical ID's, for the following score that has a different value assign a rank equal to the rank from the previous ID plus the number of ID's that shared the same score (The rank output column in the example should make this clear).
NA 不应获得排名:
na.last = NA
我已经开始为等级创建一个矩阵,然后我可能需要 sort(),但是我很难为时间序列和附加规范捕获它......也找不到这样具体的现有问题.感谢帮助!
I have started with creating a matrix for ranks, then I would probably need sort(), but I struggle to capture this for the time-series and with the additional specification... couldn't find such specific existing questions either. Help appreciated!
time_series <- c("08.2007","09.2007","10.2007",...)
abs_ranks_mat <- as.data.frame(mat.or.vec(nrow(ID),length(time_series)))
推荐答案
一个解决方案使用 dplyr
.df
是来自@trosendal 示例的示例.df3
是最终输出.
A solution uses dplyr
. df
is the example from @trosendal's example. df3
is the final output.
关键是使用min_rank
函数来创建排名.mutate_at
允许我们指定我们希望或不想进行排名的列.之后,我们可以更改列名并与原始数据框合并.
The key is to use min_rank
function to create the rank. mutate_at
allows us to specify which column we do or do not want to conduct ranking. After that, we can change the column names and merge with the original data frame.
library(dplyr)
df <- df %>% mutate(RowID = 1:n())
df2 <- df %>%
group_by(Category) %>%
mutate_at(vars(-ID, -RowID), funs(min_rank(desc(.)))) %>%
ungroup() %>%
select(-Category, -ID) %>%
setNames(., gsub("Score", "Rank", colnames(.)))
df3 <- df %>%
left_join(df2, by = "RowID") %>%
select(-RowID)
这篇关于R中的特定组排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!