R中的特定组排名 [英] Specific group rankings in R

查看:49
本文介绍了R中的特定组排名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有数据框Category"、ID"、Score(t)",我想得到Rank(t)":

I have the data frame "Category", "ID", "Score(t)", and I want to get "Rank(t)":

Category    ID          Score.08.2007   Score.09.2007    Rank.08.2007    Rank.09.2007   ...
Orange      FSGBR070N3  0.16            ...              5               ...
Orange      FSGBR070N3  0.05            ...              7               ...
Orange      FSGBR070N3  0.11                             6
Orange      FS00008L4G  0.28                             1
Orange      FS00008VLD  0.27                             2
Orange      FS00008VLD  0.27                             2
Orange      FS00008VLD  0.27                             2
Orange      FS00009SQX  -2.03                            8
Orange      FS00009SQX  NA                          
Orange      FSUSA0A1KW  NA          
Orange      FSUSA0A1KW  NA  
Orange      FSUSA0A1KX  NA  
Orange      FSUSA0A1KY  NA  
Orange      FS0000B389  NA  
Banana      FS000092GP  96.25                            1
Banana      FS000092GP  96.25                            1
Banana      FS000092GP  96.25                            1
Banana      FS000092GP  52.33                            4
Banana      FS0000ATLN  31.73                            5
Banana      FSUSA0AVMF  1.38                             7
Banana      FSGBR058O8  1.37                             8
Banana      FSGBR05845  2.24                             6

排名基于每个类别"中分数"的降序排序.我很难捕捉到的附加规范是,当有相同的分数和相同的 ID 时,对于具有不同值的以下分数,分配一个等级等于前一个 ID 的等级加上共享相同 ID 的数量分数(示例中的排名输出列应该清楚这一点).

The ranking is based on descending sorting of the "Score" in each "Category". The additional specification, which I struggle to capture, is that when there are identical scores AND identical ID's, for the following score that has a different value assign a rank equal to the rank from the previous ID plus the number of ID's that shared the same score (The rank output column in the example should make this clear).

NA 不应获得排名:

na.last = NA

我已经开始为等级创建一个矩阵,然后我可能需要 sort(),但是我很难为时间序列和附加规范捕获它......也找不到这样具体的现有问题.感谢帮助!

I have started with creating a matrix for ranks, then I would probably need sort(), but I struggle to capture this for the time-series and with the additional specification... couldn't find such specific existing questions either. Help appreciated!

time_series <- c("08.2007","09.2007","10.2007",...)
abs_ranks_mat <- as.data.frame(mat.or.vec(nrow(ID),length(time_series)))

推荐答案

一个解决方案使用 dplyr.df 是来自@trosendal 示例的示例.df3 是最终输出.

A solution uses dplyr. df is the example from @trosendal's example. df3 is the final output.

关键是使用min_rank函数来创建排名.mutate_at 允许我们指定我们希望或不想进行排名的列.之后,我们可以更改列名并与原始数据框合并.

The key is to use min_rank function to create the rank. mutate_at allows us to specify which column we do or do not want to conduct ranking. After that, we can change the column names and merge with the original data frame.

library(dplyr)

df <- df %>% mutate(RowID = 1:n())

df2 <- df %>%
  group_by(Category) %>%
  mutate_at(vars(-ID, -RowID), funs(min_rank(desc(.)))) %>%
  ungroup() %>%
  select(-Category, -ID) %>%
  setNames(., gsub("Score", "Rank", colnames(.)))

df3 <- df %>% 
  left_join(df2, by = "RowID") %>%
  select(-RowID)

这篇关于R中的特定组排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆