基于几个变量的排名 [英] Rank based on several variables

查看:93
本文介绍了基于几个变量的排名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个小例子。在我较大的数据集中,我有多年的数据,每组(div)的观察次数并不总是相等。

This is a small example. In my larger dataset, I have multiple years of data and the number of observations per group (div) are not always equal.

示例数据:

set.seed(1)
df<-data.frame(
  year = 2014,
  id = sample(LETTERS[1:26], 12),
  div = rep(c("1", "2a", "2b"), each=4),
  pts = c(9,7,9,3,7,5,3,7,2,7,7,1),
  x = c(10,12,11,7,7,5,4,12,4,6,7,2)
)

df

#   year id div pts  x
#1  2014  G   1   9 10
#2  2014  J   1   7 12
#3  2014  N   1   9 11
#4  2014  U   1   3  7
#5  2014  E  2a   7  7
#6  2014  S  2a   5  5
#7  2014  W  2a   3  4
#8  2014  M  2a   7 12
#9  2014  L  2b   2  4
#10 2014  B  2b   7  6
#11 2014  D  2b   7  7
#12 2014  C  2b   1  2

我想对这些数据进行排名,以使div 1中的个人排名高于div 2a / 2b,在div 1个人中排名第1,

I want to rank this data such that individuals in div 1 are ranked higher than div 2a/2b, and within div 1 individuals are ranked 1,2,3,4 based on highest number of 'pts' followed by highest number of 'x'.

div 2a和div 2b中的个人也应分别排名在相同的条件下。看起来像这样:

Individuals in div 2a and div 2b should be ranked individually also based on the same criteria. This would look like this:

df %>% 
  group_by(div) %>%
  arrange(desc(pts), desc(x)) %>%
  mutate(position = row_number(div))


#   year id div pts  x position
#1  2014  N   1   9 11        1
#2  2014  G   1   9 10        2
#3  2014  J   1   7 12        3
#4  2014  U   1   3  7        4
#5  2014  M  2a   7 12        1
#6  2014  E  2a   7  7        2
#7  2014  S  2a   5  5        3
#8  2014  W  2a   3  4        4
#9  2014  D  2b   7  7        1
#10 2014  B  2b   7  6        2
#11 2014  L  2b   2  4        3
#12 2014  C  2b   1  2        4

但是,我想生成另一个列的最终列/变量。这将使div 1中的所有个人的排名高于2a / 2b,但2a / 2b相等。即2a / 2b中为1的个人现在应该获得5.5,排名2的个人现在应该获得7.5。

However, I want to produce a final column/variable that is another rank. This would rank all individuals in div 1 as higher than 2a/2b, but 2a/2b are equal. i.e. individuals who are 1 in 2a/2b should now get 5.5, individuals who are ranked 2 should now get 7.5. There are always an equal number of individuals in div2a and div2b for all years.

看起来应该是这样的:

#   year id div pts  x position final
#1  2014  N   1   9 11        1   1.0  
#2  2014  G   1   9 10        2   2.0
#3  2014  J   1   7 12        3   3.0
#4  2014  U   1   3  7        4   4.0
#5  2014  M  2a   7 12        1   5.5
#6  2014  E  2a   7  7        2   7.5
#7  2014  S  2a   5  5        3   9.5
#8  2014  W  2a   3  4        4  11.5
#9  2014  D  2b   7  7        1   5.5
#10 2014  B  2b   7  6        2   7.5  
#11 2014  L  2b   2  4        3   9.5
#12 2014  C  2b   1  2        4  11.5

我需要找到理想的 dplyr 解决方案。另外,它确实需要归纳到 div1中的个体数量可能变化而div2a / div2b中的个体数量变化的年份(尽管length(div2a)== length(div2b)始终)。

I need to find a dplyr solution ideally. Also, it does need to generalize to years where the number of individuals in 'div1' may vary and the number of individuals in div2a/div2b varies (although length(div2a)==length(div2b) always).

推荐答案

这就是我要这样做的方式:

This is how I'd do it:

library(data.table)
dt = as.data.table(df)

dt[order(-pts, -x), rank.init := 1:.N, by = div]

dt[, div.clean := sub('(\\d+).*', '\\1', div)]
setorder(dt, div.clean, rank.init)

dt[, rank.final := mean(.I), by = .(div.clean, rank.init)]
setorder(dt, div, rank.final)
#    year id div pts  x rank.init div.clean rank.final
# 1: 2014  N   1   9 11         1         1        1.0
# 2: 2014  G   1   9 10         2         1        2.0
# 3: 2014  J   1   7 12         3         1        3.0
# 4: 2014  U   1   3  7         4         1        4.0
# 5: 2014  M  2a   7 12         1         2        5.5
# 6: 2014  E  2a   7  7         2         2        7.5
# 7: 2014  S  2a   5  5         3         2        9.5
# 8: 2014  W  2a   3  4         4         2       11.5
# 9: 2014  D  2b   7  7         1         2        5.5
#10: 2014  B  2b   7  6         2         2        7.5
#11: 2014  L  2b   2  4         3         2        9.5
#12: 2014  C  2b   1  2         4         2       11.5

这篇关于基于几个变量的排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆