基于多个变量计算排名 [英] Calculate rank with ties based on more than one variable

查看:64
本文介绍了基于多个变量计算排名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试计算一项体育赛事的奖牌表.

I'm trying to compute a medal table for a sports event.

我的数据如下所示:

test <- data.frame("ID" = c("1_1", "1_2", "1_3", "1_4","1_5","1_6"),
                   "gold"=c(10, 4, 1, 7, 7, 1),
                   "silver"=c(1, 3, 2, 19, 19, 2),
                   "bronze"=c(1, 8, 2, 0, 0, 2))

首先,我想根据金"、银"和铜"的数量对数据进行排序,如下所示:

First, I want to order the data based on number of "gold", "silver", and "bronze", like this:

(test_ordered <- with(test, test[order(-gold, -silver, -bronze), ]))

然后计算最终的奖牌排名.这是最终排名列的样子:

Then compute the final medal rank. This is how the final rank column should like:

(test_ordered$rank<-c(1, 2, 2, 4, 5, 5))

 #    ID gold silver bronze rank
 # 1 1_1   10      1      1    1
 # 4 1_4    7     19      0    2
 # 5 1_5    7     19      0    2
 # 2 1_2    4      3      8    4
 # 3 1_3    1      2      2    5
 # 6 1_6    1      2      2    5

由于 ID1_4"和1_5"赢得的奖牌组合与排名 2 相同,例如

As ID "1_4" and "1_5" have the won the same combination of medals they'd share rank 2, e.g.

我尝试使用 rank(还有 dplyr::min_ranked)超过两个条件失败:

My attempts using more than two criteria with rank (also dplyr::min_ranked) failed:

with(test, rank(-gold, -silver, -bronze, ties.method = "min")) 
# (...) unused argument (-bronze)

还有 interaction 不成功:

as.numeric(interaction(gl(-test$gold), gl(-test$silver), gl(-test$bronze), lex.order = TRUE))

任何想法如何根据多个变量计算排名?

Any ideas how to calculate rank based on multiple variables?

使用 henrik 的想法解决:

as.data.frame(setDT(test)[ , rank := frank(test, -gold, -silver, -bronze, ties.method = "min")]; setorder(test, rank))

推荐答案

您可以使用 data.table 等效于 base::rank, frank.frank 的一个很好的特性是它不仅接受向量(如在 rank 中),还接受一个 data.frame 或一个 data.table 作为输入.对于这些类型的对象,排名可能基于几列.

You may use the data.table equivalent of base::rank, frank. A nice feature with frank is that it accepts, not only vectors (as in rank), but also a data.frame or a data.table as input. For these types of objects, the rank may be based on several columns.

使用您原来的data.frame:

test$rank <- data.table::frank(test, -gold, -silver, -bronze, ties.method = "min")

或者,如果您想全部使用 data.table 函数:

Or if you want to go all in with data.table functions:

setDT(test)[ , rank := frank(test, -gold, -silver, -bronze, ties.method = "min")]
setorder(test, rank)

这篇关于基于多个变量计算排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆