如何使用第二个向量作为决胜局对向量进行排名? [英] How to rank a vector using a second vector as a tie breaker?
问题描述
我需要为数值向量实现一个排名算法.我不知道是否可以使用 R 中的 rank()、order() 或 sort() 等函数来实现,或者我是否应该对其进行硬编码.无论哪种方式,我都做不到.
算法的工作原理如下:
设 x = (x_1,x_2...,x_n) 和 y = (y_1,y_2,...y_n) 是两个向量.我们需要这样构建由 x 的排序元素组成的向量 z:
如果 x_i
如果 x_i = x_j 则
z_i
z_i >z_j 如果 y_i >y_j
z_i = z_j 如果 y_i = y_j
如果 x_i 是 NA(缺失),则
z_i >z_j 如果 z_j 不是 NA
z_i = z_j 如果 z_j 是 NA
例如,如果 x = (30,15,27,49,15) 和 y = (12,11,10,9,8) 那么 z = (4,2,3,5,1)>
我想我可以使用 order(order(x,y, na.last=T))
并且实际上只要 x 中的关系不与 y 相关,它就可以工作.如果是这种情况,那么 order()
将按照出现的顺序对它们进行排名,而不是将它们并列.
例如,如果 x = (30,15,27,49,15) 和 y = (12,8,10,9,8) 那么 order(order(x,y, na.last=T))
将输出 z = (4,1,3,5,2) 而不是 z = (4,1,3,5,1) 或另一个 z(例如 (3,1,2,4,1)) 尊重第 2 步.
我无法逃避.我该如何继续?
tl;dr: 我认为版本 1 是最好的.版本 2 和 3 是早期的想法,不太好,但我把它们留在这里,以防它们对任何人有用.
<小时>不幸的是,rank
没有提供使用第二个向量来打破平局的能力(order
和 sort
做的一个有用的能力) 允许).
版本 1
但是,library(data.table)
提供的 frank()
可以很好地完成这项工作.
x = c(30,15,27,49,15)y = c(12,11,10,9,8)坦率(列表(x,y),ties.method =min")# [1] 4 2 3 5 1x = c(30,15,27,49,15)y = c(12,8,10,9,8)坦率(列表(x,y),ties.method =min")# [1] 4 1 3 5 1
请注意,frank
还为 ties.method = "dense"
提供了另一个选项,这对于某些用途可能更好,因为它不会跳过排名(即当两个值被赋予 1 级,下一个最大的值是 2 级,而不是 3) - 请参见下面的示例
frank(list(x,y), ties.method = "dense")[1] 3 1 2 4 1
版本 2
如果您想坚持使用基数 R,一种简单的解决方法是对 x * K + y
进行排序,其中 K 是足够大的任何数字,可以添加最大的 y
> 到任何 x*K
都不能改变顺序:
ranky = function(x,y) {K = 1 + max(y)/min(diff(sort(unique(x))))等级(x*K + y, ties.method = 'min')}ranky(c(30,15,27,49,15), c(12,11,10,9,8))# [1] 4 2 3 5 1ranky(c(30,15,27,49,15), c(12,8,10,9,8))# [1] 4 1 3 5 1
版本 3
同样在基础 R 中,您可以将每个的固定宽度字符串表示粘贴在一起,然后对组合的字符向量进行排名.
rank(粘贴(格式C(x,宽度= 15,标志=0"),formatC(y, width = 15, flag = "0")),ties.method = 'min')
I need to implement a ranking algorithm for numeric vectors. I don't know if it's possible to do it using functions like rank(), order() or sort() in R, or if I should hard-code it. Either way, I could not do it.
The algorithm works as follows:
Let x = (x_1,x_2...,x_n) and y = (y_1,y_2,...y_n) be two vectors. We need to build the vector z composed of the ranked elements of x this way:
If x_i < x_j then z_i < z_j
If x_i = x_j then
z_i < z_j if y_i < y_j
z_i > z_j if y_i > y_j
z_i = z_j if y_i = y_j
If x_i is NA (missing) then
z_i > z_j if z_j is not NA
z_i = z_j if z_j is NA
For example, if x = (30,15,27,49,15) and y = (12,11,10,9,8) then z = (4,2,3,5,1)
I think I could use order(order(x,y, na.last=T))
and in fact it worked as long as the ties in x do not tie in y as well. If that's the case, then order()
will rank them in order of appearance instead of leaving them tied.
For example, if x = (30,15,27,49,15) and y = (12,8,10,9,8) then order(order(x,y, na.last=T))
will output z = (4,1,3,5,2) instead of z = (4,1,3,5,1) or another z (such as (3,1,2,4,1)) that respects step 2.
I could not escape that. How can I proceed?
tl;dr: I think version 1 is best. Versions 2 and 3 were early ideas that are not as good, but I leave them here in case they are useful to anyone.
Unfortunately rank
does not provide the ability to break ties using a second vector (a useful capability that order
and sort
do allow).
Version 1
But, library(data.table)
provides frank()
which does the job nicely.
x = c(30,15,27,49,15)
y = c(12,11,10,9,8)
frank(list(x,y), ties.method = "min")
# [1] 4 2 3 5 1
x = c(30,15,27,49,15)
y = c(12,8,10,9,8)
frank(list(x,y), ties.method = "min")
# [1] 4 1 3 5 1
Note that frank
also provides another option for ties.method = "dense"
which may be better for some uses, because it does not skip ranks (i.e. when two values are given rank 1, the next largest gets rank 2, rather than 3) - see below for an example
frank(list(x,y), ties.method = "dense")
[1] 3 1 2 4 1
Version 2
If you want to stick to base R, one simple workaround would be to rank x * K + y
, where K is any number sufficiently large that adding the largest y
to any x*K
cannot change the order:
ranky = function(x,y) {
K = 1 + max(y) / min(diff(sort(unique(x))))
rank(x*K + y, ties.method = 'min')
}
ranky(c(30,15,27,49,15), c(12,11,10,9,8) )
# [1] 4 2 3 5 1
ranky(c(30,15,27,49,15), c(12,8,10,9,8))
# [1] 4 1 3 5 1
Version 3
Also in base R, you could paste together fixed-width string representations of each and then rank the combined character vector.
rank(paste(
formatC(x, width = 15, flag = "0"),
formatC(y, width = 15, flag = "0")),
ties.method = 'min')
这篇关于如何使用第二个向量作为决胜局对向量进行排名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!