如何使用第二个向量作为决胜局对向量进行排名? [英] How to rank a vector using a second vector as a tie breaker?

查看:59
本文介绍了如何使用第二个向量作为决胜局对向量进行排名?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要为数值向量实现一个排名算法.我不知道是否可以使用 R 中的 rank()、order() 或 sort() 等函数来实现,或者我是否应该对其进行硬编码.无论哪种方式,我都做不到.

算法的工作原理如下:

设 x = (x_1,x_2...,x_n) 和 y = (y_1,y_2,...y_n) 是两个向量.我们需要这样构建由 x 的排序元素组成的向量 z:

  1. 如果 x_i

  2. 如果 x_i = x_j 则

    • z_i
    • z_i >z_j 如果 y_i >y_j
    • z_i = z_j 如果 y_i = y_j
  3. 如果 x_i 是 NA(缺失),则

    • z_i >z_j 如果 z_j 不是 NA
    • z_i = z_j 如果 z_j 是 NA

例如,如果 x = (30,15,27,49,15) 和 y = (12,11,10,9,8) 那么 z = (4,2,3,5,1)

我想我可以使用 order(order(x,y, na.last=T)) 并且实际上只要 x 中的关系不与 y 相关,它就可以工作.如果是这种情况,那么 order() 将按照出现的顺序对它们进行排名,而不是将它们并列.

例如,如果 x = (30,15,27,49,15) 和 y = (12,8,10,9,8) 那么 order(order(x,y, na.last=T)) 将输出 z = (4,1,3,5,2) 而不是 z = (4,1,3,5,1) 或另一个 z(例如 (3,1,2,4,1)) 尊重第 2 步.

我无法逃避.我该如何继续?

解决方案

tl;dr: 我认为版本 1 是最好的.版本 2 和 3 是早期的想法,不太好,但我把它们留在这里,以防它们对任何人有用.

<小时>

不幸的是,rank 没有提供使用第二个向量来打破平局的能力(ordersort 做的一个有用的能力) 允许).

版本 1

但是,library(data.table) 提供的 frank() 可以很好地完成这项工作.

x = c(30,15,27,49,15)y = c(12,11,10,9,8)坦率(列表(x,y),ties.method =min")# [1] 4 2 3 5 1x = c(30,15,27,49,15)y = c(12,8,10,9,8)坦率(列表(x,y),ties.method =min")# [1] 4 1 3 5 1

请注意,frank 还为 ties.method = "dense" 提供了另一个选项,这对于某些用途可能更好,因为它不会跳过排名(即当两个值被赋予 1 级,下一个最大的值是 2 级,而不是 3) - 请参见下面的示例

frank(list(x,y), ties.method = "dense")[1] 3 1 2 4 1

版本 2

如果您想坚持使用基数 R,一种简单的解决方法是对 x * K + y 进行排序,其中 K 是足够大的任何数字,可以添加最大的 y> 到任何 x*K 都不能改变顺序:

ranky = function(x,y) {K = 1 + max(y)/min(diff(sort(unique(x))))等级(x*K + y, ties.method = 'min')}ranky(c(30,15,27,49,15), c(12,11,10,9,8))# [1] 4 2 3 5 1ranky(c(30,15,27,49,15), c(12,8,10,9,8))# [1] 4 1 3 5 1

版本 3

同样在基础 R 中,您可以将每个的固定宽度字符串表示粘贴在一起,然后对组合的字符向量进行排名.

rank(粘贴(格式C(x,宽度= 15,标志=0"),formatC(y, width = 15, flag = "0")),ties.method = 'min')

I need to implement a ranking algorithm for numeric vectors. I don't know if it's possible to do it using functions like rank(), order() or sort() in R, or if I should hard-code it. Either way, I could not do it.

The algorithm works as follows:

Let x = (x_1,x_2...,x_n) and y = (y_1,y_2,...y_n) be two vectors. We need to build the vector z composed of the ranked elements of x this way:

  1. If x_i < x_j then z_i < z_j

  2. If x_i = x_j then

    • z_i < z_j if y_i < y_j
    • z_i > z_j if y_i > y_j
    • z_i = z_j if y_i = y_j
  3. If x_i is NA (missing) then

    • z_i > z_j if z_j is not NA
    • z_i = z_j if z_j is NA

For example, if x = (30,15,27,49,15) and y = (12,11,10,9,8) then z = (4,2,3,5,1)

I think I could use order(order(x,y, na.last=T)) and in fact it worked as long as the ties in x do not tie in y as well. If that's the case, then order() will rank them in order of appearance instead of leaving them tied.

For example, if x = (30,15,27,49,15) and y = (12,8,10,9,8) then order(order(x,y, na.last=T)) will output z = (4,1,3,5,2) instead of z = (4,1,3,5,1) or another z (such as (3,1,2,4,1)) that respects step 2.

I could not escape that. How can I proceed?

解决方案

tl;dr: I think version 1 is best. Versions 2 and 3 were early ideas that are not as good, but I leave them here in case they are useful to anyone.


Unfortunately rank does not provide the ability to break ties using a second vector (a useful capability that order and sort do allow).

Version 1

But, library(data.table) provides frank() which does the job nicely.

x = c(30,15,27,49,15) 
y = c(12,11,10,9,8) 
frank(list(x,y), ties.method = "min")
# [1] 4 2 3 5 1

x = c(30,15,27,49,15) 
y = c(12,8,10,9,8)
frank(list(x,y), ties.method = "min")
# [1] 4 1 3 5 1

Note that frank also provides another option for ties.method = "dense" which may be better for some uses, because it does not skip ranks (i.e. when two values are given rank 1, the next largest gets rank 2, rather than 3) - see below for an example

frank(list(x,y), ties.method = "dense")
[1] 3 1 2 4 1

Version 2

If you want to stick to base R, one simple workaround would be to rank x * K + y, where K is any number sufficiently large that adding the largest y to any x*K cannot change the order:

ranky = function(x,y) {
  K = 1 +  max(y) / min(diff(sort(unique(x))))
  rank(x*K + y, ties.method = 'min')
}

ranky(c(30,15,27,49,15), c(12,11,10,9,8) )
# [1] 4 2 3 5 1    
ranky(c(30,15,27,49,15), c(12,8,10,9,8))
# [1] 4 1 3 5 1

Version 3

Also in base R, you could paste together fixed-width string representations of each and then rank the combined character vector.

rank(paste(
      formatC(x, width = 15, flag = "0"), 
      formatC(y, width = 15, flag = "0")), 
     ties.method = 'min')

这篇关于如何使用第二个向量作为决胜局对向量进行排名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆