如何使用第二个向量作为决胜局对向量进行排名? [英] How to rank a vector using a second vector as a tie breaker?

查看：59 发布时间：2021/7/2 20:16:34 r sorting ranking

本文介绍了如何使用第二个向量作为决胜局对向量进行排名?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要为数值向量实现一个排名算法.我不知道是否可以使用 R 中的 rank()、order() 或 sort() 等函数来实现，或者我是否应该对其进行硬编码.无论哪种方式，我都做不到.

算法的工作原理如下:

设 x = (x_1,x_2...,x_n) 和 y = (y_1,y_2,...y_n) 是两个向量.我们需要这样构建由 x 的排序元素组成的向量 z:

如果 x_i
如果 x_i = x_j 则
- z_i
- z_i >z_j 如果 y_i >y_j
- z_i = z_j 如果 y_i = y_j
如果 x_i 是 NA(缺失)，则
- z_i >z_j 如果 z_j 不是 NA
- z_i = z_j 如果 z_j 是 NA

例如，如果 x = (30,15,27,49,15) 和 y = (12,11,10,9,8) 那么 z = (4,2,3,5,1)

我想我可以使用 order(order(x,y, na.last=T)) 并且实际上只要 x 中的关系不与 y 相关，它就可以工作.如果是这种情况，那么 order() 将按照出现的顺序对它们进行排名，而不是将它们并列.

例如，如果 x = (30,15,27,49,15) 和 y = (12,8,10,9,8) 那么 order(order(x,y, na.last=T)) 将输出 z = (4,1,3,5,2) 而不是 z = (4,1,3,5,1) 或另一个 z(例如 (3,1,2,4,1)) 尊重第 2 步.

我无法逃避.我该如何继续?

解决方案

tl;dr: 我认为版本 1 是最好的.版本 2 和 3 是早期的想法，不太好，但我把它们留在这里，以防它们对任何人有用.

<小时>不幸的是，rank 没有提供使用第二个向量来打破平局的能力(order 和 sort 做的一个有用的能力) 允许).
版本 1
但是，library(data.table) 提供的 frank() 可以很好地完成这项工作.
x = c(30,15,27,49,15)y = c(12,11,10,9,8)坦率(列表(x，y)，ties.method =min")# [1] 4 2 3 5 1x = c(30,15,27,49,15)y = c(12,8,10,9,8)坦率(列表(x，y)，ties.method =min")# [1] 4 1 3 5 1
请注意，frank 还为 ties.method = "dense" 提供了另一个选项，这对于某些用途可能更好，因为它不会跳过排名(即当两个值被赋予 1 级，下一个最大的值是 2 级，而不是 3) - 请参见下面的示例
frank(list(x,y), ties.method = "dense")[1] 3 1 2 4 1
版本 2
如果您想坚持使用基数 R，一种简单的解决方法是对 x * K + y 进行排序，其中 K 是足够大的任何数字，可以添加最大的 y> 到任何 x*K 都不能改变顺序:
ranky = function(x,y) {K = 1 + max(y)/min(diff(sort(unique(x))))等级(x*K + y, ties.method = 'min')}ranky(c(30,15,27,49,15), c(12,11,10,9,8))# [1] 4 2 3 5 1ranky(c(30,15,27,49,15), c(12,8,10,9,8))# [1] 4 1 3 5 1
版本 3
同样在基础 R 中，您可以将每个的固定宽度字符串表示粘贴在一起，然后对组合的字符向量进行排名.
rank(粘贴(格式C(x，宽度= 15，标志=0")，formatC(y, width = 15, flag = "0")),ties.method = 'min')
I need to implement a ranking algorithm for numeric vectors. I don't know if it's possible to do it using functions like rank(), order() or sort() in R, or if I should hard-code it. Either way, I could not do it.

The algorithm works as follows:

Let x = (x_1,x_2...,x_n) and y = (y_1,y_2,...y_n) be two vectors.
We need to build the vector z composed of the ranked elements of x this way:

If x_i < x_j then z_i < z_j
If x_i = x_j then                    


z_i < z_j if y_i < y_j
z_i > z_j if y_i > y_j
z_i = z_j if y_i = y_j

If x_i is NA (missing) then 


z_i > z_j if z_j is not NA
z_i = z_j if z_j is NA

For example, if x = (30,15,27,49,15) and y = (12,11,10,9,8) then z = (4,2,3,5,1)

I think I could use order(order(x,y, na.last=T)) and in fact it worked as long as the ties in x do not tie in y as well. If that's the case, then order() will rank them in order of appearance instead of leaving them tied. 

For example, if x = (30,15,27,49,15) and y = (12,8,10,9,8) then order(order(x,y, na.last=T)) will output z = (4,1,3,5,2) instead of z = (4,1,3,5,1) or another z (such as (3,1,2,4,1)) that respects step 2.

I could not escape that. How can I proceed?
 解决方案 
tl;dr: I think version 1 is best.  Versions 2 and 3 were early ideas that are not as good, but I leave them here in case they are useful to anyone.



Unfortunately rank does not provide the ability to break ties using a second vector (a useful capability that order and sort do allow). 

Version 1

But, library(data.table) provides frank() which does the job nicely.
x = c(30,15,27,49,15) 
y = c(12,11,10,9,8) 
frank(list(x,y), ties.method = "min")
# [1] 4 2 3 5 1

x = c(30,15,27,49,15) 
y = c(12,8,10,9,8)
frank(list(x,y), ties.method = "min")
# [1] 4 1 3 5 1
Note that frank also provides another option for ties.method = "dense" which may be better for some uses, because it does not skip ranks (i.e. when two values are given rank 1, the next largest gets rank 2, rather than 3) - see below for an example
frank(list(x,y), ties.method = "dense")
[1] 3 1 2 4 1


Version 2

If you want to stick to base R, one simple workaround would be to rank x * K + y, where K is any number sufficiently large that adding the largest y to any x*K cannot change the order:
ranky = function(x,y) {
  K = 1 +  max(y) / min(diff(sort(unique(x))))
  rank(x*K + y, ties.method = 'min')
}

ranky(c(30,15,27,49,15), c(12,11,10,9,8) )
# [1] 4 2 3 5 1    
ranky(c(30,15,27,49,15), c(12,8,10,9,8))
# [1] 4 1 3 5 1


Version 3

Also in base R, you could paste together fixed-width string representations of each and then rank the combined character vector.
rank(paste(
      formatC(x, width = 15, flag = "0"), 
      formatC(y, width = 15, flag = "0")), 
     ties.method = 'min')


                        
这篇关于如何使用第二个向量作为决胜局对向量进行排名?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何使用第二个向量作为决胜局对向量进行排名? [英] How to rank a vector using a second vector as a tie breaker?

问题描述

版本 1

版本 2

版本 3

Version 1

Version 2

Version 3

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用第二个向量作为决胜局对向量进行排名? [英] How to rank a vector using a second vector as a tie breaker?

问题描述

版本 1

版本 2

版本 3

Version 1

Version 2

Version 3

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭