排名之间的距离 [英] Distances between rankings

查看:75
本文介绍了排名之间的距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两种方法对字符串列表进行不同的排名,我们可以将其视为列表的正确"排名(即黄金标准).

I have two methods that rank a list of strings differently, and what we can consider to be the "right" ranking of the list (i.e. a gold standard).

换句话说:

 ranked_list_of_strings_1 = method_1(list_of_strings)
 ranked_list_of_strings_2 = method_2(list_of_strings)    
 correctly_ranked_list_of_strings # Some permutation of list_of_strings

考虑到method_1method_2是黑匣子,如何确定哪种方法更好?在SciPyscikit-learn或类似的库中,是否有任何方法可以测量此值?

How can I determine which method is better considering that method_1 and method_2 are black boxes? Are there any methods to measure this available either in SciPy or scikit-learn or similar libraries?

在我的特定情况下,我实际上有一个数据框,每种方法都输出一个分数.重要的不是方法与真实分数之间的分数差异,而是方法获得了排名权利(分数越高意味着所有列的排名越高).

In my specific case, I actually have a dataframe, and each method outputs a score. What matters is not the difference in score between the methods and the true scores, but that the methods get the ranking right (higher score means higher ranking for all columns).

      strings        scores_method_1   scores_method_2   true_scores
5714  aeSeOg                    0.54               0.1           0.8
5741  NQXACs                    0.15               0.3           0.4
5768  zsFZQi                    0.57               0.7           0.2

推荐答案

您正在寻找归一化的折扣累积收益( NDGC ).这是搜索引擎排名中常用的一项指标,用于测试结果排名的质量.

You're looking for Normalized Discounted Cumulative Gain (NDGC). It's a metric commonly used in search engine rankings to test the quality of the result ranking.

这个想法是,您可以通过点击(在您投放真实排名时)与用户反馈进行对比来测试您的排名(在您的情况下为两种方法). NDGC会告诉您相对于真实情况的排名质量.

The idea is that you test your ranking (in your case the two methods) against user feedback through clicks (in your cast the true rank). NDGC will tell you the quality of your ranking relative to the truth.

Python具有基于 RankEval 的模块,该模块可实现该指标(如果需要,还可以包含其他指标)想尝试一下). 仓库在这里,并且有一个很好的

Python has RankEval based module that implements this metric (and some others if you want to try them). The repo is here and there is a nice IPython NB with examples

这篇关于排名之间的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆