威尔逊得分的未评级实体与否定评级实体-如何处理? [英] unrated versus negative-rated entities with Wilson score -- how to handle?

查看:115
本文介绍了威尔逊得分的未评级实体与否定评级实体-如何处理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

已阅读如何不按平均评分进行排序我认为应该尝试一下.

CREATE FUNCTION `mydb`.`LowerBoundWilson95` (pos FLOAT, neg FLOAT)
RETURNS FLOAT DETERMINISTIC
RETURN
IF(
    pos + neg <= 0,
    0,
    (
        (pos + 1.9208) / (pos + neg)
        -
        1.96 * SQRT(
            (pos * neg) / (pos + neg) + 0.9604
        )
        / (pos + neg)
    )
    /
    (
        1 + 3.8416
        / (pos + neg)
    )
);

通过一些测试,我发现具有pos=0neg>0的对象得分非常小,但非负值,而具有pos=neg=0的对象得分为零,排名更低.

我认为,未评级的对象应列在没有正面评级但有些负面的对象之上.

我认为各个等级实际上都是偏离某个基准的表达,因此我将移动基准,为每个对象赋予一个中立"的初始评分,"因此我想到了: /p>

CREATE FUNCTION `mydb`.`AdjustedRating` (pos FLOAT, neg FLOAT)
RETURNS FLOAT DETERMINISTIC
RETURN
(
    SELECT `mydb`.`LowerBoundWilson95` (pos+4, neg+4)
);

以下是AdjustedRating

的一些示例输出

  \  pos  0       1       2
neg
 0   | 0.215 | 0.188 | 0.168
 1   | 0.266 | 0.235 | 0.212
 2   | 0.312 | 0.280 | 0.235

这与我想要的分数更接近,作为数字技巧,我认为它是可行的,但我不能从数学上证明其合理性

有没有更好的方法,正确"的方法?

解决方案

之所以出现此问题,是因为这种近似(较低的置信度范围)实际上是用于标识列表中评分最高的项目.如果您对排名最低的广告感兴趣,则可以取较高的可信度范围.

或者,我们使用贝叶斯统计量,这正是您描述的第二种方法的形式化形式.埃文·米勒(Evan Miller)实际上有一个跟进帖子,他在信中说:

我先前提出的解决方案-使用均值周围的置信区间的下限-被计算机程序员称为黑客.它之所以起作用,不是因为它是一个普遍最佳的解决方案,而是因为它大致对应于我们希望在最佳评级列表顶部看到的内容的直觉感:在给定数据的情况下,出现不良概率最小的项目

贝叶斯统计数据使我们可以直观地了解这种直觉...

使用贝叶斯排名方法,任何具有零数据的点都将退回到先前的均值(您称为初始得分),然后在收集数据时远离该均值.这也是IMDB用来计算其顶级电影列表的方法. https://math. stackexchange.com/questions/169032/了解如何在自己的网站上使用Imdb加权评分功能

您建议的将每个对象记为4个赞成票和4个反对票的特定方法等效于以0.5的平均值和8票的权重.考虑到没有其他数据,这是一个合理的开始.拉普拉斯(Laplace)在日出问题中著名地指出,应将事件归因于1成功和1失败.在商品排名问题中,我们拥有更多的知识,因此将先验均值设置为等于平均排名是有意义的.设置该先验平均值的权重(或将其作为数据函数的快慢,也称为先验方差)可能难以设置.

对于IMDB的前250名电影的排名,他们使用平均电影排名7.1,权重为25000票,这相当于将所有电影视为以7.1的25000个免费"票开始. /p>

Having read How Not To Sort By Average Rating I thought I should give it a try.

CREATE FUNCTION `mydb`.`LowerBoundWilson95` (pos FLOAT, neg FLOAT)
RETURNS FLOAT DETERMINISTIC
RETURN
IF(
    pos + neg <= 0,
    0,
    (
        (pos + 1.9208) / (pos + neg)
        -
        1.96 * SQRT(
            (pos * neg) / (pos + neg) + 0.9604
        )
        / (pos + neg)
    )
    /
    (
        1 + 3.8416
        / (pos + neg)
    )
);

Running some tests, I discover that objects with pos=0 and neg>0 have very small, but non-negative scores, whereas an object with pos=neg=0 has a score of zero, ranking lower.

I am of the opinion that an unrated object should be listed above one which has no positive ratings but some negatives.

I reasoned that "the individual ratings are all really expressions of deviation from some baseline, so I'll move the baseline, I'll give every object a 'neutral' initial score," so I came up with this:

CREATE FUNCTION `mydb`.`AdjustedRating` (pos FLOAT, neg FLOAT)
RETURNS FLOAT DETERMINISTIC
RETURN
(
    SELECT `mydb`.`LowerBoundWilson95` (pos+4, neg+4)
);

Here are some sample outputs for AdjustedRating

  \  pos  0       1       2
neg
 0   | 0.215 | 0.188 | 0.168
 1   | 0.266 | 0.235 | 0.212
 2   | 0.312 | 0.280 | 0.235

This is closer to the sort of scores I want and as a numerical hack I guess it's workable, but I can't mathematically justify it

Is there a better way, a "right" way?

解决方案

The problem arises because this approximation (lower confidence bound) is really meant for identifying the highest rated items of a list. If you were interested in the lowest ranked, you could take the upper confidence bound instead.

Alternatively, we use Bayesian statistics which is the formalization of exactly the second method you describe. Evan Miller actually had a followup post to this in which he said:

The solution I proposed previously — using the lower bound of a confidence interval around the mean — is what computer programmers call a hack. It works not because it is a universally optimal solution, but because it roughly corresponds to our intuitive sense of what we'd like to see at the top of a best-rated list: items with the smallest probability of being bad, given the data.

Bayesian statistics lets us formalize this intuition...

Using the Bayesian ranking approach, any point that has zero data would fall back to the prior mean (what you refer to as the initial score) and then move away from it as it collects data. This is also the approach used at IMDB to compute their top Movies lists. https://math.stackexchange.com/questions/169032/understanding-the-imdb-weighted-rating-function-for-usage-on-my-own-website

The specific method you suggest of crediting each object 4 upvotes and 4 downvotes is equivalent to putting a mean of 0.5 with a weight of 8 votes. Given an absence of any other data, this is a reasonable start. Laplace famously argued in the sunrise problem that events should be credited with 1 success and 1 failure. In the item ranking problem, we have a lot more knowledge, so it makes sense to set the prior mean equal to the average ranking. The weight of this prior mean (or how fast you move off it as a function of data, also called the prior variance) can be challenging to set.

For IMDB's ranking of the Top 250 Movies, they use a mean movie ranking of 7.1 with a weight of 25000 votes, which is equivalent to treating all movies as if they started with 25000 "free" votes with a rating of 7.1

这篇关于威尔逊得分的未评级实体与否定评级实体-如何处理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆