我应该如何订购这些“有帮助”分数？ [英] How should I order these "helpful" scores?

查看：223 发布时间：2016/12/21 10:37:28 math statistics comments voting user-generated-content

本文介绍了我应该如何订购这些“有帮助”分数？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在我的网站上的用户生成的帖子下，我有一个类似Amazon的评分系统：

Under the user generated posts on my site, I have an Amazon-like rating system:

   Was this review helpful to you: Yes | No

如果有投票，我会显示上面的结果，如下：

If there are votes, I display the results above that line like so:

   5 of 8 people found this reply helpful.

我想根据这些排名对帖子进行排序。

I would like to sort the posts based upon these rankings. If you were ranking from most helpful to least helpful, how would you order the following posts?

   a) 1/1 = 100% helpful
   b) 2/2 = 100% helpful
   c) 999/1000 = 99.9% helpful
   b) 3/4 = 75% helpful
   e) 299/400 = 74.8% helpful

很明显，它不能根据有用的百分比排序，。有这样做的标准方法吗？

Clearly, its not right to sort just on the percent helpful, somehow the total votes should be factored in. Is there a standard way of doing this?

UPDATE：

Charles的公式计算Agresti-Coull的下限并对其排序，这是上面的例子如何排序：

Using Charles' formulas to calculate the Agresti-Coull lower range and sorting on it, this is how the above examples would sort:

   1) 999/1000 (99.9%) = 95% likely to fall in 'helpfulness' range of 99.2% to 100%
   2) 299/400 (74.8%) = 95% likely to fall in 'helpfulness' range of 69.6% to 79.3%
   3) 3/4 (75%) = 95% likely to fall in 'helpfulness' range of 24.7% to 97.5%
   4) 2/2 (100%) = 95% likely to fall in 'helpfulness' range of 23.7% to 100%
   5) 1/1 (100%) = 95% likely to fall in 'helpfulness' range of 13.3% to 100%

直觉上，这种感觉是正确的。

Intuitively, this feels right.

：

从应用程序的角度来看，我不想在每次上传帖子列表时都运行这些计算。我想我将更新和存储Agresti-Coull下限在常规，cron驱动的日程表（仅更新那些从上次运行后已经收到投票的帖子）或每当接收新的投票时更新它。

From an application point of view, I don't want to be running these calculations every time I pull up a list of posts. I'm thinking I'll either update and store the Agresti-Coull lower bound either on a regular, cron-driven schedule (updating only those posts which have received a vote since the last run) or update it whenever a new vote is received.

推荐答案

对于每个帖子，生成有效期望的有效期。我更喜欢使用Agresti-Coull区间。 Pseudocode：

For each post, generate bounds on how helpful you expect it to be. I prefer to use the Agresti-Coull interval. Pseudocode:

float AgrestiCoullLower(int n, int k) {
  //float conf = 0.05;  // 95% confidence interval
  float kappa = 2.24140273; // In general, kappa = ierfc(conf/2)*sqrt(2)
  float kest=k+kappa^2/2;
  float nest=n+kappa^2;
  float pest=kest/nest;
  float radius=kappa*sqrt(pest*(1-pest)/nest);
  return max(0,pest-radius); // Lower bound
  // Upper bound is min(1,pest+radius)
}

然后取得估计的低端，并对此进行排序。因此2/2是（由Agresti-Coull）95％可能落在23.7％到100％的有用性范围内，因此它排在低于999/1000，其范围为99.2％到100％（因为.237< ; .992）。

Then take the lower end of the estimate and sort on this. So the 2/2 is (by Agresti-Coull) 95% likely to fall in the 'helpfulness' range 23.7% to 100%, so it sorts below the 999/1000 which has range 99.2% to 100% (since .237 < .992).

编辑：由于有些人似乎发现这有帮助（哈哈），让我注意到算法可以调整基于如何有信心/风险厌恶你想成为。你需要的信心越少，你就越愿意放弃对未经测试但得分高的评论的经证明的（高投票）评论。 90％置信区间给出kappa = 1.95996398，85％置信区间给出1.78046434，75％置信区间给出1.53412054，并且所有警告 - 风50％置信区间给出1.15034938。

Since some people seem to have found this helpful (ha ha), let me note that the algorithm can be tweaked based on how confident/risk-averse you want to be. The less confidence you need, the more willing you will be to abandon the 'proven' (high-vote) reviews for the untested but high-scoring reviews. A 90% confidence interval gives kappa = 1.95996398, an 85% confidence interval gives 1.78046434, a 75% confidence interval gives 1.53412054, and the all-caution-to-the-wind 50% confidence interval gives 1.15034938.

50％置信区间为

1) 999/1000 (99.7%) = 50% likely to fall in 'helpfulness' range of 99.7% to 100%
2) 299/400 (72.2%) = 50% likely to fall in 'helpfulness' range of 72.2% to 77.2%
3) 2/2 (54.9%) = 50% likely to fall in 'helpfulness' range of 54.9% to 100%
4) 3/4 (45.7%) = 50% likely to fall in 'helpfulness' range of 45.7% to 91.9%
5) 1/1 (37.5%) = 50% likely to fall in 'helpfulness' range of 37.5% to 100%

这不是整体的不同，但它更喜欢2/2到3/4的安全。

which isn't that different overall, but it does prefer the 2/2 to the safety of the 3/4.

这篇关于我应该如何订购这些“有帮助”分数？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我应该如何订购这些“有帮助”分数？ [英] How should I order these "helpful" scores?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

我应该如何订购这些“有帮助”分数？ [英] How should I order these &quot;helpful&quot; scores?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

我应该如何订购这些“有帮助”分数？ [英] How should I order these "helpful" scores?

登录关闭