如何使用众包排序对一百万张图像进行排名 [英] How to rank a million images with a crowdsourced sort

查看:27
本文介绍了如何使用众包排序对一百万张图像进行排名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过制作一个游戏来对一组风景图片进行排名,网站访问者可以通过该游戏对它们进行评分,以找出人们认为最吸引人的图片.

I'd like to rank a collection of landscape images by making a game whereby site visitors can rate them, in order to find out which images people find the most appealing.

这样做的好方法是什么?

What would be a good method of doing that?

  • 流行或不流行的风格?IE.显示单个图像,要求用户从 1-10 对其进行排名.在我看来,这让我可以平均得分,我只需要确保我在所有图像中获得均匀的投票分布.实施起来相当简单.
  • 选择 A 或 B?IE.显示两张图片,请用户选择更好的一张.这很吸引人,因为没有数字排名,这只是一个比较.但我将如何实施它?我的第一个想法是将其作为快速排序来进行,比较操作由人类提供,一旦完成,只需无限重复排序即可.
  • Hot-or-Not style? I.e. show a single image, ask the user to rank it from 1-10. As I see it, this allows me to average the scores, and I would just need to ensure that I get an even distribution of votes across all the images. Fairly simple to implement.
  • Pick A-or-B? I.e. show two images, ask user to pick the better one. This is appealing as there is no numerical ranking, it's just a comparison. But how would I implement it? My first thought was to do it as a quicksort, with the comparison operations being provided by humans, and once completed, simply repeat the sort ad-infinitum.

会怎么做?

如果你需要数字,我说的是在一个每天有 20,000 次访问量的网站上有 100 万张图片.我想有一小部分人可能会玩这个游戏,为了争论,假设我每天可以生成 2,000 次人工排序操作!这是一个非盈利网站,好奇的人会通过我的个人资料找到它:)

推荐答案

正如其他人所说,排名 1-10 效果不佳,因为人的级别不同.

As others have said, ranking 1-10 does not work that well because people have different levels.

Pick A-or-B 方法的问题在于它不能保证系统是可传递的(A 可以击败 B,但 B 击败 C,C 击败 A).具有非传递性比较运算符会破坏排序算法.使用快速排序,对于这个例子,没有被选为枢轴的字母将被错误地排列.

The problem with the Pick A-or-B method is that its not guaranteed for the system to be transitive (A can beat B, but B beats C, and C beats A). Having nontransitive comparison operators breaks sorting algorithms. With quicksort, against this example, the letters not chosen as the pivot will be incorrectly ranked against each other.

在任何给定时间,您都希望获得所有图片的绝对排名(即使其中一些/所有图片并列).您还希望自己的排名不改变除非有人投票.

At any given time, you want an absolute ranking of all the pictures (even if some/all of them are tied). You also want your ranking not to change unless someone votes.

我会使用选择 A 或 B(或并列) 方法,但确定类似于 Elo 评分系统,用于 2 人游戏(最初是国际象棋)的排名:

I would use the Pick A-or-B (or tie) method, but determine ranking similar to the Elo ratings system which is used for rankings in 2 player games (originally chess):

Elo 玩家评分系统比较球员的比赛记录与对手的比赛记录相比并确定概率赢得比赛的球员.这个概率因子决定了多少玩家评分上升或根据每个结果向下比赛.当玩家击败评分较高的对手,玩家的评分比 if 上升更多他或她击败了一名玩家较低的评分(因为玩家应该击败比自己低的对手评分).

The Elo player-rating system compares players’ match records against their opponents’ match records and determines the probability of the player winning the matchup. This probability factor determines how many points a players’ rating goes up or down based on the results of each match. When a player defeats an opponent with a higher rating, the player’s rating goes up more than if he or she defeated a player with a lower rating (since players should defeat opponents who have lower ratings).

Elo 系统:

  1. 所有新玩家的基本评分都是 1600
  2. WinProbability = 1/(10^((对手当前评分-玩家当前评分)/400) + 1)
  3. ScoringPt = 1 分,如果他们赢了比赛,0 如果他们输了,0.5 平局.
  4. 玩家的新评分 = 玩家的旧评分 +(K 值 *(ScoringPt-玩家获胜概率))

用图片替换玩家",您就有了一种基于公式调整两张图片评级的简单方法.然后,您可以使用这些数字分数进行排名.(这里的 K 值是锦标赛的级别".小型本地锦标赛是 8-16,大型邀请赛/区域锦标赛是 24-32.您可以使用常数,例如 20).

Replace "players" with pictures and you have a simple way of adjusting both pictures' rating based on a formula. You can then perform a ranking using those numeric scores. (K-Value here is the "Level" of the tournament. It's 8-16 for small local tournaments and 24-32 for larger invitationals/regionals. You can just use a constant like 20).

使用这种方法,您只需为每张图片保留一个数字,这比将每张图片的单独排名保持在其他图片之间的内存密集程度要低得多.

With this method, you only need to keep one number for each picture which is a lot less memory intensive than keeping the individual ranks of each picture to each other picture.

根据评论添加了更多肉.

Added a little more meat based on comments.

这篇关于如何使用众包排序对一百万张图像进行排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆