关于(平均)平均精度的困惑 [英] Confusion about (Mean) Average Precision

查看:384
本文介绍了关于(平均)平均精度的困惑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我问的这个问题精确调用曲线的说明.

In this question I asked clarifications about the precision-recall curve.

尤其是,我问我们是否必须考虑固定数量的排名才能绘制曲线,还是我们可以合理地选择自己.根据答案,第二个是正确的.

In particular, I asked if we have to consider a fixed number of rankings to draw the curve or we can reasonably choose ourselves. According to the answer, the second one is correct.

但是,现在我对平均精度(AP)值存有很大的疑问:使用AP可以从数字上估算在给定查询的情况下我们的算法有多好.平均平均精度(MAP)是多个查询的平均精度.

However now I have a big doubt about the Average Precision (AP) value: AP is used to estimate numerically how good is our algorithm given a certain query. Mean Average Precision (MAP) is average precision on multiple queries.

我的疑问是:如果AP根据我们检索到的对象数量而变化,那么我们可以将此参数调整为我们的优势,以便我们尽可能地显示最佳AP值.例如,假设p-r曲线表现出色,直到10个元素,然后可怕的是,我们可以仅考虑前10个元素来欺骗"计算(M)AP值.

My doubt is: if AP changes according to how many objects we retrieve then we can tune this parameter to our advantage so we show the best AP value possible. For example, supposing that the p-r curve performs wonderfully until 10 elements and then horribly, we could "cheat" computing the (M)AP value considering only the first 10 elements.

我知道这听起来可能令人困惑,但是我在任何地方都找不到任何东西.

I know that this could sound confusing, but I didn't find anything about this anywhere.

推荐答案

AP是精确调用曲线下的面积,并且精确调用曲线应该在整个返回的排序列表上进行计算.

AP is the area under the precision-recall curve, and the precision-recall curve is supposed to be computed over the entire returned ranked list.

无法通过调整返回的排名列表的大小来欺骗AP. AP是精确召回曲线下方的区域,该区域绘制了精确度与召回率的函数关系,其中召回率是相对于基本事实中存在的正向总数的返回正数的数量,而不是相对于地面真理中存在的正向总数的数.返回列表.因此,如果裁剪列表,那么您所做的就是裁剪精度调用曲线,而忽略绘制其尾部.由于AP是曲线下的面积,因此剪裁列表会减小AP,因此无法调整已排序的列表大小-如果返回整个列表,则会获得最大的AP.例如,您可以在在其他问题中引用的代码-裁剪列表仅对应于

It is not possible to cheat the AP by tweaking the size of the returned ranked list. AP is the area below the precision-recall curve which plots precision as a function of recall, where recall is the number of returned positives relative to the total number of positives that exist in the ground truth, not relative to the number of positives in the returned list. So if you crop the list, all you are doing is that you are cropping the precision-recall curve and ignoring to plot its tail. As AP is the area under the curve, cropping the list reduces the AP, so there is no wisdom in tweaking the ranked list size - the maximal AP is achieved if you return the entire list. You can see this for example from the code you cited in your other question - cropping the list simply corresponds to

for ( ; i<ranked_list.size(); ++i) {

更改为

for ( ; i<some_number; ++i) {

导致ap的增量减少(所有增量均为非负值,因为old_precisionprecision为非负值,recall为非递减值),因此AP值较小.

which results in fewer increments of ap (all increments are non-negative as old_precision and precision are non-negative and recall is non-decreasing) and thus smaller AP value.

实际上,出于纯粹的计算原因,您可能希望将列表裁剪为某个合理的数字,例如10k,因为AP不太可能发生太大变化,因为precision @ large_number可能为0,除非您有非常多的正数.

In practice, for purely computational reasons, you might want to crop the list at some reasonable number, e.g. 10k, as it is unlikely that AP will change much since precision@large_number is likely to be 0 unless you have an unusually large number of positives.

您的困惑可能与某些流行函数的方式有关,例如VLFeat的vl_pr计算精确召回曲线,因为它们假定您已为它们提供了整个排名列表,因此计算了正数总数.只需查看排名列表即可查看基本事实,而不是查看基本事实本身.因此,如果天真地在裁剪列表上使用vl_pr,则确实可以作弊,但这将是无效的计算.我同意从函数的描述中不能百分百清楚,但是如果您查看文档 a>的详细信息,您会看到它提到了NUMNEGATIVESNUMPOSITIVES,因此,如果给出的排名列表不完整,则应设置这两个数量,以使函数知道如何计算精确调用曲线/AP正确.现在,如果您使用vl_pr绘制排名列表的不同作物,但所有函数调用具有相同的NUMNEGATIVES和NUMPOSITIVES,您将看到精确召回曲线只是彼此的作物,正如我上面解释的那样(I还没有检查这一点,因为我在这里没有matlab,但是我敢肯定是这种情况,如果不是,我们应该提交一个错误.)

Your confusion might be related to the way some popular function, such as VLFeat's vl_pr compute the precision-recall curves as they assume that you've provided them the entire ranked list and therefore compute the total number of positives in the ground truth by just looking at the ranked list instead of the ground truth itself. So if you used vl_pr naively on cropped lists you could indeed cheat it, but that would be an invalid computation. I agree it's not 100% clear from the description of the function, but if you examine the documentation in more detail, you'll see it mentions NUMNEGATIVES and NUMPOSITIVES, so that if you are giving an incomplete ranked list you should set these two quantities to let the function know how to compute the precision-recall curve / AP properly. Now if you plot different crops of a ranked list using vl_pr but with the same NUMNEGATIVES and NUMPOSITIVES for all function calls, you'll see that the precision-recall curves are just crops of each other, as I was explaining above (I haven't checked this yet as I don't have matlab here, but I'm certain it's the case and if it's not we should file a bug).

这篇关于关于(平均)平均精度的困惑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆