有关如何评估排名，AP，MAP和召回以进行IR评估的一些想法和方向 [英] some ideas and direction of how to measure ranking, AP, MAP, recall for IR evaluation

查看：143 发布时间：2020/6/26 18:51:54 information-retrieval evaluation information-extraction

本文介绍了有关如何评估排名，AP，MAP和召回以进行IR评估的一些想法和方向的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对如何评估信息检索结果的好坏有疑问，例如计算

I have question about how to evaluate the information retrieve result is good or not such as calculate

相关文档的等级，召回率，精度，AP，MAP .....

the relevant document rank, recall, precision ,AP, MAP.....

当前，一旦用户输入查询，系统便能够从数据库中检索文档.问题是我不知道如何进行评估.

currently, the system is able to retrieve the document from the database once the users enter the query. The problem is I do not know how to do the evaluation.

我有一些公共数据集，例如"Cranfield集合" 数据集链接它包含

I got some public data set such as "Cranfield collection" dataset link it contains

1.文档2.查询3.相关性评估

1.document 2.query 3.relevance assesments

             DOCS   QRYS   SIZE*
Cranfield   1,400    225    1.6

我可以知道如何使用"Cranfield集合"进行评估吗? 相关文档的等级，召回率，精度，AP，MAP .....

May I know how to use do the evaluation by using "Cranfield collection" to calculate the relevant document rank, recall, precision ,AP, MAP.....

我可能需要一些想法和指导.不要求如何编写程序.

I might need some ideas and direction. not asking for how to code the program.

推荐答案

文档排名

Okapi BM25 (BM代表最佳匹配")是搜索引擎使用的一种排名功能，用于根据匹配文档与给定搜索查询的相关性对它们进行排名.它基于概率检索框架. BM25是词袋检索功能，可对一组基于文档的文件进行排名不管文档中查询词之间的相互关系(例如，它们的相对接近度)如何，每个文档中出现的查询词都是相同的.有关更多详细信息，请参见维基百科页面.

Okapi BM25 (BM stands for Best Matching) is a ranking function used by search engines to rank matching documents according to their relevance to a given search query. It is based on the probabilistic retrieval framework. BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document, regardless of the inter-relationship between the query terms within a document (e.g., their relative proximity). See the Wikipedia page for more details.

精确度和召回率

精确的度量标准是我们检索到的所有相关文档中有多少实际上是相关的?".

Precision measures "of all the documents we retrieved as relevant how many are actually relevant?".

Precision = No. of relevant documents retrieved / No. of total documents retrieved

召回措施在所有实际相关文件中，我们检索了多少相关文件?".

Recall measures "Of all the actual relevant documents how many did we retrieve as relevant?".

Recall = No. of relevant documents retrieved / No. of total relevant documents

假设，当将查询"q"提交给具有100个相关文档的信息检索系统(例如，搜索引擎)时.在查询"q"时，系统从600个文档的总集合中检索了68个文档.在检索到的68份文件中，有40份是相关的.因此，在这种情况下:

Suppose, when a query "q" is submitted to an information retrieval system (ex., search engine) having 100 relevant documents w.r.t. the query "q", the system retrieves 68 documents out of total collection of 600 documents. Out of 68 retrieved documents, 40 documents were relevant. So, in this case:

Precision = 40 / 68 = 58.8%和Recall = 40 / 100 = 40%

F分数/F度量是精度和查全率的加权谐波均值.传统的F度量或平衡F分数是:

F-Score / F-measure is the weighted harmonic mean of precision and recall. The traditional F-measure or balanced F-score is:

F-Score = 2 * Precision * Recall / Precision + Recall

平均精度

您可以这样想:在Google中键入内容，它会显示10条结果.如果所有这些都相关，那可能是最好的.如果只有一些相关，例如说五个，那么最好先显示相关的.如果前五个无关紧要而好的仅从第六个开始，那将是不好的，不是吗? AP分数反映了这一点.

You can think of it this way: you type something in Google and it shows you 10 results. It’s probably best if all of them were relevant. If only some are relevant, say five of them, then it’s much better if the relevant ones are shown first. It would be bad if first five were irrelevant and good ones only started from sixth, wouldn’t it? AP score reflects this.

下面举一个例子:

两个排名的AvgPrec:

AvgPrec of the two rankings:

排名#1:(1.0 + 0.67 + 0.75 + 0.8 + 0.83 + 0.6) / 6 = 0.78

排名2:(0.5 + 0.4 + 0.5 + 0.57 + 0.56 + 0.6) / 6 = 0.52

平均平均精度(MAP)

MAP是跨多个查询/排名的平均精度的平均值.举例说明.

MAP is mean of average precision across multiple queries/rankings. Giving an example for illustration.

两个查询的平均平均精度:

Mean average precision for the two queries:

对于查询1，AvgPrec: (1.0+0.67+0.5+0.44+0.5) / 5 = 0.62

对于查询2，AvgPrec: (0.5+0.4+0.43) / 3 = 0.44

因此，MAP = (0.62 + 0.44) / 2 = 0.53

有时，人们将precision@k，recall@k用作检索系统的性能指标.您应该为此类测试构建一个检索系统.如果要使用Java编写程序，则应考虑 Apache Lucene 建立索引.

Sometimes, people use precision@k, recall@k as performance measure of a retrieval system. You should build a retrieval system for such testings. If you want to write your program in Java, you should consider Apache Lucene to build your index.

这篇关于有关如何评估排名，AP，MAP和召回以进行IR评估的一些想法和方向的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

有关如何评估排名，AP，MAP和召回以进行IR评估的一些想法和方向 [英] some ideas and direction of how to measure ranking, AP, MAP, recall for IR evaluation

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

有关如何评估排名，AP，MAP和召回以进行IR评估的一些想法和方向 [英] some ideas and direction of how to measure ranking, AP, MAP, recall for IR evaluation

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭