与MongoDB的相关性排序 [英] Sorting by relevance with MongoDB

查看:83
本文介绍了与MongoDB的相关性排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下格式的文件集:

I have a collection of documents in the following form:

{ _id: ObjectId(...)
, title: "foo"
, tags: ["bar", "baz", "qux"] 
}

查询应找到带有任何这些标签的所有文档.我目前使用此查询:

The query should find all documents with any of these tags. I currently use this query:

{ "tags": { "$in": ["bar", "hello"] } }

它有效;返回所有标记为"bar"或"hello"的文档.

And it works; all documents tagged "bar" or "hello" are returned.

但是,我想按相关性排序,即,匹配的标签越多,文档在结果中出现的时间就越早.例如,对于查询["bar", "hello"],标记为["bar", "hello", "baz"]的文档的结果应高于标记为["bar", "baz", "boo"]的文档.我该如何实现?

However, I want to sort by relevance, i.e. the more matching tags the earlier the document should occur in the result. For example, a document tagged ["bar", "hello", "baz"] should be higher in the results than a document tagged ["bar", "baz", "boo"] for the query ["bar", "hello"]. How can I achieve this?

推荐答案

MapReduce并在客户端执行该操作太慢-您应该使用聚合框架(MongoDB 2.2中的新增功能).

MapReduce and doing it client-side is going to be too slow - you should use the aggregation framework (new in MongoDB 2.2).

它可能看起来像这样:

db.collection.aggregate([
   { $match : { "tags": { "$in": ["bar", "hello"] } } },
   { $unwind : "$tags" },
   { $match : { "tags": { "$in": ["bar", "hello"] } } },
   { $group : { _id: "$title", numRelTags: { $sum:1 } } },
   { $sort : { numRelTags : -1 } }
   //  optionally
   , { $limit : 10 }
])

请注意,第一个和第三个管道成员看起来相同,这是有意且必要的.步骤如下:

Note the first and third pipeline members look identical, this is intentional and needed. Here is what the steps do:

  1. 仅传递其中带有标签"bar"或"hello"的文档.
  2. 展开标签数组(意思是将每个标签元素拆分为一个文档
  3. 仅传递恰好是"bar"或"hello"的标签(即丢弃其余标签)
  4. 按标题分组(也可以按"$ _id"或原始文档的任何其他组合) 加起来有多少标签("bar"和"hello")
  5. 按相关标签数降序排列
  6. (可选)将返回的值限制为前10个.
  1. pass on only documents which have tag "bar" or "hello" in them.
  2. unwind the tags array (meaning split into one document per tags element
  3. pass on only tags exactly "bar" or "hello" (i.e. discard the rest of the tags)
  4. group by title (it could be also by "$_id" or any other combination of original document adding up how many tags (of "bar" and "hello") it had
  5. sort in descending order by number of relevant tags
  6. (optionally) limit the returned set to top 10.

这篇关于与MongoDB的相关性排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆