与MongoDB的相关性排序 [英] Sorting by relevance with MongoDB
问题描述
我有以下格式的文件集:
I have a collection of documents in the following form:
{ _id: ObjectId(...)
, title: "foo"
, tags: ["bar", "baz", "qux"]
}
查询应找到带有任何这些标签的所有文档.我目前使用此查询:
The query should find all documents with any of these tags. I currently use this query:
{ "tags": { "$in": ["bar", "hello"] } }
它有效;返回所有标记为"bar"或"hello"的文档.
And it works; all documents tagged "bar" or "hello" are returned.
但是,我想按相关性排序,即,匹配的标签越多,文档在结果中出现的时间就越早.例如,对于查询["bar", "hello"]
,标记为["bar", "hello", "baz"]
的文档的结果应高于标记为["bar", "baz", "boo"]
的文档.我该如何实现?
However, I want to sort by relevance, i.e. the more matching tags the earlier the document should occur in the result. For example, a document tagged ["bar", "hello", "baz"]
should be higher in the results than a document tagged ["bar", "baz", "boo"]
for the query ["bar", "hello"]
. How can I achieve this?
推荐答案
MapReduce并在客户端执行该操作太慢-您应该使用聚合框架(MongoDB 2.2中的新增功能).
MapReduce and doing it client-side is going to be too slow - you should use the aggregation framework (new in MongoDB 2.2).
它可能看起来像这样:
db.collection.aggregate([
{ $match : { "tags": { "$in": ["bar", "hello"] } } },
{ $unwind : "$tags" },
{ $match : { "tags": { "$in": ["bar", "hello"] } } },
{ $group : { _id: "$title", numRelTags: { $sum:1 } } },
{ $sort : { numRelTags : -1 } }
// optionally
, { $limit : 10 }
])
请注意,第一个和第三个管道成员看起来相同,这是有意且必要的.步骤如下:
Note the first and third pipeline members look identical, this is intentional and needed. Here is what the steps do:
- 仅传递其中带有标签"bar"或"hello"的文档.
- 展开标签数组(意思是将每个标签元素拆分为一个文档
- 仅传递恰好是"bar"或"hello"的标签(即丢弃其余标签)
- 按标题分组(也可以按"$ _id"或原始文档的任何其他组合) 加起来有多少标签("bar"和"hello")
- 按相关标签数降序排列
- (可选)将返回的值限制为前10个.
- pass on only documents which have tag "bar" or "hello" in them.
- unwind the tags array (meaning split into one document per tags element
- pass on only tags exactly "bar" or "hello" (i.e. discard the rest of the tags)
- group by title (it could be also by "$_id" or any other combination of original document adding up how many tags (of "bar" and "hello") it had
- sort in descending order by number of relevant tags
- (optionally) limit the returned set to top 10.
这篇关于与MongoDB的相关性排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!