MongoDB的不同与正则表达式查询的数组字段? [英] Mongodb distinct on a array field with regex query?
问题描述
基本上,我正在尝试在模型上实现标签功能.
Basically i'm trying to implement tags functionality on a model.
> db.event.distinct("tags")
[ "bar", "foo", "foobar" ]
进行简单的不同查询将检索所有不同的标签.但是,我将如何获取与某个查询匹配的所有不同标签?比如说我想获取所有匹配foo
的标签,然后期望得到["foo","foobar"]
的结果?
Doing a simple distinct query retrieves me all distinct tags. However how would i go about getting all distinct tags that match a certain query? Say for example i wanted to get all tags matching foo
and then expecting to get ["foo","foobar"]
as a result?
以下查询是我为实现此目的而失败的尝试:
The following queries is my failed attempts of achieving this:
> db.event.distinct("tags",/foo/)
[ "bar", "foo", "foobar" ]
> db.event.distinct("tags",{tags: {$regex: 'foo'}})
[ "bar", "foo", "foobar" ]
推荐答案
聚合框架,而不是.distinct()
命令:
db.event.aggregate([
// De-normalize the array content to separate documents
{ "$unwind": "$tags" },
// Filter the de-normalized content to remove non-matches
{ "$match": { "tags": /foo/ } },
// Group the "like" terms as the "key"
{ "$group": {
"_id": "$tags"
}}
])
您最好在正则表达式的开头使用锚",这是指从字符串的开始"开始.在处理 $match
同样是"http://docs.mongodb.org/manual/reference/operator/aggregation/unwind/" rel ="noreferrer"> $unwind
:
You are probably better of using an "anchor" to the beginning of the regex is you mean from the "start" of the string. And also doing this $match
before you process $unwind
as well:
db.event.aggregate([
// Match the possible documents. Always the best approach
{ "$match": { "tags": /^foo/ } },
// De-normalize the array content to separate documents
{ "$unwind": "$tags" },
// Now "filter" the content to actual matches
{ "$match": { "tags": /^foo/ } },
// Group the "like" terms as the "key"
{ "$group": {
"_id": "$tags"
}}
])
这确保您不会在 $unwind
上进行处理集合中的每个文档,只有那些可能包含匹配的标签"值的文档,然后才能进行过滤"以确认.
That makes sure you are not processing $unwind
on every document in the collection and only those that possibly contain your "matched tags" value before you "filter" to make sure.
使用可能的匹配"来缓解大型阵列的真正复杂"方法需要花费更多的工作,而MongoDB 2.6或更高版本:
The really "complex" way to somewhat mitigate large arrays with possible matches takes a bit more work, and MongoDB 2.6 or greater:
db.event.aggregate([
{ "$match": { "tags": /^foo/ } },
{ "$project": {
"tags": { "$setDifference": [
{ "$map": {
"input": "$tags",
"as": "el",
"in": { "$cond": [
{ "$eq": [
{ "$substr": [ "$$el", 0, 3 ] },
"foo"
]},
"$$el",
false
]}
}},
[false]
]}
}},
{ "$unwind": "$tags" },
{ "$group": { "_id": "$tags" }}
])
所以 $map
是一个不错的数组在线"处理器,但只能走这么远. $setDifference
运算符将否定false
匹配,但是最终您仍然需要处理$unwind
来完成其余的$group
阶段,以总体上获得不同的值.
So $map
is a nice "in-line" processor of arrays but it can only go so far. The $setDifference
operator negates the false
matches, but ultimately you still need to process $unwind
to do the remaining $group
stage for distinct values overall.
这里的优点是现在将数组简化"为仅匹配的"tags"元素.当同一文档中存在多个不同"值时,如果您希望计数"出现次数,请不要使用此选项.但同样,还有其他方法可以解决这个问题.
The advantage here is that arrays are now "reduced" to only the "tags" element that matches. Just don't use this when you want a "count" of the occurrences when there are "multiple distinct" values in the same document. But again, there are other ways to handle that.
这篇关于MongoDB的不同与正则表达式查询的数组字段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!