Cosmos DB聚合管道不兼容(Mongo API)? [英] Cosmos DB Aggregation Pipeline incompatible (Mongo API)?
问题描述
也许有人对我们如何解决或解决在Azure Cosmos DB中针对MongoDB的聚合管道的当前实施中的错误(例如是错误)有所了解(所以是的:我们已在实例上启用此功能).
Perhaps somebody has a good idea how we can fix or workaround something which looks like a bug in the current implementation of Aggregation Pipelines for MongoDB in Azure Cosmos DB (so yes: We have switched on the feature on our instances).
简短版本是:在我们看来,$group
阶段之后 之后的$match
聚合阶段不起作用.它永远不会返回任何结果.
Short version is: It seems to us that a $match
aggregation stage after a $group
stage does not work. It never returns any results.
可以使用任何Mongo Console通过以下方式重现此内容,前提是您已经在有效的数据库中(使用use <some db>
):
This can be reproduced in the following way using any Mongo Console, assuming you are already in a valid DB (using use <some db>
):
粘贴以下JavaScript(如果您是通过Azure门户创建集合的,则此行为可选;假定您的集合名为bug
)
Paste the following JavaScript (this line is optional if you created the collection via the Azure Portal; it assumes your collection is called bug
)
db.createCollection("bug");
将一些文档添加到集合中:
The add some documents to the collection:
db.bug.insert({ _id: 1, owner: "a", _class: "History" });
db.bug.insert({ _id: 2, owner: "a", _class: "History" });
db.bug.insert({ _id: 3, owner: "a", _class: "DocumentBookmark" });
db.bug.insert({ _id: 4, owner: "a", _class: "Recyclebin" });
db.bug.insert({ _id: 5, owner: "b", _class: "History" });
如您所见,owner: "a"
具有重复的History
记录,我们要查询该记录.
As you can see, the owner: "a"
has a duplicate History
record, which we want to query for.
现在执行以下操作:
db.bug.aggregate([
{ $match: { _class: "History"} }
]);
这将呈现正确的结果:
globaldb:PRIMARY> db.bug.aggregate([
... { $match: { _class: "History"} }
... ]);
{
"_t" : "AggregationPipelineResponse",
"ok" : 1,
"waitedMS" : NumberLong(0),
"result" : [
{
"_id" : 1,
"owner" : "a",
"_class" : "History"
},
{
"_id" : 2,
"owner" : "a",
"_class" : "History"
},
{
"_id" : 5,
"owner" : "b",
"_class" : "History"
}
]
}
现在添加带有count
的$group
阶段以查找每个所有者的记录数:
Now add a $group
stage with a count
to find the number of records per owner:
db.bug.aggregate([
{ $match: { _class: "History"} },
{ $group: { _id: "$owner", count: { $sum: 1 }}}
]);
这还会返回正确的结果:
This also returns a correct result:
globaldb:PRIMARY> db.bug.aggregate([
... { $match: { _class: "History"} },
... { $group: { _id: "$owner", count: { $sum: 1 }}}
... ]);
{
"_t" : "AggregationPipelineResponse",
"ok" : 1,
"waitedMS" : NumberLong(0),
"result" : [
{
"_id" : "a",
"count" : NumberLong(2)
},
{
"_id" : "b",
"count" : NumberLong(1)
}
]
}
现在,我们要匹配count
大于1的记录:
Now we want to match for records with a count
greater than 1:
db.bug.aggregate([
{ $match: { _class: "History"} },
{ $group: { _id: "$owner", count: { $sum: 1 }}},
{ $match: { count: { $gt: 1 }}}
]);
这将返回一个空结果集:
globaldb:PRIMARY> db.bug.aggregate([
... { $match: { _class: "History"} },
... { $group: { _id: "$owner", count: { $sum: 1 }}},
... { $match: { count: { $gt: 1 }}}
... ]);
{
"_t" : "AggregationPipelineResponse",
"ok" : 1,
"waitedMS" : NumberLong(0),
"result" : [ ]
}
在Mongo DB上也是如此
现在,要验证这些查询是否正确,我使用mongo:3.4
泊坞窗映像进行了尝试.以下代码将在本地计算机上启动一个新的Mongo数据库实例,以便您可以自己尝试一下:
The same on Mongo DB
Now, to verify these queries are actually correct, I tried it using the mongo:3.4
docker image. The following code will start a new Mongo DB instance on your local machine, so that you can try it out yourself:
$ docker run --name mongobug -d mongo:3.4
ad3010da255b7c15a464fa21ff6519799a5c16cb8af62a0ea564a95780900491
$ docker exec -it mongobug mongo
MongoDB shell version v3.4.10
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.10
Welcome to the MongoDB shell.
>
然后,我们将与上面的Cosmos相同;在Mongo Shell中,运行以下命令:
Then we'll do the same as with Cosmos above; inside the Mongo Shell, run the following commands:
db.createCollection("bug")
然后插入测试数据:
db.bug.insert({ _id: 1, owner: "a", _class: "History" });
db.bug.insert({ _id: 2, owner: "a", _class: "History" });
db.bug.insert({ _id: 3, owner: "a", _class: "DocumentBookmark" });
db.bug.insert({ _id: 4, owner: "a", _class: "Recyclebin" });
db.bug.insert({ _id: 5, owner: "b", _class: "History" });
现在您可以看到,在运行以下聚合查询时,返回空集的查询实际上返回了非空的聚合结果:
And now you can see that the query which returned an empty set in fact returns a non-empty aggregate result when running the following aggregate query:
db.bug.aggregate([
{ $match: { _class: "History"} },
{ $group: { _id: "$owner", count: { $sum: 1 }}},
{ $match: { count: { $gt: 1 }}}
]);
结果是预期的:
> db.bug.aggregate([
... { $match: { _class: "History"} },
... { $group: { _id: "$owner", count: { $sum: 1 }}},
... { $match: { count: { $gt: 1 }}}
... ]);
{ "_id" : "a", "count" : 2 }
额外的奖励阅读
我还尝试先按owner
和_class
分组,然后再分组为$match
;显然,这是迄今为止要昂贵得多的操作,因为Mongo必须对整个集合进行分组,而不仅仅是对已经过滤的项目进行分组.
Extra bonus reading
I also tried to first group by owner
and _class
in a joint group and then $match
; this is obviously by far a much more expensive operation, as Mongo has to group over the entire collection, and not only over already filtered items.
但是,不幸的是,当它在本地Mongo docker映像上运行时,这也导致了一个空结果:
But, unfortunately, this also rendered an empty result, while it worked on the local Mongo docker image:
db.bug.aggregate([
{ $group: { _id: { owner: "$owner", _class: "$_class" }, count: { $sum: 1 } } },
{ $match: { "_id._class": "History", count: { $gt: 1 } } }
]);
关于宇宙的结果:
globaldb:PRIMARY> db.bug.aggregate([
... { $group: { _id: { owner: "$owner", _class: "$_class" }, count: { $sum: 1 } } },
... { $match: { "_id._class": "History", count: { $gt: 1 } } }
... ]);
{
"_t" : "AggregationPipelineResponse",
"ok" : 1,
"waitedMS" : NumberLong(0),
"result" : [ ]
}
Mongo数据库上的结果:
Result on Mongo DB:
> db.bug.aggregate([
... { $group: { _id: { owner: "$owner", _class: "$_class" }, count: { $sum: 1 } } },
... { $match: { "_id._class": "History", count: { $gt: 1 } } }
... ]);
{ "_id" : { "owner" : "a", "_class" : "History" }, "count" : 2 }
很奇怪.
Cosmos DB是否存在不允许在$group
阶段之后运行$match
聚合的错误?
Does Cosmos DB have a bug which does not allow for running $match
aggregates after $group
stages?
推荐答案
您的观察是正确的. Cosmos DB尚不支持多个$ match阶段. $ match必须是第一阶段.在实施支持之前的短期解决方法之一(除了显而易见的是-在客户端上处理其他过滤操作)是使用$ out阶段并利用一个临时集合,在该临时集合上,您可以运行另一个带有$的聚合管道命令比赛.
Your observations are correct. Multiple $match stages are not yet supported in Cosmos DB. $match has to be the first stage. One of the short-term workarounds until the support is implemented (besides an obvious - dealing with additional filtering on the client side) is to use $out stage and utilize a temporary collection, on which you can run another aggregation pipeline command with another $match.
这篇关于Cosmos DB聚合管道不兼容(Mongo API)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!