Cosmos DB聚合管道不兼容(Mongo API)? [英] Cosmos DB Aggregation Pipeline incompatible (Mongo API)?

查看:72
本文介绍了Cosmos DB聚合管道不兼容(Mongo API)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

也许有人对我们如何解决或解决在Azure Cosmos DB中针对MongoDB的聚合管道的当前实施中的错误(例如是错误)有所了解(所以是的:我们已在实例上启用此功能).

Perhaps somebody has a good idea how we can fix or workaround something which looks like a bug in the current implementation of Aggregation Pipelines for MongoDB in Azure Cosmos DB (so yes: We have switched on the feature on our instances).

简短版本是:在我们看来,$group阶段之后 之后的$match聚合阶段不起作用.它永远不会返回任何结果.

Short version is: It seems to us that a $match aggregation stage after a $group stage does not work. It never returns any results.

可以使用任何Mongo Console通过以下方式重现此内容,前提是您已经在有效的数据库中(使用use <some db>):

This can be reproduced in the following way using any Mongo Console, assuming you are already in a valid DB (using use <some db>):

粘贴以下JavaScript(如果您是通过Azure门户创建集合的,则此行为可选;假定您的集合名为bug)

Paste the following JavaScript (this line is optional if you created the collection via the Azure Portal; it assumes your collection is called bug)

db.createCollection("bug");

将一些文档添加到集合中:

The add some documents to the collection:

db.bug.insert({ _id: 1, owner: "a", _class: "History" });
db.bug.insert({ _id: 2, owner: "a", _class: "History" });
db.bug.insert({ _id: 3, owner: "a", _class: "DocumentBookmark" });
db.bug.insert({ _id: 4, owner: "a", _class: "Recyclebin" });
db.bug.insert({ _id: 5, owner: "b", _class: "History" });

如您所见,owner: "a"具有重复的History记录,我们要查询该记录.

As you can see, the owner: "a" has a duplicate History record, which we want to query for.

现在执行以下操作:

db.bug.aggregate([
    { $match: { _class: "History"} }
]);

这将呈现正确的结果:

globaldb:PRIMARY> db.bug.aggregate([
...     { $match: { _class: "History"} }
... ]);
{
    "_t" : "AggregationPipelineResponse",
    "ok" : 1,
    "waitedMS" : NumberLong(0),
    "result" : [
        {
            "_id" : 1,
            "owner" : "a",
            "_class" : "History"
        },
        {
            "_id" : 2,
            "owner" : "a",
            "_class" : "History"
        },
        {
            "_id" : 5,
            "owner" : "b",
            "_class" : "History"
        }
    ]
}

现在添加带有count$group阶段以查找每个所有者的记录数:

Now add a $group stage with a count to find the number of records per owner:

db.bug.aggregate([
    { $match: { _class: "History"} },
    { $group: { _id: "$owner", count: { $sum: 1 }}}
]);

这还会返回正确的结果:

This also returns a correct result:

globaldb:PRIMARY> db.bug.aggregate([
...     { $match: { _class: "History"} },
...     { $group: { _id: "$owner", count: { $sum: 1 }}}
... ]);
{
    "_t" : "AggregationPipelineResponse",
    "ok" : 1,
    "waitedMS" : NumberLong(0),
    "result" : [
        {
            "_id" : "a",
            "count" : NumberLong(2)
        },
        {
            "_id" : "b",
            "count" : NumberLong(1)
        }
    ]
}

现在,我们要匹配count大于1的记录:

Now we want to match for records with a count greater than 1:

db.bug.aggregate([
    { $match: { _class: "History"} },
    { $group: { _id: "$owner", count: { $sum: 1 }}},
    { $match: { count: { $gt: 1 }}}
]);

这将返回一个结果集:

globaldb:PRIMARY> db.bug.aggregate([
...     { $match: { _class: "History"} },
...     { $group: { _id: "$owner", count: { $sum: 1 }}},
...     { $match: { count: { $gt: 1 }}}
... ]);
{
    "_t" : "AggregationPipelineResponse",
    "ok" : 1,
    "waitedMS" : NumberLong(0),
    "result" : [ ]
}

在Mongo DB上也是如此

现在,要验证这些查询是否正确,我使用mongo:3.4泊坞窗映像进行了尝试.以下代码将在本地计算机上启动一个新的Mongo数据库实例,以便您可以自己尝试一下:

The same on Mongo DB

Now, to verify these queries are actually correct, I tried it using the mongo:3.4 docker image. The following code will start a new Mongo DB instance on your local machine, so that you can try it out yourself:

$ docker run --name mongobug -d mongo:3.4
ad3010da255b7c15a464fa21ff6519799a5c16cb8af62a0ea564a95780900491
$ docker exec -it mongobug mongo
MongoDB shell version v3.4.10
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.10
Welcome to the MongoDB shell.
>

然后,我们将与上面的Cosmos相同;在Mongo Shell中,运行以下命令:

Then we'll do the same as with Cosmos above; inside the Mongo Shell, run the following commands:

db.createCollection("bug")

然后插入测试数据:

db.bug.insert({ _id: 1, owner: "a", _class: "History" });
db.bug.insert({ _id: 2, owner: "a", _class: "History" });
db.bug.insert({ _id: 3, owner: "a", _class: "DocumentBookmark" });
db.bug.insert({ _id: 4, owner: "a", _class: "Recyclebin" });
db.bug.insert({ _id: 5, owner: "b", _class: "History" });

现在您可以看到,在运行以下聚合查询时,返回空集的查询实际上返回了非空的聚合结果:

And now you can see that the query which returned an empty set in fact returns a non-empty aggregate result when running the following aggregate query:

db.bug.aggregate([
    { $match: { _class: "History"} },
    { $group: { _id: "$owner", count: { $sum: 1 }}},
    { $match: { count: { $gt: 1 }}}
]);

结果是预期的:

> db.bug.aggregate([
...     { $match: { _class: "History"} },
...     { $group: { _id: "$owner", count: { $sum: 1 }}},
...     { $match: { count: { $gt: 1 }}}
... ]);
{ "_id" : "a", "count" : 2 }

额外的奖励阅读

我还尝试先按owner_class分组,然后再分组为$match;显然,这是迄今为止要昂贵得多的操作,因为Mongo必须对整个集合进行分组,而不仅仅是对已经过滤的项目进行分组.

Extra bonus reading

I also tried to first group by owner and _class in a joint group and then $match; this is obviously by far a much more expensive operation, as Mongo has to group over the entire collection, and not only over already filtered items.

但是,不幸的是,当它在本地Mongo docker映像上运行时,这也导致了一个空结果:

But, unfortunately, this also rendered an empty result, while it worked on the local Mongo docker image:

db.bug.aggregate([
    { $group: { _id: { owner: "$owner", _class: "$_class" }, count: { $sum: 1 } } },
    { $match: { "_id._class": "History", count: { $gt: 1 } } }
]);

关于宇宙的结果:

globaldb:PRIMARY> db.bug.aggregate([
...     { $group: { _id: { owner: "$owner", _class: "$_class" }, count: { $sum: 1 } } },
...     { $match: { "_id._class": "History", count: { $gt: 1 } } }
... ]);
{
    "_t" : "AggregationPipelineResponse",
    "ok" : 1,
    "waitedMS" : NumberLong(0),
    "result" : [ ]
}

Mongo数据库上的结果:

Result on Mongo DB:

> db.bug.aggregate([
...     { $group: { _id: { owner: "$owner", _class: "$_class" }, count: { $sum: 1 } } },
...     { $match: { "_id._class": "History", count: { $gt: 1 } } }
... ]);
{ "_id" : { "owner" : "a", "_class" : "History" }, "count" : 2 }

很奇怪.

Cosmos DB是否存在不允许在$group阶段之后运行$match聚合的错误?

Does Cosmos DB have a bug which does not allow for running $match aggregates after $group stages?

推荐答案

您的观察是正确的. Cosmos DB尚不支持多个$ match阶段. $ match必须是第一阶段.在实施支持之前的短期解决方法之一(除了显而易见的是-在客户端上处理其他过滤操作)是使用$ out阶段并利用一个临时集合,在该临时集合上,您可以运行另一个带有$的聚合管道命令比赛.

Your observations are correct. Multiple $match stages are not yet supported in Cosmos DB. $match has to be the first stage. One of the short-term workarounds until the support is implemented (besides an obvious - dealing with additional filtering on the client side) is to use $out stage and utilize a temporary collection, on which you can run another aggregation pipeline command with another $match.

这篇关于Cosmos DB聚合管道不兼容(Mongo API)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆