无法获得allowDiskUse:True与pymongo一起使用 [英] Can't get allowDiskUse:True to work with pymongo

查看:160
本文介绍了无法获得allowDiskUse:True与pymongo一起使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用pymongo进行mongodb聚合时遇到了aggregation result exceeds maximum document size (16MB)错误.

I'm running into the aggregation result exceeds maximum document size (16MB) error with mongodb aggregation using pymongo.

首先我可以使用limit()选项克服它.但是,在某个时候我得到了

I was able to overcome it at first using the limit() option. However, at some point I got the

Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in." error.

好,我将使用{'allowDiskUse':True}选项.当我在命令行上使用该选项时,该选项有效,但是当我尝试在python代码中使用该选项时

Ok, I'll use the {'allowDiskUse':True} option. This option works when I use it on the commandline, but when I tried to use in my python code

result = work1.aggregate(pipe, 'allowDiskUse:true')

我得到TypeError: aggregate() takes exactly 2 arguments (3 given)错误. (尽管 http://api.mongodb.org/python/current/api/pymongo/collection.html#pymongo.collection.Collection.aggregate :aggregate(pipeline,** kwargs)).

I get TypeError: aggregate() takes exactly 2 arguments (3 given) error. (that's in spite of the definition given at http://api.mongodb.org/python/current/api/pymongo/collection.html#pymongo.collection.Collection.aggregate: aggregate(pipeline, **kwargs)).

我尝试使用runCommand,或者说它是pymongo等效项:

I tried to use runCommand, or rather it's pymongo equivalent:

db.command('aggregate','work1',pipe, {'allowDiskUse':True})

但是现在我回到聚合结果超出最大文档大小(16MB)"错误

but now I'm back to the 'aggregation result exceeds maximum document size (16MB)' error

如果您需要了解

pipe = [{'$project': {'_id': 0, 'summary.trigrams': 1}}, {'$unwind': '$summary'}, {'$unwind': '$summary.trigrams'}, {'$group': {'count': {'$sum': 1}, '_id': '$summary.trigrams'}}, {'$sort': {'count': -1}}, {'$limit': 10000}]

谢谢

推荐答案

所以,按顺序:

  • aggregate是一种方法.它需要2个位置参数(分别隐式传递的selfpipeline)和任意数量的 keyword 参数(必须作为foo=bar传递-如果没有=符号,它不是关键字参数).这意味着您需要致电result = work1.aggregate(pipe, allowDiskUse=True).

  • aggregate is a method. It takes 2 positional arguments (self, which is implicitly passed, and pipeline) and any number of keyword arguments (which must be passed as foo=bar -- if there's no = sign, it's not a keyword argument). This means you need to call result = work1.aggregate(pipe, allowDiskUse=True).

您关于最大文档大小的错误是Mongo固有的. Mongo绝不能返回大于16 MB的文档(或其数组).我无法告诉您原因,因为您既没有给我们数据也没有给我们代码,但这可能意味着最终要构建的文档太大.尝试减小$limit参数,也许吗?首先将其设置为1,运行一个测试,然后增加它,然后查看进行此操作后得到的结果.

Your error about maximum document size is inherent to Mongo. Mongo can never return a document (or array thereof) larger than 16 megabytes. I can't tell you why because you have given us neither your data nor your code, but it probably means that the document you're building as an end result is too large. Try decreasing the $limit parameter, maybe? Start by setting it to 1, run a test, then increase it and look at how big the result gets when you do that.

这篇关于无法获得allowDiskUse:True与pymongo一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆