将MongoDB(3.0)集合的子集保存到Python中的另一个集合 [英] Save a subset of MongoDB(3.0) collection to another collection in Python
问题描述
我找到了这个答案-答案链接
db.full_set.aggregate([ { $match: { date: "20120105" } }, { $out: "subset" } ]);
我想做同样的事情,但集合中有前15000个文档,我找不到如何将限制应用于此类查询(我尝试使用$limit : 15000
,但无法识别$ limit)
I want do same thing but with first 15000 documents in collection, I couldn't find how to apply limit to such query (I tried using $limit : 15000
, but it doesn't recognize $limit)
也在尝试时-
db.subset.insert(db.full_set.find({}).limit(15000).toArray())
对于输出类型cursor
没有功能toArray()
.
指导我如何做到?
there is no function toArray()
for output type cursor
.
Guide me how can I accomplish it?
推荐答案
嗯,
在python中,这就是工作原理-$limit
需要包装在""
,
中
并且您需要创建管道以将其作为命令执行.
Well,
in python, this is how things work - $limit
needs to be wrapped in ""
,
and you need to create a pipeline to execute it as a command.
在我的代码中-
pipeline = [{ '$limit': 15000 },{'$out': "destination_collection"}]
db.command('aggregate', "source_collection", pipeline=pipeline)
您需要将所有内容都用双引号引起来,包括源和目标集合.
在db.command
中,db是数据库的对象(即dbclient.database_name)
You need to wrap everything in double quotes, including your source and destination collection.
And in db.command
db is the object of your database (ie dbclient.database_name)
根据此答案-
至少在我看来,它的速度比forEach快约100倍.这是因为整个聚合管道都在mongod进程中运行,而基于find()和insert()的解决方案必须将所有文档从服务器发送到客户端,然后再发送回客户端.即使服务器和客户端在同一台计算机上,这也会降低性能.
It works about 100 times faster than forEach at least in my case. This is because the entire aggregation pipeline runs in the mongod process, whereas a solution based on find() and insert() has to send all of the documents from the server to the client and then back. This has a performance penalty, even if the server and client are on the same machine.
The one that really helped me figure this answer out - Reference 1
And official documentation
这篇关于将MongoDB(3.0)集合的子集保存到Python中的另一个集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!