PyMongo具有生成器的批量写入操作功能 [英] PyMongo’s bulk write operation features with generators

查看:270
本文介绍了PyMongo具有生成器的批量写入操作功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用PyMongo的批量 执行写操作的写操作功能 以减少网络往返次数并增加仪式吞吐量.

I would like to use PyMongo’s bulk write operation features which executes write operations in batches in order to reduces the number of network round trips and increaseses rite throughput.

我还此处发现可以使用5000作为批号.

I also found here that it was possible to used 5000 as a batch number.

但是,我不希望批号的最佳大小是什么,以及如何在以下代码中将PyMongo的批量写入操作功能与生成器结合起来?

However, I do not want is the best size for batch number and how to combine PyMongo’s bulk write operation features with generators in the following code?

from pymongo import MongoClient
from itertools import groupby
import csv


def iter_something(rows):
    key_names = ['type', 'name', 'sub_name', 'pos', 's_type', 'x_type']
    chr_key_names = ['letter', 'no']
    for keys, group in groupby(rows, lambda row: row[:6]):
        result = dict(zip(key_names, keys))
        result['chr'] = [dict(zip(chr_key_names, row[6:])) for row in group]
        yield result


def main():
    converters = [str, str, str, int, int, int, str, int]
    with open("/home/mic/tmp/test.txt") as c:
    reader = csv.reader(c, skipinitialspace=True)
    converted = ([conv(col) for conv, col in zip(converters, row)] for row in reader)
    for object_ in iter_something(converted):
        print(object_)


if __name__ == '__main__':
    db = MongoClient().test
    sDB = db.snps 
    main()

test.txt文件:

test.txt file:

  Test, A, B01, 828288,  1,    7, C, 5
  Test, A, B01, 828288,  1,    7, T, 6
  Test, A, B01, 171878,  3,    7, C, 5
  Test, A, B01, 171878,  3,    7, T, 6
  Test, A, B01, 871963,  3,    9, A, 5
  Test, A, B01, 871963,  3,    9, G, 6
  Test, A, B01, 1932523, 1,   10, T, 4
  Test, A, B01, 1932523, 1,   10, A, 5
  Test, A, B01, 1932523, 1,   10, X, 6
  Test, A, B01, 667214,  1,   14, T, 4
  Test, A, B01, 667214,  1,   14, G, 5
  Test, A, B01, 67214,   1,   14, G, 6      

推荐答案

您可以轻松地做到:

sDB.insert(iter_something(converted))

PyMongo将做正确的事情:迭代生成器,直到生成1000个文档或16MB数据,然后在生成器将批处理插入MongoDB时暂停生成器.一旦插入了批次,PyMongo就会恢复生成器以创建下一个批次,并继续直到插入所有文档为止.然后insert()返回插入的文档ID的列表.

PyMongo will do the right thing: iterate your generator until it has yielded 1000 a documents or 16MB of data, then pause the generator while it inserts the batch into MongoDB. Once the batch is inserted PyMongo resumes your generator to create the next batch, and continues until all documents are inserted. Then insert() returns a list of inserted document ids.

此提交中,已将对生成器的初始支持添加到PyMongo中.从那时起,我们就一直为文档生成器提供支持.

Initial support for generators was added to PyMongo in this commit and we've maintained support for document generators ever since.

这篇关于PyMongo具有生成器的批量写入操作功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆