pymongo.errors.BulkWriteError:发生批处理操作错误(MongoDB 3.4.2,pymongo 3.4.0,python 2.7.13) [英] pymongo.errors.BulkWriteError: batch op errors occurred (MongoDB 3.4.2, pymongo 3.4.0, python 2.7.13)

查看:1364
本文介绍了pymongo.errors.BulkWriteError:发生批处理操作错误(MongoDB 3.4.2,pymongo 3.4.0,python 2.7.13)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用pymongo将几亿种格式为{'id_str': , 'created_at': , 'text': }的推文从文本文件迁移到MongoDB中.为每个用户创建一个集合以存储他/她的推文.我正在使用的插入方法是insert_many().它经常碰到BulkWriteError.

I am migrating several hundred million tweets of the format {'id_str': , 'created_at': , 'text': } from text files into MongoDB using pymongo. A collection is created for each user to store his/her tweets. The insertion method I am using is insert_many(). It often runs into BulkWriteError.

Traceback (most recent call last):
  File "pipeline.py", line 105, in <module>
    timeline_db, meta_db, negative_db, log_col, dir_path)
  File "/media/haitao/Storage/twitter_pipeline/migrate_old.py", line 134, in migrate_dir
    timeline_db[user_id].insert_many(utility.temporal_sort(statuses))
  File "/home/haitao/anaconda3/envs/py27/lib/python2.7/site-packages/pymongo/collection.py", line 711, in insert_many
    blk.execute(self.write_concern.document)
  File "/home/haitao/anaconda3/envs/py27/lib/python2.7/site-packages/pymongo/bulk.py", line 493, in execute
    return self.execute_command(sock_info, generator, write_concern)
  File "/home/haitao/anaconda3/envs/py27/lib/python2.7/site-packages/pymongo/bulk.py", line 331, in execute_command
    raise BulkWriteError(full_result)
pymongo.errors.BulkWriteError: batch op errors occurred

当键重复时,似乎会发生此错误,此处不应该这样.我还可以检查其他事情来解决此问题吗?

This error seems to occur when there are duplicates of keys, which should not be the case for here. Are there other things that I can check to solve this issue?

提前谢谢!

推荐答案

抱歉.

1)我复制了错误.以下内容接近mongod.log的结尾.

1) I replicated the error. The following is close to the end of the mongod.log.

I -        [ftdc] Assertion: 13538:couldn't open [/proc/5774/stat] errno:24 Too many open
files
W FTDC     [ftdc] Uncaught exception in 'Location13538: couldn't open [/proc/5774/stat] 
errno:24 Too many open files' in full-time diagnostic data capture subsystem. Shutting down 
the full-time diagnostic data capture subsystem.
E STORAGE  [conn2] WiredTiger (24) [1491020356:127332][5774:0x7f6f30e9d700], WT_SESSION
.create: /var/lib/mongodb/: directory-sync: open: Too many open files
I COMMAND  [conn2] command timeline_db.231731006 command: insert { insert: "231731006", 
ordered: true, documents: 1000 } ninserted:0 keyUpdates:0 writeConflicts:0 exception: 24: 
Too many open files code:8 numYields:0 reslen:123 locks:{ Global: { acquireCount: { r: 2, 
w: 2 } }, Database: { acquireCount: { w: 1, W: 1 } }, Collection: { acquireCount: { w: 1, 
W: 1 } } } protocol:op_query 511ms```

2)是,仅传递了一个MongoClient()实例.

2) Yes, only one instance of MongoClient() is passed around.

3)没有运行多处理程序.

3) No multi-processing was run.

发布最初的问题后,我开始使用insert_one(),它明确引发了打开文件限制错误.我更改了数据库的设计(主要是减少了集合数),并解决了有关打开文件限制的问题.我不确定,但是日志似乎表明BulkWriteError的实际原因也是打开文件限制.

After I posted the initial question, I started to use insert_one() which explicitly raised open file limit error. I changed the design of the database (mainly, reduced the number of collections) and solved the issue about open file limit. I am not sure, but the log seems to suggest that the actual cause of the BulkWriteError is also open file limit.

这篇关于pymongo.errors.BulkWriteError:发生批处理操作错误(MongoDB 3.4.2,pymongo 3.4.0,python 2.7.13)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆