MongoDB:导入大文件时,mongoimport失去连接 [英] MongoDB: mongoimport loses connection when importing big files
问题描述
将JSON文件导入到本地MongoDB实例时遇到一些麻烦. JSON是使用mongoexport
生成的,看起来像这样.没有数组,没有硬核嵌套:
I have some trouble importing a JSON file to a local MongoDB instance. The JSON was generated using mongoexport
and looks like this. No arrays, no hardcore nesting:
{"_created":{"$date":"2015-10-20T12:46:25.000Z"},"_etag":"7fab35685eea8d8097656092961d3a9cfe46ffbc","_id":{"$oid":"562637a14e0c9836e0821a5e"},"_updated":{"$date":"2015-10-20T12:46:25.000Z"},"body":"base64 encoded string","sender":"mail@mail.com","type":"answer"}
{"_created":{"$date":"2015-10-20T12:46:25.000Z"},"_etag":"7fab35685eea8d8097656092961d3a9cfe46ffbc","_id":{"$oid":"562637a14e0c9836e0821a5e"},"_updated":{"$date":"2015-10-20T12:46:25.000Z"},"body":"base64 encoded string","sender":"mail@mail.com","type":"answer"}
如果我导入一个约有300行的9MB文件,则没有问题:
If I import a 9MB file with ~300 rows, there is no problem:
[stekhn latest]$ mongoimport -d mietscraping -c mails mails-small.json
2015-11-02T10:03:11.353+0100 connected to: localhost
2015-11-02T10:03:11.372+0100 imported 240 documents
但是,如果尝试导入具有〜1300行的32MB文件,则导入将失败:
But if try to import a 32MB file with ~1300 rows, the import fails:
[stekhn latest]$ mongoimport -d mietscraping -c mails mails.json
2015-11-02T10:05:25.228+0100 connected to: localhost
2015-11-02T10:05:25.735+0100 error inserting documents: lost connection to server
2015-11-02T10:05:25.735+0100 Failed: lost connection to server
2015-11-02T10:05:25.735+0100 imported 0 documents
这是日志:
2015-11-02T11:53:04.146+0100 I NETWORK [initandlisten] connection accepted from 127.0.0.1:45237 #21 (6 connections now open)
2015-11-02T11:53:04.532+0100 I - [conn21] Assertion: 10334:BSONObj size: 23592351 (0x167FD9F) is invalid. Size must be between 0 and 16793600(16MB) First element: insert: "mails"
2015-11-02T11:53:04.536+0100 I NETWORK [conn21] AssertionException handling request, closing client connection: 10334 BSONObj size: 23592351 (0x167FD9F) is invalid. Size must be between 0 and 16793600(16MB) First element: insert: "mails"
我以前听说过 BSON文档的16MB限制,但由于JSON中没有行文件大于16MB,这应该不是问题,对吧?当我执行完全相同的操作(32MB)导入本地计算机时,一切正常.
I've heard about the 16MB limit for BSON documents before, but since no row in my JSON file is bigger than 16MB, this shouldn't be a problem, right? When I do the exact same (32MB) import one my local computer, everything works fine.
任何想法会导致这种奇怪的行为吗?
Any ideas what could cause this weird behaviour?
推荐答案
我想问题出在性能上,您可以通过以下任何方式解决此问题:
I guess the problem is about performance, any way you can solved used:
您可以使用 mongoimport选项-j .如果不适用于4,则尝试增量操作,即4,8,16,具体取决于您CPU中的内核数.
you can use mongoimport option -j. Try increment if not work with 4. i.e, 4,8,16, depend of the number of core you have in your cpu.
mongoimport-帮助
mongoimport --help
-j,--numInsertionWorkers =要运行的插入操作数 同时(默认为1)
-j, --numInsertionWorkers= number of insert operations to run concurrently (defaults to 1)
mongoimport -d mietscraping -c mails -j 4< mails.json
,也可以拆分文件并导入所有文件.
希望这对您有所帮助.
多看一点,在某些版本中是一个错误 https://jira.mongodb.org/browse/TOOLS-939 在这里,您可以更改batchSize的另一种解决方案,默认值为10000,减小该值并进行测试:
looking a little more, is a bug in some version https://jira.mongodb.org/browse/TOOLS-939 here another solution you can change the batchSize, for default is 10000, reduce the value and test:
mongoimport -d mietscraping -c邮件< mails.json --batchSize 1
这篇关于MongoDB:导入大文件时,mongoimport失去连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!