是否有可能提高 Mongoexport 的速度? [英] Is it possible to improve Mongoexport speed?

查看：133 发布时间：2021/6/3 19:42:06 mongodb performance

本文介绍了是否有可能提高 Mongoexport 的速度?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 130M 行的 MongoDB 3.6.2.0 集合.它有几个简单的字段和 2 个带有嵌套 JSON 文档的字段.数据以压缩格式(zlib)存储.

I have a 130M rows MongoDB 3.6.2.0 collection. It has several simple fields and 2 fields with nested JSON documents. Data is stored in compressed format (zlib).

我需要尽快将嵌入的字段之一导出为 JSON 格式.但是，mongoexport 需要永远.运行 12 小时后，它只处理了 5.5% 的数据，这对我来说太慢了.

I need to export one of embedded fields into JSON format as soon as possible. However, mongoexport is taking forever. After 12 hours of running it has processed only 5.5% of data, which is too slow for me.

CPU 不忙.Mongoexport 似乎是单线程的.

The CPU is not busy. Mongoexport seems to be single-threaded.

我正在使用的导出命令:

Export command I am using:

mongoexport -c places --fields API \
    --uri mongodb://user:pass@hostIP:hostPort/maps?authSource=admin \
    -o D:\APIRecords.json

它实际上是 getMore 命令，它在引擎盖下非常慢:

It's actually getMore command which is unreasonably slow under the hood:

2018-05-02T17:59:35.605-0700 I COMMAND  [conn289] command maps.places command: getMore { getMore: 14338659261, collection: "places", $db: "maps" } originatingCommand: { find: "places", filter: {}, sort: {}, projection: { _id: 1, API: 1 }, skip: 0, snapshot: true, $readPreference: { mode: "secondaryPreferred" }, $db: "maps" } planSummary: COLLSCAN cursorid:14338659261 keysExamined:0 docsExamined:5369 numYields:1337 nreturned:5369 reslen:16773797 locks:{ Global: { acquireCount: { r: 2676 } }, Database: { acquireCount: { r: 1338 } }, Collection: { acquireCount: { r: 1338 } } } protocol:op_query 22796ms

我尝试在这样的单独进程中使用 --SKIP 和 --LIMIT 选项运行多个命令

I have tried running multiple commands with --SKIP and --LIMIT options in separate processes like this

mongoexport -c places --SKIP 10000000 --LIMIT 10000000 --fields API \
    --uri mongodb://user:pass@hostIP:hostPort/maps?authSource=admin \
    -o D:\APIRecords.json
mongoexport -c places --SKIP 20000000 --LIMIT 10000000 --fields API \
    --uri mongodb://user:pass@hostIP:hostPort/maps?authSource=admin \
    -o D:\APIRecords.json

等等.但是我无法等到第一个非零 SKIP 的命令甚至开始！

etc. But I was not able to finish waiting till the command with first non-zero SKIP even starts!

我也尝试过 --forceTableScan 选项，但没有任何区别.

I have also tried with --forceTableScan option, which did not make any difference.

我在 Places 表上没有索引.

I have no indexes on places table.

我的存储配置:

journal.enabled: false
wiredTiger.collectionConfig.blockCompressor: zlib

收藏统计:

'ns': 'maps.places',
'size': 2360965435671,
'count': 130084054,
'avgObjSize': 18149,
'storageSize': 585095348224.0

我的服务器规格:

Windows Server 2012 R2 x64
10Gb RAM 4TB HDD 6 cores Xeon 2.2Ghz

我进行了一项测试，使用 SSD 时，它的读取吞吐量与使用 HDD 时一样糟糕.

I've run a test and with SSD it's having the same terrible read throughput as with HDD.

我的问题:

为什么阅读这么慢?有没有其他人遇到过同样的问题?你能给我一些关于如何加速数据转储的提示吗?

Why is reading so slow? Has anyone else experienced the same issue? Can you give me any hints on how to speed up data dumping?

我将数据库移至快速 NVME SSD 驱动器，我想现在我可以更清楚地表达我对 MongoDB 读取性能的担忧.

I moved the DB to fast NVME SSD drives and I think now I can state my concerns about MongoDB read performance in a more clear way.

为什么这个命令会寻找没有特定字段的文档块:

Why does this command, which seeks to find a chunk of documents not having specific field:

2018-05-05T07:20:46.215+0000 I COMMAND  [conn704] command maps.places command: find { find: "places", filter: { HTML: { $exists: false }, API.url: { $exists: true } }, skip: 9990, limit: 1600, lsid: { id: UUID("ddb8b02c-6481-45b9-9f84-cbafa586dbbf") }, $readPreference: { mode: "secondaryPreferred" }, $db: "maps" } planSummary: COLLSCAN cursorid:15881327065 keysExamined:0 docsExamined:482851 numYields:10857 nreturned:101 reslen:322532 locks:{ Global: { acquireCount: { r: 21716 } }, Database: { acquireCount: { r: 10858 } }, Collection: { acquireCount: { r: 10858 } } } protocol:op_query 177040ms

在快速闪存驱动器上仅产生 50Mb/秒的读取压力?这显然是单线程随机(分散)读取的性能.而我刚刚证明该驱动器可以轻松实现 1Gb/秒的读/写吞吐量.

only yields 50Mb/sec read pressure onto a fast flash drive? This is clearly performance of a single-threaded random (scattered) read. Whereas I have just proven that the drive allows 1Gb/sec read/write throughput easily.

就 Mongo 内部而言，按顺序读取 BSON 文件并获得 20 倍的扫描速度提升不是更明智吗?(而且，由于我的块是 zlib 压缩的，并且服务器有 16 个内核，最好在一个或多个帮助线程中解码获取的块?)而不是一个接一个地迭代 BSON 文档.

In terms of Mongo internals, would it not be wiser to read BSON file in a sequential order and gain 20x scanning speed improvement? (And, since my blocks are zlib compressed, and server has 16 cores, better to decode fetched chunks in one or several helper threads?) Instead of iterating BSON document after document.

我也可以确认，即使我没有指定任何查询过滤器，并且显然想要迭代整个集合，也不会发生 BSON 文件的快速顺序读取.

I also can confirm, even when I'm not specifying any query filters, and clearly want to iterate ENTIRE collection, fast sequential read of the BSON file is not happening.

是否有可能提高 Mongoexport 的速度? [英] Is it possible to improve Mongoexport speed?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

是否有可能提高 Mongoexport 的速度? [英] Is it possible to improve Mongoexport speed?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭