MongoDB大索引构建非常慢 [英] MongoDB large index build very slow

查看:1509
本文介绍了MongoDB大索引构建非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个收藏有4亿个文档.每个字段都有6个DateTime,1个布尔值,8个Double,9个Integer和6个String字段.我正在尝试建立以下索引:

I have a collection with 400 million documents. Each has 6 DateTime, 1 Boolean, 8 Double, 9 Integer, and 6 String fields. I am trying to build the following index:

db.MyCollection.ensureIndex( 
    { "String1" : 1, "String2" : 1, "String3" : 1, "DateTime1" : 1, "Integer1" : 1, "DateTime2" : 1 }, 
    {background: true} 
);

运行5天后,只完成了一半.

After running for 5 days it is only half done.

服务器正在运行Windows Server Enterprise,并具有4TB磁盘空间和256GB RAM.针对数据库运行的其他进程很少.没有分片或其他特殊配置.

The server is running Windows Server Enterprise and has 4TB disk space and 256GB RAM. Very few other processes are running against the database. No sharding or other special configuration.

有什么办法可以加快速度吗? (在不删除background = true限定词的情况下,因为我不希望它完全将我拒之门外,而在这种情况下会这样做.)

Is there any way to speed this up? (Without dropping the background = true qualifier, because I don't want it to completely shut me out of the database, which it does in that case.)

推荐答案

误解

速度

即使不谈论多键索引,也是如此.正在进行大量表扫描.因此,mongoDB遍历文档,尝试找到要建立索引的字段,评估该字段(如果当前文档中不存在该字段,则为null),并将其发现内容写入不少于6个文件,因为我们所说的是6索引.算一下:200.000.000/86400 * 5告诉我们,mongoDB这样做大约每秒460个文档,或者每个文档仅需要 2.2毫秒.我不会这么慢.可能要花很长时间,但并不慢.

Misconceptions

Speed

Even when not talking of a multi key index, here is what happens. There is a massive table scan going on. So mongoDB iterates over the documents, tries to find the field to be indexed, evaluates that field (to null if it does not exist in the current document) and writes it's findings to no less than 6 files as we are talking of 6 indices. Doing the math: 200.000.000 / 86400 * 5 tells us that mongoDB does this for roughly 460 documents per second or only needs 2.2 milliseconds per document. I would not call that slow. It may take long, but it is not slow.

使用此参数不会将您锁定在数据库之外.恰恰相反,这两个文档均在索引创​​建部分,以及关于创建的教程部分后台的索引.但是,有一个句子很容易被误解:

Using this parameter does not lock you out of the databases. Quite the contrary, which is clearly stated in the docs, both on the Index Creation section and in the tutorial section on creating indices in the background. However, there is a sentence which can easily be misinterpreted:

此外,在前景索引构建期间,不会发生需要对所有数据库(例如listDatabases)进行读或写锁定的操作.

Also, no operation that requires a read or write lock on all databases (e.g. listDatabases) can occur during a foreground index build.

这意味着您不能执行适用于所有数据库的操作,而则需要读或写锁定.

What that means is that you can not do operations which apply to all databases and require a read or write lock.

使用具有副本集分片的共享群集.它易于设置,除具有改进的性能外,还具有其他优点.其中之一是轻松的可伸缩性,添加分片(从而为集群增加空间和计算能力)非常容易.备份对应用程序的影响较小.再也没有单点故障了(如果做得正确,这甚至适用于整个数据中心规模的中断).

Use a shared cluster with replica set shards. It is easy to set up and has multiple advantages besides improved performance. One of them is easy scalability adding a shard (and thus adding space and computing power to a cluster) is very easy. Backups have less impact on the application. There is not single point of failure any more (when done right, this even applies to outages at the scale of a whole datacenter).

对不起,在Windows Server上运行与磁盘io性能相关的应用程序对我来说没有任何意义-完全没有. ExtFS4或XFS的速度比NTFS或ReFS快25%到40%,具体取决于优化.这使与磁盘IO相关的应用程序(如您的用例)产生了真实差异.我们谈论的只是几天的事情(甚至没有考虑到更有效的内存映射和Linux系统上操作系统的减少的内存消耗).

Sorry, running a disk io performance dependent application on a Windows Server does not make sense to me - at all. ExtFS4 or XFS are between 25% and 40% faster than NTFS or ReFS, depending on the optimization. This makes a real difference on applications which are as disk IO dependent like your use case. We are talking of a matter of days (not even taking into account the more efficient memory mapping and the reduced memory consumption of the OS on Linux systems).

虽然这并不能真正提高性能(出于显而易见的原因,实际上在后台建立索引要比在前台花费更长的时间),但是在建立索引的过程中您的应用程序仍然可用.因此,根据您的需求,这可能是一个可行的选择.

While this does not really improve performance (actually building indices in the background take longer than in foreground for obvious reasons), your application stays available during the time during which the index is build. So depending on your needs, this may be a viable option.

旁注:它是 Bad Idea™,在使用mongoDB时可以垂直缩放,因为它被明确设计为水平缩放.这尤其适用于像您这样的大型集合,因为并行处理将大大提高应用程序的性能.

Side note: It is a Bad Idea™, to scale vertically when using mongoDB since it was explicitly designed to be scaled horizontally. This especially applies for large collections like yours as parallel processing would greatly improve the performance of your application.

这篇关于MongoDB大索引构建非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆