MongoDB:集合中的所有文档 [英] MongoDB: BIllions of documents in a collection

查看:275
本文介绍了MongoDB:集合中的所有文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将66亿个二元组加载到一个集合中,但是我找不到有关执行此操作的最佳方法的任何信息.

I need to load 6.6 billion bigrams into a collection but I can't find any information on the best way to do this.

将那么多文档加载到单个主键索引上将花费很多时间,但是据我所知mongo不支持等效的分区?

Loading that many documents onto a single primary key index would take forever but as far as I'm aware mongo doesn't support the equivalent of partitioning?

分片有帮助吗?我是否应该尝试将数据集拆分为多个集合,并将该逻辑构建到我的应用程序中?

Would sharding help? Should I try and split the data set over many collections and build that logic into my application?

推荐答案

很难说出什么是最佳的批量插入-这部分取决于要插入的对象的大小和其他不可估量的因素.您可以尝试一些范围,看看是什么为您带来最佳性能.或者,有些人喜欢使用mongoimport,这非常快,但是您的导入数据需要为json或csv.如果数据为BSON格式,则显然存在mongodrestore.

It's hard to say what the optimal bulk insert is -- this partly depends on the size of the objects you're inserting and other immeasurable factors. You could try a few ranges and see what gives you the best performance. As an alternative, some people like using mongoimport, which is pretty fast, but your import data needs to be json or csv. There's obviously mongodrestore, if the data is in BSON format.

Mongo可以轻松处理数十亿个文档,并且可以在一个集合中包含数十亿个文档,但是请记住,最大文档大小为16mb . MongoDB中有许多人拥有数十亿的文档,并且文档,其中涉及大量使用如果您改变主意并希望拥有多个收藏夹,则可能要阅读这些收藏夹.您拥有的集合越多,索引也就越多,这可能不是您想要的.

Mongo can easily handle billions of documents and can have billions of documents in the one collection but remember that the maximum document size is 16mb. There are many folk with billions of documents in MongoDB and there's lots of discussions about it on the MongoDB Google User Group. Here's a document on using a large number of collections that you may like to read, if you change your mind and want to have multiple collections instead. The more collections you have, the more indexes you will have also, which probably isn't what you want.

这是Craigslist的演示文稿,内容涉及向MongoDB中插入数十亿个文档,而这个人的博客文章

Here's a presentation from Craigslist on inserting billions of documents into MongoDB and the guy's blogpost.

分片确实对您来说是一个很好的解决方案,但分片通常用于在多台服务器之间进行扩展,并且很多人这样做是因为他们想扩展其写操作或无法保持其工作集(数据和索引).从一台服务器开始,然后随着数据的增长或需要额外的冗余性和弹性而移到一个分片集或副本集,这是完全合理的.

It does look like sharding would be a good solution for you but typically sharding is used for scaling across multiple servers and a lot of folk do it because they want to scale their writes or they are unable to keep their working set (data and indexes) in RAM. It is perfectly reasonable to start off with a single server and then move to a shard or replica-set as your data grows or you need extra redundancy and resilience.

但是,还有其他用户使用多个mongod绕过具有大量写入操作的单个mongod的锁定限制.这是显而易见的,但仍然值得一提,但是多mongod的设置要比单服务器更复杂.如果此处的IO或cpu没有达到极限,则您的工作集小于RAM,并且您的数据易于保持平衡(相当随机地分布),那么您应该会看到改善(在单个服务器上分片).作为FYI,存在内存和IO争用的潜力.随着2.2改进了并发性

However, there are other users use multiple mongods to get around locking limits of a single mongod with lots of writes. It's obvious but still worth saying but a multi-mongod setup is more complex to manage than a single server. If your IO or cpu isn't maxed out here, your working set is smaller than RAM and your data is easy to keep balanced (pretty randomly distributed), you should see improvement (with sharding on a single server). As a FYI, there is potential for memory and IO contention. With 2.2 having improved concurrency with db locking, I suspect that there will be much less of a reason for such a deployment.

您需要计划正确地进行分片的步骤,即,仔细考虑选择分片密钥的方法.如果您采用这种方式,则最好预先拆分并关闭平衡器.移动数据以保持平衡的结果会适得其反,这意味着您需要预先决定如何拆分数据.此外,有时在设计文档时考虑到某些字段对于分片或作为主键很有用.

You need to plan your move to sharding properly, i.e. think carefully about choosing your shard key. If you go this way then it's best to pre-split and turn off the balancer. It will be counter-productive to be moving data around to keep things balanced which means you will need to decide up front how to split it. Additionally, it is sometimes important to design your documents with the idea that some field will be useful for sharding on, or as a primary key.

这里有一些很好的链接-

Here's some good links -

  • Choosing a Shard Key
  • Blog post on shard keys
  • Overview presentation on sharding
  • Presentation on Sharding Best Practices

这篇关于MongoDB:集合中的所有文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆