mongodb 2.6+中的批量操作可以用作缓冲区/队列吗? [英] Can bulk opreration in mongodb 2.6+ be used as a buffer/queue?

查看:79
本文介绍了mongodb 2.6+中的批量操作可以用作缓冲区/队列吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

MongoDB从2.6版开始引入了 Bulk() ,对我来说似乎很棒.

MongoDB introduced Bulk() since version 2.6, I checked the APIs, it's seems great to me.

在使用此API之前,如果需要批量插入,则必须将文档存储在列表中,它们使用insert()插入整个列表.在多线程环境中,还应考虑并发性.

Before this API, if I need to do a bulk insert, I have to store documents in a List, them use insert() to insert the whole List. In a multi-thread environment, concurrency should also be considered.

  1. 在批量API内是否实现了队列/缓冲区?每次我 在execute()之前放入一些东西,数据存储在int中 他排队/缓冲,对吗?
  2. 因此,我不需要编写自己的队列/缓冲区,只需使用Bulk.insert()Bulk.find().update(),对吗?
  3. 有人可以告诉我更多有关队列的信息.我仍然需要关注并发问题吗?
  4. 因为像db.collection.initializeUnorderedBulkOp()这样创建了一个Bulk,所以如果不释放Bulk实例,它将保持与MongoDB服务器的连接,对吗?
  1. Is there a queue/buffer implemented inside the bulk API? each time I put something into the bulk before execute(), the data is stored int he queue/buffer, is that right?
  2. Thus, I don't need to write my own queue/buffer, just use Bulk.insert() or Bulk.find().update(), is that right?
  3. Could someone tell me more about the queue. Do I still need to concern the concurrency issues?
  4. Since a Bulk is created like db.collection.initializeUnorderedBulkOp(), so if a bulk instance is not released, it will stay connected to the MongoDB server, is that right?

推荐答案

从您是否需要存储自己的列表?"这一基本思想出发,不是真的,但是我想这一切都取决于您在做什么.

From the basic idea of "do you need to store your own list?", then not really, but I suppose it all really depends on what you are doing.

有关批量操作API 最好的查看方法是每种类型的命令表操作.因此,相关的手册部分是此处.

For a basic idea of the internals of what is happening under the Bulk Operations API the best place to look is at the individual command forms for each type of operation. So the relevant manual section is here.

因此您可以将批量" 界面视为列出或添加到其中的所有操作的集合.并且,您可以根据需要(在一定的内存和实际限制条件下)添加尽可能多的内容,并认为此队列"的排水"方法是

So you can think of the "Bulk" interface as being a list or collection of all of the operations that you add to it. And you can pretty much add to that as much as you wish to ( within certain memory and practical constraints ) and consider that the "drain" method for this "queue" is the .execute() method.

如该文档中所述,无论您排队"多少次操作,实际上最多一次只能一次以1000个操作为一组发送到服务器.要记住的另一件事是,没有任何治理可确保这1000个操作请求实际上符合16MB BSON限制.因此,对于MongoDB来说,这仍然是一个硬性限制,并且在发送到服务器时,您一次只能有效地形成一个请求",其总大小小于该数据大小的限制.

As noted in the documentation there, regardless of how many operations you "queue" this will only actually send to the server in groups of 1000 operations at a time at maximum. The other thing to keep in mind is that there is no governance that makes sure that these 1000 operations requests actually fit under the 16MB BSON limit. So that is still a hard limit with MongoDB and you can only effectively form one "request" at a time that totals in less than that data limit in size when sending to the server.

因此,一般来讲,每1000个或更少的条目向服务器提出自己的执行/排出"请求通常更为实用.里程可能会有所不同,但是在此有一些注意事项.

So generally speaking, it is often more practical to make your own "execute/drain" requests to the sever once per every 1000 or less entries or so. Mileage may vary on this but there are some considerations to make here.

对于有序"或无序"操作请求,在前一种情况下,如果在发送的批处理中生成错误,则所有排队的操作将中止.当然,所有操作发生在遇到错误后 之后.

With respect to either "Ordered" or "UnOrdered" operations requests, in the former case all queued operations will be aborted in the event of an error being generated in the batch sent. Meaning of course all operations occuring after the error is encountered.

在后面的情况下,对于无序"操作,没有报告致命错误,而是在 WriteResult 除了获得未排序"的含义外,还将获得遇到的任何错误的列表",这意味着操作不一定按任何特定顺序应用",这意味着您在应用该操作之前,不能依赖"正在处理的队列"中其他内容的操作.

In the later case for "UnOrdered" operations, there is not fatal error reported, but rather in the WriteResult that is returned you get a "list" of any errors that are encountered, in addition to the "UnOrdered" meaning that the operations are not necessarily "applied" in any particular order, which means you cannot "queue" operations that rely on something else in the "queue" being processed before that operation is applied.

因此,您担心要获得多大的WriteResult,以及实际上如何处理应用程序中的响应.如前所述,里程数的重要性可能会有所不同,因为对于较小且可管理的响应,响应非常大.

So there is the concern of how large a WriteResult you are going to get and indeed how you handle that response in your application. As stated earlier, mileage may vary to the importance of this being a very large response to a smaller and manageable response.

就并发而言,这里确实需要考虑一件事.即使您在一次调用中向服务器发送了许多指令,而没有等待单独的转移和确认,它实际上仍然一次只能处理一条指令.这些可以按照Initialize方法的说明排序,也可以选择为无序",然后操作就可以像在服务器上那样并行"运行,直到批处理成功为止.筋疲力尽.

As far and concurrency is concerned there is really one thing to consider here. Even though you are sending many instructions to the sever in a single call and not waiting for individual transfers and acknowledgements, it is still only really processing one instruction at a time. These are either ordered as implied by the initialize method, or "un-ordered" where that is chosen and of course the operations can then run in "parallel" as it were on the server until the batch is drained.

但是在批处理"完成之前没有锁定",因此它不能替代事务",因此请不要以该错误为设计要点.同样的MongoDB规则也适用,但是这样做的好处是一个写入服务器"和一个响应返回",而不是每个操作一个.

But there is no "lock" until the "batch" completes, so it is not a substitute for a "transaction", so don't make that mistake as a design point. The same MongoDB rules apply, but the benefit here is "one write to server" and "one response back", rather that one for each operation.

最后,关于API是否在此处保留了一些服务器连接",答案是不存在.正如查看命令内部结构的最初要点所指出的那样,此队列"构建纯粹是仅客户端". 直到调用.execute()方法均与服务器没有任何通信.这是设计使然",实际上只有一半,因为主要是我们不想在每次添加操作时都将数据发送到服务器.一次完成.

Finally, as to whether there is some "server connection" held here by the API, then the answer is not there is not. As pointed to by the initial points of looking at the command internals, this "queue" building is purely "client side only". There is no communication with the server in any way until the .execute() method is called. This is "by design" and actually half the point, as mainly we don't want to be sending data to the server each time you add an operation. It is done all at once.

因此,批量操作"是客户端队列".所有内容都存储在客户端中,直到.execute()拖入"队列并将所有操作一次发送到服务器.然后,服务器会给出一个响应,其中包含所发送操作的所有结果,您可以根据自己的意愿进行处理.

So "Bulk Operations" are a "client side queue". Everything is stored within the client side until the .execute() "drains" the queue and sends the operations to the server all at once. A response is then given from the server containing all of the results from the operations sent that you can handle however you wish.

此外,一旦调用.execute(),就可以将不再操作排队"到批量对象,并且.execute()都不能再次调用.根据实现的不同,您可以进一步检查批量"对象和结果.但是通常情况是,您需要发送更多的批量"操作,然后像大多数队列系统一样重新初始化并重新开始.

Also, once .execute() is called, no more operations can be "queued" to the bulk object, and neither can .execute() be called again. Depending on implementation, you can have some further examination of the "Bulk" object and results. But the general case is where you need to send more "bulk" operations, you re-initialize and start again, just as you would with most queue systems.

总结:

  1. 是的.该对象有效地排队"了操作.
  2. 您不需要自己的列表.这些方法本身就是列表构建器"
  3. 操作按顺序是有序的"或无序的",但是所有操作均由服务器按照常规MongoDB规则分别处理.没有交易.
  4. 初始化"命令不直接与服务器通信,也不自己保持连接".与服务器真正交谈"的唯一方法是.execute()
  1. Yes. The object effectively "queues" operations.
  2. You don't need your own lists. The methods are "list builders" in themselves
  3. Operations are either "Ordered" or "Un-Ordered" as far as sequence, but all operations are individually processed by the server as per normal MongoDB rules. No transactions.
  4. The "initialize" commands do not talk to the server directly and do not "hold connections" in themselves. The only method that actually "talks" to the server is .execute()

所以它是一个非常好的工具.您可以从旧版命令实现中获得更好的写入操作.但是不要期望它会提供MongoDB基本无法提供的功能.

So it is a really good tool. You get much better write operations that you do from legacy command implementations. But do not expect that this offers functionality outside of what MongoDB basically does.

这篇关于mongodb 2.6+中的批量操作可以用作缓冲区/队列吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆