ActiveRecord的find_each结合限制和订单 [英] ActiveRecord find_each combined with limit and order

查看:106
本文介绍了ActiveRecord的find_each结合限制和订单的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图运行约50000使用的ActiveRecord的 find_each 方法的记录的查询,但它似乎被忽略了,像这样我的其他参数:

I'm trying to run a query of about 50,000 records using ActiveRecord's find_each method, but it seems to be ignoring my other parameters like so:

Thing.active.order("created_at DESC").limit(50000).find_each {|t| puts t.id }

而不是停止50,000我想和 created_at 排序,这里是得到过的全部的数据集执行的查询结果的:

Instead of stopping at 50,000 I'd like and sorting by created_at, here's the resulting query that gets executed over the entire dataset:

Thing Load (198.8ms)  SELECT "things".* FROM "things" WHERE "things"."active" = 't' AND ("things"."id" > 373343) ORDER BY "things"."id" ASC LIMIT 1000

有没有办法让类似的行为 find_each ,但总的最大限制,并尊重我的排序标准?

Is there a way to get similar behavior to find_each but with a total max limit and respecting my sort criteria?

推荐答案

说,该文档find_each而find_in_batches不保留排序顺序和限制,因为:

The documentation says that find_each and find_in_batches don't retain sort order and limit because:

  • PK上排序的ASC是用来制造批次顺序工作。
  • 限位用于控制所述批量大小。

您可以编写自己的版本,这个功能就像@rorra一样。但是你可以变异的对象时陷入困境。例如,如果你通过created_at排序并保存对象时,它可能会再次拿出在接下来的批次中的一个。同样,你可能会在执行查询时获得下一批跳过对象,因为结果的顺序发生了变化。仅使用该解决方案具有只读的对象。

You could write your own version of this function like @rorra did. But you can get into trouble when mutating the objects. If for example you sort by created_at and save the object it might come up again in one of the next batches. Similarly you might skip objects because the order of results has changed when executing the query to get the next batch. Only use that solution with read only objects.

现在我主要关心的是,我不希望加载30000+对象到内存中一次。我关注的不是查询本身的执行时间。因此,我使用的执行原来的查询一个解决方案,但只缓存ID的。然后,它把ID的成块和查询的阵列/创建每个块的对象。这样,您就可以放心地发生变异的对象,因为排序顺序保存在内存中。

Now my primary concern was that I didn't want to load 30000+ objects into memory at once. My concern was not the execution time of the query itself. Therefore I used a solution that executes the original query but only caches the ID's. It then divides the array of ID's into chunks and queries/creates the objects per chunk. This way you can safely mutate the objects because the sort order is kept in memory.

下面是相似,我做了一个小例子:

Here is a minimal example similar to what I did:

batch_size = 512
ids = Thing.order('created_at DESC').pluck(:id) # Replace .order(:created_at) with your own scope
ids.each_slice(batch_size) do |chunk|
    Thing.find(chunk, :order => "field(id, #{chunk.join(',')})").each do |thing|
      # Do things with thing
    end
end

权衡这个解决方案是:

The trade-offs to this solution are:

  • 完整执行查询来获取ID的
  • 所有的ID的数组保存在内存中
  • 使用MySQL的特定领域()函数

希望这有助于!

这篇关于ActiveRecord的find_each结合限制和订单的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆