ActiveRecord find_each 结合limit和order [英] ActiveRecord find_each combined with limit and order

查看:19
本文介绍了ActiveRecord find_each 结合limit和order的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 ActiveRecord 的 find_each 方法运行大约 50,000 条记录的查询,但它似乎忽略了我的其他参数,如下所示:

I'm trying to run a query of about 50,000 records using ActiveRecord's find_each method, but it seems to be ignoring my other parameters like so:

Thing.active.order("created_at DESC").limit(50000).find_each {|t| puts t.id }

不是在我想要的 50,000 处停止并按 created_at 排序,而是在 整个 数据集上执行的结果查询:

Instead of stopping at 50,000 I'd like and sorting by created_at, here's the resulting query that gets executed over the entire dataset:

Thing Load (198.8ms)  SELECT "things".* FROM "things" WHERE "things"."active" = 't' AND ("things"."id" > 373343) ORDER BY "things"."id" ASC LIMIT 1000

有没有办法获得与 find_each 类似的行为,但有一个总最大限制并遵守我的排序标准?

Is there a way to get similar behavior to find_each but with a total max limit and respecting my sort criteria?

推荐答案

文档 说 find_each并且 find_in_batches 不保留排序顺序和限制,因为:

The documentation says that find_each and find_in_batches don't retain sort order and limit because:

  • 在 PK 上对 ASC 进行排序用于使批量排序工作.
  • Limit 用于控制批量大小.

你可以像@rorra 那样编写你自己的这个函数版本.但是在改变对象时可能会遇到麻烦.例如,如果您按 created_at 排序并保存对象,它可能会在下一批中再次出现.同样,您可能会跳过对象,因为在执行查询以获取下一批时结果的顺序已更改.仅对只读对象使用该解决方案.

You could write your own version of this function like @rorra did. But you can get into trouble when mutating the objects. If for example you sort by created_at and save the object it might come up again in one of the next batches. Similarly you might skip objects because the order of results has changed when executing the query to get the next batch. Only use that solution with read only objects.

现在我主要担心的是我不想一次将 30000 多个对象加载到内存中.我关心的不是查询本身的执行时间.因此,我使用了一个执行原始查询但只缓存 ID 的解决方案.然后它将 ID 数组划分为块并查询/创建每个块的对象.这样你就可以安全地改变对象,因为排序顺序保存在内存中.

Now my primary concern was that I didn't want to load 30000+ objects into memory at once. My concern was not the execution time of the query itself. Therefore I used a solution that executes the original query but only caches the ID's. It then divides the array of ID's into chunks and queries/creates the objects per chunk. This way you can safely mutate the objects because the sort order is kept in memory.

这是一个与我所做的类似的最小示例:

Here is a minimal example similar to what I did:

batch_size = 512
ids = Thing.order('created_at DESC').pluck(:id) # Replace .order(:created_at) with your own scope
ids.each_slice(batch_size) do |chunk|
    Thing.find(chunk, :order => "field(id, #{chunk.join(',')})").each do |thing|
      # Do things with thing
    end
end

此解决方案的权衡是:

  • 执行完整的查询以获取 ID
  • 所有 ID 的数组都保存在内存中
  • 使用 MySQL 特定的 FIELD() 函数

希望这有帮助!

这篇关于ActiveRecord find_each 结合limit和order的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆