@BatchSize 是聪明还是愚蠢的使用? [英] @BatchSize a smart or stupid use?

查看:23
本文介绍了@BatchSize 是聪明还是愚蠢的使用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先我将解释我是如何理解和使用 @BatchSize 的:@BatchSize是为了批量加载对象的关系,减少对数据库的SQL请求.这对 LAZY @OneToMany 关系特别有用.

但是,它甚至对 LAZY @OneToOne 关系和 @ManyToOne 甚至有用:如果您从数据库加载实体列表并要求加载一个延迟的 @*ToOne 实体,即使我只是使用加载列表第一个实体关系的测试,它也会批量加载实体.

请注意,如果有人想测试:这仅显示实体是否尚未加载:例如,如果您有一个包含管理员的用户列表并列出所有用户,当您访问管理员时,不会触发任何请求因为它已经加载了.

我在该方法上看到的唯一缺点是,如果您从数据库加载项目列表但只使用其中的一部分.这是一个后过滤操作.

让我们进入正题.

让我们假设我让一切都变得很好,即使它让我做本机 SQL 查询或使用 DTO 对象进行多选条件查询等,也不会做类似后过滤的操作.

  1. 在仔细考虑使用预先加载/加入并最终选择一个惰性关系后,我认为我可以只@BatchSize每个惰性关系吗?
  2. 我是否有兴趣为 @BatchSize 寻找足够的值,或者我可以认为越大越好"吗?这意味着在"IN"SQL 运算符中是否有任何数量限制可以使我的请求足够慢而不再有价值?我使用 Postgres 但如果你有其他 SGBD 的答案,我也很感兴趣.
  3. 可选问题:似乎在类上使用 @BatchSize 并没有产生很多结果.我仍然需要注释每一个懒惰的关系,我是否错过了什么或者它没有用?

我的 3 点是我的行为有所不同.

假设我正在加载一个A"类的实体列表,它与 B 有一个 LAZY OneToMany 关系.现在我想打印 B 的所有 creationDate.所以我正在做一个经典的 2 for 循环.

我现在用 BatchSize 注释 B:

  • @OneToMany 没有用 BatchSize 注释:每组 B 在每次迭代时独立加载,无需批处理.所以我对B类的注释似乎完全被忽略了.即使我将一个值设置为二"并且我在一组中有 6 个条目,我也有一个对该组的查询.
    • @OneToMany 已注释:我有加载的批次的特定查询.如果我将批量大小固定为 2 并且我总共有 10 B accros,我只会收到 5 个请求:无论我有多少 A.如果我将它设置为 100:我有 1 个 B 对象查询.

PS:我没有考虑任何与 B 相关的查询,这些查询可能会触发以使用 fetch select/subselect 加载 B 字段.

编辑 2:我刚刚找到这篇文章 为什么我不在每个延迟加载的关系上使用 @BatchSize ? 虽然我在发布我的问题之前用谷歌搜索并搜索了 SO,但我猜我没有使用正确的词......

然而,我添加了一些不同的东西,可能会导致不同的答案:当我想知道在每个关系上使用 BatchSize 时,这是在选择我是否想要急切加载、加入/选择获取或者我想要懒惰之后加载中.

解决方案

  1. 是的,@BatchSize 旨在与惰性关联一起使用.
  2. Hibernate 无论如何都会在大多数情况下执行多条语句,即使未初始化的代理/集合的数量小于指定的批处理大小.有关详细信息,请参阅此答案.此外,与较小的查询相比,较轻的查询可能会对系统的整体吞吐量产生积极的影响.
  3. @BatchSize 在类级别意味着实体的指定批大小将应用于与该实体的所有 @*ToOne 惰性关联.请参阅 文档.

您提供的链接问题/答案通常更关注优化和延迟加载的需求.它们当然也适用于这里,但它们不仅仅与批量加载相关,这只是可能的方法之一.

另一个重要的事情与链接的答案中提到的急切加载有关,这表明如果始终使用属性,那么您可以通过使用急切加载获得更好的性能.对于集合以及在许多情况下对于一对一关联,这通常不正确.

例如,假设您有以下实体,bscsA总是使用使用.

公共类 A {@一对多私人收藏<B>bs;@一对多私人收藏<C>CS;}

急切加载 bscs 显然会遇到 N+1 选择问题,如果你没有在单个查询中加入它们.但是如果你在一个查询中加入它们,例如:

从A中选择一个左连接获取 a.bs左连接获取 a.cs

然后您在 bscs 之间创建完整的笛卡尔积并返回 count(a.bs) x count(a.cs) 结果集中的行对于每个a,它们被一一读取并组装成A的实体及其bscs 的集合.

在这种情况下,批量获取将是非常理想的,因为您将首先读取 As,然后是 bs,然后是 cs,从而导致更多的查询,但从数据库传输的数据总量要少得多.此外,单独的查询比带有连接的大查询简单得多,并且更易于数据库执行和优化.

First I'll explain how I understood and use @BatchSize : @BatchSize is made in order to load relations of objects in batch, making less SQL request to the database. This is specially usefull on LAZY @OneToMany relations.

However it's even useful on LAZY @OneToOne relation and @ManyToOne : if you load a list of entities from the database and ask to load a lazyed @*ToOne entity, it will load the entities by batch even if i just use a test that load the relation of the 1st entity of the list.

Note if some want to tests : This only show if the entities are not already loaded : for instance if you have a list of user with manager and list all users, when you will access to the manager, no request will be triggered since it's already loaded.

The only drawback that i see on that method is if you load a list of item from the database but only use a part of it. This is a post-filtering operation.

So let's get to the main point.

Let's assume that i make everything good to never do post-filtering-like operations even if it's makes me do native SQL queries or use DTO objects for multiselect criteria query and so on.

  1. Am I right to consider that I can just @BatchSize every lazyed relations after having carefully think about using eager loading / join and finally choose a lazy relation ?
  2. Do i have any interest to search for an adequate value for the @BatchSize or can i think "the bigger the better" ? This would mean "is there any a limit of number in "IN" SQL operator that can make my request enough slower to not be worth anymore ? I use Postgres but if you have answers for others SGBD i'm interested too.
  3. Optional question : it seems that using @BatchSize on a class isn't producing a lot of results. I still have to annotate every lazy relationships, did i miss something about it or is it useless ?

EDIT : The point of my 3 is that i'm getting a different behaviour.

Let say i'm loading a list of entities of class "A" which has a LAZY OneToMany relationship to B. Now i want to print all creationDate of B. So i'm doing a classic 2 for loop.

I annotated B with BatchSize now :

  • @OneToMany is not annotated with BatchSize : each set of B are loaded on each iteration independently without batching. So my annotation on B class seems to be totally ignored. Even if i set a value to "two" and i have 6 entries in one set, i have one query for that set.
    • @OneToMany is annotated : i have the specific query of batches that are loaded. If i fix the batch size to two and i have a total of 10 B accros i just get 5 requests : whatever the number of A i have. If i set it to 100 : i have 1 query for B objects.

PS : i'm not considering any related query to B that might fire to load B fields with fetch select/subselect.

EDIT 2 : i just found this post Why would I not use @BatchSize on every lazy loaded relationship? althought i googled and search on SO beforeposting my question, guess i didn't use the right words...

However i'm adding something different that might lead to a different answer : when i'm wondering about using BatchSize on every relations, it's after choosing if i want a eager loading, with join / select fetch or if i want lazy loading.

解决方案

  1. Yes, @BatchSize is meant to be used with lazy associations.
  2. Hibernate will execute multiple statements in most sitations anyway, even if the count of uninitialized proxies/collections is less than the specified batch size. See this answer for more details. Also, more lighter queries compared to less bigger ones may positively contribute to the overall throughput of the system.
  3. @BatchSize on class level means that the specified batch size for the entity will be applied for all @*ToOne lazy associations with that entity. See the example with the Person entity in the documentation.

The linked question/answers you provided are more concerned about the need for optimization and lazy loading in general. They apply here as well of course, but they are not related to batch loading only, which is just one of the possible approaches.

Another important thing relates to eager loading which is mentioned in the linked answers and which suggests that if a property is always used then you may get better performance by using eager loading. This is in general not true for collections and in many situations for to-one associations either.

For example, suppose you have the following entity for which bs and cs are always used when A is used.

public class A {
  @OneToMany
  private Collection<B> bs;

  @OneToMany
  private Collection<C> cs;
}

Eagerly loading bs and cs obviously suffers from N+1 selects problem if you don't join them in a single query. But if you join them in a single query, for example like:

select a from A
  left join fetch a.bs
  left join fetch a.cs

then you create full Cartesian product between bs and cs and returning count(a.bs) x count(a.cs) rows in the result set for each a which are read one by one and assembled into the entities of A and their collections of bs and cs.

Batch fetching would be very optimal in this situation, because you would first read As, then bs and then cs, resulting in more queries but with much less total amount of data that is transferred from the database. Also, the separate queries are much simpler than a big one with joins and are easier for database to execute and optimize.

这篇关于@BatchSize 是聪明还是愚蠢的使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆