FetchMode加入与SubSelect [英] FetchMode Join vs SubSelect

查看:183
本文介绍了FetchMode加入与SubSelect的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两张表Employee和Department下面是他们两个的实体类

  Department.java 
@Entity
@Table(name =DEPARTMENT)
public class Department {
@Id
@Column(name =DEPARTMENT_ID)
@GeneratedValue(strategy = GenerationType.AUTO)
private Integer departmentId;
@Column(name =DEPARTMENT_NAME)
private String departmentName;
@Column(name =LOCATION)
私人字符串位置;

@OneToMany(cascade = CascadeType.ALL,mappedBy =department,orphanRemoval = true)
@Fetch(FetchMode.SUBSELECT)
//@Fetch(FetchMode.JOIN )
私人列表<员工> employees = new ArrayList<>();


$ b $ Employee.java
@Entity
@Table(name =EMPLOYEE)
public class Employee {
@Id
@SequenceGenerator(name =emp_seq,sequenceName =seq_employee)
@GeneratedValue(generator =emp_seq)
@Column(name =EMPLOYEE_ID)
private Integer employeeId;
@Column(name =EMPLOYEE_NAME)
private String employeeName;

@ManyToOne
@JoinColumn(name =DEPARTMENT_ID)
私人部门部门;

以下是当我执行 em时触发的查询。 find(Department.class,1);



- fetch mode = fetchmode.join

  SELECT department0_.DEPARTMENT_ID AS DEPARTMENT_ID1_0_0_,
department0_.DEPARTMENT_NAME AS DEPARTMENT_NAME2_0_0_,
department0_.LOCATION AS LOCATION3_0_0_,
employees1_.DEPARTMENT_ID AS DEPARTMENT_ID3_1_1_,
employees1_.EMPLOYEE_ID AS EMPLOYEE_ID1_1_1_,
employees1_.EMPLOYEE_ID AS EMPLOYEE_ID1_1_2_,
employees1_.DEPARTMENT_ID AS DEPARTMENT_ID3_1_2_,
employees1_.EMPLOYEE_NAME AS EMPLOYEE_NAME2_1_2_
FROM DEPARTMENT department0_
LEFT OUTER JOIN EMPLOYEE employees1_
ON department0_.DEPARTMENT_ID = employees1_.DEPARTMENT_ID
WHERE department0_.DEPARTMENT_ID =?

- fetch mode = fetchmode.subselect

  SELECT department0_.DEPARTMENT_ID AS DEPARTMENT_ID1_0_0_,
department0_.DEPARTMENT_NAME AS DEPARTMENT_NAME2_0_0_,
department0_.LOCATION AS LOCATION3_0_0_
FROM DEPARTMENT department0_
WHERE department0_.DEPARTMENT_ID =?

SELECT employees0_.DEPARTMENT_ID AS DEPARTMENT_ID3_1_0_,
employees0_.EMPLOYEE_ID AS EMPLOYEE_ID1_1_0_,
employees0_.EMPLOYEE_ID AS EMPLOYEE_ID1_1_1_,
employees0_.DEPARTMENT_ID AS DEPARTMENT_ID3_1_1_,
employees0_ .EMPLOYEE_NAME AS EMPLOYEE_NAME2_1_1_
FROM EMPLOYEE employees0_
WHERE employees0_.DEPARTMENT_ID =?

我只想知道我们应该选择哪一个 FetchMode.JOIN code>或 FetchMode.SUBSELECT ?哪一个应该选择在哪种情况下?

解决方案

Marmite引用的SUBQUERY策略与FetchMode.SELECT相关,而不是SUBSELECT 。

您发布的有关 fetchmode.subselect 的控制台输出令人好奇,因为这不是应该起作用的方式。



FetchMode.SUBSELECT


使用子查询加载附加集合

Hibernate docs
$ b


如果一个懒惰集合或单值代理必须是取出后,Hibernate将加载所有这些数据,然后在子查询中重新运行原始查询。这种方式与批量提取相同,但没有零星的加载。

FetchMode.SUBSELECT应该看起来像这样:

p>

  SELECT< employees columns> 
FROM EMPLOYEE employees0_
WHERE employees0_.DEPARTMENT_ID IN
(SELECT department0_.DEPARTMENT_ID FROM DEPARTMENT department0_)

您可以看到第二个查询会将 all 所有属于某个凭据的员工(即employee.department_id不为null)记录到内存中,这并不重要它不是您在第一个查询中检索的部门。
因此,如果员工表很大,这可能是一个主要问题,因为它可能是意外地将整个数据库加载到内存中

然而,FetchMode.SUBSELECT显着减少了查询的数量,因为在comparisson中只有两个查询FecthMode.SELECT的N + 1个查询。



您可能认为FetchMode.JOIN的查询次数更少,只有1个,所以为什么要使用SUBSELECT?嗯,这是真的,但是以重复数据和更重的响应为代价。

如果必须使用JOIN提取单值代理,则查询可能会检索:

  + --------------- + --------- + ----------- + 
| DEPARTMENT_ID | BOSS_ID | BOSS_NAME |
+ --------------- + --------- + ----------- +
| 1 | 1 | GABRIEL |
| 2 | 1 | GABRIEL |
| 3 | 2 | ALEJANDRO |
+ --------------- + --------- + ----------- +

如果他指挥了多个部门并且带宽成本很高,那么老板的员工数据就会被复制。



如果必须使用JOIN提取惰性集合,则查询可能会检索:

  + --------------- + --------------- + ------------- + 
| DEPARTMENT_ID | DEPARTMENT_ID | EMPLOYEE_ID |
+ --------------- + --------------- + ------------- +
| 1 |销售| GABRIEL |
| 1 |销售| ALEJANDRO |
| 2 | RRHH | DANILO |
+ --------------- + --------------- + ------------- +

如果部门数据包含多个员工(自然情况),则重复部门数据。
我们不仅承受带宽成本,而且还得到重复的复制Department对象,我们必须使用SET或 DISTINCT_ROOT_ENTITY 去重复。



然而,低延迟位置的重复数据是一个很好的折衷在许多情况下,像Markus Winand
$ b


SQL嵌套仍然比嵌套选择方法更有效率 - 尽管它执行相同的索引查找,因为它避免了很多网络通信。如果由于每次销售的员工属性重复而导致传输的数据总量更大,则速度更快。这是因为性能的两个维度:响应时间和吞吐量;在计算机网络中,我们称之为延迟和带宽。带宽对响应时间仅有轻微影响,但延迟会产生巨大影响。这意味着数据库往返次数对于响应时间比传输数据量更重要。


所以,main有关使用SUBSELECT的问题是难以控制,并且可能会将整个实体图加载到内存中。
通过批量提取,您可以在与SUBSELECT不同的查询中获取关联的实体(因此您不会遇到重复的问题),逐渐且最重要的是,您只查询相关实体(因此您不会受到潜在加载巨大图形),因为IN子查询是通过outter查询检索到的ID进行过滤的。)

  Hibernate:
select .. 。
from mkyong.stock stock0_

Hibernate:
select ...
from mkyong.stock_daily_record stockdaily0_
where
stockdaily0_.STOCK_ID in (
?,?,?,?,?,?,?,?

(这可能是一个有趣的测试,如果批处理量很大的批处理会像SUBSELECT一样起作用,但不会加载整个表)




$ b

摘要:


  • JOIN:避免N + 1查询的主要问题,但它可能检索数据重复。

  • SUBSELECT:也避免N + 1,不会重复数据,但会将关联类型的所有实体加载到内存中。
  • ul>

    这些表格使用 ascii-tables


    I have two tables Employee and Department following are the entity classes for both of them

    Department.java
    @Entity
    @Table(name = "DEPARTMENT")
    public class Department {
        @Id
        @Column(name = "DEPARTMENT_ID")
        @GeneratedValue(strategy = GenerationType.AUTO)
        private Integer departmentId;
        @Column(name = "DEPARTMENT_NAME")
        private String departmentName;
        @Column(name = "LOCATION")
        private String location;
    
        @OneToMany(cascade = CascadeType.ALL, mappedBy = "department", orphanRemoval = true)
        @Fetch(FetchMode.SUBSELECT)
        //@Fetch(FetchMode.JOIN)
        private List<Employee> employees = new ArrayList<>();
    }
    
    
    Employee.java
    @Entity
    @Table(name = "EMPLOYEE")
    public class Employee {
        @Id
        @SequenceGenerator(name = "emp_seq", sequenceName = "seq_employee")
        @GeneratedValue(generator = "emp_seq")
        @Column(name = "EMPLOYEE_ID")
        private Integer employeeId;
        @Column(name = "EMPLOYEE_NAME")
        private String employeeName;
    
        @ManyToOne
        @JoinColumn(name = "DEPARTMENT_ID")
        private Department department;
    }
    

    Below are the queries fired when I did em.find(Department.class, 1);

    -- fetch mode = fetchmode.join

        SELECT department0_.DEPARTMENT_ID AS DEPARTMENT_ID1_0_0_,
          department0_.DEPARTMENT_NAME    AS DEPARTMENT_NAME2_0_0_,
          department0_.LOCATION           AS LOCATION3_0_0_,
          employees1_.DEPARTMENT_ID       AS DEPARTMENT_ID3_1_1_,
          employees1_.EMPLOYEE_ID         AS EMPLOYEE_ID1_1_1_,
          employees1_.EMPLOYEE_ID         AS EMPLOYEE_ID1_1_2_,
          employees1_.DEPARTMENT_ID       AS DEPARTMENT_ID3_1_2_,
          employees1_.EMPLOYEE_NAME       AS EMPLOYEE_NAME2_1_2_
        FROM DEPARTMENT department0_
        LEFT OUTER JOIN EMPLOYEE employees1_
        ON department0_.DEPARTMENT_ID   =employees1_.DEPARTMENT_ID
        WHERE department0_.DEPARTMENT_ID=?
    

    -- fetch mode = fetchmode.subselect

        SELECT department0_.DEPARTMENT_ID AS DEPARTMENT_ID1_0_0_,
          department0_.DEPARTMENT_NAME    AS DEPARTMENT_NAME2_0_0_,
          department0_.LOCATION           AS LOCATION3_0_0_
        FROM DEPARTMENT department0_
        WHERE department0_.DEPARTMENT_ID=?
    
        SELECT employees0_.DEPARTMENT_ID AS DEPARTMENT_ID3_1_0_,
          employees0_.EMPLOYEE_ID        AS EMPLOYEE_ID1_1_0_,
          employees0_.EMPLOYEE_ID        AS EMPLOYEE_ID1_1_1_,
          employees0_.DEPARTMENT_ID      AS DEPARTMENT_ID3_1_1_,
          employees0_.EMPLOYEE_NAME      AS EMPLOYEE_NAME2_1_1_
        FROM EMPLOYEE employees0_
        WHERE employees0_.DEPARTMENT_ID=?
    

    I just wanted to know which one should we prefer FetchMode.JOIN or FetchMode.SUBSELECT? which one should we opt in which scenario?

    解决方案

    The SUBQUERY strategy that Marmite refers to is related to FetchMode.SELECT, not SUBSELECT.

    The console output that you've posted about fetchmode.subselect is curious because this is not the way that is supposed to work.

    The FetchMode.SUBSELECT

    use a subselect query to load the additional collections

    Hibernate docs:

    If one lazy collection or single-valued proxy has to be fetched, Hibernate will load all of them, re-running the original query in a subselect. This works in the same way as batch-fetching but without the piecemeal loading.

    FetchMode.SUBSELECT should look something like this:

    SELECT <employees columns>
    FROM EMPLOYEE employees0_
    WHERE employees0_.DEPARTMENT_ID IN
    (SELECT department0_.DEPARTMENT_ID FROM DEPARTMENT department0_)
    

    You can see that this second query will bring to memory all the employees that belongs to some departament (i.e. employee.department_id is not null), it doesn't matter if it is not the department that you retrieve in your first query. So this is potentially a major issue if the table of employees is large because it may be accidentially loading a whole database into memory.

    However, FetchMode.SUBSELECT reduces significatly the number of queries because takes only two queries in comparisson to the N+1 queries of the FecthMode.SELECT.

    You may be thinking that FetchMode.JOIN makes even less queries, just 1, so why use SUBSELECT at all? Well, it's true but at the cost of duplicated data and a heavier response.

    If a single-valued proxy has to be fetched with JOIN, the query may retrieve:

    +---------------+---------+-----------+
    | DEPARTMENT_ID | BOSS_ID | BOSS_NAME |
    +---------------+---------+-----------+
    |             1 |       1 | GABRIEL   |
    |             2 |       1 | GABRIEL   |
    |             3 |       2 | ALEJANDRO |
    +---------------+---------+-----------+
    

    The employee data of the boss is duplicated if he directs more than one department and it has a cost in bandwith.

    If a lazy collection has to be fetched with JOIN, the query may retrieve:

    +---------------+---------------+-------------+
    | DEPARTMENT_ID | DEPARTMENT_ID | EMPLOYEE_ID |
    +---------------+---------------+-------------+
    |             1 | Sales         | GABRIEL     |
    |             1 | Sales         | ALEJANDRO   |
    |             2 | RRHH          | DANILO      |
    +---------------+---------------+-------------+
    

    The department data is duplicated if it contains more than one employee (the natural case). We don't only suffer a cost in bandwidth but also we get duplicate duplicated Department objects and we must use a SET or DISTINCT_ROOT_ENTITY to de-duplicate.

    However, duplicate data in pos of a lower latency is a good trade off in many cases, like Markus Winand says.

    An SQL join is still more efficient than the nested selects approach—even though it performs the same index lookups—because it avoids a lot of network communication. It is even faster if the total amount of transferred data is bigger because of the duplication of employee attributes for each sale. That is because of the two dimensions of performance: response time and throughput; in computer networks we call them latency and bandwidth. Bandwidth has only a minor impact on the response time but latencies have a huge impact. That means that the number of database round trips is more important for the response time than the amount of data transferred.

    So, the main issue about using SUBSELECT is that is hard to control and may be loading a whole graph of entities into memory. With Batch fetching you fetch the associated entity in a separate query as SUBSELECT (so you don't suffer duplicates), gradually and most important you query only related entities (so you don't suffer from potentially load a huge graph) because the IN subquery is filtered by the IDs retrieved by the outter query).

    Hibernate: 
        select ...
        from mkyong.stock stock0_
    
    Hibernate: 
        select ...
        from mkyong.stock_daily_record stockdaily0_ 
        where
            stockdaily0_.STOCK_ID in (
                ?, ?, ?, ?, ?, ?, ?, ?, ?, ?
            )
    

    (It may be interesting test if Batch fetching with a very high batch size would act like a SUBSELECT but without the issue of load the whole table)

    A couple of posts showing the different fetching strategies and the SQL logs (very important):

    Summary:

    • JOIN: avoids the major issue of N+1 queries but it may retrieve data duplicated.
    • SUBSELECT: avoids N+1 too and doesn't duplicate data but it loads all the entities of the associated type into memory.

    The tables were built using ascii-tables.

    这篇关于FetchMode加入与SubSelect的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆