使用 GROUP BY 与 DISTINCT 时的巨大性能差异 [英] Huge performance difference when using GROUP BY vs DISTINCT

查看：45 发布时间：2021/12/8 11:58:26 sql performance group-by distinct hsqldb

本文介绍了使用 GROUP BY 与 DISTINCT 时的巨大性能差异的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在一个 HSQLDB 服务器上执行一些测试，其中包含一个包含 500 000 个条目的表.该表没有索引.有 5000 个不同的业务键.我需要他们的清单.

I am performing some tests on a HSQLDB server with a table containing 500 000 entries. The table has no indices. There are 5000 distinct business keys. I need a list of them.

当然，我从一个 DISTINCT 查询开始:

Naturally I started with a DISTINCT query:

SELECT DISTINCT business_key
FROM memory
WHERE concept <> 'case'   OR 
      attrib  <> 'status' OR 
      value   <> 'closed';

大约需要 90 秒！！！

It takes around 90 seconds!!!

然后我尝试使用GROUP BY:

SELECT business_key
FROM memory
WHERE concept <> 'case'   OR 
      attrib  <> 'status' OR
      value   <> 'closed';
GROUP BY business_key

需要 1 秒！！！

试图找出差异我运行了 EXLAIN PLAN FOR 但它似乎为两个查询提供了相同的信息.

Trying to figure out the difference I ran EXLAIN PLAN FOR but it seems to give the same information for both queries.

EXLAIN Plan FOR DISTINCT ...

isAggregated=[false]
columns=[
  COLUMN: PUBLIC.MEMORY.BUSINESS_KEY
]
[range variable 1
  join type=INNER
  table=MEMORY
  alias=M
  access=FULL SCAN
  condition = [    index=SYS_IDX_SYS_PK_10057_10058
    other condition=[
    OR arg_left=[
     OR arg_left=[
      NOT_EQUAL arg_left=[
       COLUMN: PUBLIC.MEMORY.CONCEPT] arg_right=[
       VALUE = case, TYPE = CHARACTER]] arg_right=[
      NOT_EQUAL arg_left=[
       COLUMN: PUBLIC.MEMORY.ATTRIB] arg_right=[
       VALUE = status, TYPE = CHARACTER]]] arg_right=[
     NOT_EQUAL arg_left=[
      COLUMN: PUBLIC.MEMORY.VALUE] arg_right=[
      VALUE = closed, TYPE = CHARACTER]]]
  ]
]]
PARAMETERS=[]
SUBQUERIES[]
Object References
PUBLIC.MEMORY
PUBLIC.MEMORY.CONCEPT
PUBLIC.MEMORY.ATTRIB
PUBLIC.MEMORY.VALUE
PUBLIC.MEMORY.BUSINESS_KEY
Read Locks
PUBLIC.MEMORY
WriteLocks

EXLAIN PLAN FOR SELECT ... GROUP BY ...

isDistinctSelect=[false]
isGrouped=[true]
isAggregated=[false]
columns=[
  COLUMN: PUBLIC.MEMORY.BUSINESS_KEY
]
[range variable 1
  join type=INNER
  table=MEMORY
  alias=M
  access=FULL SCAN
  condition = [    index=SYS_IDX_SYS_PK_10057_10058
    other condition=[
    OR arg_left=[
     OR arg_left=[
      NOT_EQUAL arg_left=[
       COLUMN: PUBLIC.MEMORY.CONCEPT] arg_right=[
       VALUE = case, TYPE = CHARACTER]] arg_right=[
      NOT_EQUAL arg_left=[
       COLUMN: PUBLIC.MEMORY.ATTRIB] arg_right=[
       VALUE = status, TYPE = CHARACTER]]] arg_right=[
     NOT_EQUAL arg_left=[
      COLUMN: PUBLIC.MEMORY.VALUE] arg_right=[
      VALUE = closed, TYPE = CHARACTER]]]
  ]
]]
groupColumns=[
COLUMN: PUBLIC.MEMORY.BUSINESS_KEY]
PARAMETERS=[]
SUBQUERIES[]
Object References
PUBLIC.MEMORY
PUBLIC.MEMORY.CONCEPT
PUBLIC.MEMORY.ATTRIB
PUBLIC.MEMORY.VALUE
PUBLIC.MEMORY.BUSINESS_KEY
Read Locks
PUBLIC.MEMORY
WriteLocks

编辑

我做了额外的测试.HSQLDB 中有 500 000 条记录，包含所有不同的业务键，DISTINCT 的性能现在更好 - 3 秒，而 GROUP BY 花费了大约9 秒.

EDIT

I did additional tests. With 500 000 records in HSQLDB with all distinct business keys, the performance of DISTINCT is now better - 3 seconds, vs GROUP BY which took around 9 seconds.

在 MySQL 中，两个查询执行相同的:

In MySQL both queries preform the same:

MySQL:500 000 行 - 5 000 个不同的业务键:两个查询:0.5 秒MySQL:500 000 行 - 所有不同的业务键:SELECT DISTINCT ... - 11 秒SELECT ... GROUP BY business_key - 13 秒

MySQL: 500 000 rows - 5 000 distinct business keys: Both queries: 0.5 second MySQL: 500 000 rows - all distinct business keys: SELECT DISTINCT ... - 11 seconds SELECT ... GROUP BY business_key - 13 seconds

所以问题只与HSQLDB有关.

如果有人能解释为什么会有如此巨大的差异，我将不胜感激.

I will be very grateful if someone can explain why there is such a drastic difference.

使用 GROUP BY 与 DISTINCT 时的巨大性能差异 [英] Huge performance difference when using GROUP BY vs DISTINCT

问题描述

编辑

EDIT

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 GROUP BY 与 DISTINCT 时的巨大性能差异 [英] Huge performance difference when using GROUP BY vs DISTINCT

问题描述

编辑

EDIT

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭