每个连接的计数-优化 [英] a count for each join - optimisation

查看:94
本文介绍了每个连接的计数-优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

结果: 我使用了三种方法:

RESULTS : I've used three methods :

  1. 三个子查询,每个(我的)查询1个
  2. 三个子查询,不连接,使用where(SlimsGhost)过滤
  3. 三重联接(Solarflare)

我用解释"和分析"进行了一些统计,它们解释了每个查询必须完成的工作,以下结果不足为奇:

I've made some stats with "explain" and "profiling" which explains the work each query must do and the following results weren't surprising : stats

相对结果:

  1. 100%
  2. 79%
  3. 1715%

  1. 100%
  2. 79%
  3. 1715%

原始帖子

这个想法是联接4个表,每次使用相同的PK,然后计算每个联接分别给出多少行.

The idea is to join 4 tables, using the same PK each time and then count how many rows each join would separately gives.

显而易见的答案是将每个联接...与子查询分开进行.

The obvious answer is to do each join... separately with sub queries.

但是可以通过一个查询来做到这一点吗?会更有效吗?

But is it possible to do it with one query? Would it be more efficient?

select "LES CIGARES DU PHARAON" as "Titre",
          (select count( payalb.idPays)
          from album alb
                     left join pays_album payalb using ( idAlb )
          where alb.titreAlb = "LES CIGARES DU PHARAON") as "Pays",
          (select count( peralb.idPers)
          from album alb
                     left join pers_album peralb using ( idAlb )
          where alb.titreAlb = "LES CIGARES DU PHARAON") as "Personnages",
          (select count( juralb.idJur)
          from album alb
                     left join juron_album juralb using ( idAlb )
          where alb.titreAlb = "LES CIGARES DU PHARAON") as "Jurons"
; 
+------------------------+------+-------------+--------+
| Titre                  | Pays | Personnages | Jurons |
+------------------------+------+-------------+--------+
| LES CIGARES DU PHARAON |    3 |          13 |     50 |
+------------------------+------+-------------+--------+

表相册行:22

表pays_album行:45

table pays_album rows : 45

表personnage_album行:100

table personnage_album rows : 100

表juron_album行:1704

table juron_album rows : 1704

这是我尝试过的:

select alb.titreAlb as "Titre",
         sum(case when alb.idAlb=payalb.idAlb then 1 else 0 end) "Pays",
         sum(case when alb.idAlb=peralb.idAlb then 1 else 0 end) "Personnages",
         sum(case when alb.idAlb=juralb.idAlb then 1 else 0 end) "Jurons"
from album alb
          left join pays_album payalb using ( idAlb )
          left join pers_album peralb using ( idAlb )
          left join juron_album juralb using ( idAlb )
where alb.titreAlb = "LES CIGARES DU PHARAON"
group by alb.titreAlb
;
+------------------------+------+-------------+--------+
| Titre                  | Pays | Personnages | Jurons |
+------------------------+------+-------------+--------+
| LES CIGARES DU PHARAON | 1950 |        1950 |   1950 |
+------------------------+------+-------------+--------+

但它计算的是完整联接表的总行数,...(1950 = 3 * 13 * 50)

but it counts the total number of rows of the full joined table, ... (1950 = 3 * 13 * 50)

模式: https://github.com/LittleNooby/gbd2015 -2016/blob/master/tintin_schema.png

表内容: https://github.com/LittleNooby/gbd2015- 2016/blob/master/tintin_description

如果您想玩玩:

db_init: https://github.com/LittleNooby/gbd2015 -2016/blob/master/tintin_ok.mysql

推荐答案

出于优化目的,一个好的经验法则是少加入而不是多加入.实际上,您应该尝试加入尽可能少的行.使用任何其他联接,您将成倍增加成本,而不是增加成本.因为mysql基本上只会生成一个大的乘法矩阵.不过,其中很多已被索引和其他内容优化了.

For optimization purposes, a good rule of thumb is to join less, not more. In fact, you should try to join as few rows as you can with as few rows as you can. With any additional join, you will multiply cost instead of adding cost. Because mysql will basically just generate a big multiplied matrix. A lot of that gets optimized away by indexes and other stuff though.

但是要回答您的问题:假设表具有唯一键,而idalb是专辑的唯一键,那么实际上只需要一个大的连接就可以计数.然后,只有那时,您才能执行类似于代码的操作:

But to answer your question: it is actually possible to count with only one big join, assuming the tables have unique keys and idalb is a unique key for album. Then, and only then, you can do it similar to your code:

select alb.titreAlb as "Titre",
       count(distinct payalb.idAlb, payalb.PrimaryKeyFields) "Pays",
       count(distinct peralb.idAlb, peralb.PrimaryKeyFields) "Personnages",
       count(distinct juralb.idAlb, juralb.PrimaryKeyFields) "Jurons"
from album alb
left join pays_album payalb using ( idAlb )
left join pers_album peralb using ( idAlb )
left join juron_album juralb using ( idAlb )
where alb.titreAlb = "LES CIGARES DU PHARAON"
group by alb.titreAlb

其中PrimaryKeyFields代表联接表的主键字段(您必须查找它们).

where PrimaryKeyFields stands for the primary key fields of the joined tables (you have to look them up).

Distinct将消除其他联接对计数产生的影响.但不幸的是,通常distinct不会消除联接对成本的影响.

Distinct will remove the effect the other joins have on the count. But unfortunately, in general, distinct will not remove the effect the joins have on the cost.

尽管,如果您的索引覆盖了表的所有(idAlb + PrimaryKeyFields)字段,则其速度可能与原始解决方案一样快(因为它可以优化distinct而不进行排序),并且会接近您的想法(只需遍历每个表/索引一次).但是在正常或最坏的情况下,它的性能要比合理的解决方案(例如SlimGhost的解决方案)更差-因为怀疑是否会找到最佳策略.但是,尝试一下并检查其解释(并发布发现),也许mysql会做一些疯狂的事情.

Although, if you have indexes that cover all (idAlb + PrimaryKeyFields)-fields of your tables, that might be even as fast as the original solution (because it can optimize the distinct to not do a sorting) and will come close to what you were thinking of (just walking through every table/index once). But in a normal or worst case szenario, it should perform worse than a reasonable solution (like SlimGhost's one) - because it is doubtful it will find the optimal strategy. But play around with it and check the explains (and post the findings), maybe mysql will do something crazy.

这篇关于每个连接的计数-优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆