每个连接的计数-优化 [英] a count for each join - optimisation
问题描述
结果: 我使用了三种方法:
RESULTS : I've used three methods :
- 三个子查询,每个(我的)查询1个
- 三个子查询,不连接,使用where(SlimsGhost)过滤
- 三重联接(Solarflare)
我用解释"和分析"进行了一些统计,它们解释了每个查询必须完成的工作,以下结果不足为奇:
I've made some stats with "explain" and "profiling" which explains the work each query must do and the following results weren't surprising : stats
相对结果:
- 100%
- 79%
-
1715%
- 100%
- 79%
1715%
原始帖子
这个想法是联接4个表,每次使用相同的PK,然后计算每个联接分别给出多少行.
The idea is to join 4 tables, using the same PK each time and then count how many rows each join would separately gives.
显而易见的答案是将每个联接...与子查询分开进行.
The obvious answer is to do each join... separately with sub queries.
但是可以通过一个查询来做到这一点吗?会更有效吗?
But is it possible to do it with one query? Would it be more efficient?
select "LES CIGARES DU PHARAON" as "Titre",
(select count( payalb.idPays)
from album alb
left join pays_album payalb using ( idAlb )
where alb.titreAlb = "LES CIGARES DU PHARAON") as "Pays",
(select count( peralb.idPers)
from album alb
left join pers_album peralb using ( idAlb )
where alb.titreAlb = "LES CIGARES DU PHARAON") as "Personnages",
(select count( juralb.idJur)
from album alb
left join juron_album juralb using ( idAlb )
where alb.titreAlb = "LES CIGARES DU PHARAON") as "Jurons"
;
+------------------------+------+-------------+--------+
| Titre | Pays | Personnages | Jurons |
+------------------------+------+-------------+--------+
| LES CIGARES DU PHARAON | 3 | 13 | 50 |
+------------------------+------+-------------+--------+
表相册行:22
表pays_album行:45
table pays_album rows : 45
表personnage_album行:100
table personnage_album rows : 100
表juron_album行:1704
table juron_album rows : 1704
这是我尝试过的:
select alb.titreAlb as "Titre",
sum(case when alb.idAlb=payalb.idAlb then 1 else 0 end) "Pays",
sum(case when alb.idAlb=peralb.idAlb then 1 else 0 end) "Personnages",
sum(case when alb.idAlb=juralb.idAlb then 1 else 0 end) "Jurons"
from album alb
left join pays_album payalb using ( idAlb )
left join pers_album peralb using ( idAlb )
left join juron_album juralb using ( idAlb )
where alb.titreAlb = "LES CIGARES DU PHARAON"
group by alb.titreAlb
;
+------------------------+------+-------------+--------+
| Titre | Pays | Personnages | Jurons |
+------------------------+------+-------------+--------+
| LES CIGARES DU PHARAON | 1950 | 1950 | 1950 |
+------------------------+------+-------------+--------+
但它计算的是完整联接表的总行数,...(1950 = 3 * 13 * 50)
but it counts the total number of rows of the full joined table, ... (1950 = 3 * 13 * 50)
模式: https://github.com/LittleNooby/gbd2015 -2016/blob/master/tintin_schema.png
表内容: https://github.com/LittleNooby/gbd2015- 2016/blob/master/tintin_description
如果您想玩玩:
db_init: https://github.com/LittleNooby/gbd2015 -2016/blob/master/tintin_ok.mysql
推荐答案
出于优化目的,一个好的经验法则是少加入而不是多加入.实际上,您应该尝试加入尽可能少的行.使用任何其他联接,您将成倍增加成本,而不是增加成本.因为mysql基本上只会生成一个大的乘法矩阵.不过,其中很多已被索引和其他内容优化了.
For optimization purposes, a good rule of thumb is to join less, not more. In fact, you should try to join as few rows as you can with as few rows as you can. With any additional join, you will multiply cost instead of adding cost. Because mysql will basically just generate a big multiplied matrix. A lot of that gets optimized away by indexes and other stuff though.
但是要回答您的问题:假设表具有唯一键,而idalb是专辑的唯一键,那么实际上只需要一个大的连接就可以计数.然后,只有那时,您才能执行类似于代码的操作:
But to answer your question: it is actually possible to count with only one big join, assuming the tables have unique keys and idalb is a unique key for album. Then, and only then, you can do it similar to your code:
select alb.titreAlb as "Titre",
count(distinct payalb.idAlb, payalb.PrimaryKeyFields) "Pays",
count(distinct peralb.idAlb, peralb.PrimaryKeyFields) "Personnages",
count(distinct juralb.idAlb, juralb.PrimaryKeyFields) "Jurons"
from album alb
left join pays_album payalb using ( idAlb )
left join pers_album peralb using ( idAlb )
left join juron_album juralb using ( idAlb )
where alb.titreAlb = "LES CIGARES DU PHARAON"
group by alb.titreAlb
其中PrimaryKeyFields代表联接表的主键字段(您必须查找它们).
where PrimaryKeyFields stands for the primary key fields of the joined tables (you have to look them up).
Distinct
将消除其他联接对计数产生的影响.但不幸的是,通常distinct
不会消除联接对成本的影响.
Distinct
will remove the effect the other joins have on the count. But unfortunately, in general, distinct
will not remove the effect the joins have on the cost.
尽管,如果您的索引覆盖了表的所有(idAlb + PrimaryKeyFields)字段,则其速度可能与原始解决方案一样快(因为它可以优化distinct
而不进行排序),并且会接近您的想法(只需遍历每个表/索引一次).但是在正常或最坏的情况下,它的性能要比合理的解决方案(例如SlimGhost的解决方案)更差-因为怀疑是否会找到最佳策略.但是,尝试一下并检查其解释(并发布发现),也许mysql会做一些疯狂的事情.
Although, if you have indexes that cover all (idAlb + PrimaryKeyFields)-fields of your tables, that might be even as fast as the original solution (because it can optimize the distinct
to not do a sorting) and will come close to what you were thinking of (just walking through every table/index once). But in a normal or worst case szenario, it should perform worse than a reasonable solution (like SlimGhost's one) - because it is doubtful it will find the optimal strategy. But play around with it and check the explains (and post the findings), maybe mysql will do something crazy.
这篇关于每个连接的计数-优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!