mysql:从两者中选择最有效的查询 [英] mysql:choosing the most efficient query from the two
问题描述
这两个 mysql 查询产生完全相同的结果,但查询 A 是一个简单的联合,它具有嵌入在各个查询中的 where postType 子句,而查询 B 具有应用于虚拟表的外部选择的相同 where 子句,即单个查询结果的联合.我担心如果有很多行,查询 B 中的虚拟表 sigma
可能会无缘无故变得太大,但是我有点困惑,因为 order by 将如何用于查询 A ;它是否也不必制作虚拟表或类似的东西来对结果进行排序.所有可能都取决于 order by 如何为工会工作?如果联合的 order by 也在制作临时表;那么查询 A 几乎等同于资源中的查询 B(与查询 A 相比,在我们的系统中实现查询 B 会容易得多)?请以任何可能的方式指导/建议,谢谢
查询 A
<块引用>SELECT `t1`.*, `t2`.*从`t1`内部连接`t2`开启`t1`.websiteID= `t2`.ownerIDAND `t1`.authorID= `t2`.authorIDAND `t1`.authorID=1559 AND `t1`.postType="simplePost"联盟选择`t1`.*FROM `t1` where websiteID=1559 AND postType="simplePost"ORDER BY postID 限制 0,50
查询 B
<块引用>选择 * from (选择`t1`.*,`t2`.*从`t1`内部连接`t2`开启`t1`.websiteID= `t2`.ownerIDAND `t1`.authorID= `t2`.authorIDAND `t1`.authorID=1559联盟选择`t1`.*来自`t1`,其中 websiteID=1559)作为 sigma where postType="simplePost" ORDER BY postID limit 0,50
解释查询 A
<块引用>id type table type possible_keys keys key_len ref rows Extra1 PRIMARY t2 ref userID userID 4 const 11 PRIMARY t1 ref authorID authorID 4 const 2 Usingwhere2 UNION t1 ref websiteID websiteID 4 const 9 UsingwhereNULL UNIONRESULT <union1,2>ALL NULL NULL NULL NULL NULL 使用文件排序
解释查询 B
<块引用>id select_type table type possible_keys key key_len ref rows Extra1 PRIMARY <derived2>ALL NULL NULL NULL NULL 10 使用 where;使用文件排序2 DERIVED t2 ref userID userID 4 12 DERIVED t1 ref authorID authorID 4 2 使用 where3 UNION t1 ref websiteID websiteID 4 9NULL联合结果全部 NULL NULL NULL NULL NULL
毫无疑问,第 1 版 - 在联合的每一侧单独使用 where 子句 - 会更快.让我们看看为什么版本 - 在联合结果上的 where 子句 - 更糟糕:
- 数据量:联合结果中总会有更多的行,因为返回的行的条件较少.这意味着更多的磁盘 I/O(取决于索引)、更多的临时存储来保存行集,这意味着更多的处理时间
- 重复扫描:如果可以在初始扫描期间处理,则必须再次扫描联合的整个结果以应用条件.这意味着对行集进行双重处理,虽然可能在内存中,但仍然是额外的工作.
- 索引不用于联合结果的 where 子句.如果您在外键字段和 postType上有索引,则不会使用它
如果您想要获得最佳性能,请使用 UNION ALL
,它将行直接传递到结果中而没有开销,而不是 UNION
,后者删除重复项(通常通过排序)并且可能很昂贵,根据您的评论是不必要的
定义这些索引并使用版本 1 以获得最佳性能:
在 t1(authorID, postType) 上创建索引 t1_authorID_postType;在 t1(websiteID, postType) 上创建索引 t1_websiteID_postType;
Both of these mysql queries produce exactly the same result but query A is a simple union and it has the where postType clause embedded inside individual queries whereas query B has the same where clause applied to the external select of the virtual table which is a union of individual query results. I am concerned that the virtual table sigma
from query B might get too large for no good reason if there are a lot of rows but then I am bit confused because how would the order by work for query A ; would it also not have to make a virtual table or something like that for sorting results. All may depend on how order by works for a union ? If order by for a union is also making a temp table ; would then query A almost equate to query B in resources(it will be much easier for us to implement query B in our system compared to query A)? Please guide/advise in any way possible, thanks
Query A
SELECT `t1`.*, `t2`.* FROM `t1` INNER JOIN `t2` ON `t1`.websiteID= `t2`.ownerID AND `t1`.authorID= `t2`.authorID AND `t1`.authorID=1559 AND `t1`.postType="simplePost" UNION SELECT `t1`.* FROM `t1` where websiteID=1559 AND postType="simplePost" ORDER BY postID limit 0,50
Query B
Select * from ( SELECT `t1`.*,`t2`.* FROM `t1` INNER JOIN `t2` ON `t1`.websiteID= `t2`.ownerID AND `t1`.authorID= `t2`.authorID AND `t1`.authorID=1559 UNION SELECT `t1`.* FROM `t1` where websiteID=1559 ) As sigma where postType="simplePost" ORDER BY postID limit 0,50
EXPLAIN FOR QUERY A
id type table type possible_keys keys key_len ref rows Extra 1 PRIMARY t2 ref userID userID 4 const 1 1 PRIMARY t1 ref authorID authorID 4 const 2 Usingwhere 2 UNION t1 ref websiteID websiteID 4 const 9 Usingwhere NULL UNIONRESULT <union1,2> ALL NULL NULL NULL NULL NULL Usingfilesort
EXPLAIN FOR QUERY B
id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY <derived2> ALL NULL NULL NULL NULL 10 Using where; Using filesort 2 DERIVED t2 ref userID userID 4 1 2 DERIVED t1 ref authorID authorID 4 2 Using where 3 UNION t1 ref websiteID websiteID 4 9 NULL UNION RESULT <union2,3> ALL NULL NULL NULL NULL NULL
There is no doubt that version 1 - separate where clauses in each side of the union - will be faster. Let's look at why version - where clause over the union result - is worse:
- data volume: there's always going to be more rows in the union result, because there are less conditions on what rows are returned. This means more disk I/O (depending on indexes), more temporary storage to hold the rowset, which means more processing time
- repeated scan: the entire result of the union must be scanned again to apply the condition, when it could have been handled during the initial scan. This means double handling the rowset, albeit probably in-memory, still it's extra work.
- indexes aren't used for where clauses on a union result. If you have an index over the foreign key fields and postType, it would not be used
If you want maximum performance, use UNION ALL
, which passes the rows straight out into the result with no overhead, instead of UNION
, which removes duplicates (usually by sorting) and can be expensive and is unnecessary based in your comments
Define these indexes and use version 1 for maximum performance:
create index t1_authorID_postType on t1(authorID, postType);
create index t1_websiteID_postType on t1(websiteID, postType);
这篇关于mysql:从两者中选择最有效的查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!