简化SQL语句的一般规则 [英] General rules for simplifying SQL statements
问题描述
我正在寻找一些推理规则"(类似于设置操作规则或逻辑规则),可以用来减少SQL查询的复杂度或大小. 是否存在类似的东西?有论文,有工具吗?您自己发现的任何对等物吗?它在某种程度上类似于查询优化,但在性能方面却不一样.
I'm looking for some "inference rules" (similar to set operation rules or logic rules) which I can use to reduce a SQL query in complexity or size. Does there exist something like that? Any papers, any tools? Any equivalencies that you found on your own? It's somehow similar to query optimization, but not in terms of performance.
陈述不同:使用JOIN,SUBSELECT和UNION进行(复杂)查询是否有可能(或不)通过使用一些转换规则将其简化为更简单的等效SQL语句,该语句产生相同的结果. ?
To state it different: Having a (complex) query with JOINs, SUBSELECTs, UNIONs is it possible (or not) to reduce it to a simpler, equivalent SQL statement, which is producing the same result, by using some transformation rules?
因此,我正在寻找等效的SQL语句转换,例如大多数SUBSELECT都可以重写为JOIN的事实.
So, I'm looking for equivalent transformations of SQL statements like the fact that most SUBSELECTs can be rewritten as a JOIN.
推荐答案
陈述不同:使用JOIN,SUBSELECT和UNION进行(复杂)查询是否有可能(或不)通过使用一些转换规则将其简化为更简单的等效SQL语句,该语句产生相同的结果. ?
To state it different: Having a (complex) query with JOINs, SUBSELECTs, UNIONs is it possible (or not) to reduce it to a simpler, equivalent SQL statement, which is producing the same result, by using some transformation rules?
这正是优化程序谋生的方式(不是说我总是在做得很好).
That's exactly what optimizers do for a living (not that I'm saying they always do this well).
由于SQL
是一种基于集合的语言,所以通常有不止一种将一个查询转换为另一种查询的方法.
Since SQL
is a set based language, there are usually more than one way to transform one query to other.
喜欢此查询:
SELECT *
FROM mytable
WHERE col1 > @value1 OR col2 < @value2
可以转换成这样:
SELECT *
FROM mytable
WHERE col1 > @value1
UNION
SELECT *
FROM mytable
WHERE col2 < @value2
或者这个:
SELECT mo.*
FROM (
SELECT id
FROM mytable
WHERE col1 > @value1
UNION
SELECT id
FROM mytable
WHERE col2 < @value2
) mi
JOIN mytable mo
ON mo.id = mi.id
,看起来很丑,但可以产生更好的执行计划.
, which look uglier but can yield better execution plans.
最常见的事情之一是替换此查询:
One of the most common things to do is replacing this query:
SELECT *
FROM mytable
WHERE col IN
(
SELECT othercol
FROM othertable
)
与此:
SELECT *
FROM mytable mo
WHERE EXISTS
(
SELECT NULL
FROM othertable o
WHERE o.othercol = mo.col
)
在某些RDBMS
(例如PostgreSQL
)中,DISTINCT
和GROUP BY
使用不同的执行计划,因此有时最好将其中一个替换为另一个:
In some RDBMS
's (like PostgreSQL
), DISTINCT
and GROUP BY
use the different execution plans, so sometimes it's better to replace one with the other:
SELECT mo.grouper,
(
SELECT SUM(col)
FROM mytable mi
WHERE mi.grouper = mo.grouper
)
FROM (
SELECT DISTINCT grouper
FROM mytable
) mo
vs.
SELECT mo.grouper, SUM(col)
FROM mytable
GROUP BY
mo.grouper
在PostgreSQL
中,DISTINCT
进行排序和GROUP BY
哈希.
In PostgreSQL
, DISTINCT
sorts and GROUP BY
hashes.
MySQL
缺少FULL OUTER JOIN
,因此可以将其重写为以下内容:
MySQL
lacks FULL OUTER JOIN
, so it can be rewritten as folloing:
SELECT t1.col1, t2.col2
FROM table1 t1
LEFT OUTER JOIN
table2 t2
ON t1.id = t2.id
vs.
SELECT t1.col1, t2.col2
FROM table1 t1
LEFT JOIN
table2 t2
ON t1.id = t2.id
UNION ALL
SELECT NULL, t2.col2
FROM table1 t1
RIGHT JOIN
table2 t2
ON t1.id = t2.id
WHERE t1.id IS NULL
,但请参阅我的博客中有关如何在MySQL
中更有效地执行此操作的文章:
, but see this article in my blog on how to do this more efficiently in MySQL
:
Oracle
中的此分层查询:
SELECT DISTINCT(animal_id) AS animal_id
FROM animal
START WITH
animal_id = :id
CONNECT BY
PRIOR animal_id IN (father, mother)
ORDER BY
animal_id
可以转换为此:
SELECT DISTINCT(animal_id) AS animal_id
FROM (
SELECT 0 AS gender, animal_id, father AS parent
FROM animal
UNION ALL
SELECT 1, animal_id, mother
FROM animal
)
START WITH
animal_id = :id
CONNECT BY
parent = PRIOR animal_id
ORDER BY
animal_id
,后者表现更好.
有关执行计划的详细信息,请参阅我的博客中的这篇文章:
See this article in my blog for the execution plan details:
要查找与给定范围重叠的所有范围,可以使用以下查询:
To find all ranges that overlap the given range, you can use the following query:
SELECT *
FROM ranges
WHERE end_date >= @start
AND start_date <= @end
,但是在SQL Server
中,此更复杂的查询会更快地产生相同的结果:
, but in SQL Server
this more complex query yields same results faster:
SELECT *
FROM ranges
WHERE (start_date > @start AND start_date <= @end)
OR (@start BETWEEN start_date AND end_date)
,不管您相信与否,我的博客上也有一篇文章:
, and believe it or not, I have an article in my blog on this too:
SQL Server
也缺少一种有效的方法来进行累积聚合,因此此查询:
SQL Server
also lacks an efficient way to do cumulative aggregates, so this query:
SELECT mi.id, SUM(mo.value) AS running_sum
FROM mytable mi
JOIN mytable mo
ON mo.id <= mi.id
GROUP BY
mi.id
可以使用游标(主,请帮助我,您用一句话正确地听到了cursors
,more efficiently
和SQL Server
)更有效地重写.
can be more efficiently rewritten using, Lord help me, cursors (you heard me right: cursors
, more efficiently
and SQL Server
in one sentence).
在我的博客中查看有关如何执行此操作的文章:
See this article in my blog on how to do it:
在金融应用程序中通常会遇到一种查询,查询某种货币的有效汇率,如Oracle
中的这样:
There is a certain kind of query commonly met in financial applications that searches for the effective rate for a currency, like this one in Oracle
:
SELECT TO_CHAR(SUM(xac_amount * rte_rate), 'FM999G999G999G999G999G999D999999')
FROM t_transaction x
JOIN t_rate r
ON (rte_currency, rte_date) IN
(
SELECT xac_currency, MAX(rte_date)
FROM t_rate
WHERE rte_currency = xac_currency
AND rte_date <= xac_date
)
可以大量重写此查询以使用等于条件,该条件允许使用HASH JOIN
而不是NESTED LOOPS
:
This query can be heavily rewritten to use an equality condition which allows a HASH JOIN
instead of NESTED LOOPS
:
WITH v_rate AS
(
SELECT cur_id AS eff_currency, dte_date AS eff_date, rte_rate AS eff_rate
FROM (
SELECT cur_id, dte_date,
(
SELECT MAX(rte_date)
FROM t_rate ri
WHERE rte_currency = cur_id
AND rte_date <= dte_date
) AS rte_effdate
FROM (
SELECT (
SELECT MAX(rte_date)
FROM t_rate
) - level + 1 AS dte_date
FROM dual
CONNECT BY
level <=
(
SELECT MAX(rte_date) - MIN(rte_date)
FROM t_rate
)
) v_date,
(
SELECT 1 AS cur_id
FROM dual
UNION ALL
SELECT 2 AS cur_id
FROM dual
) v_currency
) v_eff
LEFT JOIN
t_rate
ON rte_currency = cur_id
AND rte_date = rte_effdate
)
SELECT TO_CHAR(SUM(xac_amount * eff_rate), 'FM999G999G999G999G999G999D999999')
FROM (
SELECT xac_currency, TRUNC(xac_date) AS xac_date, SUM(xac_amount) AS xac_amount, COUNT(*) AS cnt
FROM t_transaction x
GROUP BY
xac_currency, TRUNC(xac_date)
)
JOIN v_rate
ON eff_currency = xac_currency
AND eff_date = xac_date
尽管笨拙,但后者查询的速度要快6
倍.
Despite being bulky as a hell, the latter query is 6
times faster.
这里的主要思想是将<=
替换为=
,这需要构建内存中的日历表.到JOIN
.
The main idea here is replacing <=
with =
, which requires building an in-memory calendar table. to JOIN
with.
这篇关于简化SQL语句的一般规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!