是否有关于SQL查询复杂性和性能的一般规则? [英] Is there any general rule on SQL query complexity Vs performance?
问题描述
1)如果不使用索引,是否将SQL查询执行时间O(n)与连接数相比较?如果没有,我们可能期望什么样的关系?索引是否可以改善实际的big-O时间复杂性,还是仅将整个查询时间减少一定的常数?
1)Are SQL query execution times O(n) compared to the number of joins, if indexes are not used? If not, what kind of relationship are we likely to expect? And can indexing improve the actual big-O time-complexity, or does it only reduce the entire query time by some constant factor?
一个模糊的问题,我敢肯定它变化很大,但是我在这里是从一般意义上说的.
Slightly vague question, I'm sure it varies a lot but I'm talking in a general sense here.
2)如果您有类似这样的查询:
2) If you have a query like:
SELECT T1.name, T2.date
FROM T1, T2
WHERE T1.id=T2.id
AND T1.color='red'
AND T2.type='CAR'
我是对的,假设数据库在评估多表条件之前会先对T1.color和T2.type进行单表过滤吗?在这种情况下,使查询更复杂可能会使其更快,因为较少的行要接受连接级测试?
Am I right assuming the DB will do single table filtering first on T1.color and T2.type, before evaluating multi-table conditions? In such a case, making the query more complex could make it faster because less rows are subjected to the join-level tests?
推荐答案
这取决于所使用的查询计划.
This depends on the query plan used.
即使没有索引,现代服务器也可以使用比O(N * M)
Even without indexes, modern servers can use HASH JOIN
and MERGE JOIN
which are faster than O(N * M)
更具体地说,HASH JOIN
的复杂度是O(N + M)
,其中N
是哈希表,而M
是查找表.散列和散列查找具有恒定的复杂性.
More specifically, complexity of a HASH JOIN
is O(N + M)
, where N
is the hashed table and M
the is lookup table. Hashing and hash lookups have constant complexity.
MERGE JOIN
的复杂度为O(N*Log(N) + M*Log(M))
:这是对两个表进行排序的时间加上对其进行扫描的时间之和.
Complexity of a MERGE JOIN
is O(N*Log(N) + M*Log(M))
: it's the sum of times to sort both tables plus time to scan them.
SELECT T1.name, T2.date
FROM T1, T2
WHERE T1.id=T2.id
AND T1.color='red'
AND T2.type='CAR'
如果未定义索引,则引擎将选择HASH JOIN
或MERGE JOIN
.
If there are no indexes defined, the engine will select either a HASH JOIN
or a MERGE JOIN
.
HASH JOIN
的工作方式如下:
-
会选择哈希表(通常是记录较少的表).说是
t1
扫描来自t1
的所有记录.如果记录包含color='red'
,则该记录进入哈希表,并以id
作为键,将name
作为值.
All records from t1
are scanned. If the records holds color='red'
, this record goes into the hash table with id
as a key and name
as a value.
扫描来自t2
的所有记录.如果记录包含type='CAR'
,则在哈希表中搜索其id
,并返回所有哈希命中的name
值以及当前值data
.
All records from t2
are scanned. If the record holds type='CAR'
, its id
is searched in the hash table and the values of name
from all hash hits are returned along with the current value of data
.
MERGE JOIN
的工作方式如下:
-
已创建
t1 (id, name)
的副本,并按id
已创建t2 (id, data)
的副本,并按id
两个表中的指针都设置为最小值:
The pointers are set to the minimal values in both tables:
>1 2<
2 3
2 4
3 5
在循环中比较指针,如果它们匹配,则返回记录.如果它们不匹配,则使用最小值的指针前进:
The pointers are compared in a loop, and if they match, the records are returned. If they don't match, the pointer with the minimal value is advanced:
>1 2< - no match, left pointer is less. Advance left pointer
2 3
2 4
3 5
1 2< - match, return records and advance both pointers
>2 3
2 4
3 5
1 2 - match, return records and advance both pointers
2 3<
2 4
>3 5
1 2 - the left pointer is out of range, the query is over.
2 3
2 4<
3 5
>
在这种情况下,由于较少的行要进行连接级测试,因此使查询更复杂可以使其速度更快?
In such a case, making the query more complex could make it faster because less rows are subjected to the join-level tests?
好的.
您的查询中没有WHERE
子句:
SELECT T1.name, T2.date
FROM T1, T2
更简单,但返回更多结果,并且运行时间更长.
is more simple but returns more results and runs longer.
这篇关于是否有关于SQL查询复杂性和性能的一般规则?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!