是否有关于SQL查询复杂性和性能的一般规则? [英] Is there any general rule on SQL query complexity Vs performance?

查看:96
本文介绍了是否有关于SQL查询复杂性和性能的一般规则?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

1)如果不使用索引,是否将SQL查询执行时间O(n)与连接数相比较?如果没有,我们可能期望什么样的关系?索引是否可以改善实际的big-O时间复杂性,还是仅将整个查询时间减少一定的常数?

1)Are SQL query execution times O(n) compared to the number of joins, if indexes are not used? If not, what kind of relationship are we likely to expect? And can indexing improve the actual big-O time-complexity, or does it only reduce the entire query time by some constant factor?

一个模糊的问题,我敢肯定它变化很大,但是我在这里是从一般意义上说的.

Slightly vague question, I'm sure it varies a lot but I'm talking in a general sense here.

2)如果您有类似这样的查询:

2) If you have a query like:

SELECT  T1.name, T2.date
FROM    T1, T2
WHERE   T1.id=T2.id
        AND T1.color='red'
        AND T2.type='CAR'

我是对的,假设数据库在评估多表条件之前会先对T1.color和T2.type进行单表过滤吗?在这种情况下,使查询更复杂可能会使其更快,因为较少的行要接受连接级测试?

Am I right assuming the DB will do single table filtering first on T1.color and T2.type, before evaluating multi-table conditions? In such a case, making the query more complex could make it faster because less rows are subjected to the join-level tests?

推荐答案

这取决于所使用的查询计划.

This depends on the query plan used.

即使没有索引,现代服务器也可以使用比O(N * M)

Even without indexes, modern servers can use HASH JOIN and MERGE JOIN which are faster than O(N * M)

更具体地说,HASH JOIN的复杂度是O(N + M),其中N是哈希表,而M是查找表.散列和散列查找具有恒定的复杂性.

More specifically, complexity of a HASH JOIN is O(N + M), where N is the hashed table and M the is lookup table. Hashing and hash lookups have constant complexity.

MERGE JOIN的复杂度为O(N*Log(N) + M*Log(M)):这是对两个表进行排序的时间加上对其进行扫描的时间之和.

Complexity of a MERGE JOIN is O(N*Log(N) + M*Log(M)): it's the sum of times to sort both tables plus time to scan them.

SELECT  T1.name, T2.date
FROM    T1, T2
WHERE   T1.id=T2.id
        AND T1.color='red'
        AND T2.type='CAR'

如果未定义索引,则引擎将选择HASH JOINMERGE JOIN.

If there are no indexes defined, the engine will select either a HASH JOIN or a MERGE JOIN.

HASH JOIN的工作方式如下:

  1. 会选择哈希表(通常是记录较少的表).说是t1

扫描来自t1的所有记录.如果记录包含color='red',则该记录进入哈希表,并以id作为键,将name作为值.

All records from t1 are scanned. If the records holds color='red', this record goes into the hash table with id as a key and name as a value.

扫描来自t2的所有记录.如果记录包含type='CAR',则在哈希表中搜索其id,并返回所有哈希命中的name值以及当前值data.

All records from t2 are scanned. If the record holds type='CAR', its id is searched in the hash table and the values of name from all hash hits are returned along with the current value of data.

MERGE JOIN的工作方式如下:

  1. 已创建t1 (id, name)的副本,并按id

已创建t2 (id, data)的副本,并按id

两个表中的指针都设置为最小值:

The pointers are set to the minimal values in both tables:

>1  2<
 2  3
 2  4
 3  5

  • 在循环中比较指针,如果它们匹配,则返回记录.如果它们不匹配,则使用最小值的指针前进:

  • The pointers are compared in a loop, and if they match, the records are returned. If they don't match, the pointer with the minimal value is advanced:

    >1  2<  - no match, left pointer is less. Advance left pointer
     2  3
     2  4
     3  5
    
     1  2<  - match, return records and advance both pointers
    >2  3
     2  4
     3  5
    
     1  2  - match, return records and advance both pointers
     2  3< 
     2  4
    >3  5
    
     1  2 - the left pointer is out of range, the query is over.
     2  3
     2  4<
     3  5
    >
    

  • 在这种情况下,由于较少的行要进行连接级测试,因此使查询更复杂可以使其速度更快?

    In such a case, making the query more complex could make it faster because less rows are subjected to the join-level tests?

    好的.

    您的查询中没有WHERE子句:

    SELECT  T1.name, T2.date
    FROM    T1, T2
    

    更简单,但返回更多结果,并且运行时间更长.

    is more simple but returns more results and runs longer.

    这篇关于是否有关于SQL查询复杂性和性能的一般规则?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆