SQL SELECT语句在执行期间实际执行什么? [英] What does a SQL SELECT statement actually do during execution?

查看:208
本文介绍了SQL SELECT语句在执行期间实际执行什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SELECT 语句中:

  SELECT name 
FROM users
WHERE地址IN(addr_a,addr_b,addr_c,...);

我们知道它会选择地址在(addr_a,addr_b,addr_c, ..)。但我想知道它在执行这个语句时实际做了什么。



例如,它是否搜索表中的每个元素以检查其地址是否在(addr_a,...)中?



如果 addr_a addr_b 太长它会减慢搜索过程吗?



是否有任何关于这些建议的材料?



编辑:我没有指定RDBMS,因为我想知道尽可能多的SQL实现。



strong>:这里我得到关于MySQL和SQL Server的答案,我接受了SQL Server,因为它是一个详细的答案。欢迎有关其他RDBMS的更多答案。

解决方案

由于您没有指定您的问题的RDBMS,写它如何在SQL Server上工作,试图简化它一点,避免很多技术性。在不同的系统上它可能是相同或非常相似的,但也可能完全不同。



SQL Server将如何处理您的查询

 `SELECT name FROM users WHERE地址IN(addr_a,addr_b,addr_c,...);`

几乎完全取决于表上有什么样的索引。以下是3个基本方案:



情况1 (良好的索引)



如果您有覆盖索引,这意味着列地址上的PK或聚簇索引或地址其中包括 name ,SQL Server将执行 Index Seek 。这意味着它将通过索引的树结构,并快速精确定位您需要的行(或发现它不存在)。因为 name 列也包含在索引中,它将读取它并从那里返回。



方案2 (不太好的索引)



这是在列地址,其中不包括 name 。你可能会经常发现这些类型的索引 - 只有一列,但是你会发现很快,他们在大多数时候都是无用的。你在这里希望SQL Server通过你的索引结构(seek),并快速找到具有你的地址的行。但是由于列 name 现在不存在,它只能获取rowID(或PK),实际上是行,所以它将为每一行返回另外的读取另一个索引或表以查找您的行和检索名称。由于这需要比场景1多3倍的读取,SQL Server将更多的时候不会决定只是通过表的所有行而不是使用您的索引是更便宜的。


$ b

p>如果你根本没有索引或列地址没有索引,就会发生这种情况。简单来说SQL Server遍历所有行,并检查每一行的条件。如果没有索引,则称为索引扫描(或表扫描)。通常情况下最糟糕的
情况,并且最慢。



希望有助于澄清事情。



至于其他关于长字符串减速的子问题下来 - 这种情况下的答案将是可能不多。当SQl Server比较两个字符串时,它逐个字符,因此如果两个字符串的第一个字母不同,它将不会进一步检查。但是,如果你在字符串的开头放一个通配符,即: WHERE地址LIKE'%addr_a' SQL Server将必须检查列中每个字符串的每个字符,慢得多。


In a SELECT statement :

SELECT name 
FROM users 
WHERE address IN (addr_a, addr_b, addr_c, ...);

We know that it will select all person's names whose address is in (addr_a, addr_b, addr_c, ...). But I want to know what it actually do when executing this statement.

For example, does it search every element in the table to check if its address is in (addr_a, ...) ?

If addr_a, addr_b is too long, does it slow down the search process?

Is there any material about these stuff to be recommended ?

Edit: I didn't specify a RDBMS, because I would like to know as many SQL implementations as possible.

Edit again: Here I got answers about MySQL and SQL Server and I accepted the "SQL Server" one as it's a detailed answer. Welcome for more answers about other RDBMS.

解决方案

Since you haven't specified which RDBMS are your question about, I am going to write how it works on SQL Server, trying to simplify it a bit and avoid much of technicalities. It might be same or very similar on different systems, but it also might be completely different.

What SQL Server is going to do with your query

`SELECT name FROM users WHERE address IN (addr_a, addr_b, addr_c, ...);`

depends almost entirely on what kind of indexes do you have on a table. Here are a 3 basic scenarios:

Scenario 1 (good index)

If you have what is called Covering Index, which would mean either a PK or clustered index on column address or non-clustered index on address which include name, SQL Server will do something called Index Seek. It means it will go through index's tree structure and quickly pinpoint the exact row you need (or find it's not existing). Since name column is also included in index, it will read it and return right from there.

Scenario 2 (not-so-good index)

This is the case when you have index on column address, which does not include column name. You might find these kind of indexes - on only one column - very often, but as you'll find out soon they are pretty useless most of the time. What you are hoping here that SQL Server goes through your index structure (seek) and quickly finds the row with your address. However as column name is not there now, it can only get rowID (or PK) where the row actually is, so it will for each row returned do additional reading of another index or table to find your row and retrieve name. Since that takes 3 times more reading then scenario 1, SQL Server will more often then not decide that it's cheaper to just go through all rows of table rather than to use your index. And that is explained in scenario 3.

Scenario 3 (no usable index)

This will happen if you don't have indexes at all or no indexes on column address. Simply speaking SQL Server goes through all the rows and check every row for your condition. This is called Index Scan (or Table Scan if there are no indexes at all). Usually the worst case scenario and slowest at all.

Hopes that helps to clarify things a bit.

As for the other sub-question about long string slowing down - the answer for this case would be 'probably not much'. When SQl Server compares two strings, it goes character-by-character, so if the first letters of both strings are different, it will not check further. However if you put a wildcard % on beginning of your string ie: WHERE address LIKE '%addr_a' SQL Server will have to check every character of every string in column and therefore work much slower.

这篇关于SQL SELECT语句在执行期间实际执行什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆