为什么单列选择和多列选择之间存在性能差异 [英] why there is a performance difference between single column select and multi column select

查看:124
本文介绍了为什么单列选择和多列选择之间存在性能差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有表报表有10+列,其中每个字段,除了autoincrement id是varchar。
表的任何行的最大大小是〜80000字节。

I have table Report which has 10+ columns, where each field except autoincrement id is varchar. Maximum size of any row of table is ~80000 bytes.

我观察到以下2个查询之间的相当大的差异,其中服务器和客户端都在相同的系统(具有约1.5M行)(也具有列名上的索引)

I observed the considerable amount of difference between below 2 queries where both server and client are on the same system(with ~1.5M rows)(also having index on column name)

select * from Report;

&

select name from Report;

我经历了其他stackoverflow问题,其中Ans建议

I went through other stackoverflow ques, where Ans suggesting that


选择特定列更好的一个原因是它
提高了SQL Server可以从
索引访问数据而不是查询表数据的概率。

"One reason that selecting specific columns is better is that it raises the probability that SQL Server can access the data from indexes rather than querying the table data. "

我的问题是:

1)即使使用索引我们需要从实际的DB(即磁盘)读取,所以当我们选择单列或多列(比如说使用*)时,同样的磁盘I / O将
执行。

1) Even when using index we need to read from actual DB(i.e. Disk), so same Disk I/o will get performed when we selecting single column or multiple columns(let say by using *).

2)有人可以用一点内部解释它为什么单列选择和多列选择之间有区别。
由于我理解数据库尝试将整行存储在同一个磁盘扇区中。

2) Can someone explain it with little internals that why there is a difference between single column select and multiple column select. As I understood databases try to store the complete row in same disk sector.

让我们为一些用户表查询,例如

Let say for some query for User table like

select first_name from user & 
select first_name, age from user 

3)数据库是否执行相同的磁盘I / O对于上面的查询),或磁盘I / O可能不同,如果User表有多于10列,我们选择5〜8栏仍然磁盘I / O保持不变?

3) Does DB perform same disk I/O(for above queries), or disk I/O's may differ and what if User table has more then 10 columns and we are selecting 5~8 cols still disk I/O remains same?

已在其他Ans中阅读过,使用select *是不可取的,它可能会导致视图和维护问题(向DB添加新列时的情况)

Already read in other Ans that using select * is not preferable and it may cause issues with views and maintenance(cases when adding new columns to DB)

一个傻的que,因为仍然学习数据库(新手主题)。

It maybe a silly que, since still learning databases(newbie to subject). Please provide link if some ques already answered anywhere.

推荐答案

您的假设是错误的。如果索引包含所有选定的列,SQL Server将不会命中磁盘来获取它们,它将从索引数据中检索它们。索引更可能保存在内存中而不是数据页面,所以查询可能不一定要敲击磁盘。

Your assumptions are wrong. If the index contains all the selected columns, SQL Server will not hit the disk to get them, it will retrieve them from the index data. Indexes are much more likely to be kept in memory than data pages so a query may not have to hit the disk at all.

如果您的选择包含您不需要但不包含在任何索引中的列,服务器将无需从磁盘读取数据。

If your select contains columns that you don't need but are not contained in any indexes, the server will have to read the data from disk for no reason.

另一件改变是网络I / O。生产服务器可能必须每分钟服务几千个查询,数百个客户端。在数据中包括无用列会影响所有客户端。如果你考虑在客户可能有几千的热门网站中使用服务器,情况会更糟。

Another thing that does change is network I/O. A production server probably has to serve several thousand queries per minute, to hundreds of clients. Including useless columns in the data impacts all clients. Things are even worse if you consider servers used in popular web sites where the clients may be several thousand.

这篇关于为什么单列选择和多列选择之间存在性能差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆