使用COUNT(*)还是SELECT *是个好主意? [英] Is using COUNT(*) or SELECT * a good idea?

查看:155
本文介绍了使用COUNT(*)还是SELECT *是个好主意?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我听说过几次不应该执行 COUNT(*) SELECT *

I've heard several times that you shouldn't perform COUNT(*) or SELECT * for performance reasons, but wasn't able to dig up some further information about it.

我可以想象数据库然后使用所有列行动,这可以是一个令人印象深刻的性能损失,但我不知道这一点。有人有关于该主题的进一步信息吗?

I can imagine that the database is then using all columns for the action, which can be an impressive performance loss, but I'm not sure about that. Does somebody have further information about the topic?

推荐答案

1。在计数(*)与计数(其他)



SQL是声明性的,因为您指定了这不同于指定如何得到你想要的。这意味着数据库引擎可以以它认为最有效的任何方式自由实现您的查询。许多数据库优化器将您的查询重写为成本较低的替代方案(如果有这样的计划可用)。

1. On count(*) vs. count(something else)

SQL is declarative in that you specify what you want. This is different from specifying how to get what you want. That means the database engine is free to realize your query in whatever way it thinks is the most efficient. Many database optimizers rewrites your query to a less costly alternative (if such a plan is available).

给定下表:

table(
   pk       not null
  ,color    not null
  ,nullable null
  ,unique(pk)
  ,index(color)
);

... 全部 计数空值的机制):

...all of the following are functionally equivalent (due to the mechanics of count and nulls):

1) select count(*) from table;
2) select count(1) from table;
3) select count(pk) from table;
4) select count(color) from table;

无论使用哪种形式,优化器都可以将查询重写为另一种形式更高效。 (同样,并不是所有的优化器都足够复杂这样做)。唯一索引(pk)将比整个表更小(占用的字节)。因此,计算索引条目的数量而不是扫描整个表会更有效。在Oracle中,我们有位图索引,它也压缩重复的字符串。如果我们在颜色列上使用了这样的索引,那么它可能是最小的要扫描的索引。 Oracle还支持表压缩,在某些情况下,使物理表小于复合索引。

Regardless of which form you use, the optimizer is free to rewrite the query to another form if it is more efficient. (Again, not all optimizers are sophisticated enough to do this). The unique index(pk) would be smaller (bytes occupied) than the entire table. Therefore it would be more efficient to count the number of index entries rather than scanning through the entire table. In Oracle we have bitmap indexes, which also compress repeating strings. If we had used such an index on the color column, it would probably have been the smallest index to scan. Oracle also supports table compression which in some cases makes the physical table smaller than a composite index.

1。 TL; DR;
您的特定dbms将有自己的一组工具,启用不同的重写规则和执行计划。这使问题有些无用(除非我们谈论一个特定的dbms的特定版本)。我建议在所有情况下 COUNT(*),因为它需要最少的认知努力才能掌握。

1. TL;DR; Your specific dbms will have its own set of tools that enables different rewriting rules and in turn execution plans. That renders the question somewhat useless (unless we talk about a specific release of a specific dbms). I recommend COUNT(*) in all cases because it requires the least cognitive effort to grasp.

在代码中有很少有效的使用 SELECT * 写入并投入生产。想象一下,一个包含Bluray电影的表格(是的,电影存储在这个表格中的一个Blob)。所以你把你的awesomesauce抽象层一起放在 getMovies(movie_id)中的 SELECT * FROM movies where id =?方法。我将禁止自己解释为什么 SELECT name FROM movies 将通过网络传输更快一点。当然,在大多数现实情况下,它不会产生显着的影响。

There are very few valid uses of SELECT * in code you write and put into production. Imagine a table which contains Bluray movies (yes, the movies is stored as a blob in this table). So you slapped together your awesomesauce abstraction layer and put SELECT * FROM movies where id = ? in the getMovies(movie_id) method. I will refrain myself from explaining why SELECT name FROM movies will be transported across the network just a tad faster. Of course, in most realistic cases it won't have a noticable impact.

性能的最后一点是,当所有被引用的列查询存在作为索引(称为覆盖索引),数据库不需要触摸表。它可以完全解决从扫描索引。通过选择所有列,您可以从优化程序中删除此选项。

One last point on performance is that when all the referenced columns (selected, filtered) in your query exists as an index (called a covering index), the database need not touch the table at all. It can be fully resolved from scanning the index only. By selecting all columns you remove this option from the optimizer.

SELECT * 比任何东西,它是创建一个隐含的依赖于表的特定物理布局。让我解释。考虑下列表:

Another thing about SELECT * which is far more serious than anything, is that it creates an implicit dependency on a specific physical layout of the table. Let me explain. Consider the following tables:

table T1(name, id)
table T2(name, id)

以下语句...

insert into t1 select * from t2;

...将中断或产生不同的结果:

... will break or produce a different result if any of the following happens:


  • 任何表格列都重新排列,例如T1(id,name)

  • 空列

  • T2会得到另一列

2。 TL; DR; 如果可能,请明确指定所需的列(最终,您必须这样做)。此外,选择较少的列比选择更多的列要快。对显式选择的可能的副作用是它给优化器更大的自由。

2. TL;DR; When possible, explicitly specify the columns you want (eventually, you'll have to do that anyway). Also, selecting fewer columns are faster than selecting more columns. A possitive side-effect on explicit selects is that it gives greater freedom to the optimizer.

这篇关于使用COUNT(*)还是SELECT *是个好主意?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆