SQL Server中的COUNT(*)是恒定时间操作吗?如果没有,为什么不呢? [英] Is COUNT(*) in SQL Server a constant time operation? If not, why not?

查看:82
本文介绍了SQL Server中的COUNT(*)是恒定时间操作吗?如果没有,为什么不呢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在另一篇文章中阅读此讨论其他人提出的问题。在阅读讨论之前,我一直以为SQL Server(和其他DBMS)会在元数据中的某个位置保留每个表的全局行计数,但是讨论似乎并非如此。为什么? Count(*)(不进行任何过滤)是一种常见的操作,如果它是O(1),则将获得巨大的提升。即使不考虑 COUNT(*),表中的总行数也是基本信息。他们为什么不记下来呢?



此外,为什么我们需要加载整行(如我链接的文章所述)只是为了算他们吗?

解决方案

否, COUNT(* )不是固定时间的操作。 COUNT(*)必须返回符合当前扫描谓词的行数(即 WHERE 子句) ,因此仅此一项将使元数据属性的返回无效。但是,即使没有谓词,COUNT仍必须满足当前的事务隔离语义,即。返回 visible 行的计数(例如,已提交)。因此,在SQL Server中, COUNT 必须并且将实际扫描并计算行数。某些系统允许返回更快的估算计数。 / p>

此外,作为旁注,它依赖于行 .com / zh-CN / sql / relational-databases / system-catalog-views / sys-partitions-transact-sql rel = noreferrer> sys.partitions 不可靠。毕竟,如果可以保证此计数是准确的,那么我们就不需要 具有COUNT_ROWS 的DBCC UPDATEUSAGE(...)。历史上有几种情况会导致此计数器偏离实际情况(大多数情况下日志记录的插入回滚最少),我所知道的都是固定的,但仍然存在以下问题:1)具有错误的早期版本的升级表和2 )其他尚未发现的错误。


此外,为什么我们需要加载整行(如文章中所述)我链接)只是为了数他们?索引或PK等不足以计算它们吗?


这不是100%正确的。至少有两种情况不'加载整个行':




  • 狭窄的行存储索引仅加载索引行,这可能会小得多

  • 列存储数据仅加载相关的列段



上面说的不适用于Hekaton桌子。


I was reading this discussion in another post where this question was raised by someone else. Before reading the discussion, I always thought SQL Server (and other DBMS) keep a global count of rows for each table somewhere in the metadata, but the discussion seems to say it isn't so. Why? Count(*) (without any filtering) being such a common operation would get huge boost if it is O(1). Even not considering COUNT(*), the total number of rows in a table is such a fundamental piece of information. Why don't they keep a note of it?

In addition, why do we need to "load" entire rows (as indicated in the post I linked) just to count them? Shouldn't indexes or PKs etc. be sufficient to count them?

解决方案

No, COUNT(*) is not a constant time operation. A COUNT(*) must return a count of rows that conform to the current scan predicate (ie. WHERE clause), so that alone would make the return of a metadata property invalid. But even if you have no predicates, the COUNT still has to satisfy the current transaction isolation semantics, ie. return the count of rows visible (eg. committed). So COUNT must, and will, in SQL Server, actually scan and count the rows. Some systems allow return of faster 'estimate' counts.

Also, as a side comment, relying on rows in sys.partitions is unreliable. After all, if this count would be guaranteed accurate then we would not need DBCC UPDATEUSAGE(...) WITH COUNT_ROWS. There are several scenarios that historically would cause this counter to drift apart from reality (mostly minimally logged insert rollbacks), all I know of are fixed, but that still leaves the problems of 1) upgraded tables from earlier versions that had the bugs and 2) other, not yet discovered, bugs.

In addition, why do we need to "load" entire rows (as indicated in the post I linked) just to count them? Shouldn't indexes or PKs etc. be sufficient to count them?

This is not 100% true. There are at least 2 scenarios that do no 'load entire rows':

  • narrow rowstore indexes load just the 'index' row, which may be much smaller
  • columnstore data loads just the relevant column segments

And most of what I say above do not apply for Hekaton tables.

这篇关于SQL Server中的COUNT(*)是恒定时间操作吗?如果没有,为什么不呢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆