在非常大的表中计算确切行数的最快方法? [英] Fastest way to count exact number of rows in a very large table?
问题描述
我遇到过一些文章,指出 SELECT COUNT(*) FROM TABLE_NAME
在表有很多行和很多列时会很慢.
I have come across articles that state that SELECT COUNT(*) FROM TABLE_NAME
will be slow when the table has lots of rows and lots of columns.
我有一个可能包含数十亿行的表格 [它大约有 15 列].有没有更好的方法来获取表行数的精确计数?
I have a table that might contain even billions of rows [it has approximately 15 columns]. Is there a better way to get the EXACT count of the number of rows of a table?
在回答之前请考虑以下几点:
Please consider the following before your answer:
我正在寻找数据库供应商独立的解决方案.如果是就OK涵盖 MySQL、Oracle、MS SQL Server.但是如果真的没有数据库独立于供应商的解决方案然后我将适应不同的解决方案适用于不同的数据库供应商.
I am looking for a database vendor independent solution. It is OK if it covers MySQL, Oracle, MS SQL Server. But if there is really no database vendor independent solution then I will settle for different solutions for different database vendors.
我无法使用任何其他外部工具去做这个.我主要是找一个基于 SQL 的解决方案.
I cannot use any other external tool to do this. I am mainly looking for a SQL based solution.
我无法规范我的数据库设计任何进一步.它已经在 3NF 中,而且已经写了很多代码围绕它.
I cannot normalize my database design any further. It is already in 3NF and moreover a lot of code has already been written around it.
推荐答案
简单回答:
- 独立于数据库供应商的解决方案 = 使用标准 =
COUNT(*)
- 有近似 SQL Server 解决方案,但不要使用 COUNT(*) = 超出范围
- Database vendor independent solution = use the standard =
COUNT(*)
- There are approximate SQL Server solutions but don't use COUNT(*) = out of scope
注意事项:
COUNT(1) = COUNT(*) = COUNT(PrimaryKey)万一
SQL Server 示例(14 亿行,12 列)
SQL Server example (1.4 billion rows, 12 columns)
SELECT COUNT(*) FROM MyBigtable WITH (NOLOCK)
-- NOLOCK here is for me only to let me test for this answer: no more, no less
1 次运行,5:46 分钟,计数 = 1,401,659,700
1 runs, 5:46 minutes, count = 1,401,659,700
--Note, sp_spaceused uses this DMV
SELECT
Total_Rows= SUM(st.row_count)
FROM
sys.dm_db_partition_stats st
WHERE
object_name(object_id) = 'MyBigtable' AND (index_id < 2)
2 次运行,均在 1 秒内,计数 = 1,401,659,670
2 runs, both under 1 second, count = 1,401,659,670
第二个行数较少 = 错误.将相同或更多取决于写入(删除是在几个小时内完成的)
The second one has less rows = wrong. Would be the same or more depending on writes (deletes are done out of hours here)
这篇关于在非常大的表中计算确切行数的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!