查询所有表的不同计数时,Tempdb已满 [英] Tempdb Full When Querying Distinct Count Of All Tables

查看:135
本文介绍了查询所有表的不同计数时,Tempdb已满的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

ORIGINAL PROBLEM



我创建了一个自定义脚本,用于将数据从远程SQL服务器检索到我们办公室的本地副本。我有一些问题的脚本,其中选定的表有一些数据插入两次,从而创建重复。我知道对于所有数据库中的所有表,应该没有重复。



这个问题使我偏执,其他表可能有



p>我创建了一个SQL脚本,将所有列的count和distinct计数插入到服务器上所有数据库(不包括4个系统数据库)的表中:

  DECLARE @TableFullName AS NVARCHAR(MAX)
DECLARE @SQLQuery AS NVARCHAR(MAX)
DECLARE @TableHasDuplicates AS BIT
DECLARE @TempTableRowCount AS INT
DECLARE @ResultsTable TABLE([CompleteTableName] NVARCHAR(200),[CountAll] INT,[CountDistinct] INT)
DECLARE @CountAll INT
DECLARE @CountDistinct INT

SET NOCOUNT ON
DECLARE @AllTables TABLE([CompleteTableName] NVARCHAR(200))
INSERT INTO @AllTables([CompleteTableName])
EXEC sp_msforeachdb'SELECT''[''+ [TABLE_CATALOG] '']。['+ [TABLE_SCHEMA] +'']。[''+ [TABLE_NAME] +'']''FROM [?]。INFORMATION_SCHEMA.TABLES'
SET NOCOUNT OFF;

DECLARE [table_cursor] CURSOR FOR
(SELECT *
FROM @AllTables
WHERE [CompleteTableName] NOT LIKE'%master%'AND [CompleteTableName] NOT LIKE' %msdb%'AND [CompleteTableName] NOT LIKE'%tempdb%'AND [CompleteTableName] NOT LIKE'%model%');

OPEN [table_cursor]

PRINT N'There are'+ CAST(@CountAll AS NVARCHAR(10))+'具有潜在重复数据的表'

FETCH NEXT FROM [table_cursor]
INTO @TableFullName

WHILE @@ FETCH_STATUS = 0
BEGIN
SET @SQLQuery ='SELECT @CntAll = COUNT * FROM'+ @TableFullName +'SELECT @CntDistinct = COUNT(*)FROM(SELECT DISTINCT * FROM'+ @TableFullName +')AS [sq] IF @CntAll> @CntDistinct SELECT @ BitResult = 1 ELSE SELECT @ BitResult = 0';

EXEC sp_executesql @SQLQuery,N'@ BitResult BIT OUTPUT,@CntAll INT OUTPUT,@CntDistinct INT OUTPUT',@BitResult = @TableHasDuplicates OUTPUT,@CntAll = @CountAll OUTPUT,@CntDistinct = @CountDistinct输出;

IF @TableHasDuplicates = 1
BEGIN
INSERT INTO @ResultsTable([CompleteTableName],[CountAll],[CountDistinct])
SELECT @TableFullName,@CountAll, @CountDistinct
END;

FETCH NEXT FROM [table_cursor]
INTO @TableFullName
END
CLOSE [table_cursor];
DEALLOCATE [table_cursor];

SELECT *
FROM @ResultsTable

它的工作原理是表变量@AllTables使用带有INFORMATION_SCHEMA.TABLES的sp_msforeachdb列出所有数据库中的所有表(有16537个表)。表游标用于存储所有非系统条目,然后使用动态SQL执行存储在另一个表变量@ResultsTable中的计数和不同计数。



与此解决方案的问题



当我运行此查询时,它将运行大约3分钟,然后抛出一个错误,说tempdb PRIMARY文件组已满:



我是我自己的DBA,我使用;我建议你阅读,以了解可以影响TempDb的对象,以及如何解决它的常见问题。理想情况下,您的总TempDb大小应根据观察值计算,在您的情况下> 24 GB。



**编辑1 **



如果您不确定stats更新,然后使用下面的查询获取所有表的计数
注意:替换您不想要统计的数据库

  DECLARE @ServerStats TABLE(DatabaseName varchar(200),TableName varchar(200),RowsCount INT)
INSERT INTO @ServerStats
exec sp_msforeachdb @ command1 = '
use#;
if''#''NOT IN(''master'',''model'',''msdb'',''tempdb'',''ReportServer'')
begin
print''#''
exec sp_MSforeachtable @ command1 =''
SELECT'''''''''AS DATABASENAME,'''''''AS TABLENAME,COUNT *)FROM? ;
''
end
',@replacechar ='#'

SELECT * FROM @ServerStats

类似地,对于所有具有以下查询的数据库,您可以在所有表​​中使用不同

  DECLARE @ServerStatsDistinct TABLE(DatabaseName varchar(200),TableName varchar(200),RowsCount INT)
INSERT INTO @ServerStatsDistinct
exec sp_msforeachdb @ command1 ='
use#;
if''#''NOT IN(''master'',''model'',''msdb'',''tempdb'',''ReportServer'')
begin
print''#''
exec sp_MSforeachtable @ command1 =''
SELECT'''''''''AS DATABASENAME,'''''''AS TABLENAME,COUNT *)FROM(
SELECT DISTINCT *
FROM?
)a;
''
end
',@replacechar ='#'

SELECT * FROM @ServerStatsDistinct


ORIGINAL PROBLEM

I have created a custom script to retrieve data from a remote SQL server into our local copy in our office. I had some issues with the script where selected tables had some data inserted twice, thus creating duplicates. I know that for all the tables in all databases there should be no duplicates.

This issue has made me paranoid that other tables may have had this problem historically, and therefore I'd like to verify this.

SOLUTION

I have created a SQL script to insert the count and distinct count of all columns into a table for all the databases on our server (excluding the 4 system databases):

DECLARE     @TableFullName AS NVARCHAR(MAX)
DECLARE     @SQLQuery AS NVARCHAR(MAX)
DECLARE     @TableHasDuplicates AS BIT
DECLARE     @TempTableRowCount AS INT
DECLARE     @ResultsTable TABLE ([CompleteTableName] NVARCHAR(200), [CountAll] INT, [CountDistinct] INT)
DECLARE     @CountAll INT
DECLARE     @CountDistinct INT

SET NOCOUNT ON
DECLARE @AllTables TABLE ([CompleteTableName] NVARCHAR(200))
INSERT INTO @AllTables ([CompleteTableName])
EXEC sp_msforeachdb 'SELECT ''['' + [TABLE_CATALOG] + ''].['' + [TABLE_SCHEMA] + ''].['' + [TABLE_NAME] + '']'' FROM [?].INFORMATION_SCHEMA.TABLES'
SET NOCOUNT OFF;

DECLARE     [table_cursor] CURSOR FOR 
(SELECT     *
FROM        @AllTables
WHERE       [CompleteTableName] NOT LIKE '%master%' AND [CompleteTableName] NOT LIKE '%msdb%' AND [CompleteTableName] NOT LIKE '%tempdb%' AND [CompleteTableName] NOT LIKE '%model%');

OPEN [table_cursor]

PRINT N'There were ' + CAST(@CountAll AS NVARCHAR(10)) + ' tables with potential duplicate data'

FETCH NEXT FROM [table_cursor]
INTO @TableFullName

WHILE @@FETCH_STATUS = 0
    BEGIN
        SET         @SQLQuery = 'SELECT @CntAll = COUNT(*) FROM ' + @TableFullName + ' SELECT @CntDistinct = COUNT(*) FROM  (SELECT DISTINCT * FROM ' + @TableFullName + ') AS [sq] IF @CntAll > @CntDistinct SELECT @BitResult=1 ELSE SELECT @BitResult=0';

        EXEC        sp_executesql @SQLQuery, N'@BitResult BIT OUTPUT, @CntAll INT OUTPUT, @CntDistinct INT OUTPUT', @BitResult = @TableHasDuplicates OUTPUT, @CntAll = @CountAll OUTPUT, @CntDistinct = @CountDistinct OUTPUT;

        IF @TableHasDuplicates = 1
            BEGIN
                INSERT INTO @ResultsTable ([CompleteTableName], [CountAll], [CountDistinct])
                SELECT @TableFullName, @CountAll, @CountDistinct
            END;

        FETCH NEXT FROM [table_cursor]
        INTO @TableFullName
    END 
CLOSE [table_cursor];
DEALLOCATE [table_cursor];

SELECT *
FROM @ResultsTable

An overview of how it works is the table variable @AllTables uses sp_msforeachdb with INFORMATION_SCHEMA.TABLES to list all the tables in all databases (there are 16537 tables). A table cursor is used to store all non-system entries and then I use dynamic SQL to undertake a count and distinct count which is stored in another table variable @ResultsTable.

THE PROBLEM WITH THIS SOLUTION

When I run this query, it will run for circa 3 minutes then throw an error saying that the tempdb PRIMARY filegroup is full:

I am my own DBA, and I used Brent Ozar's guide to setting up my SQL server instance, and my tempdb is set up with 8 x 3GB mdf/ndf files (the server has 8 cores):

These files show as having 23997MB available under 'General' properties.

MY QUESTIONS

  1. If I have circa 24GB of tempdb free space, why is this relatively simple query running out of tempdb space?
  2. Is there a better/more efficiency way of getting a count and distinct count of all tables in all databases?

解决方案

You should always consider contention before adding TempDb file. Adding 7 additional TempDb file won't really help.

If I have circa 24GB of tempdb free space, why is this relatively simple query running out of tempdb space?

No, it should not. But are you sure that you aren't dealing with large amount of data or you don't have other process running on SQL? Cursors, Temp tables and even table variables use TempDb extensively. Check which object is consuming more TempDb space:

SELECT
    SUM (user_object_reserved_page_count)*8 as usr_obj_kb,
    SUM (internal_object_reserved_page_count)*8 as internal_obj_kb,
    SUM (version_store_reserved_page_count)*8  as version_store_kb,
    SUM (unallocated_extent_page_count)*8 as freespace_kb,
    SUM (mixed_extent_page_count)*8 as mixedextent_kb
FROM sys.dm_db_file_space_usage

So, if your user and internal objects are more then it clearly means that you have low TempDb space because of cursors and SQL Server internal usage (Ex: intermediate tables, Hash joins, Hash aggregation etc)

Is there a better/more efficiency way of getting a count and distinct count of all tables in all databases?

You can use below code to get the count of all tables in all databases

  DECLARE @Stats TABLE (DBNAME VARCHAR(40), NAME varchar(200), Rows INT)
INSERT INTO @Stats
EXECUTE sp_MSForEachDB 
        'USE ?; SELECT DB_NAME()AS DBName, 
        sysobjects.Name
    , sysindexes.Rows
FROM
    sysobjects
    INNER JOIN sysindexes
    ON sysobjects.id = sysindexes.id 
WHERE
    type = ''U''
    AND sysindexes.IndId < 2'

    SELECT * FROM @Stats

I have written an article on TempDb recommendation; I would suggest you to read that to understand objects which can affect TempDb and how to solve common problems of it. Ideally, your total TempDb size should be calculated based on observation which in your case > 24 GB.

** Edit 1**

If you are unsure about stats update then use below query to get count of all tables Note: Replace databases for which you don't want stats

    DECLARE @ServerStats TABLE (DatabaseName varchar(200), TableName varchar(200), RowsCount INT)
INSERT INTO @ServerStats
exec sp_msforeachdb @command1='
use #;
if ''#'' NOT IN (''master'', ''model'', ''msdb'', ''tempdb'',''ReportServer'')
begin
print ''#''
exec sp_MSforeachtable @command1=''
SELECT ''''#'''' AS DATABASENAME, ''''?'''' AS TABLENAME, COUNT(*)  FROM ? ;
''
end 
', @replacechar = '#'

SELECT * FROM @ServerStats

similarly you can take distinct in all tables for all databases with below query

    DECLARE @ServerStatsDistinct TABLE (DatabaseName varchar(200), TableName varchar(200), RowsCount INT)
INSERT INTO @ServerStatsDistinct
exec sp_msforeachdb @command1='
use #;
if ''#''  NOT IN (''master'', ''model'', ''msdb'', ''tempdb'',''ReportServer'')
begin
print ''#''
exec sp_MSforeachtable @command1=''
SELECT ''''#'''' AS DATABASENAME, ''''?'''' AS TABLENAME, COUNT(*)  FROM  (
    SELECT DISTINCT *
    FROM ?
) a  ;
''
end 
', @replacechar = '#'

SELECT * FROM @ServerStatsDistinct

这篇关于查询所有表的不同计数时,Tempdb已满的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆