SQL Server 查询性能 - 聚集索引查找 [英] SQL Server Query Performance - Clustered Index Seek

查看:63
本文介绍了SQL Server 查询性能 - 聚集索引查找的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请原谅这篇长文章,但在下面我已经包含了一个完整的脚本来生成和填充我的测试工具.

Excuse the long post, but below I have included a full script to generate and populate my test harness.

我的测试工具有以下表格

My test harness has the following tables

|--------|  |-------------|  |-----|  |--------------|
|Column  |  |ColumnValue  |  |Row  |  |RowColumnValue|
|--------|  |-------------|  |-----|  |--------------|
|ColumnId|  |ColumnValueId|  |RowId|  |RowId         |
|Name    |  |ColumnId     |  |Name |  |ColumnValueId |
|--------|  |Value        |  |-----|  |--------------|
            |-------------|

它们代表表格中的行和列.列中单元格的可能值存储在 ColumnValue 中.行的选定值存储在 RowColumnValue 中.(我希望这很清楚)

They represent Rows and Columns in a table. The possible values of a cell in a column are stored in ColumnValue. The selected values for a Row are stored in RowColumnValue. (I hope that is clear)

我用 10 列、10,000 行、每列 50 个列值 (500) 和每行 25 个选定列值 (250,000) 填充数据.

I have populated the data with 10 Columns, 10,000 Rows, 50 Column Values per column (500) and 25 selected column values per Row (250,000).

我有一些动态 sql,它返回所有行,与列一起旋转,并包含每列所选列值的 XML 列表.

I have some dynamic sql which returns all the rows, pivotted with the columns and contains an XML list of the selected column values for each column.

注意:出于性能测试目的,我将查询包装在 SELECT COUNT(*) 中,这样查询就不会通过网络返回大量数据.

Note: For performance testing purposes I have wrapped the query in a SELECT COUNT(*) so that the query will not return a large amount of data over the network.

我的测试工具在大约 5-6 秒内运行此查询(带有计数).执行计划显示 92% 的查询花费在 [ColumnValue].[PK_ColumnValue] 上的聚集索引查找上.客户端统计数据显示客户端处理时间、总执行时间和服务器回复等待时间都为 0.

My test harness runs this query (with the count) in approx 5-6 seconds. The execution plan shows that 92% of the query is spent on a Clustered Index Seek on [ColumnValue].[PK_ColumnValue]. The Client Statistics show a Client processing time, Total execution time and Wait time on server replies all at 0.

我意识到 RowColumnValue 表中的 250k 行相当多,我可能对 SQL Server 的期望过高.但是,我的期望是查询应该能够比这快得多.或者至少执行计划应该呈现不同的瓶颈,而不是聚集索引搜索.

I realise that 250k rows in the RowColumnValue table is quite a lot and I might be expecting too much from SQL Server. However, my expectation is that query should be able to run a lot faster than this. Or at the very least the execution plan should be presenting a different bottle neck rather than the Clustered Index Seek.

任何人都可以解释这个问题或给我一些关于如何提高效率的建议吗?

Can anyone shed any light on the problem or give me some suggestions on how to make this more efficient?

运行数据透视表以显示表格的动态 SQL:

DECLARE @columnDataList NVARCHAR(MAX)
SELECT
    @columnDataList = 
    CAST
    (
        (
            SELECT
                ', CONVERT(xml, [PVT].[' + [Column].[Name] + ']) [Column.' + [Column].[Name] + ']'
            FROM
                [Column]
            ORDER BY
                [Column].[Name]
            FOR XML PATH('')
        ) AS XML
    ).value('.', 'NVARCHAR(MAX)')

DECLARE @columnPivotList NVARCHAR(MAX)
SELECT
    @columnPivotList = 
    CAST
    (
        (
            SELECT
                ', [' + [Column].[Name] + ']'
            FROM
                [Column]
            ORDER BY
                [Column].[Name]
            FOR XML PATH('')
        ) AS XML
    ).value('.', 'NVARCHAR(MAX)')

EXEC('
    SELECT
        COUNT(*)
    FROM
    (
        SELECT
            [PVT].[RowId]
            ' + @columnDataList + '
        FROM
        (
            SELECT
                [Row].[RowId],
                [Column].[Name] [ColumnName],
                [XmlRowColumnValues].[XmlRowColumnValues] [XmlRowColumnValues]
            FROM
                [Row]
            CROSS JOIN
                [Column]
            CROSS APPLY
            (
                SELECT
                    [ColumnValue].[Value] [Value]
                FROM
                    [RowColumnValue]
                INNER JOIN
                    [ColumnValue]
                ON
                    [ColumnValue].[ColumnValueId] = [RowColumnValue].[ColumnValueId]
                WHERE
                    [RowColumnValue].[RowId] = [Row].[RowId]
                AND
                    [ColumnValue].[ColumnId] = [Column].[ColumnId]
                FOR XML PATH (''''), ROOT(''Values'')
            ) [XmlRowColumnValues] ([XmlRowColumnValues])
        ) [PivotData]
        PIVOT
        (
            MAX([PivotData].[XmlRowColumnValues])
        FOR
            [ColumnName]
            IN
            ([0]' + @columnPivotList + ')
        ) PVT
    ) RowColumnData
')

生成和填充数据库的脚本:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Row](
    [RowId] [int] IDENTITY(1,1) NOT NULL,
    [Name] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_Row] PRIMARY KEY CLUSTERED 
(
    [RowId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Column](
    [ColumnId] [int] IDENTITY(1,1) NOT NULL,
    [Name] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_Column] PRIMARY KEY CLUSTERED 
(
    [ColumnId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[RowColumnValue](
    [RowId] [int] NOT NULL,
    [ColumnValueId] [int] NOT NULL,
 CONSTRAINT [PK_RowColumnValue] PRIMARY KEY CLUSTERED 
(
    [RowId] ASC,
    [ColumnValueId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[ColumnValue](
    [ColumnValueId] [int] IDENTITY(1,1) NOT NULL,
    [ColumnId] [int] NOT NULL,
    [Value] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_ColumnValue] PRIMARY KEY CLUSTERED 
(
    [ColumnValueId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [FKIX_ColumnValue_ColumnId] ON [dbo].[ColumnValue] 
(
    [ColumnId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO
ALTER TABLE [dbo].[ColumnValue]  WITH CHECK ADD  CONSTRAINT [FK_ColumnValue_Column] FOREIGN KEY([ColumnId])
REFERENCES [dbo].[Column] ([ColumnId])
GO
ALTER TABLE [dbo].[ColumnValue] CHECK CONSTRAINT [FK_ColumnValue_Column]
GO
ALTER TABLE [dbo].[RowColumnValue]  WITH CHECK ADD  CONSTRAINT [FK_RowColumnValue_ColumnValue] FOREIGN KEY([ColumnValueId])
REFERENCES [dbo].[ColumnValue] ([ColumnValueId])
GO
ALTER TABLE [dbo].[RowColumnValue] CHECK CONSTRAINT [FK_RowColumnValue_ColumnValue]
GO
ALTER TABLE [dbo].[RowColumnValue]  WITH CHECK ADD  CONSTRAINT [FK_RowColumnValue_Row] FOREIGN KEY([RowId])
REFERENCES [dbo].[Row] ([RowId])
GO
ALTER TABLE [dbo].[RowColumnValue] CHECK CONSTRAINT [FK_RowColumnValue_Row]
GO

DECLARE @columnLoop INT
DECLARE @columnValueLoop INT
DECLARE @rowLoop INT

DECLARE @columnId INT
DECLARE @columnValueId INT
DECLARE @rowId INT

SET @columnLoop = 0

WHILE @columnLoop < 10
BEGIN

    INSERT INTO [Column] ([Name]) VALUES(NEWID())

    SET @columnId = @@IDENTITY

    SET @columnValueLoop = 0

    WHILE @columnValueLoop < 50
    BEGIN

        INSERT INTO [ColumnValue] ([ColumnId], [Value]) VALUES(@columnId, NEWID())

        SET @columnValueLoop = @columnValueLoop + 1

    END

    SET @columnLoop = @columnLoop + 1

END

SET @rowLoop = 0

WHILE @rowLoop < 10000
BEGIN

    INSERT INTO [Row] ([Name]) VALUES(NEWID())

    SET @rowId = @@IDENTITY

    INSERT INTO [RowColumnValue] ([RowId], [ColumnValueId]) SELECT TOP 25 @rowId, [ColumnValueId] FROM [ColumnValue] ORDER BY NEWID()

    SET @rowLoop = @rowLoop + 1

END

推荐答案

我同意 @marc_s 和 @KM 的观点,这个宏伟的设计从一开始就注定要失败.

I agree with @marc_s and @KM that this grand design is doomed from the start.

Microsoft 的数百万开发人员时间已投入到构建和微调一个强大而强大的数据库引擎上,但您将通过将所有内容塞入少量通用表并重新实现 SQL 的所有内容来彻底改造它服务器已经为您设计.

Millions of Microsoft's developer-hours have gone into building and fine-tuning a robust and powerful database engine, but you're going to reinvent it all by cramming everything into a small number of generic tables and re-implementing everything that SQL Server is already engineered to do for you.

SQL Server 已经有包含实体名称、列名称等的表.您通常不直接与这些系统表交互的事实是一件好事:这称为抽象.并且您不太可能比 SQL Server 更好地实现该抽象.

SQL Server already has tables containing names of entities, names of columns, and so on. The fact that you don't normally interact directly with these system tables is a good thing: It's called abstraction. And it's unlikely that you're going to do a better job of implementing that abstraction than SQL Server does.

归根结底,根据您的方法 (a),即使是最简单的查询也会变得异常复杂;(b) 您永远不会接近最佳性能,因为您放弃了原本可以免费获得的所有查询优化.

At the end of the day, with your approach (a) even the simplest queries will be monstrous; and (b) you're never going to come close to optimal performance, because you're forgoing all of the query optimization that you'd otherwise get for free.

如果不了解更多关于您的应用程序或您的要求,就很难给出任何具体的建议.但我建议一些好的旧规范化会有很长的路要走.任何实现良好的非平凡数据库都有很多表;十张桌子加上十张 xtab 桌子应该不会把你吓跑.

Without knowing anything more about your application or your requirements, it's hard to give any sort of specific advice. But I'd suggest that some good old normalization would go a long way. Any well-implemented, non-trivial database has a lot of tables; ten tables plus ten xtab tables shouldn't scare you away.

并且不要害怕将 SQL 代码生成作为跨不同表实现通用接口的一种方式.一点点可以走很长的路.

And don't be afraid of SQL code generation as a way of implementing common interfaces across disparate tables. A little can go a long way.

这篇关于SQL Server 查询性能 - 聚集索引查找的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆