XML Server XML 性能优化 [英] XML Server XML performance optimization

查看:42
本文介绍了XML Server XML 性能优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在数据库中有 34 行,每一行都有一个包含 xml 的列 - xml 实际上位于 NVARCHAR(MAX) 列中,而不是 XML 列中.

I've got 34 rows in a database, each row has a column containing xml - the xml is actually in an NVARCHAR(MAX) column not an XML column.

对于每一行,我选择 xml 元素中的值作为单个结果集.性能相当差.我尝试了两种不同的查询.第一个执行大约需要 22 秒,第二个需要 7 秒.

For each row I am selecting values in the xml elements as a single resultset. The performance is pretty poor. I've tried two different queries. The first takes roughly 22 seconds to execute and the second takes 7.

即使是 7 秒,这也远低于最佳状态,我希望最多 1-2 秒.

Even at 7 seconds, this is far slower than optimal, I'm hoping for 1-2 seconds at most.

然后我在网上读到一个谣言,如果你使用临时表或表变量将 NVARCHAR 数据转换为 XML,你将获得性能提升,至少在我的情况下是这样的......它现在执行不到一秒钟.我现在正在寻找的是一种解释,可以告诉我为什么这两种方法实际上会影响性能.

So then I read a rumor online that if you convert the NVARCHAR data to a XML using a temp table or table variable, you will achieve a performance gain, which at least in my case was true... It now executes in under a second. What I'm looking for now is an explanation that can tell my why these 2 approaches actually affect performance.

22 秒:

SELECT
    c.ID,
    c.ChannelName,
    [Name] = d.c.value('name[1]','varchar(100)'),
    [Type] = d.c.value('transportName[1]','varchar(100)'),
    [Enabled] = d.c.value('enabled[1]','BIT'),
    [Queued] = d.c.value('properties[1]/destinationConnectorProperties[1]/queueEnabled[1]','varchar(100)'),
    [RetryInterval] = d.c.value('properties[1]/destinationConnectorProperties[1]/retryIntervalMillis[1]','INT'),
    [MaxRetries] = d.c.value('properties[1]/destinationConnectorProperties[1]/retryCount[1]','INT'),
    [RotateQueue] = d.c.value('properties[1]/destinationConnectorProperties[1]/rotate[1]','BIT'),
    [ThreadCount] = d.c.value('properties[1]/destinationConnectorProperties[1]/threadCount[1]','INT'),
    [WaitForPrevious] = d.c.value('waitForPrevious[1]','BIT'),
    [Destination] = COALESCE(
        d.c.value('properties[1]/channelId[1]','varchar(100)'),
        d.c.value('properties[1]/remoteAddress[1]','varchar(100)'),
        d.c.value('properties[1]/wsdlUrl[1]','varchar(1024)')),

    [DestinationPort] = COALESCE(
        d.c.value('properties[1]/remotePort[1]','varchar(100)'),
        d.c.value('properties[1]/port[1]','varchar(1024)')),
    [Service] = d.c.value('properties[1]/service[1]','varchar(1024)'),
    [Operation] = d.c.value('properties[1]/operation[1]','varchar(1024)')
FROM
(
    SELECT
            [ID],
            [ChannelName] = [Name],
            [CFG] = Convert(XML, Channel)
    FROM
            dbo.CHANNEL
) c
CROSS APPLY c.CFG.nodes('/channel/destinationConnectors/connector') d(c)

7 秒,由于使用了 text().我不知道为什么文本会加快速度.

7 seconds, due to use of text(). I have no idea why text speeds things up.

SELECT
    c.ID,
    c.ChannelName,
    [Name] = d.c.value('(name/text())[1]','varchar(100)'),
    [Type] = d.c.value('(transportName/text())[1]','varchar(100)'),
    [Enabled] = d.c.value('(enabled/text())[1]','BIT'),
    [Queued] = d.c.value('(properties/destinationConnectorProperties/queueEnabled/text())[1]','varchar(100)'),
    [RetryInterval] = d.c.value('(properties/destinationConnectorProperties/retryIntervalMillis/text())[1]','INT'),
    [MaxRetries] = d.c.value('(properties/destinationConnectorProperties/retryCount/text())[1]','INT'),
    [RotateQueue] = d.c.value('(properties/destinationConnectorProperties/rotate/text())[1]','BIT'),
    [ThreadCount] = d.c.value('(properties/destinationConnectorProperties/threadCount/text())[1]','INT'),
    [WaitForPrevious] = d.c.value('(waitForPrevious/text())[1]','BIT'),
    [Destination] = COALESCE(
        d.c.value('(properties/channelId/text())[1]','varchar(100)'),
        d.c.value('(properties/remoteAddress/text())[1]','varchar(100)'),
        d.c.value('(properties/wsdlUrl/text())[1]','varchar(1024)')),

    [DestinationPort] = COALESCE(
        d.c.value('(properties/remotePort/text())[1]','varchar(100)'),
        d.c.value('(properties/port/text())[1]','varchar(1024)')),
    [Service] = d.c.value('(properties/service/text())[1]','varchar(1024)'),
    [Operation] = d.c.value('(properties/operation/text())[1]','varchar(1024)')
FROM
(
    SELECT
            [ID],
            [ChannelName] = [Name],
            [CFG] = Convert(XML, Channel)
    FROM
            dbo.CHANNEL
) c
CROSS APPLY c.CFG.nodes('/channel/destinationConnectors/connector') d(c)

此查询使用 text() 方法,但首先将 NVARCHAR 列转换为表变量中的 xml 列.在不到一秒的时间内执行...

This query uses the text() approach but puts converts the NVARCHAR column to xml column in a table variable first. Executes in less than a second...

DECLARE @Xml AS TABLE (
    [ID] NVARCHAR(36) NOT NULL Primary Key,
    [Name] NVARCHAR(100) NOT NULL,
    [CFG] XML NOT NULL
);

INSERT INTO @Xml (ID, Name, CFG)
SELECT
    c.ID,
    c.Name,
    Convert(XML, c.Channel)
FROM
    [dbo].[CHANNEL] c;

SELECT
    c.ID,
    c.ChannelName,
    [Name] = d.c.value('(name/text())[1]','varchar(100)'),
    [Type] = d.c.value('(transportName/text())[1]','varchar(100)'),
    [Enabled] = d.c.value('(enabled/text())[1]','BIT'),
    [Queued] = d.c.value('(properties/destinationConnectorProperties/queueEnabled/text())[1]','varchar(100)'),
    [RetryInterval] = d.c.value('(properties/destinationConnectorProperties/retryIntervalMillis/text())[1]','INT'),
    [MaxRetries] = d.c.value('(properties/destinationConnectorProperties/retryCount/text())[1]','INT'),
    [RotateQueue] = d.c.value('(properties/destinationConnectorProperties/rotate/text())[1]','BIT'),
    [ThreadCount] = d.c.value('(properties/destinationConnectorProperties/threadCount/text())[1]','INT'),
    [WaitForPrevious] = d.c.value('(waitForPrevious/text())[1]','BIT'),
    [Destination] = COALESCE(
        d.c.value('(properties/channelId/text())[1]','varchar(100)'),
        d.c.value('(properties/remoteAddress/text())[1]','varchar(100)'),
        d.c.value('(properties/wsdlUrl/text())[1]','varchar(1024)')),

    [DestinationPort] = COALESCE(
        d.c.value('(properties/remotePort/text())[1]','varchar(100)'),
        d.c.value('(properties/port/text())[1]','varchar(1024)')),
    [Service] = d.c.value('(properties/service/text())[1]','varchar(1024)'),
    [Operation] = d.c.value('(properties/operation/text())[1]','varchar(1024)')
FROM
(
    SELECT
            [ID],
            [ChannelName] = [Name],
            [CFG]
    FROM
            @Xml
) c
CROSS APPLY c.CFG.nodes('/channel/destinationConnectors/connector') d(c)

推荐答案

我可以给你一个答案和一个猜测:

I can give you one answer and one guess:

首先我使用一个声明的表变量来模拟你的场景:

First I use a declared table variable to mock up your scenario:

DECLARE @tbl TABLE(s NVARCHAR(MAX));
INSERT INTO @tbl VALUES
(N'<root>
    <SomeElement>This is first text of element1
        <InnerElement>This is text of inner element1</InnerElement>
        This is second text of element1
    </SomeElement>
    <SomeElement>This is first text of element2
        <InnerElement>This is text of inner element2</InnerElement>
        This is second text of element2
    </SomeElement>
</root>')
,(N'<root>
    <SomeElement>This is first text of elementA
        <InnerElement>This is text of inner elementA</InnerElement>
        This is second text of elementA
    </SomeElement>
    <SomeElement>This is first text of elementB
        <InnerElement>This is text of inner elementB</InnerElement>
        This is second text of elementB
    </SomeElement>
</root>');

--此查询将读取带有 sub-select 强制转换的 XML.您可以使用 CTE 代替,但这应该只是语法糖...

--This query will read the XML with a cast out of a sub-select. You might use a CTE instead, but this should be syntactical sugar only...

SELECT se.value(N'(.)[1]','nvarchar(max)') SomeElementsContent
      ,se.value(N'(InnerElement)[1]','nvarchar(max)') InnerElementsContent
      ,se.value(N'(./text())[1]','nvarchar(max)') ElementsFirstText
      ,se.value(N'(./text())[2]','nvarchar(max)') ElementsSecondText
FROM (SELECT CAST(s AS XML) FROM @tbl) AS tbl(TheXml)
CROSS APPLY TheXml.nodes(N'/root/SomeElement') AS A(se);

--第二部分使用表格写入类型化的 XML 并从那里读取:

--The second part uses a table to write in the typed XML and read from there:

DECLARE @tbl2 TABLE(x XML)
INSERT INTO @tbl2
SELECT CAST(s AS XML) FROM @tbl;

SELECT se.value(N'(.)[1]','nvarchar(max)') SomeElementsContent
      ,se.value(N'(InnerElement)[1]','nvarchar(max)') InnerElementsContent
      ,se.value(N'(./text())[1]','nvarchar(max)') ElementsFirstText
      ,se.value(N'(./text())[2]','nvarchar(max)') ElementsSecondText
FROM @tbl2 t2
CROSS APPLY t2.x.nodes(N'/root/SomeElement') AS A(se);

为什么 /text() 比没有 /text() 更快?

如果你看看我的例子,元素的内容是一切,从开始标签到结束标签.元素的 text() 是这些标签之间的浮动文本.您可以在上面选择的结果中看到这一点.text() 实际上是一个单独存储在树结构中的部分(阅读下一节).获取它是一个一步操作.否则,必须分析一个复杂的结构,以找到开始标记与其相应的结束标记之间的所有内容——即使除了 text() 之外什么都没有.

Why is /text() faster than without /text()?

If you look at my example, the content of an element is everything from the opening tag down to the closing tag. The text() of an element is the floating text between these tags. You can see this in the results of the select above. The text() is one separately stored portion in a tree structure actually (read next section). To fetch it, is a one-step-action. Otherwise a complex structure has to be analysed to find everything between the opening tag and its corresponding closing tag - even if there is nothing else than the text().

XML 不仅仅是带有一些愚蠢的额外字符的文本!它是一个结构复杂的文档.XML 不会存储为您看到的文本.XML 存储在树结构中.每当您将表示 XML 的字符串转换为真正的 XML 时,必须完成这项非常昂贵的工作.当 XML 呈现给您(或任何其他输出)时,表示字符串是从头开始(重新)构建的.

XML is not just text with some silly extra characters! It is a document with a complex structure. The XML is not stored as the text you see. XML is stored in a tree structure. Whenever you cast a string, which represents an XML, into a real XML, this very expensive work must be done. When the XML is presented to you (or any other output) the representing string is (re)built from scratch.

这是猜测...
在我的示例中,这两种方法非常平等,并且导致(几乎)相同的执行计划.
SQL Server 不会像您期望的那样处理所有事情.这不是一个程序系统,你声明做这个,然后做这个,然后做这个!.你告诉引擎你想要什么,引擎决定如何做到最好.引擎非常好!
在执行开始之前,引擎会尝试估计方法的成本.CONVERT(或CAST)是一个相当便宜的操作.可能是,引擎决定处理您的调用列表并一遍又一遍地为每个需求进行转换,因为它认为这比创建派生表的成本更低......

This is guessing...
In my example both approaches are quite equal and lead to (almost) the same execution plan.
SQL Server will not work down everything the way you might expect this. This is not a procedural system where you state do this, than do this and after do this!. You tell the engine what you want, and the engine decides how to do this best. And the engine is pretty good with this!
Before execution starts, the engine tries to estimate the costs of approaches. CONVERT (or CAST) is a rather cheap operation. It could be, that the engine decides to work down the list of your calls and do the cast for each single need over and over, because it thinks, that this is cheaper than the expensive creation of a derived table...

这篇关于XML Server XML 性能优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆