交叉应用xml查询的性能随着xml文档的增长而呈指数级下降 [英] cross apply xml query performs exponentially worse as xml document grows

查看:74
本文介绍了交叉应用xml查询的性能随着xml文档的增长而呈指数级下降的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大小可变的XML文档,需要在MSSQL 2008 R2上进行解析,如下所示:

I have a variable size XML document that needs to be parsed on MSSQL 2008 R2 that looks like this:

<data item_id_type="1" cfgid="{4F5BBD5E-72ED-4201-B741-F6C8CC89D8EB}" has_data_event="False">
  <item name="1">
    <field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.506543009706267</field>
    <field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.79500402346138</field>
    <field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.0152649050024924</field>
  </item>
  <item name="2">
    <field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.366096802804087</field>
    <field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.386642801354842</field>
    <field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.031671174184115</field>
  </item>
</data>

.

我需要将其转换为如下所示的常规表类型数据集:

I need to transform it into a regular table type dataset that looks like this:

item_name field_id                             field_type  field_value
--------- ------------------------------------ ----------- ---------------
1         EA032B25-19F1-4C1B-BDDE-3113542D13A5 2           0.5065430097062
1         71014ACB-571B-4C72-9C9B-05458B11335F 2           -0.795004023461
1         740C36E9-1988-413E-A1D5-B3E5B4405B45 2           0.0152649050024
2         EA032B25-19F1-4C1B-BDDE-3113542D13A5 2           0.3660968028040
2         71014ACB-571B-4C72-9C9B-05458B11335F 2           -0.386642801354
2         740C36E9-1988-413E-A1D5-B3E5B4405B45 2           0.0316711741841
3         EA032B25-19F1-4C1B-BDDE-3113542D13A5 2           0.8839620369590
3         71014ACB-571B-4C72-9C9B-05458B11335F 2           -0.781459993268
3         740C36E9-1988-413E-A1D5-B3E5B4405B45 2           0.2284423515729

.

cross apply查询创建所需的输出:

This cross apply query creates the desired output:

create table #temp (x xml)

insert into #temp (x)
values ('
<data item_id_type="1" cfgid="{4F5BBD5E-72ED-4201-B741-F6C8CC89D8EB}" has_data_event="False">
  <item name="1">
    <field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.506543009706267</field>
    <field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.79500402346138</field>
    <field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.0152649050024924</field>
  </item>
  <item name="2">
    <field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.366096802804087</field>
    <field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.386642801354842</field>
    <field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.031671174184115</field>
  </item>
  <item name="3">
    <field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.883962036959074</field>
    <field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.781459993268713</field>
    <field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.228442351572923</field>
  </item>
</data>
')

select c.value('(../@name)','varchar(5)') as item_name
      ,c.value('(@id)','uniqueidentifier') as field_id
      ,c.value('(@type)','int') as field_type
      ,c.value('(.)','nvarchar(15)') as field_value
from   #temp cross apply
       #temp.x.nodes('/data/item/field') as y(c)

drop table #temp

.

当XML中有几百个(或更少)<item>元素时,查询执行得很好.但是,当有1,000个<item>元素时,需要24秒才能完成SSMS中的行返回.当有6,500个<item>元素时,运行cross apply查询大约需要20分钟.我们可以有10-20,000个<item>元素.

When there are a few hundred (or fewer) <item> elements in the XML, the query performs just fine. However, when there are 1,000 <item> elements, it takes 24 seconds to finish returning the rows in SSMS. When there are 6,500 <item> elements, it takes about 20 minutes to run the cross apply query. We could have 10-20,000 <item> elements.

.

是什么原因使得cross apply查询在此简单XML文档中的执行效果如此差,并且随着数据集的增长呈指数级下降呢?

What makes the cross apply query perform so poorly on this simple XML document, and perform exponentially slower as the dataset grows?

是否有更有效的方式将XML文档转换为表格数据集(在SQL中)?

Is there a more efficient way to transform the XML document into the tabular dataset (in SQL)?

推荐答案

是什么使交叉应用查询在这种简单的XML上表现如此差 文档,并且随着数据集的增长而指数级地变慢?

What makes the cross apply query perform so poorly on this simple XML document, and perform exponentially slower as the dataset grows?

使用父轴从项目节点获取属性ID.

It Is the use of the parent axis to get the attribute ID from the item node.

问题在于查询计划的这一部分.

It is this part of the query plan that is problematic.

注意表值较低的函数中有423行.

Notice the 423 rows coming out of the lower Table-valued function.

仅添加一个具有三个字段节点的item节点就可以完成此操作.

Adding just one more item node with three field nodes gives you this.

返回732行.

如果将第一个查询的节点数加倍到总共6个项目节点,该怎么办?

What if we double the nodes from the first query to a total of 6 item nodes?

我们最多返回1602行.

We are up to a whopping 1602 row returned.

顶部函数中的图18是XML中的所有字段节点.我们这里有6个项目,每个项目都有3个字段.那18个节点在嵌套循环中与其他函数结合使用,因此18个执行返回1602行,表示每次迭代返回89行.恰好是整个XML中确切的节点数.好吧,实际上它比所有可见节点多一个.我不知道为什么您可以使用此查询来检查XML中的节点总数.

The figure 18 in the top function is all field nodes in your XML. We have here 6 items with three fields in each item. Those 18 nodes are used in a nested loops join against the other function so 18 executions returning 1602 rows gives that it is returning 89 rows per iteration. That just happens to be the exact number of nodes in the entire XML. Well it is actually one more than all the visible nodes. I don't know why. You can use this query to check the total number of nodes in your XML.

select count(*)
from @XML.nodes('//*, //@*, //*/text()') as T(X)  

因此,当您在values函数中使用父轴..时,SQL Server用于获取值的算法是,它首先找到要切碎的所有节点,在最后一种情况下为18.对于这些节点中的每个节点,它都会粉碎并返回整个XML文档,并在过滤器运算符中签入您实际想要的节点.在那里,您可以呈指数级增长. 而不是使用父轴,您应该使用一个额外的交叉应用.首先切碎物品,然后切碎.

So the algorithm used by SQL Server to get the value when you use the parent axis .. in a values function is that it first finds all the nodes you are shredding on, 18 in the last case. For each of those nodes it shreds and returns the entire XML document and checks in the filter operator for the node you actually want. There you have your exponential growth. Instead of using the parent axis you should use one extra cross apply. First shred on item and then on field.

select I.X.value('@name', 'varchar(5)') as item_name,
       F.X.value('@id', 'uniqueidentifier') as field_id,
       F.X.value('@type', 'int') as field_type,
       F.X.value('text()[1]', 'nvarchar(15)') as field_value
from #temp as T
  cross apply T.x.nodes('/data/item') as I(X)
  cross apply I.X.nodes('field') as F(X)

我还更改了您访问该字段的文本值的方式.使用.将使SQL Server可以查找要field的子节点,并将这些值连接到结果中.您没有子值,因此结果是相同的,但是最好避免在查询计划中包含该部分(UDX运算符).

I also changed how you access the text value of the field. Using . will make SQL Server go look for child nodes to field and concatenate those values in the result. You have no child values so the result is the same but it is a good thing to avoid to have that part in the query plan (the UDX operator).

如果您使用的是XML索引,则查询计划在父轴上没有问题,但是您仍将从更改获取字段值的方式中受益.

The query plan does not have the issue with the parent axis if you are using an XML index but you will still benefit from changing how you fetch the field value.

这篇关于交叉应用xml查询的性能随着xml文档的增长而呈指数级下降的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆