SQL Server:选择性XML索引未得到有效使用 [英] Sql Server: Selective XML Index not being efficiently used

查看:84
本文介绍了SQL Server:选择性XML索引未得到有效使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在探索提高应用程序性能的方法,这些方法只能在有限的程度上影响数据库级别. SQL Server版本是2012 SP2,有问题的表和视图结构是(我不能真正影响到此,请注意xml文档总共可能包含数百个元素):

I'm exploring ways of improving the performance of an application which I can only affect on the database level to a limited degree. The SQL Server version is 2012 SP2 and the table and view structure in question is (I cannot really affect this + note that the xml document may have several hundred elements in total):

CREATE TABLE Orders(
    id nvarchar(64) NOT NULL,
    xmldoc xml NULL,
    CONSTRAINT PK_Order_id PRIMARY KEY CLUSTERED (id)
);

CREATE VIEW V_Orders as
SELECT 
    a.id, a.xmldoc
    ,a.xmldoc.value('data(/row/c1)[1]', 'nvarchar(max)') "Stuff"
    ,a.xmldoc.value('data(/row/c2)[1]', 'nvarchar(max)') "OrderType"
etc..... many columns
from Orders a;

一个典型的查询(以及下面用于测试的查询):

A typical query (and the one being used for testing below):

SELECT id FROM V_Orders WHERE OrderType = '30791'

所有查询都是针对视图执行的,我既不会影响查询,也不会影响表/视图结构.

All the queries are performed against the view and I can affect neither the queries nor the table/view structure.

我认为向表中添加选择性XML索引将是我的救命之路:

I thought adding a selective XML index to the table would be my saviour:

CREATE SELECTIVE XML INDEX I_Orders_OrderType ON Orders(xmldoc)
FOR(
    pathOrderType = '/row/c2' as SQL [nvarchar](20)
)

但是,即使在更新统计信息之后,执行计划仍然看起来很奇怪.无法将图片发布为新帐户,而将相关详细信息发布为文本:

But even after updating the statistics the execution plan is looking weird. Couldn't post a pic as new account so the relevant details as text:

  • 从selectiveXml查找聚簇索引(成本:总计的2%).预期的行数为1,但预期的执行时间为1269(表中的行数)
  • ->排名前N位(费用:总数的95%)
  • ->计算标量(成本0)

  • Clustered index seek from selectiveXml (Cost: 2% of total). Expected number of rows 1 but expected number of execution times 1269 (number of rows in the table)
  • -> Top N sort (Cost: 95% of total)
  • -> Compute scalar (Cost 0)

单独的分支:聚集索引扫描PK_Order_id(成本:总计的3%).预期的行数1269

Separate branch: Clustered index scan PK_Order_id (Cost: 3% of total). Expected number of rows 1269

实际上,对于我的测试数据,该查询甚至不会返回任何结果,但是无论返回一个还是几个都没有任何区别.执行时间真正支持查询,只要从执行计划中推导出来,查询次数就可以达到数千次.

In actuality with my test data the query doesn't even return any results but whether it returns one or few doesn't make any difference. Execution times support the query really taking as long as could be deduced from the execution plan and have read counts in the thousands.

所以我的问题是为什么优化器没有正确使用选择性xml索引?还是我出了点问题?我将如何通过选择性xml索引(或者可能是持久列)来优化此特定查询的性能?

So my question is why is the selective xml index not being used properly by the optimizer? Or have I got something wrong? How would I optimize this specific query's performance with selective xml indexing (or perhaps persisted column)?

我对较大的样本数据(表中约27.4万行,XML文档接近平均生产规模)进行了额外的测试,并将选择性XML索引与升级后的列进行了比较.结果来自Profiler跟踪,重点是CPU使用率和读取计数.选择性xml索引的执行计划与上述基本相同.

I did additional testing with larger sample data (~274k rows in the table with XML documents close to average production sizes) and compared the selective XML index to a promoted column. The results are from Profiler trace, concentrating on CPU usage and read counts. The execution plan for selective xml indexing is basically identical to what is described above.

选择性XML索引和274k行(执行上面的查询): CPU:6454,读取:938521

Selective XML index and 274k rows (executing the query above): CPU: 6454, reads: 938521

将搜索字段中的值更新为唯一后(总记录仍为274k),我得到了以下结果:

After I updated the values in the searched field to be unique (total records still 274k) I got the following results:

选择性XML索引和274k行(执行上面的查询): CPU:10077,读取:1006466

Selective XML index and 274k rows (executing the query above): CPU: 10077, reads: 1006466

然后使用一个提升的(即持久化的)单独索引的列,并直接在视图中使用它: CPU:0,读取:23

Then using a promoted (i.e. persisted) separately indexed column and using it directly in the view: CPU: 0, reads: 23

选择性XML索引的性能似乎比适当的SQL索引列获取更接近全表扫描.我在某处读到,对于表使用模式可能有助于从执行计划中删除TOP N步骤(假设我们正在搜索一个非重复字段),但是我不确定在这种情况下这是否可行.

Selective XML index performance seems to be closer to full table scan than proper SQL indexed column fetch. I read somewhere that using schema for the table might help drop the TOP N step from execution plan (assuming we're searching for a non-repeating field) but I'm not sure whether that's a realistic possibility in this case.

推荐答案

您创建的选择性XML索引存储在内部表中,主键来自Orders,作为内部表的集群键的前导列,指定的路径存储为稀疏列.

The selective XML index you create is stored in an internal table with the primary key from Orders as the leading column for the clustered key for the internal table and the paths specified stored as sparse columns.

您获得的查询计划可能看起来像这样:

The query plan you get probably looks a something like this:

您对整个Orders表进行了扫描,并在内部表中对Orders中每一行的主键进行了查找.最终的Filter运算符负责检查OrderType的值,仅返回匹配的行.

You have a scan over the entire Orders table with a seek in the internal table on the primary key for each row in Orders. The final Filter operator is responsible for checking the value of OrderType returning only the matching rows.

不是真正从索引中得到的期望.

Not really what you would expect from something called an index.

第二个选择性XML索引为救援提供了方便.它们是为主要选择索引中指定的路径之一创建的,并将在路径表达式中提取的值上创建非聚集键.

To the rescue comes a secondary selective XML index. They are created for one of the paths specified in the primary selective index and will create a non-clustered key on the values extracted in the path expression.

但这并不是那么容易. SQL Server将不会对values()函数提取的值所使用的谓词使用二级索引.您必须改为使用exists().另外,exists()要求在value()使用SQL数据类型的路径表达式中使用XQUERY数据类型.

It is however not all that easy. SQL Server will not use the secondary index on predicates used on values extracted by the values() function. You have to use exists() instead. Also, exists() requires the use of XQUERY data types in the path expressions where value() uses SQL data types.

您的主要选择性XML索引可能如下所示:

Your primary selective XML index could look like this:

CREATE SELECTIVE XML INDEX I_Orders_OrderType ON Orders(xmldoc)
FOR 
(
  pathOrderType = '/row/c2' as sql nvarchar(20), 
  pathOrderTypeX = '/row/c2/text()' as xquery 'xs:string' maxlength (20)
)

pathOrderTypeX上具有辅助功能.

CREATE XML INDEX I_Orders_OrderType2 ON Orders(xmldoc)
  USING XML INDEX I_Orders_OrderType FOR (pathOrderTypeX) 

通过使用exist()的查询,您将获得此计划.

And with a query that uses exist() you will get this plan.

select id
from V_Orders
where xmldoc.exist('/row/c2/text()[. = "30791"]') = 1

第一个查找是在内部表的非聚集索引中查找您要查找的值.键查询是在内部表上的群集键上完成的(不知道为什么这样做是必要的).最后一次查找是在订单"表中的主键上,然后是一个过滤器,该过滤器检查xmldoc列中的空值.

The first seek is a seek for the value you are looking for in the non-clustered index of the internal table. The key lookup is done on the clustered key on the internal table (don't know why that is necessary). And the last seek is on primary key in the Orders table followed by a filter that checks for null values in the column xmldoc.

如果您可以使用,通过XML在Orders表中创建计算出的索引列,我想您仍然会比使用二级选择性XML索引获得更好的性能.

If you can get away with using property promotion, creating calculated indexed columns in the Orders table from the XML, I guess you would still get better performance than using secondary selective XML indexes.

这篇关于SQL Server:选择性XML索引未得到有效使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆