为什么在使用 XPath 查询时需要 CROSS APPLY? [英] Why is CROSS APPLY needed when using XPath queries?

查看:35
本文介绍了为什么在使用 XPath 查询时需要 CROSS APPLY?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

tl;博士

为什么不:

SELECTSomeXmlColumn.nodes('/people/person') AS foo(b)从我的表

工作?

之前的问题

我几乎见过(或得到)在 SQL Server 中使用 XPath 查询的答案要求您使用 CROSS APPLY 将 XML 文档连接回自身.

为什么?

.稍后我会回到这个问题;等这里凉快了,我就有更多的力气求救了.

第二风

根本的问题是,无论我做什么,我总是只返回一行.我想要返回三行(因为有三个人).SQL Server 确实有一个函数可以将 XML 行(称为节点)转换为 SQL Server 行(称为行).这是 .nodes 函数:

<块引用>

nodes() 方法在您想要将 xml 数据类型实例分解为关系数据时很有用.它允许您识别将映射到新行的节点.

这意味着您使用对 xml 数据类型的 XPath 查询调用".nodes 方法.过去在 SQL Server 中作为一行三个节点返回的内容,(正确地)作为三个节点返回:

.nodes('/people/person') AS MyDerivedTable(SomeOtherXmlColumn)

概念上返回:

SomeOtherXmlColumn-----------------------------------------------------------------<person><firstName>Jon</firstName><lastName>Johnson</lastName></person><person><firstName>Kathy</firstName><lastName>Carter</lastName></person><person><firstName>Bob</firstName><lastName>Burns</lastName></person>

但是如果您实际上尝试使用它,则它不起作用:

DECLARE @xml xml;设置@xml ='<人><person><firstName>Jon</firstName><lastName>Johnson</lastName></person><person><firstName>Kathy</firstName><lastName>Carter</lastName></person><person><firstName>Bob</firstName><lastName>Burns</lastName></person></人>';选择 *FROM @xml.nodes('/people/person') AS MyDervicedTable(SomeOtherXmlColumn)

给出错误:

<块引用>

消息 493,级别 16,状态 1,第 8 行
不能直接使用从 nodes() 方法返回的列SomeOtherXmlColumn".它只能与四种 XML 数据类型方法之一一起使用,exist()、nodes()、query() 和 value(),或者用于 IS NULL 和 IS NOT NULL 检查.

我认为这是因为我不允许查看结果集(即不允许使用 *).没问题.我将使用我最初使用的相同 .query:

SELECT SomeOtherXmlColumn.query('/') AS SomeOtherOtherXmlColumnFROM @xml.nodes('/people/person') AS MyDervicedTable(SomeOtherXmlColumn)

返回行.但它不是将节点列表拆分为行,而是复制整个 XML:

SomeOtherOtherXmlColumn---------------------------<people><person><firstName>Jon</firstName><lastName>Johnson</lastName></person><person><firstName>Kathy</firstName><lastName<Carter/lastName></person><person><firstName>Bob</firstName><lastName>Burns</lastName></person></people><people><person><firstName>Jon</firstName><lastName>Johnson</lastName></person><person><firstName>Kathy</firstName><lastName<Carter/lastName></person><person><firstName>Bob</firstName><lastName>Burns</lastName></person></people><people><person><firstName>Jon</firstName><lastName>Johnson</lastName></person><person><firstName>Kathy</firstName><lastName<Carter/lastName></person><person><firstName>Bob</firstName><lastName>Burns</lastName></person></people>

这是有道理的.我期望 SQL Server 中的 XPath 查询表现得像 XPath.但事后仔细阅读文档后会得出相反的结论:

<块引用>

nodes() 方法的结果是一个包含原始 XML 实例的逻辑副本的行集.在这些逻辑副本中,每个行实例的上下文节点都设置为用查询表达式标识的节点之一,以便后续查询可以相对于这些上下文节点进行导航.

现在使用 xml

前面的示例是针对 xml 类型的变量.现在我们必须改造 .nodes 函数以处理包含 xml 列的表:

SELECTSomeXmlColumn.nodes('/people/person')从我的表

不,那行不通:

<块引用>

消息 227,级别 15,状态 1,第 8 行
nodes"不是有效的函数、属性或字段.

尽管 .nodes xml 数据类型的有效方法,但当您尝试将其用于xml 数据类型.它也不适用于 xml 数据类型:

SELECT *FROM MyTable.SomeXmlColumn.nodes('/people/person')

<块引用>

消息 208,级别 16,状态 1,第 8 行
无效的对象名称MyTable.SomeXmlColumn.nodes".

我认为这就是为什么需要 CROSS APPLY 修饰符的原因.不是因为你加入了任何东西,而是因为 SQL Server 解析器将拒绝识别 .nodes 除非它前面有关键字 cross apply:

SELECT'测试' AS SomeTestColumn从 MyTable CROSS APPLY MyTable.SomeXmlColumn.nodes('/people/person') AS MyDerivedTable(SomeOtherXmlColumn)

然后我们开始到达某个地方:

SomeTestColumn--------------测试测试测试

如果我们想查看返回的 XML:

SELECTSomeOtherXmlColumn.query('/')FROM (MyTable CROSS APPLY MyTable.SomeXmlColumn.nodes('/people/person') AS MyDerivedTable(SomeOtherXmlColumn))

现在我们有三行.

似乎cross apply 并没有用于连接,而只是一个允许.nodes 工作的关键字

而且似乎 SQL Server 优化器只是拒绝接受任何使用

.nodes

并且您必须实际使用:

CROSS APPLY .nodes

事情就是这样.如果是这样的话 - 那很好.这就是规则.这导致了多年的混乱;以为我正在使用 cross apply 运算符将某些内容与其他内容结合起来.

除了我相信还有更多.不知何故,实际上必须发生cross apply.但我看不到在哪里 - 或为什么.

解决方案

查询:

SELECT x.i.value('(./text())[1]', 'VARCHAR(10)')FROM MyTable.SomeXmlColumn.nodes('./people/person/firstName') AS x(i);

不起作用,原因与此查询不起作用的原因相同:

SELECT *从人.人.名字;

但是这样做:

选择名字从人.人;

-

FROM 子句需要行集,所以这是有效的,因为 nodes() 返回行集:

DECLARE @xml AS XML ='<人><person><firstName>Jon</firstName><lastName>Johnson</lastName></person><person><firstName>Kathy</firstName><lastName>Carter</lastName></person><person><firstName>Bob</firstName><lastName>Burns</lastName></person></人>';SELECT x.i.value('(./text())[1]', 'VARCHAR(10)')FROM @xml.nodes('./people/person/firstName') AS x(i);

如果xml不是变量而是表中的值,我们首先需要从这个值中提取行,这就是CROSS APPLY派上用场的时候:

SELECT x.i.value('(./text())[1]', 'VARCHAR(10)')从 MyTable 作为 t交叉申请t.SomeXmlColumn.nodes('./people/person/firstName') AS x(i);

CROSS APPLY 运算符将右表达式应用于左表 (MyTable) 中的每条记录.

  • 在 MyTable 表中有一个包含 xml 的记录.
  • CROSS APPLY 获取此记录并将其公开给右侧的表达式.
  • 右表达式使用 nodes() 函数提取记录.
  • 结果有 1 x 3 = 3 条记录(xml 节点),然后由 SELECT 子句处理.

与普通"CROSS APPLY 查询相比:

SELECT c.CustomerID, soh.TotalDue, soh.OrderDateFROM Sales.Customer AS c交叉申请(SELECT TOP(2) TotalDue, OrderDateFROM Sales.SalesOrderHeaderWHERE CustomerID = c.CustomerIDORDER BY TotalDue DESC) AS soh;

c.CustomerID 是我们的 t.SomeXmlColumn

tl;dr

Why doesn't:

SELECT 
    SomeXmlColumn.nodes('/people/person') AS foo(b)
FROM MyTable

work?

The Before Question

Nearly ever answer I've seen (or gotten) for using XPath queries in SQL Server requires that you join the XML document back to itself using a CROSS APPLY.

Why?

For example:

SELECT 
   p.value('(./firstName)[1]', 'VARCHAR(8000)') AS firstName,
   p.value('(./lastName)[1]', 'VARCHAR(8000)') AS lastName
FROM table 
   CROSS APPLY field.nodes('/person') t(p)

For example:

SELECT a.BatchXml.value('(Name)[1]', 'varchar(50)') AS Name,
    a.BatchXml.value('(IDInfo/IDType)[1]', 'varchar(50)') AS IDType,
    a.BatchXml.value('(IDInfo/IDOtherDescription)[1]', 'varchar(50)') AS IDOtherDescription
FROM BatchReports b
CROSS APPLY b.BatchFileXml.nodes('Customer') A(BatchXml)
WHERE a.BatchXml.exist('IDInfo/IDType[text()=3]')=1

For example:

SELECT  b.BatchID,
        x.XmlCol.value('(ReportHeader/OrganizationReportReferenceIdentifier)[1]','VARCHAR(100)') AS OrganizationReportReferenceIdentifier,
        x.XmlCol.value('(ReportHeader/OrganizationNumber)[1]','VARCHAR(100)') AS OrganizationNumber
FROM    Batches b
CROSS APPLY b.RawXml.nodes('/CasinoDisbursementReportXmlFile/CasinoDisbursementReport') x(XmlCol);

And even from MSDN Books Online:

SELECT nref.value('first-name[1]', 'nvarchar(32)') FirstName,
       nref.value('last-name[1]', 'nvarchar(32)') LastName
FROM    [XmlFile] CROSS APPLY [Contents].nodes('//author') AS p(nref)

They all use it. But nobody (not even the SQL Server Books Online) explains why it's needed, what problem it solves, what it's doing, or how it works.

Even the simplest case needs them

Even the simplest example of taking the XML:

<people>
   <person><firstName>Jon</firstName><lastName>Johnson</lastName></person>
   <person><firstName>Kathy</firstName><lastName>Carter</lastName></person>
   <person><firstName>Bob</firstName><lastName>Burns</lastName></person>
</people>

and returning the values:

FirstName  LastName
=========  ========
Jon        Johnson
Kathy      Carter
Bob        Burns

needs a join:

SELECT 
   p.value('(./firstName)[1]', 'VARCHAR(8000)') AS firstName,
   p.value('(./lastName)[1]', 'VARCHAR(8000)') AS lastName
FROM table 
   CROSS APPLY field.nodes('/person') t(p)

What's confusing is that it doesn't even use the table it joins from, why does it need it?

Since querying for XML has never been documented or explained, hopefully we can solve that now.

What does it actually do?

So let's start with an actual example, since we want an actual answer, that gives an actual explanation:

DECLARE @xml xml;
SET @xml = 
'<people>
   <person><firstName>Jon</firstName><lastName>Johnson</lastName></person>
   <person><firstName>Kathy</firstName><lastName>Carter</lastName></person>
   <person><firstName>Bob</firstName><lastName>Burns</lastName></person>
</people>';
;WITH MyTable AS (
    SELECT @xml AS SomeXmlColumn
)

Now we have psuedo table we can query from:

Let's start with the obvious

First I need the people. In real XML, I can easily return the three rows:

/people/person

Which gives a NodeList containing three nodes:

<person><firstName>Jon</firstName><lastName>Johnson</lastName></person>
<person><firstName>Kathy</firstName><lastName>Carter</lastName></person>
<person><firstName>Bob</firstName><lastName>Burns</lastName></person>

In SQL Server, the same query:

SELECT 
   SomeXmlColumn.query('/people/person')
FROM MyTable

doesn't return three rows, but rather one row with the XML containing the three nodes:

<person>
  <firstName>Jon</firstName>
  <lastName>Johnson</lastName>
</person>
<person>
  <firstName>Kathy</firstName>
  <lastName>Carter</lastName>
</person>
<person>
  <firstName>Bob</firstName>
  <lastName>Burns</lastName>
</person>

Obviously this is unsuitable, when my end goal is to return 3 rows. I somehow have to break up the three rows into three rows.

Onto the names

My actual goal is to get the firstName and lastName. In XPath I could do something like:

/people/person/firstName|/people/person/lastName

which gets me the six nodes, although they are not adjoining

<firstName>Jon</firstName>
<lastName>Johnson</lastName>
<firstName>Kathy</firstName>
<lastName>Carter</lastName>
<firstName>Bob</firstName>
<lastName>Burns</lastName>

In SQL Server, we try something similar

SELECT 
    SomeXmlColumn.query('/people/person/firstName') AS FirstName,
    SomeXmlColumn.query('/people/person/lastName') AS LastName
FROM MyTable

which gets us one row, with each column containing an XML fragment:

FirstName                     LastName
============================  ============================
<firstName>Jon</firstName>    <lastName>Johnson</lastName>
<firstName>Kathy</firstName>  <lastName>Carter</lastName>
<firstName>Bob</firstName>    <lastName>Burns</lastName>

...and now I'm tired. I've spent three hours writing this question, on top of the four hours I spent asking yesterday's question. I'll come back to this question later; when it's cooler in here, and I have more energy to beg for help.

Second wind

The fundamental problem is that no matter what I do, I keep getting only one row returned. I want three rows returned (because there are three people). SQL Server does have a function that can convert XML rows (called nodes) into SQL Server rows (called rows). It's the .nodes function:

The nodes() method is useful when you want to shred an xml data type instance into relational data. It allows you to identify nodes that will be mapped into a new row.

This means that you "call" the .nodes method with an XPath query on an xml data type. And what used to come back in SQL Server as one row with three nodes, comes back (correctly) as three nodes:

.nodes('/people/person') AS MyDerivedTable(SomeOtherXmlColumn)

Conceptually this returns:

SomeOtherXmlColumn
------------------------------------------------------------------------
<person><firstName>Jon</firstName><lastName>Johnson</lastName></person>
<person><firstName>Kathy</firstName><lastName>Carter</lastName></person>
<person><firstName>Bob</firstName><lastName>Burns</lastName></person>

But if you actually try to use it, it doesn't work:

DECLARE @xml xml;
SET @xml = 
'<people>
   <person><firstName>Jon</firstName><lastName>Johnson</lastName></person>
   <person><firstName>Kathy</firstName><lastName>Carter</lastName></person>
   <person><firstName>Bob</firstName><lastName>Burns</lastName></person>
</people>';
SELECT *
FROM @xml.nodes('/people/person') AS MyDervicedTable(SomeOtherXmlColumn)

Gives the error:

Msg 493, Level 16, State 1, Line 8
The column 'SomeOtherXmlColumn' that was returned from the nodes() method cannot be used directly. It can only be used with one of the four XML data type methods, exist(), nodes(), query(), and value(), or in IS NULL and IS NOT NULL checks.

I presume this is because I'm not allowed to look at the results set (i.e. the * is not allowed). No problem. I'll use the same .query I used originally:

SELECT SomeOtherXmlColumn.query('/') AS SomeOtherOtherXmlColumn
FROM @xml.nodes('/people/person') AS MyDervicedTable(SomeOtherXmlColumn)

Which returns rows. But rather than splitting a list of nodes into rows, it just duplicates the entire XML:

SomeOtherOtherXmlColumn
----------------------------------------
<people><person><firstName>Jon</firstName><lastName>Johnson</lastName></person><person><firstName>Kathy</firstName><lastName>Carter</lastName></person><person><firstName>Bob</firstName><lastName>Burns</lastName></person></people>
<people><person><firstName>Jon</firstName><lastName>Johnson</lastName></person><person><firstName>Kathy</firstName><lastName>Carter</lastName></person><person><firstName>Bob</firstName><lastName>Burns</lastName></person></people>
<people><person><firstName>Jon</firstName><lastName>Johnson</lastName></person><person><firstName>Kathy</firstName><lastName>Carter</lastName></person><person><firstName>Bob</firstName><lastName>Burns</lastName></person></people>

Which makes sense. I was expecting an XPath query in SQL Server to behave like XPath. But a hindsight careful reading of the docs say otherwise:

The result of the nodes() method is a rowset that contains logical copies of the original XML instances. In these logical copies, the context node of every row instance is set to one of the nodes identified with the query expression, so that subsequent queries can navigate relative to these context nodes.

Now do it with an xml column

The preceding example was for a variable of type xml. Now we have to retrofit the .nodes function to work with a table containing an xml column:

SELECT 
   SomeXmlColumn.nodes('/people/person')
FROM MyTable

No, that doesn't work:

Msg 227, Level 15, State 1, Line 8
"nodes" is not a valid function, property, or field.

Although .nodes is a valid method of an xml data type, it simply doesn't work when you try to use it on an xml data type. Nor does it work on when used on an xml data type:

SELECT *
FROM MyTable.SomeXmlColumn.nodes('/people/person')

Msg 208, Level 16, State 1, Line 8
Invalid object name 'MyTable.SomeXmlColumn.nodes'.

Which I presume is why the CROSS APPLY modifier is needed. Not because you are joining anything, but because the SQL Server parser will refuse to recognize .nodes unless it's preceded with the keywords cross apply:

SELECT 
    'test' AS SomeTestColumn
FROM MyTable CROSS APPLY MyTable.SomeXmlColumn.nodes('/people/person') AS MyDerivedTable(SomeOtherXmlColumn)

And we start to get somewhere:

SomeTestColumn
--------------
test
test
test

And so if we then want to see the XML that comes back:

SELECT 
    SomeOtherXmlColumn.query('/')
FROM (MyTable CROSS APPLY MyTable.SomeXmlColumn.nodes('/people/person') AS MyDerivedTable(SomeOtherXmlColumn))

Now we have three rows.

It seems that cross apply isn't used to a join, but merely a keyword that allows .nodes to work

And it seems that the SQL Server optimizer just refuses to accept any use of

.nodes

and you must actually use:

CROSS APPLY .nodes

And that's just how it is. And if that's the case - that's fine. That's the rule. And that led to years of confusion; thinking I was joining something to something else with the cross apply operator.

Except I believe there is more to it than that. There must, somehow, actually be a cross apply happening. But I cannot see where - or why.

解决方案

Query:

SELECT x.i.value('(./text())[1]', 'VARCHAR(10)')
FROM MyTable.SomeXmlColumn.nodes('./people/person/firstName') AS x(i);

doesn't work, for the same reason why this query doesn't work:

SELECT *
FROM Person.Person.FirstName;

but this does:

SELECT FirstName
FROM Person.Person;

-

FROM clause expects rowset, so this is valid, since nodes() returns rowset:

DECLARE @xml AS XML = 
'<people>
   <person><firstName>Jon</firstName><lastName>Johnson</lastName></person>
   <person><firstName>Kathy</firstName><lastName>Carter</lastName></person>
   <person><firstName>Bob</firstName><lastName>Burns</lastName></person>
</people>';

SELECT x.i.value('(./text())[1]', 'VARCHAR(10)')
FROM @xml.nodes('./people/person/firstName') AS x(i);

If xml is not a variable but value in table, we first need to extract rows from this value, and this is when CROSS APPLY comes in handy:

SELECT x.i.value('(./text())[1]', 'VARCHAR(10)')
FROM MyTable as t
CROSS APPLY 
   t.SomeXmlColumn.nodes('./people/person/firstName') AS x(i);

CROSS APPLY operator applies the right expression to each record from the left table (MyTable).

  • In MyTable table there is one record containing xml.
  • CROSS APPLY fetches this record and exposes it to expression in the right.
  • Right expression extracts records using nodes() function.
  • As a result there are 1 x 3 = 3 records (xml nodes) which are then processed by SELECT clause.

Compare to 'normal' CROSS APPLY query:

SELECT c.CustomerID, soh.TotalDue, soh.OrderDate
FROM Sales.Customer AS c
CROSS APPLY
    (SELECT TOP(2) TotalDue, OrderDate
    FROM Sales.SalesOrderHeader
    WHERE CustomerID = c.CustomerID
ORDER BY TotalDue DESC) AS soh;

c.CustomerID is our t.SomeXmlColumn

这篇关于为什么在使用 XPath 查询时需要 CROSS APPLY?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆