将带有命名空间、属性等的未知 XML 读入完整的 EAV 列表 [英] Read an unknown XML with namespaces, attributes etc. into a full EAV list
问题描述
在回答有关如何读取未知 JSON 的问题后,我尝试为 XML 找到类似的内容(由 这个相关问题).
After answering a question about how to read an unknown JSON I tried to find something similar for XML (triggered by this related question).
问题是:如何从未知的 XML 中读取全部内容?
The question is: How can I read the whole and everything out of an unknown XML?
理想的输出是按预期排序顺序的 EAV 列表以及元素的完整 XPath:
The ideal output is an EAV-list in the expected sort order together with a full XPath for the element:
- 完整的 XPath(知道任何命名空间和元素位置)
- 元素或属性的名称(带命名空间)
- 内容(
text()
节点或属性的值) - 由于排序顺序是 XML 文档的固有部分,因此输出应按此排序.
一些研究将我带到了众所周知的函数tvf-xml-hier()
作者:约翰·卡佩莱蒂但这缺少命名空间支持并拒绝分解 multi-text 元素.
Some research brought me to the well known function tvf-xml-hier()
by John Cappelletti. But this is missing namespace-support and denies to shred multi-text elements.
所以我想出了一个你可以在下面找到的方法.
So I came up with an approach you can find below.
在我的自我回答中找到完整的 MCVE.
Find a full MCVE in my self-answer.
推荐答案
在下面我创建了一个 XML,其中包含多个命名空间、一个 多文本 元素和各种嵌套、重复、名称冲突和属性.这应该涵盖大多数现实世界场景.
In the following I create a XML with several namespaces, one multi-text element and various nestings, repetitions, name-clashes and attributes. This should cover most real-world scenarios.
提示:很容易将其包装为内联 TVF 并将其称为单行代码,将 XML 作为参数传递.
Hint: It's easy to wrap this as inline TVF and call it as a one liner, passing the XML as parameter.
DECLARE @xml XML=
N'<root xmlns="defaultNs" xmlns:ns1="dummy1" xmlns:other="SomeOther">
<!-- this element contains several attributes in various namespaces
Hint: An attribute without a prefix is assumed to live in the same namespace as its element -->
<ns1:level1 test1="test1" ns1:test2="test2" other:test3="test3">
<other:InnerElement>Some inner element</other:InnerElement>
<!-- this element contains several text nodes -->
<multiText>text1<someInner>blah</someInner>text2<someInner/>text3</multiText>
<!-- repeating elements some of them with attributes -->
<repeating>rep 1</repeating>
<repeating r2="r2">rep 2</repeating>
<!-- one with the same name, but living in another namespace -->
<other:repeating r4="r4">rep 4</other:repeating>
<!-- some deeper nesting -->
<level2>
<level3/>
<level3>
<content>Content in second level3 element</content>
</level3>
</level2>
<!-- and one more of the repeating, but listed in a lower position -->
<repeating oneMore="oneMore">one more</repeating>
</ns1:level1>
</root>';
--查询
WITH AllNamespaces As
(
SELECT CONCAT('ns',ROW_NUMBER() OVER(ORDER BY (B.namespaceUri))) Prefix
,B.namespaceUri
FROM @xml.nodes('//*') A(nd)
CROSS APPLY(VALUES(A.nd.value('namespace-uri(.)','nvarchar(max)')))B(namespaceUri)
WHERE LEN(B.namespaceUri)>0
GROUP BY B.namespaceUri
)
,recCte AS
(
SELECT 1 AS RecursionLevel
,1 AS NodeType
,ROW_NUMBER() OVER(ORDER BY A.nd) AS ElementPosition
,CAST(REPLACE(STR(ROW_NUMBER() OVER(ORDER BY A.nd),5),' ','0') AS VARCHAR(900)) COLLATE DATABASE_DEFAULT AS SortString
,ns.Prefix AS CurrentPrefix
,ns.namespaceUri AS CurrentUri
,CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']') AS FullName
,CAST(CONCAT('/',ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']') AS NVARCHAR(MAX)) COLLATE DATABASE_DEFAULT AS XPath
,A.nd.query('.') CurrentFragment
,A.nd.query('./*') NextFragment
FROM @xml.nodes('/*') A(nd)
LEFT JOIN AllNamespaces ns ON ns.namespaceUri=A.nd.value('namespace-uri(.)','nvarchar(max)')
UNION ALL
SELECT r.RecursionLevel+1
,1
,ROW_NUMBER() OVER(ORDER BY A.nd)
,CAST(CONCAT(r.SortString,REPLACE(STR(ROW_NUMBER() OVER(ORDER BY A.nd),5),' ','0')) AS VARCHAR(900)) COLLATE DATABASE_DEFAULT
,ns.Prefix
,ns.namespaceUri
,CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']')
,CONCAT(r.XPath,'/',ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']')
,A.nd.query('.') CurrentFragment
,A.nd.query('./*') NextFragment
FROM recCte r
CROSS APPLY NextFragment.nodes('*') A(nd)
OUTER APPLY(SELECT Prefix,namespaceUri FROM AllNamespaces ns WHERE ns.namespaceUri=A.nd.value('namespace-uri(.)','nvarchar(max)')) ns
)
,WithValues AS
(
SELECT r.RecursionLevel
,CASE WHEN LEN(B.NodeValue)>0 THEN 3 ELSE r.NodeType END AS NodeType
,r.ElementPosition
,CASE WHEN LEN(B.NodeValue)>0 THEN CONCAT(r.SortString,REPLACE(STR(ROW_NUMBER() OVER(PARTITION BY r.Xpath ORDER BY A.txt),5),' ','0')) ELSE r.SortString END AS SortString
,r.CurrentPrefix
,r.CurrentUri
,CASE WHEN LEN(B.NodeValue)>0 THEN 'text()' ELSE r.FullName END AS FullName
,r.XPath AS OrigXPath
,CASE WHEN LEN(B.NodeValue)>0 THEN CONCAT(r.XPath,'/text()[',ROW_NUMBER() OVER(PARTITION BY r.Xpath ORDER BY A.txt),']') ELSE r.XPath END AS XPath
,CASE WHEN LEN(B.NodeValue)>0 THEN B.NodeValue ELSE NULL END AS NodeValue
,r.CurrentFragment
,r.NextFragment
FROM recCte r
OUTER APPLY r.CurrentFragment.nodes('*/text()') A(txt)
OUTER APPLY (SELECT A.txt.value('.','nvarchar(max)')) B(NodeValue)
)
,WithAttributes AS
(
SELECT RecursionLevel
,NodeType
,ElementPosition
,SortString
,CurrentPrefix
,CurrentUri
,FullName
,XPath
,NodeValue
,CurrentFragment
,NextFragment
FROM WithValues
UNION ALL
SELECT wv.RecursionLevel
,2
,wv.ElementPosition
,wv.SortString
,CASE WHEN ns.Prefix IS NOT NULL THEN ns.Prefix ELSE wv.CurrentPrefix END AS CurrentPrefix
,CASE WHEN ns.namespaceUri IS NOT NULL THEN ns.namespaceUri ELSE wv.CurrentUri END AS CurrentUri
,CONCAT('@',ns.Prefix+':',B.AttrName) AS FullName
,CONCAT(wv.OrigXPath,'/@',ns.Prefix+':',B.AttrName) AS XPath
,A.attr.value('.','nvarchar(max)') AS NodeValue
,wv.CurrentFragment
,wv.NextFragment
FROM WithValues wv
CROSS APPLY wv.CurrentFragment.nodes('*/@*') A(attr)
CROSS APPLY (SELECT A.attr.value('local-name(.)','nvarchar(max)') AS AttrName
,A.attr.value('.','nvarchar(max)') AS AttrValue
,A.attr.value('namespace-uri(.)','nvarchar(max)') AS namespaceUri) B
OUTER APPLY(SELECT Prefix,namespaceUri FROM AllNamespaces ns WHERE ns.namespaceUri=B.namespaceUri) ns
)
SELECT NodeType
,CurrentPrefix
,CurrentUri
,FullName
,XPath
,NodeValue
FROM WithAttributes
WHERE NodeValue IS NOT NULL
ORDER BY SortString;
--结果
/*
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| NodeType | CurrentPrefix | CurrentUri | FullName | XPath | NodeValue |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 2 | ns2 | dummy1 | @test1 | /ns1:root[1]/ns2:level1[1]/@test1 | test1 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 2 | ns2 | dummy1 | @ns2:test2 | /ns1:root[1]/ns2:level1[1]/@ns2:test2 | test2 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 2 | ns3 | SomeOther | @ns3:test3 | /ns1:root[1]/ns2:level1[1]/@ns3:test3 | test3 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns3 | SomeOther | text() | /ns1:root[1]/ns2:level1[1]/ns3:InnerElement[1]/text()[1] | Some inner element |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns1 | defaultNs | text() | /ns1:root[1]/ns2:level1[1]/ns1:multiText[1]/text()[1] | text1 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns1 | defaultNs | text() | /ns1:root[1]/ns2:level1[1]/ns1:multiText[1]/ns1:someInner[1]/text()[1] | blah |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns1 | defaultNs | text() | /ns1:root[1]/ns2:level1[1]/ns1:multiText[1]/text()[2] | text2 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns1 | defaultNs | text() | /ns1:root[1]/ns2:level1[1]/ns1:multiText[1]/text()[3] | text3 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns1 | defaultNs | text() | /ns1:root[1]/ns2:level1[1]/ns1:repeating[1]/text()[1] | rep 1 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns1 | defaultNs | text() | /ns1:root[1]/ns2:level1[1]/ns1:repeating[2]/text()[1] | rep 2 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 2 | ns1 | defaultNs | @r2 | /ns1:root[1]/ns2:level1[1]/ns1:repeating[2]/@r2 | r2 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 2 | ns3 | SomeOther | @r4 | /ns1:root[1]/ns2:level1[1]/ns3:repeating[1]/@r4 | r4 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns3 | SomeOther | text() | /ns1:root[1]/ns2:level1[1]/ns3:repeating[1]/text()[1] | rep 4 |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns1 | defaultNs | text() | /ns1:root[1]/ns2:level1[1]/ns1:level2[1]/ns1:level3[2]/ns1:content[1]/text()[1] | Content in second level3 element |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 2 | ns1 | defaultNs | @oneMore | /ns1:root[1]/ns2:level1[1]/ns1:repeating[3]/@oneMore | oneMore |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3 | ns1 | defaultNs | text() | /ns1:root[1]/ns2:level1[1]/ns1:repeating[3]/text()[1] | one more |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
*/
--只是为了表明,创建的 XPath 返回预期的结果(注意:我们必须使用我们自己的前缀 - 即使对于默认命名空间):
--Just to show, that the created XPaths return the expected (attention: We must use our own prefixes - even for the default namespace):
WITH XMLNAMESPACES( 'defaultNs' AS ns1
,'dummy1' AS ns2
,'SomeOther' AS ns3)
SELECT @xml.value('/ns1:root[1]/ns2:level1[1]/ns1:multiText[1]/ns1:someInner[1]/text()[1]','nvarchar(max)') Is_blah
,@xml.value('/ns1:root[1]/ns2:level1[1]/ns1:level2[1]/ns1:level3[2]/ns1:content[1]/text()[1]','nvarchar(max)') Is_Content_in_second_level3_element
,@xml.value('/ns1:root[1]/ns2:level1[1]/ns1:repeating[3]/@oneMore','nvarchar(max)') Is_attribute_oneMore
,@xml.value('/ns1:root[1]/ns2:level1[1]/ns1:multiText[1]/text()[3]','nvarchar(max)') Is_3rd_text_in_multiText;
简单的想法:
- 命名空间前缀可以由您自己定义.T-SQL 中没有可用的 XQuery 函数来查找实际前缀,因此我们只使用我们自己的前缀.底层 URI 很重要.
- 第一个 cte 将创建一组所有出现的 URI,并将其与前缀一起返回.
- 递归 CTE 将越来越深入地遍历 XML.只要带有
.nodes()
的APPLY
可以返回嵌套节点,这就会继续. - 一个 CTE 添加
text()
节点 - 如果有的话. - 一个 CTE 添加属性 - 如果有的话.
- 全名以及完整的 XPath 都被串联起来.
NodeType
有助于区分元素 (=1)、属性 (=2) 和text()
(=3)- CAST 和 COLLATE 有助于避免数据类型不匹配(递归 CTE 对此非常挑剔).
- 需要串联的 SortString 以确保输出中的顺序相同.
- 您可以使用
SELECT * ...
查看所有返回的列... - 您可以在没有
WHERE NodeValue IS NOT NULL
的情况下查询此内容以查看更多空结构.
- The namespace prefixes can be defined by your own. There is no XQuery-function available in T-SQL to find the actual prefix, so we just use our own prefixes. The underlying URI is important.
- The first cte will create a set of all occuring URIs and return this together with a prefix.
- The recursive CTE will traverse deeper and deeper into the XML. This will continue as long as
APPLY
with.nodes()
can return nested nodes. - One CTE adds
text()
nodes - if there are any. - One CTE adds attributes - if there are any.
- The full name is concatenated as well as the full XPath.
- The
NodeType
helps to distinguish between elements (=1), attributes (=2) andtext()
(=3) - The CASTs and COLLATEs help to avoid data type mismatch (recursive CTEs are very picky with this).
- The concatenated SortString is needed to ensure the same order in your output.
- You might use
SELECT * ...
to see all returned colums... - You might query this without
WHERE NodeValue IS NOT NULL
to see more of the empty structure.
这篇关于将带有命名空间、属性等的未知 XML 读入完整的 EAV 列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!