将带有命名空间、属性等的未知 XML 读入完整的 EAV 列表 [英] Read an unknown XML with namespaces, attributes etc. into a full EAV list

查看:25
本文介绍了将带有命名空间、属性等的未知 XML 读入完整的 EAV 列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

回答有关如何读取未知 JSON 的问题后,我尝试为 XML 找到类似的内容(由 这个相关问题).

After answering a question about how to read an unknown JSON I tried to find something similar for XML (triggered by this related question).

问题是:如何从未知的 XML 中读取全部内容?

The question is: How can I read the whole and everything out of an unknown XML?

理想的输出是按预期排序顺序的 EAV 列表以及元素的完整 XPath:

The ideal output is an EAV-list in the expected sort order together with a full XPath for the element:

  • 完整的 XPath(知道任何命名空间和元素位置)
  • 元素或属性的名称(带命名空间)
  • 内容(text() 节点或属性的值)
  • 由于排序顺序是 XML 文档的固有部分,因此输出应按此排序.

一些研究将我带到了众所周知的函数tvf-xml-hier()作者:约翰·卡佩莱蒂但这缺少命名空间支持并拒绝分解 multi-text 元素.

Some research brought me to the well known function tvf-xml-hier() by John Cappelletti. But this is missing namespace-support and denies to shred multi-text elements.

所以我想出了一个你可以在下面找到的方法.

So I came up with an approach you can find below.

在我的自我回答中找到完整的 MCVE.

Find a full MCVE in my self-answer.

推荐答案

在下面我创建了一个 XML,其中包含多个命名空间、一个 多文本 元素和各种嵌套、重复、名称冲突和属性.这应该涵盖大多数现实世界场景.

In the following I create a XML with several namespaces, one multi-text element and various nestings, repetitions, name-clashes and attributes. This should cover most real-world scenarios.

提示:很容易将其包装为内联 TVF 并将其称为单行代码,将 XML 作为参数传递.

Hint: It's easy to wrap this as inline TVF and call it as a one liner, passing the XML as parameter.

DECLARE @xml XML=
N'<root xmlns="defaultNs" xmlns:ns1="dummy1" xmlns:other="SomeOther">

  <!-- this element contains several attributes in various namespaces
       Hint: An attribute without a prefix is assumed to live in the same namespace as its element -->
  <ns1:level1 test1="test1" ns1:test2="test2" other:test3="test3">

    <other:InnerElement>Some inner element</other:InnerElement>

    <!-- this element contains several text nodes -->
    <multiText>text1<someInner>blah</someInner>text2<someInner/>text3</multiText>

    <!-- repeating elements some of them with attributes -->
    <repeating>rep 1</repeating>
    <repeating r2="r2">rep 2</repeating>

    <!-- one with the same name, but living in another namespace -->
    <other:repeating r4="r4">rep 4</other:repeating>

    <!-- some deeper nesting -->
    <level2>
        <level3/>
        <level3>
            <content>Content in second level3 element</content>
        </level3>
    </level2>

    <!-- and one more of the repeating, but listed in a lower position -->
    <repeating oneMore="oneMore">one more</repeating>

  </ns1:level1>
</root>';

--查询

WITH AllNamespaces As
(
    SELECT  CONCAT('ns',ROW_NUMBER() OVER(ORDER BY (B.namespaceUri))) Prefix
           ,B.namespaceUri
    FROM @xml.nodes('//*') A(nd)
    CROSS APPLY(VALUES(A.nd.value('namespace-uri(.)','nvarchar(max)')))B(namespaceUri)
    WHERE LEN(B.namespaceUri)>0
    GROUP BY B.namespaceUri
)
,recCte AS
(
    SELECT 1 AS RecursionLevel
          ,1 AS NodeType
          ,ROW_NUMBER() OVER(ORDER BY A.nd) AS ElementPosition
          ,CAST(REPLACE(STR(ROW_NUMBER() OVER(ORDER BY A.nd),5),' ','0') AS VARCHAR(900)) COLLATE DATABASE_DEFAULT AS SortString
          ,ns.Prefix AS CurrentPrefix
          ,ns.namespaceUri AS CurrentUri
          ,CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']') AS FullName
          ,CAST(CONCAT('/',ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']') AS NVARCHAR(MAX)) COLLATE DATABASE_DEFAULT AS XPath
          ,A.nd.query('.') CurrentFragment
          ,A.nd.query('./*') NextFragment
    FROM @xml.nodes('/*') A(nd)
    LEFT JOIN AllNamespaces ns ON ns.namespaceUri=A.nd.value('namespace-uri(.)','nvarchar(max)') 

    UNION ALL

    SELECT r.RecursionLevel+1
          ,1 
          ,ROW_NUMBER() OVER(ORDER BY A.nd)  
          ,CAST(CONCAT(r.SortString,REPLACE(STR(ROW_NUMBER() OVER(ORDER BY A.nd),5),' ','0')) AS VARCHAR(900)) COLLATE DATABASE_DEFAULT
          ,ns.Prefix
          ,ns.namespaceUri
          ,CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']') 
          ,CONCAT(r.XPath,'/',ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']') 
          ,A.nd.query('.') CurrentFragment
          ,A.nd.query('./*') NextFragment
    FROM recCte r
    CROSS APPLY NextFragment.nodes('*') A(nd)
    OUTER APPLY(SELECT Prefix,namespaceUri FROM AllNamespaces ns WHERE ns.namespaceUri=A.nd.value('namespace-uri(.)','nvarchar(max)')) ns
)
,WithValues AS
(
    SELECT r.RecursionLevel
          ,CASE WHEN LEN(B.NodeValue)>0 THEN 3 ELSE r.NodeType END AS NodeType
          ,r.ElementPosition
          ,CASE WHEN LEN(B.NodeValue)>0 THEN CONCAT(r.SortString,REPLACE(STR(ROW_NUMBER() OVER(PARTITION BY r.Xpath ORDER BY A.txt),5),' ','0')) ELSE r.SortString END AS SortString 
          ,r.CurrentPrefix 
          ,r.CurrentUri
          ,CASE WHEN LEN(B.NodeValue)>0 THEN 'text()' ELSE r.FullName END AS FullName
          ,r.XPath AS OrigXPath
          ,CASE WHEN LEN(B.NodeValue)>0 THEN CONCAT(r.XPath,'/text()[',ROW_NUMBER() OVER(PARTITION BY r.Xpath ORDER BY A.txt),']') ELSE r.XPath END AS XPath
          ,CASE WHEN LEN(B.NodeValue)>0 THEN B.NodeValue ELSE NULL END AS NodeValue
          ,r.CurrentFragment
          ,r.NextFragment
    FROM recCte r
    OUTER APPLY r.CurrentFragment.nodes('*/text()') A(txt)
    OUTER APPLY (SELECT A.txt.value('.','nvarchar(max)')) B(NodeValue)
)
,WithAttributes AS
(
    SELECT RecursionLevel
          ,NodeType
          ,ElementPosition
          ,SortString
          ,CurrentPrefix
          ,CurrentUri
          ,FullName
          ,XPath
          ,NodeValue
          ,CurrentFragment
          ,NextFragment 
    FROM WithValues

    UNION ALL

    SELECT wv.RecursionLevel
          ,2
          ,wv.ElementPosition
          ,wv.SortString 
          ,CASE WHEN ns.Prefix IS NOT NULL THEN ns.Prefix ELSE wv.CurrentPrefix END AS CurrentPrefix
          ,CASE WHEN ns.namespaceUri IS NOT NULL THEN ns.namespaceUri ELSE wv.CurrentUri END AS CurrentUri
          ,CONCAT('@',ns.Prefix+':',B.AttrName) AS FullName
          ,CONCAT(wv.OrigXPath,'/@',ns.Prefix+':',B.AttrName) AS XPath
          ,A.attr.value('.','nvarchar(max)') AS NodeValue
          ,wv.CurrentFragment
          ,wv.NextFragment
    FROM WithValues wv
    CROSS APPLY wv.CurrentFragment.nodes('*/@*') A(attr)
    CROSS APPLY (SELECT A.attr.value('local-name(.)','nvarchar(max)') AS AttrName
                       ,A.attr.value('.','nvarchar(max)') AS AttrValue
                       ,A.attr.value('namespace-uri(.)','nvarchar(max)') AS namespaceUri) B
    OUTER APPLY(SELECT Prefix,namespaceUri FROM AllNamespaces ns WHERE ns.namespaceUri=B.namespaceUri) ns
)

SELECT NodeType
      ,CurrentPrefix
      ,CurrentUri
      ,FullName
      ,XPath
      ,NodeValue 
FROM WithAttributes
WHERE NodeValue IS NOT NULL
ORDER BY SortString;

--结果

/*
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| NodeType | CurrentPrefix | CurrentUri | FullName   | XPath                                                                           | NodeValue                        |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 2        | ns2           | dummy1     | @test1     | /ns1:root[1]/ns2:level1[1]/@test1                                               | test1                            |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 2        | ns2           | dummy1     | @ns2:test2 | /ns1:root[1]/ns2:level1[1]/@ns2:test2                                           | test2                            |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 2        | ns3           | SomeOther  | @ns3:test3 | /ns1:root[1]/ns2:level1[1]/@ns3:test3                                           | test3                            |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3        | ns3           | SomeOther  | text()     | /ns1:root[1]/ns2:level1[1]/ns3:InnerElement[1]/text()[1]                        | Some inner element               |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3        | ns1           | defaultNs  | text()     | /ns1:root[1]/ns2:level1[1]/ns1:multiText[1]/text()[1]                           | text1                            |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3        | ns1           | defaultNs  | text()     | /ns1:root[1]/ns2:level1[1]/ns1:multiText[1]/ns1:someInner[1]/text()[1]          | blah                             |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3        | ns1           | defaultNs  | text()     | /ns1:root[1]/ns2:level1[1]/ns1:multiText[1]/text()[2]                           | text2                            |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3        | ns1           | defaultNs  | text()     | /ns1:root[1]/ns2:level1[1]/ns1:multiText[1]/text()[3]                           | text3                            |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3        | ns1           | defaultNs  | text()     | /ns1:root[1]/ns2:level1[1]/ns1:repeating[1]/text()[1]                           | rep 1                            |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3        | ns1           | defaultNs  | text()     | /ns1:root[1]/ns2:level1[1]/ns1:repeating[2]/text()[1]                           | rep 2                            |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 2        | ns1           | defaultNs  | @r2        | /ns1:root[1]/ns2:level1[1]/ns1:repeating[2]/@r2                                 | r2                               |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 2        | ns3           | SomeOther  | @r4        | /ns1:root[1]/ns2:level1[1]/ns3:repeating[1]/@r4                                 | r4                               |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3        | ns3           | SomeOther  | text()     | /ns1:root[1]/ns2:level1[1]/ns3:repeating[1]/text()[1]                           | rep 4                            |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3        | ns1           | defaultNs  | text()     | /ns1:root[1]/ns2:level1[1]/ns1:level2[1]/ns1:level3[2]/ns1:content[1]/text()[1] | Content in second level3 element |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 2        | ns1           | defaultNs  | @oneMore   | /ns1:root[1]/ns2:level1[1]/ns1:repeating[3]/@oneMore                            | oneMore                          |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
| 3        | ns1           | defaultNs  | text()     | /ns1:root[1]/ns2:level1[1]/ns1:repeating[3]/text()[1]                           | one more                         |
+----------+---------------+------------+------------+---------------------------------------------------------------------------------+----------------------------------+
*/

--只是为了表明,创建的 XPath 返回预期的结果(注意:我们必须使用我们自己的前缀 - 即使对于默认命名空间):

--Just to show, that the created XPaths return the expected (attention: We must use our own prefixes - even for the default namespace):

WITH XMLNAMESPACES( 'defaultNs' AS ns1
                   ,'dummy1'    AS ns2
                   ,'SomeOther' AS ns3)
SELECT @xml.value('/ns1:root[1]/ns2:level1[1]/ns1:multiText[1]/ns1:someInner[1]/text()[1]','nvarchar(max)') Is_blah
      ,@xml.value('/ns1:root[1]/ns2:level1[1]/ns1:level2[1]/ns1:level3[2]/ns1:content[1]/text()[1]','nvarchar(max)') Is_Content_in_second_level3_element
      ,@xml.value('/ns1:root[1]/ns2:level1[1]/ns1:repeating[3]/@oneMore','nvarchar(max)') Is_attribute_oneMore
      ,@xml.value('/ns1:root[1]/ns2:level1[1]/ns1:multiText[1]/text()[3]','nvarchar(max)') Is_3rd_text_in_multiText;

简单的想法:

  • 命名空间前缀可以由您自己定义.T-SQL 中没有可用的 XQuery 函数来查找实际前缀,因此我们只使用我们自己的前缀.底层 URI 很重要.
  • 第一个 cte 将创建一组所有出现的 URI,并将其与前缀一起返回.
  • 递归 CTE 将越来越深入地遍历 XML.只要带有 .nodes()APPLY 可以返回嵌套节点,这就会继续.
  • 一个 CTE 添加 text() 节点 - 如果有的话.
  • 一个 CTE 添加属性 - 如果有的话.
  • 全名以及完整的 XPath 都被串联起来.
  • NodeType 有助于区分元素 (=1)、属性 (=2) 和 text() (=3)
  • CAST 和 COLLATE 有助于避免数据类型不匹配(递归 CTE 对此非常挑剔).
  • 需要串联的 SortString 以确保输出中的顺序相同.
  • 您可以使用 SELECT * ... 查看所有返回的列...
  • 您可以在没有 WHERE NodeValue IS NOT NULL 的情况下查询此内容以查看更多空结构.
  • The namespace prefixes can be defined by your own. There is no XQuery-function available in T-SQL to find the actual prefix, so we just use our own prefixes. The underlying URI is important.
  • The first cte will create a set of all occuring URIs and return this together with a prefix.
  • The recursive CTE will traverse deeper and deeper into the XML. This will continue as long as APPLY with .nodes() can return nested nodes.
  • One CTE adds text() nodes - if there are any.
  • One CTE adds attributes - if there are any.
  • The full name is concatenated as well as the full XPath.
  • The NodeType helps to distinguish between elements (=1), attributes (=2) and text() (=3)
  • The CASTs and COLLATEs help to avoid data type mismatch (recursive CTEs are very picky with this).
  • The concatenated SortString is needed to ensure the same order in your output.
  • You might use SELECT * ... to see all returned colums...
  • You might query this without WHERE NodeValue IS NOT NULL to see more of the empty structure.

这篇关于将带有命名空间、属性等的未知 XML 读入完整的 EAV 列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆