为什么使用 MSXML v3.0 解析 XML 文档有效,但 MSXML v6.0 无效 [英] Why does parsing XML document using MSXML v3.0 work, but MSXML v6.0 doesn't

查看:26
本文介绍了为什么使用 MSXML v3.0 解析 XML 文档有效,但 MSXML v6.0 无效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我正在开展一个项目,该项目根据每个来源的特征,使用多种不同的方法从互联网上的许多不同来源抓取和收集数据.

So, I am working on a project that scrapes and collects data from many different sources around the internet with many different methods depending on each source's characteristics.

最近添加的是一个 Web API 调用,它返回以下 XML 作为响应:

The most recent addition is a web API call which returns the following XML as a response:

<?xml version="1.0"?>
<Publication_MarketDocument xmlns="urn:iec62325.351:tc57wg16:451-3:publicationdocument:7:0">
    <mRID>29b526a69b9445a7bb507ba446e3e8f9</mRID>
    <revisionNumber>1</revisionNumber>
    <type>A44</type>
    <sender_MarketParticipant.mRID codingScheme="A01">10X1001A1001A450</sender_MarketParticipant.mRID>
    <sender_MarketParticipant.marketRole.type>A32</sender_MarketParticipant.marketRole.type>
    <receiver_MarketParticipant.mRID codingScheme="A01">10X1001A1001A450</receiver_MarketParticipant.mRID>
    <receiver_MarketParticipant.marketRole.type>A33</receiver_MarketParticipant.marketRole.type>
    <createdDateTime>2019-09-19T11:28:51Z</createdDateTime>
    <period.timeInterval>
        <start>2019-09-18T22:00Z</start>
        <end>2019-09-19T22:00Z</end>
    </period.timeInterval>
    <TimeSeries>
        <mRID>1</mRID>
        <businessType>A62</businessType>
        <in_Domain.mRID codingScheme="A01">10YCS-SERBIATSOV</in_Domain.mRID>
        <out_Domain.mRID codingScheme="A01">10YCS-SERBIATSOV</out_Domain.mRID>
        <currency_Unit.name>EUR</currency_Unit.name>
        <price_Measure_Unit.name>MWH</price_Measure_Unit.name>
        <curveType>A01</curveType>
        <Period>
            <timeInterval>
                <start>2019-09-18T22:00Z</start>
                <end>2019-09-19T22:00Z</end>
            </timeInterval>
            <resolution>PT60M</resolution>
            <Point>
                <position>1</position>
                <price.amount>44.08</price.amount>
            </Point>
            <Point>
                <position>2</position>
                <price.amount>37.14</price.amount>
            </Point>
            <Point>
                <position>3</position>
                <price.amount>32.21</price.amount>
            </Point>
            <Point>
                <position>4</position>
                <price.amount>31.44</price.amount>
            </Point>
            <Point>
                <position>5</position>
                <price.amount>32.48</price.amount>
            </Point>
            <Point>
                <position>6</position>
                <price.amount>45.52</price.amount>
            </Point>
            <Point>
                <position>7</position>
                <price.amount>56.05</price.amount>
            </Point>
            <Point>
                <position>8</position>
                <price.amount>74.96</price.amount>
            </Point>
            <Point>
                <position>9</position>
                <price.amount>74.08</price.amount>
            </Point>
            <Point>
                <position>10</position>
                <price.amount>69.03</price.amount>
            </Point>
            <Point>
                <position>11</position>
                <price.amount>72.89</price.amount>
            </Point>
            <Point>
                <position>12</position>
                <price.amount>68.91</price.amount>
            </Point>
            <Point>
                <position>13</position>
                <price.amount>74.95</price.amount>
            </Point>
            <Point>
                <position>14</position>
                <price.amount>72.91</price.amount>
            </Point>
            <Point>
                <position>15</position>
                <price.amount>75.97</price.amount>
            </Point>
            <Point>
                <position>16</position>
                <price.amount>76.49</price.amount>
            </Point>
            <Point>
                <position>17</position>
                <price.amount>59.08</price.amount>
            </Point>
            <Point>
                <position>18</position>
                <price.amount>60.19</price.amount>
            </Point>
            <Point>
                <position>19</position>
                <price.amount>64.69</price.amount>
            </Point>
            <Point>
                <position>20</position>
                <price.amount>69.18</price.amount>
            </Point>
            <Point>
                <position>21</position>
                <price.amount>64.97</price.amount>
            </Point>
            <Point>
                <position>22</position>
                <price.amount>63.38</price.amount>
            </Point>
            <Point>
                <position>23</position>
                <price.amount>52.92</price.amount>
            </Point>
            <Point>
                <position>24</position>
                <price.amount>48.08</price.amount>
            </Point>
        </Period>
    </TimeSeries>
</Publication_MarketDocument> 

使用 Microsoft XML, v6.0 成功处理了类似的情况后,我尝试了以下操作:

Having dealt successfully with situations like that using Microsoft XML, v6.0 I tried the following:

Dim respXML As New MSXML2.DOMDocument60
respXML.LoadXML (ThisWorkbook.Worksheets("Sheet2").Range("A1")) 'for the sake of the post's simplicity I'm loading the xml from excel
Debug.Print respXML.getElementsByTagName("price.amount").Length

这应该返回 24 但它返回 0.确实如下:

This should be returning 24 but instead it returns 0. Indeed the following:

Debug.Print respXML.getElementsByTagName("price.amount")(1) Is Nothing

返回 True,这意味着未找到 元素.但是,Debug.Print respXML.XML 产生了预期的结果.

returns True, which means that the <price.amount></price.amount> elements are not being found. However, Debug.Print respXML.XML yields the expected results.

我在某处读到过早绑定可能会导致问题,所以我也尝试了以下方法:

I read somewhere that early binding could be causing problems so I tried the following as well:

Dim respXML As Object
Set respXML = CreateObject("MSXML2.DOMDocument.6.0")
respXML.LoadXML (ThisWorkbook.Worksheets("Sheet2").Range("A1"))
Debug.Print respXML.getElementsByTagName("price.amount").Length
Debug.Print respXML.getElementsByTagName("price.amount")(1) Is Nothing

结果还是一样.

切换到 Microsoft XML, v3.0 完全解决了这个问题.

Switching to Microsoft XML, v3.0 resolves the issue completely.

但是,我更愿意坚持使用 v6.0,因为它是一个得到更积极维护和支持的版本.

However, I would prefer sticking to v6.0 since it's the one that is more actively being maintained and supported.

为什么会这样?它与 XML 本身有关吗?跟我的代码有关系吗?我错过了什么吗?有没有办法让它与 Microsoft XML, v6.0 一起使用?

Why does this happen? Does it have to do with the XML itself? Does it have to do with my code? Am I missing something? Is there a way to make it work with Microsoft XML, v6.0?

任何输入将不胜感激.

推荐答案

为了扩展 @CindyMeister 的 answer,问题确实似乎是使用 getElementsByTagName() 在 MSXML 版本之间进行命名空间处理.具体来说,您的 XML 维护一个 xmlns 属性,没有冒号标识的前缀,这需要 DOM 库在解析内容时分配前缀:

To extend @CindyMeister's answer, the issue does appear to be namespace handling between the MSXML versions using getElementsByTagName(). Specifically, your XML maintains an xmlns attribute without colon identified prefix which requires DOM libraries to assign a prefix when parsing content:

<Publication_MarketDocument xmlns="urn:iec62325.351:tc57wg16:451-3:publicationdocument:7:0" ...

然而,使用 SelectionNamespaces + SelectNodes 定义一个临时别名,例如 doc,到默认的命名空间前缀,两个库都打印出预期的结果.和 MS 文档甚至建议后一种方法(强调):

However, using SelectionNamespaces + SelectNodes to define a temporary alias, such as doc, to default namespace prefix, both libraries print out expected results. And MS docs even advises the latter method (emphasis added):

getElementsByTagName 方法模拟了提供了针对 tagName 属性结果的参数IXMLDOMElement.执行时,它不承认或不支持命名空间.相反,您应该使用selectNodes 方法,即在某些情况下更快并且可以支持更复杂的搜索.

The getElementsByTagName method simulates the matching of the provided argument against the result of the tagName property of IXMLDOMElement. When executed, it does not recognize or support namespaces. Instead, you should use the selectNodes method, which is faster in some cases and can support more complex searches.

MXSML v3.0 (打印意外的 getElementsByTagName 结果)

Sub ParseXMLv3()
    Dim respXML As New MSXML2.DOMDocument30

    respXML.Load "C:\Path\To\Input.xml"
    respXML.setProperty "SelectionLanguage", "XPath"
    respXML.setProperty "SelectionNamespaces", "xmlns:doc='urn:iec62325.351:tc57wg16:451-3:publicationdocument:7:0'"

    Debug.Print respXML.SelectNodes("//doc:price.amount").Length       ' PRINTS 24
    Debug.Print respXML.SelectNodes("//price.amount").Length           ' PRINTS 0
    Debug.Print respXML.getElementsByTagName("price.amount").Length    ' PRINTS 24

    Set respXML = Nothing
End Sub

MSXML v6.0

Sub ParseXMLv6()
    Dim respXML As New MSXML2.DOMDocument60

    respXML.Load "C:\Path\To\Input.xml"
    respXML.setProperty "SelectionLanguage", "XPath"
    respXML.setProperty "SelectionNamespaces", "xmlns:doc='urn:iec62325.351:tc57wg16:451-3:publicationdocument:7:0'"

    Debug.Print respXML.SelectNodes("//doc:price.amount").Length       ' PRINTS 24
    Debug.Print respXML.SelectNodes("//price.amount").Length           ' PRINTS 0
    Debug.Print respXML.getElementsByTagName("price.amount").Length    ' PRINTS 0

    Set respXML = Nothing
End Sub

这篇关于为什么使用 MSXML v3.0 解析 XML 文档有效,但 MSXML v6.0 无效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆