为什么使用 MSXML v3.0 解析 XML 文档有效,但 MSXML v6.0 无效 [英] Why does parsing XML document using MSXML v3.0 work, but MSXML v6.0 doesn't
问题描述
因此,我正在开展一个项目,该项目根据每个来源的特征,使用多种不同的方法从互联网上的许多不同来源抓取和收集数据.
So, I am working on a project that scrapes and collects data from many different sources around the internet with many different methods depending on each source's characteristics.
最近添加的是一个 Web API
调用,它返回以下 XML
作为响应:
The most recent addition is a web API
call which returns the following XML
as a response:
<?xml version="1.0"?>
<Publication_MarketDocument xmlns="urn:iec62325.351:tc57wg16:451-3:publicationdocument:7:0">
<mRID>29b526a69b9445a7bb507ba446e3e8f9</mRID>
<revisionNumber>1</revisionNumber>
<type>A44</type>
<sender_MarketParticipant.mRID codingScheme="A01">10X1001A1001A450</sender_MarketParticipant.mRID>
<sender_MarketParticipant.marketRole.type>A32</sender_MarketParticipant.marketRole.type>
<receiver_MarketParticipant.mRID codingScheme="A01">10X1001A1001A450</receiver_MarketParticipant.mRID>
<receiver_MarketParticipant.marketRole.type>A33</receiver_MarketParticipant.marketRole.type>
<createdDateTime>2019-09-19T11:28:51Z</createdDateTime>
<period.timeInterval>
<start>2019-09-18T22:00Z</start>
<end>2019-09-19T22:00Z</end>
</period.timeInterval>
<TimeSeries>
<mRID>1</mRID>
<businessType>A62</businessType>
<in_Domain.mRID codingScheme="A01">10YCS-SERBIATSOV</in_Domain.mRID>
<out_Domain.mRID codingScheme="A01">10YCS-SERBIATSOV</out_Domain.mRID>
<currency_Unit.name>EUR</currency_Unit.name>
<price_Measure_Unit.name>MWH</price_Measure_Unit.name>
<curveType>A01</curveType>
<Period>
<timeInterval>
<start>2019-09-18T22:00Z</start>
<end>2019-09-19T22:00Z</end>
</timeInterval>
<resolution>PT60M</resolution>
<Point>
<position>1</position>
<price.amount>44.08</price.amount>
</Point>
<Point>
<position>2</position>
<price.amount>37.14</price.amount>
</Point>
<Point>
<position>3</position>
<price.amount>32.21</price.amount>
</Point>
<Point>
<position>4</position>
<price.amount>31.44</price.amount>
</Point>
<Point>
<position>5</position>
<price.amount>32.48</price.amount>
</Point>
<Point>
<position>6</position>
<price.amount>45.52</price.amount>
</Point>
<Point>
<position>7</position>
<price.amount>56.05</price.amount>
</Point>
<Point>
<position>8</position>
<price.amount>74.96</price.amount>
</Point>
<Point>
<position>9</position>
<price.amount>74.08</price.amount>
</Point>
<Point>
<position>10</position>
<price.amount>69.03</price.amount>
</Point>
<Point>
<position>11</position>
<price.amount>72.89</price.amount>
</Point>
<Point>
<position>12</position>
<price.amount>68.91</price.amount>
</Point>
<Point>
<position>13</position>
<price.amount>74.95</price.amount>
</Point>
<Point>
<position>14</position>
<price.amount>72.91</price.amount>
</Point>
<Point>
<position>15</position>
<price.amount>75.97</price.amount>
</Point>
<Point>
<position>16</position>
<price.amount>76.49</price.amount>
</Point>
<Point>
<position>17</position>
<price.amount>59.08</price.amount>
</Point>
<Point>
<position>18</position>
<price.amount>60.19</price.amount>
</Point>
<Point>
<position>19</position>
<price.amount>64.69</price.amount>
</Point>
<Point>
<position>20</position>
<price.amount>69.18</price.amount>
</Point>
<Point>
<position>21</position>
<price.amount>64.97</price.amount>
</Point>
<Point>
<position>22</position>
<price.amount>63.38</price.amount>
</Point>
<Point>
<position>23</position>
<price.amount>52.92</price.amount>
</Point>
<Point>
<position>24</position>
<price.amount>48.08</price.amount>
</Point>
</Period>
</TimeSeries>
</Publication_MarketDocument>
使用 Microsoft XML, v6.0
成功处理了类似的情况后,我尝试了以下操作:
Having dealt successfully with situations like that using Microsoft XML, v6.0
I tried the following:
Dim respXML As New MSXML2.DOMDocument60
respXML.LoadXML (ThisWorkbook.Worksheets("Sheet2").Range("A1")) 'for the sake of the post's simplicity I'm loading the xml from excel
Debug.Print respXML.getElementsByTagName("price.amount").Length
这应该返回 24
但它返回 0
.确实如下:
This should be returning 24
but instead it returns 0
.
Indeed the following:
Debug.Print respXML.getElementsByTagName("price.amount")(1) Is Nothing
返回 True
,这意味着未找到
元素.但是,Debug.Print respXML.XML
产生了预期的结果.
returns True
, which means that the <price.amount></price.amount>
elements are not being found. However, Debug.Print respXML.XML
yields the expected results.
我在某处读到过早绑定可能会导致问题,所以我也尝试了以下方法:
I read somewhere that early binding could be causing problems so I tried the following as well:
Dim respXML As Object
Set respXML = CreateObject("MSXML2.DOMDocument.6.0")
respXML.LoadXML (ThisWorkbook.Worksheets("Sheet2").Range("A1"))
Debug.Print respXML.getElementsByTagName("price.amount").Length
Debug.Print respXML.getElementsByTagName("price.amount")(1) Is Nothing
结果还是一样.
切换到 Microsoft XML, v3.0
完全解决了这个问题.
Switching to Microsoft XML, v3.0
resolves the issue completely.
但是,我更愿意坚持使用 v6.0,因为它是一个得到更积极维护和支持的版本.
However, I would prefer sticking to v6.0 since it's the one that is more actively being maintained and supported.
为什么会这样?它与 XML 本身有关吗?跟我的代码有关系吗?我错过了什么吗?有没有办法让它与 Microsoft XML, v6.0
一起使用?
Why does this happen? Does it have to do with the XML itself? Does it have to do with my code? Am I missing something? Is there a way to make it work with Microsoft XML, v6.0
?
任何输入将不胜感激.
推荐答案
为了扩展 @CindyMeister 的 answer,问题确实似乎是使用 getElementsByTagName()
在 MSXML 版本之间进行命名空间处理.具体来说,您的 XML 维护一个 xmlns
属性,没有冒号标识的前缀,这需要 DOM 库在解析内容时分配前缀:
To extend @CindyMeister's answer, the issue does appear to be namespace handling between the MSXML versions using getElementsByTagName()
. Specifically, your XML maintains an xmlns
attribute without colon identified prefix which requires DOM libraries to assign a prefix when parsing content:
<Publication_MarketDocument xmlns="urn:iec62325.351:tc57wg16:451-3:publicationdocument:7:0" ...
然而,使用 SelectionNamespaces
+ SelectNodes
定义一个临时别名,例如 doc,到默认的命名空间前缀,两个库都打印出预期的结果.和 MS 文档甚至建议后一种方法(强调):
However, using SelectionNamespaces
+ SelectNodes
to define a temporary alias, such as doc, to default namespace prefix, both libraries print out expected results. And MS docs even advises the latter method (emphasis added):
getElementsByTagName
方法模拟了提供了针对 tagName
属性结果的参数IXMLDOMElement
.执行时,它不承认或不支持命名空间.相反,您应该使用selectNodes
方法,即在某些情况下更快并且可以支持更复杂的搜索.
The
getElementsByTagName
method simulates the matching of the provided argument against the result of thetagName
property ofIXMLDOMElement
. When executed, it does not recognize or support namespaces. Instead, you should use theselectNodes
method, which is faster in some cases and can support more complex searches.
MXSML v3.0 (打印意外的 getElementsByTagName
结果)
Sub ParseXMLv3()
Dim respXML As New MSXML2.DOMDocument30
respXML.Load "C:\Path\To\Input.xml"
respXML.setProperty "SelectionLanguage", "XPath"
respXML.setProperty "SelectionNamespaces", "xmlns:doc='urn:iec62325.351:tc57wg16:451-3:publicationdocument:7:0'"
Debug.Print respXML.SelectNodes("//doc:price.amount").Length ' PRINTS 24
Debug.Print respXML.SelectNodes("//price.amount").Length ' PRINTS 0
Debug.Print respXML.getElementsByTagName("price.amount").Length ' PRINTS 24
Set respXML = Nothing
End Sub
MSXML v6.0
Sub ParseXMLv6()
Dim respXML As New MSXML2.DOMDocument60
respXML.Load "C:\Path\To\Input.xml"
respXML.setProperty "SelectionLanguage", "XPath"
respXML.setProperty "SelectionNamespaces", "xmlns:doc='urn:iec62325.351:tc57wg16:451-3:publicationdocument:7:0'"
Debug.Print respXML.SelectNodes("//doc:price.amount").Length ' PRINTS 24
Debug.Print respXML.SelectNodes("//price.amount").Length ' PRINTS 0
Debug.Print respXML.getElementsByTagName("price.amount").Length ' PRINTS 0
Set respXML = Nothing
End Sub
这篇关于为什么使用 MSXML v3.0 解析 XML 文档有效,但 MSXML v6.0 无效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!