如何提高VBA中XML解析的速度 [英] How can I improve the speed of XML parsing in VBA
问题描述
我有一个需要在VBA中解析的大型XML文件(excel 2003和2007)。 xml文件中可能会有超过11,000行的数据,每个行具有10到20个列。这最终只是解析和抓取数据(5 - 7分钟)的巨大任务。我尝试阅读xml并将每个行放入字典(key =行号,值=行属性),但这需要很长时间。
它是永远走过DOM。有没有更有效的方法?
Dim XMLDict
Sub ParseXML(ByRef RootNode As IXMLDOMNode)
Dim Counter As Long
Dim RowList As IXMLDOMNodeList
Dim ColumnList As IXMLDOMNodeList
Dim RowNode As IXMLDOMNode
Dim ColumnNode As IXMLDOMNode
Counter = 1
Set RowList = RootNode .SelectNodes(Row)
RowList中的每个RowNode
设置ColumnList = RowNode.SelectNodes(Col)
Dim NodeValues As String
对于每个ColumnNode在ColumnList
NodeValues = NodeValues& | &安培; ColumnNode.Attributes.getNamedItem(id)。Text& :& ColumnNode.Text
Next ColumnNode
XMLDICT.Add Counter,NodeValues
Counter = Counter + 1
Next RowNode
End Sub
您可以尝试使用SAX而不是DOM。当您正在做的是解析文档时,SAX应该更快,并且文档的大小是不平凡的。 MSXML中SAX2实现的参考资料是 here
我通常在Excel中直接针对DOM进行大多数XML解析,但在某些情况下,SAX似乎具有优势。简短的比较此处可能有助于解释它们之间的差异。
这是一个黑客代码示例(部分基于 this )只需使用 Debug.Print
输出:
通过工具>参考文献添加对Microsoft XML,v6.0的引用
将此代码添加到正常模块中
Option Explicit
Sub main()
Dim saxReader As SAXXMLReader60
Dim saxhandler As ContentHandlerImpl
设置saxReader =新的SAXXMLReader60
设置saxhandler =新的ContentHandlerImpl
设置saxReader.contentHandler = saxhandler
saxReader.parseURLfile:// C:\Users\\ \\ foo\Desktop\bar.xml
Set saxReader = Nothing
End Sub
添加一个类m odule,调用它 ContentHandlerImpl
并添加以下代码
Option Explicit
实现IVBSAXContentHandler
私有lCounter As Long
私有sNodeValues As String
私有bGetChars作为布尔
使用模块顶部的左侧下拉列表选择IVBSAXContentHandler,然后使用右侧的下拉列表为每个事件依次(从字符
到 startPrefixMapping
)
将代码添加到某些存根中,如下所示
显式地设置计数器和标志,以显示我们此时是否要读取文本数据
Private Sub IVBSAXContentHandler_startDocument()
lCounter = 0
bGetChars = False
结束Sub
每次新元素启动时,请检查元素的名称并采取适当的措施
Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String,strLocalName As String,strQName As String,ByVal oAttributes As MSXML2.IVBSAXAttributes)
检查一下我们是否对文本数据感兴趣,如果我们是剔除任何无关的空白处,并删除所有换行符(这可能是或者可能不需要取决于您要解析的文档)
选择案例strLocalName
案例Row
sNodeValues =
案例Col
sNodeValues = sNodeValues& | &安培; oAttributes.getValueFromName(strNamespaceURI,id)& :
bGetChars = True
Case Else
'do nothing
End选择
End Sub
Private Sub IVBSAXContentHandler_characters(strChars As String)
如果(bGetChars)然后
sNodeValues = sNodeValues&如果
End Sub
如果我们已经到达
Col
的结尾,那么停止读取文本值;如果我们已经到达Row
的结尾,然后打印出节点值的字符串Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String,strLocalName As String,strQName As String)
选择案例strLocalName
案例Col
bGetChars = False
案例行
lCounter = lCounter + 1
Debug.Print lCounter& & sNodeValues
Case Else
'do nothing
End选择
End Sub
为了使事情更清楚,这里是完整版本的
ContentHandlerImpl
方法到位:Option Explicit
实现IVBSAXContentHandler
私人lCounter As Long
Private sNodeValues As String
Private bGetChars As Boolean
Private Sub IVBSAXContentHandler_characters(strChars As String)
If(bGetChars)Then
sNodeValues = sNodeValues&替换(Trim $(strChars),vbLf,)
End If
End Sub
私有属性集IVBSAXContentHandler_documentLocator(ByVal RHS As MSXML2.IVBSAXLocator)
结束属性
私有子IVBSAXContentHandler_endDocument()
End Sub
私有子IVBSAXContentHandler_endElement(strNamespaceURI As String,strLocalName As String,strQName As String)
选择案例strLocalName
案例Col
bGetChars = False
案例行
lCounter = lCounter + 1
Debug.Print lCounter& & sNodeValues
Case Else
'do nothing
End选择
End Sub
私有子IVBSAXContentHandler_endPrefixMapping(strPrefix As String)
End Sub
私有子IVBSAXContentHandler_ignorableWhitespace(strChars As String)
End Sub
私有子IVBSAXContentHandler_processingInstruction(strTarget As String,strData As String)
End Sub
私有子IVBSAXContentHandler_skippedEntity(strName As String)
End Sub
Private Sub IVBSAXContentHandler_startDocument )
lCounter = 0
bGetChars = False
End Sub
私有子IVBSAXContentHandler_startElement(strNamespaceURI As String,strLocalName As String,strQName As String,ByVal oAttributes As MSXML2.IVBSAXAttributes)
选择案例strLocalName
案例行
sNodeValues =
案例Col
sNodeValues = sNodeValues& | &安培; oAttributes.getValueFromName(strNamespaceURI,id)& :
bGetChars = True
Case Else
'do nothing
End选择
End Sub
Private Sub IVBSAXContentHandler_startPrefixMapping (strPrefix As String,strURI As String)
End Sub
I have a large XML file that needs parsed in VBA (excel 2003 & 2007). There could be upwards of 11,000 'rows' of data in the xml file with each 'row' having between 10 and 20 'columns'. This ends up being a huge task just to parse through and grab the data (5 - 7 minutes). I tried reading the xml and placing each 'row' into a dictionary (key = row number, value = Row Attributes), but this takes just as long.
It is taking forever to traverse the DOM. Is there a more efficient way?
Dim XMLDict Sub ParseXML(ByRef RootNode As IXMLDOMNode) Dim Counter As Long Dim RowList As IXMLDOMNodeList Dim ColumnList As IXMLDOMNodeList Dim RowNode As IXMLDOMNode Dim ColumnNode As IXMLDOMNode Counter = 1 Set RowList = RootNode.SelectNodes("Row") For Each RowNode In RowList Set ColumnList = RowNode.SelectNodes("Col") Dim NodeValues As String For Each ColumnNode In ColumnList NodeValues = NodeValues & "|" & ColumnNode.Attributes.getNamedItem("id").Text & ":" & ColumnNode.Text Next ColumnNode XMLDICT.Add Counter, NodeValues Counter = Counter + 1 Next RowNode End Sub
解决方案You could try using SAX instead of DOM. SAX should be faster when all you are doing is parsing the document and the document is non-trivial in size. The reference for the SAX2 implementation in MSXML is here
I typically reach straight for the DOM for most XML parsing in Excel but SAX seems to have advantages in some situations. The short comparison here might help to explain the differences between them.
Here's a hacked-together example (partially based on this) just using
Debug.Print
for output:Add a reference to "Microsoft XML, v6.0" via Tools > References
Add this code in a normal module
Option Explicit Sub main() Dim saxReader As SAXXMLReader60 Dim saxhandler As ContentHandlerImpl Set saxReader = New SAXXMLReader60 Set saxhandler = New ContentHandlerImpl Set saxReader.contentHandler = saxhandler saxReader.parseURL "file://C:\Users\foo\Desktop\bar.xml" Set saxReader = Nothing End Sub
Add a class module, call it
ContentHandlerImpl
and add the following codeOption Explicit Implements IVBSAXContentHandler Private lCounter As Long Private sNodeValues As String Private bGetChars As Boolean
Use the left-hand drop-down at the top of the module to choose "IVBSAXContentHandler" and then use the right-hand drop-down to add stubs for each event in turn (from
characters
tostartPrefixMapping
)Add code to some of the stubs as follows
Explicitly set up the counter and the flag to show if we want to read text data at this time
Private Sub IVBSAXContentHandler_startDocument() lCounter = 0 bGetChars = False End Sub
Every time a new element starts, check the name of the element and take appropriate action
Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes) Select Case strLocalName Case "Row" sNodeValues = "" Case "Col" sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":" bGetChars = True Case Else ' do nothing End Select End Sub
Check to see if we are interested in the text data and, if we are, chop off any extraneous white space and remove all line feeds (this may or may not be desirable depending on the document you are trying to parse)
Private Sub IVBSAXContentHandler_characters(strChars As String) If (bGetChars) Then sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "") End If End Sub
If we have reached the end of a
Col
then stop reading the text values; if we have reached the end of aRow
then print out the string of node valuesPrivate Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String) Select Case strLocalName Case "Col" bGetChars = False Case "Row" lCounter = lCounter + 1 Debug.Print lCounter & " " & sNodeValues Case Else ' do nothing End Select End Sub
To make things clearer, here is the full version of
ContentHandlerImpl
with al of the stub methods in place:Option Explicit Implements IVBSAXContentHandler Private lCounter As Long Private sNodeValues As String Private bGetChars As Boolean Private Sub IVBSAXContentHandler_characters(strChars As String) If (bGetChars) Then sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "") End If End Sub Private Property Set IVBSAXContentHandler_documentLocator(ByVal RHS As MSXML2.IVBSAXLocator) End Property Private Sub IVBSAXContentHandler_endDocument() End Sub Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String) Select Case strLocalName Case "Col" bGetChars = False Case "Row" lCounter = lCounter + 1 Debug.Print lCounter & " " & sNodeValues Case Else ' do nothing End Select End Sub Private Sub IVBSAXContentHandler_endPrefixMapping(strPrefix As String) End Sub Private Sub IVBSAXContentHandler_ignorableWhitespace(strChars As String) End Sub Private Sub IVBSAXContentHandler_processingInstruction(strTarget As String, strData As String) End Sub Private Sub IVBSAXContentHandler_skippedEntity(strName As String) End Sub Private Sub IVBSAXContentHandler_startDocument() lCounter = 0 bGetChars = False End Sub Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes) Select Case strLocalName Case "Row" sNodeValues = "" Case "Col" sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":" bGetChars = True Case Else ' do nothing End Select End Sub Private Sub IVBSAXContentHandler_startPrefixMapping(strPrefix As String, strURI As String) End Sub
这篇关于如何提高VBA中XML解析的速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!