如何提高VBA中XML解析的速度 [英] How can I improve the speed of XML parsing in VBA

查看:320
本文介绍了如何提高VBA中XML解析的速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个需要在VBA中解析的大型XML文件(excel 2003和2007)。 xml文件中可能会有超过11,000行的数据,每个行具有10到20个列。这最终只是解析和抓取数据(5 - 7分钟)的巨大任务。我尝试阅读xml并将每个行放入字典(key =行号,值=行属性),但这需要很长时间。



它是永远走过DOM。有没有更有效的方法?

  Dim XMLDict 
Sub ParseXML(ByRef RootNode As IXMLDOMNode)
Dim Counter As Long
Dim RowList As IXMLDOMNodeList
Dim ColumnList As IXMLDOMNodeList
Dim RowNode As IXMLDOMNode
Dim ColumnNode As IXMLDOMNode
Counter = 1
Set RowList = RootNode .SelectNodes(Row)

RowList中的每个RowNode
设置ColumnList = RowNode.SelectNodes(Col)
Dim NodeValues As String
对于每个ColumnNode在ColumnList
NodeValues = NodeValues& | &安培; ColumnNode.Attributes.getNamedItem(id)。Text& :& ColumnNode.Text
Next ColumnNode
XMLDICT.Add Counter,NodeValues
Counter = Counter + 1
Next RowNode
End Sub


解决方案

您可以尝试使用SAX而不是DOM。当您正在做的是解析文档时,SAX应该更快,并且文档的大小是不平凡的。 MSXML中SAX2实现的参考资料是 here



我通常在Excel中直接针对DOM进行大多数XML解析,但在某些情况下,SAX似乎具有优势。简短的比较此处可能有助于解释它们之间的差异。



这是一个黑客代码示例(部分基于 this )只需使用 Debug.Print 输出:



通过工具>参考文献添加对Microsoft XML,v6.0的引用



将此代码添加到正常模块中

  Option Explicit 

Sub main()

Dim saxReader As SAXXMLReader60
Dim saxhandler As ContentHandlerImpl

设置saxReader =新的SAXXMLReader60
设置saxhandler =新的ContentHandlerImpl

设置saxReader.contentHandler = saxhandler
saxReader.parseURLfile:// C:\Users\\ \\ foo\Desktop\bar.xml

Set saxReader = Nothing

End Sub

添加一个类m odule,调用它 ContentHandlerImpl 并添加以下代码

  Option Explicit 

实现IVBSAXContentHandler

私有lCounter As Long
私有sNodeValues As String
私有bGetChars作为布尔

使用模块顶部的左侧下拉列表选择IVBSAXContentHandler,然后使用右侧的下拉列表为每个事件依次(从字符 startPrefixMapping



将代码添加到某些存根中,如下所示



显式地设置计数器和标志,以显示我们此时是否要读取文本数据

  Private Sub IVBSAXContentHandler_startDocument()

lCounter = 0
bGetChars = False

结束Sub

每次新元素启动时,请检查元素的名称并采取适当的措施

  Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String,strLocalName As String,strQName As String,ByVal oAttributes As MSXML2.IVBSAXAttributes)

选择案例strLocalName
案例Row
sNodeValues =
案例Col
sNodeValues = sNodeValues& | &安培; oAttributes.getValueFromName(strNamespaceURI,id)& :
bGetChars = True
Case Else
'do nothing
End选择

End Sub
检查一下我们是否对文本数据感兴趣,如果我们是剔除任何无关的空白处,并删除所有换行符(这可能是或者可能不需要取决于您要解析的文档)

  Private Sub IVBSAXContentHandler_characters(strChars As String)

如果(bGetChars)然后
sNodeValues = sNodeValues&如果

End Sub

如果我们已经到达 Col 的结尾,那么停止读取文本值;如果我们已经到达 Row 的结尾,然后打印出节点值的字符串

  Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String,strLocalName As String,strQName As String)

选择案例strLocalName
案例Col
bGetChars = False
案例行
lCounter = lCounter + 1
Debug.Print lCounter& & sNodeValues
Case Else
'do nothing
End选择

End Sub






为了使事情更清楚,这里是完整版本的 ContentHandlerImpl 方法到位:

  Option Explicit 

实现IVBSAXContentHandler

私人lCounter As Long
Private sNodeValues As String
Private bGetChars As Boolean

Private Sub IVBSAXContentHandler_characters(strChars As String)

If(bGetChars)Then
sNodeValues = sNodeValues&替换(Trim $(strChars),vbLf,)
End If

End Sub

私有属性集IVBSAXContentHandler_documentLocator(ByVal RHS As MSXML2.IVBSAXLocator)

结束属性

私有子IVBSAXContentHandler_endDocument()

End Sub

私有子IVBSAXContentHandler_endElement(strNamespaceURI As String,strLocalName As String,strQName As String)

选择案例strLocalName
案例Col
bGetChars = False
案例行
lCounter = lCounter + 1
Debug.Print lCounter& & sNodeValues
Case Else
'do nothing
End选择

End Sub

私有子IVBSAXContentHandler_endPrefixMapping(strPrefix As String)

End Sub

私有子IVBSAXContentHandler_ignorableWhitespace(strChars As String)

End Sub

私有子IVBSAXContentHandler_processingInstruction(strTarget As String,strData As String)

End Sub

私有子IVBSAXContentHandler_skippedEntity(strName As String)

End Sub

Private Sub IVBSAXContentHandler_startDocument )

lCounter = 0
bGetChars = False

End Sub

私有子IVBSAXContentHandler_startElement(strNamespaceURI As String,strLocalName As String,strQName As String,ByVal oAttributes As MSXML2.IVBSAXAttributes)

选择案例strLocalName
案例行
sNodeValues =
案例Col
sNodeValues = sNodeValues& | &安培; oAttributes.getValueFromName(strNamespaceURI,id)& :
bGetChars = True
Case Else
'do nothing
End选择

End Sub

Private Sub IVBSAXContentHandler_startPrefixMapping (strPrefix As String,strURI As String)

End Sub


I have a large XML file that needs parsed in VBA (excel 2003 & 2007). There could be upwards of 11,000 'rows' of data in the xml file with each 'row' having between 10 and 20 'columns'. This ends up being a huge task just to parse through and grab the data (5 - 7 minutes). I tried reading the xml and placing each 'row' into a dictionary (key = row number, value = Row Attributes), but this takes just as long.

It is taking forever to traverse the DOM. Is there a more efficient way?

Dim XMLDict
    Sub ParseXML(ByRef RootNode As IXMLDOMNode)
        Dim Counter As Long
        Dim RowList As IXMLDOMNodeList
        Dim ColumnList As IXMLDOMNodeList
        Dim RowNode As IXMLDOMNode
        Dim ColumnNode As IXMLDOMNode
        Counter = 1
        Set RowList = RootNode.SelectNodes("Row")

        For Each RowNode In RowList
            Set ColumnList = RowNode.SelectNodes("Col")
            Dim NodeValues As String
            For Each ColumnNode In ColumnList
                NodeValues = NodeValues & "|" & ColumnNode.Attributes.getNamedItem("id").Text & ":" & ColumnNode.Text
            Next ColumnNode
            XMLDICT.Add Counter, NodeValues
            Counter = Counter + 1
        Next RowNode
    End Sub

解决方案

You could try using SAX instead of DOM. SAX should be faster when all you are doing is parsing the document and the document is non-trivial in size. The reference for the SAX2 implementation in MSXML is here

I typically reach straight for the DOM for most XML parsing in Excel but SAX seems to have advantages in some situations. The short comparison here might help to explain the differences between them.

Here's a hacked-together example (partially based on this) just using Debug.Print for output:

Add a reference to "Microsoft XML, v6.0" via Tools > References

Add this code in a normal module

Option Explicit

Sub main()

Dim saxReader As SAXXMLReader60
Dim saxhandler As ContentHandlerImpl

Set saxReader = New SAXXMLReader60
Set saxhandler = New ContentHandlerImpl

Set saxReader.contentHandler = saxhandler
saxReader.parseURL "file://C:\Users\foo\Desktop\bar.xml"

Set saxReader = Nothing

End Sub

Add a class module, call it ContentHandlerImpl and add the following code

Option Explicit

Implements IVBSAXContentHandler

Private lCounter As Long
Private sNodeValues As String
Private bGetChars As Boolean

Use the left-hand drop-down at the top of the module to choose "IVBSAXContentHandler" and then use the right-hand drop-down to add stubs for each event in turn (from characters to startPrefixMapping)

Add code to some of the stubs as follows

Explicitly set up the counter and the flag to show if we want to read text data at this time

Private Sub IVBSAXContentHandler_startDocument()

lCounter = 0
bGetChars = False

End Sub

Every time a new element starts, check the name of the element and take appropriate action

Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)

Select Case strLocalName
    Case "Row"
        sNodeValues = ""
    Case "Col"
        sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":"
        bGetChars = True
    Case Else
        ' do nothing
End Select

End Sub

Check to see if we are interested in the text data and, if we are, chop off any extraneous white space and remove all line feeds (this may or may not be desirable depending on the document you are trying to parse)

Private Sub IVBSAXContentHandler_characters(strChars As String)

If (bGetChars) Then
    sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "")
End If

End Sub

If we have reached the end of a Col then stop reading the text values; if we have reached the end of a Row then print out the string of node values

Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)

Select Case strLocalName
    Case "Col"
        bGetChars = False
    Case "Row"
        lCounter = lCounter + 1
        Debug.Print lCounter & " " & sNodeValues
    Case Else
        ' do nothing
End Select

End Sub


To make things clearer, here is the full version of ContentHandlerImpl with al of the stub methods in place:

Option Explicit

Implements IVBSAXContentHandler

Private lCounter As Long
Private sNodeValues As String
Private bGetChars As Boolean

Private Sub IVBSAXContentHandler_characters(strChars As String)

If (bGetChars) Then
    sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "")
End If

End Sub

Private Property Set IVBSAXContentHandler_documentLocator(ByVal RHS As MSXML2.IVBSAXLocator)

End Property

Private Sub IVBSAXContentHandler_endDocument()

End Sub

Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)

Select Case strLocalName
    Case "Col"
        bGetChars = False
    Case "Row"
        lCounter = lCounter + 1
        Debug.Print lCounter & " " & sNodeValues
    Case Else
        ' do nothing
End Select

End Sub

Private Sub IVBSAXContentHandler_endPrefixMapping(strPrefix As String)

End Sub

Private Sub IVBSAXContentHandler_ignorableWhitespace(strChars As String)

End Sub

Private Sub IVBSAXContentHandler_processingInstruction(strTarget As String, strData As String)

End Sub

Private Sub IVBSAXContentHandler_skippedEntity(strName As String)

End Sub

Private Sub IVBSAXContentHandler_startDocument()

lCounter = 0
bGetChars = False

End Sub

Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)

Select Case strLocalName
    Case "Row"
        sNodeValues = ""
    Case "Col"
        sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":"
        bGetChars = True
    Case Else
        ' do nothing
End Select

End Sub

Private Sub IVBSAXContentHandler_startPrefixMapping(strPrefix As String, strURI As String)

End Sub

这篇关于如何提高VBA中XML解析的速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆