使用 VBscript 从节点和子节点的 XML 中提取数据 [英] using VBscript to extract data from XML from nodes and children

查看:30
本文介绍了使用 VBscript 从节点和子节点的 XML 中提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

第一次在这里发帖,对 VBscript 有点新手.我真的可以从你们那里得到一些帮助,他们知道这就像第二天性一样.我已尝试包含一些相关信息,但希望不会太多.

first time poster here and somewhat newbie to VBscript. I could really use some help from you guys who know this like second nature. I've tried to include some pertinent information and hopefully not too much.

我一直在努力让它发挥作用,经过几天的尝试和十几次代码迭代后,我终于联系上了.我还没有找到从 XML 文档中的多个级别(noes 和 chidlren)提取数据的示例.

I've been trying to get this to work and am finally reaching out after a few days time of attempts and a dozen iterations of code. I haven't found examples of extracting data from multiple levels (noes and chidlren) within the XML document.

我的任务是使用 VBScript 从 XML 文件中提取数据.具体项目有:年份、账号、当前到期金额、是否有拖欠?(真/假)和格式化权证编号.

I've been tasked with extracting data from an XML file using VBScript. the specific items are: Year, Account Number, Current Amount Due, Has Delinquent? (true/false) and Formatted Warrant Number.

XML 文件的格式如下,其中有 1,000 到 10,000 多个节点填充了此数据,其中还有大量杂项"节点.

The format of the XML file is as below, with anywhere from 1,000 to 10,000+ nodes filled with this data along with plenty of 'misc' nodes in there as well.

  <BillData>
    <BillHeader>
      <Year>2010</Year>
      <misc></misc>
      <misc2></misc2>
      <misc3></misc3>
      <AcctNumber>0002566129</AcctNumber>
      <misc4></misc4>
      <PayAmounts>
         <CurrentAmountDue>133.06</CurrentAmountDue>
         <misc5></misc5>
      </PayAmounts>
      <misc6></misc6>
      <HasDelinquents>true</HasDelinquents>
      <WarrantInfo>
         <FormattedWarrantNumber>201115447</FormattedWarrantNumber>
      </WarrantInfo>
     </BillHeader>
   </BillData>

CurrentAmountDue 和 FormattedWarrantNumber 可能并不总是存在.我并不是说它们是空白的,而是 CurrentAmountDue 的整个条目可能会丢失,如下所示.

CurrentAmountDue and FormattedWarrantNumber may not always be present. by this I dont mean they are blank, but the entire entry of CurrentAmountDue may be missing, as shown below.

<PayAmounts>
   <misc5></misc5>
</PayAmounts>

我需要将此数据提取到逗号分隔的文本文件中.如果数据不存在,那么我只需要插入命令,所以当输出最终导入到 Excel 时,可以注意到它是空白的.

I need the extract this data to a comma separated text file. If the data is not present then I just need to insert the comman, so when the output is eventually imported to Excel it can be noted to be blank.

我面临的挑战是进入不同的子节点并正确提取数据.我似乎无法正确选择不同的节点.

The challenge for me is to get into the different child nodes and extract the data correctly. I can't seem to select the different nodes correctly.

这些是我用作参考的一些链接,但似乎无法使其正常工作.

These are some links I've used as reference, but can't seem to get it working.

http://technet.microsoft.com/en-us/杂志/2007.02.heyscriptingguy.aspx这似乎是前进的方向,但我收到一个错误此处需要节点测试":

http://technet.microsoft.com/en-us/magazine/2007.02.heyscriptingguy.aspx this seemed to be the direction to go in, but I get an error "Node Test Expected Here":

  Set colNodes=xmlDoc.SelectNodes("/BillData/BillHeader/*" (Year | Account | CurrentAmountDue)")

我在 Stack 上找到了一篇文章,建议使用下面的这种技术,但是一旦我超过两个值,它就对我不起作用,而我有更多.我猜这是因为 CurrentAmountDue 和 FormattedWarrantNumber 可以说是 XML 的更深层次.

I found a post on Stack which suggested using this technique below, but it doesn't work for me once I get past two values, whereas I have more. I'm guessing this is due to the CurrentAmountDue and FormattedWarrantNumber are deeper levels into the XML so to speak.

  strQuery = "/BillData/BillHeader/ " & _
  "[name()='Year' or name()='AccountNumber' or name()='HasDelinquents' or name()='CurrentAmountDue' or name()='FormattedWarrantNumber']"

令我惊讶的是,我能够让它返回一些值,但不是所有值都在同一个循环中,所以我的输出关闭(第一行只显示年份,最后一行丢失)并且只是一个逗号.

To my surprise, I am able to get this to return some values but not all on the same loop so my output is off (first line will only display year, last line is missing) and is just a comma.

   strQuery = "/BillData/BillHeader/*"
   Set colNodes=xmlDoc.selectNodes(strQuery)
   For Each objNode in colNodes 

   ' some lame if then statements that get the values, but this can't be the correct approach!
   ' these three items (Year, Account and HasDelinquents are under each BillHeader as far as I can tell, but this doesn't seem to be the most effective method.
     if objNode.nodeName = "Year" then strYear = objNode.text  
     if objNode.nodeName = "Account" then strAccount = objNode.text 
     if objNode.nodeName = "HasDelinquents" then strHasDelq = objNode.text 

          for each CurrentAmt in objNode.SelectNodes("./CurrentAmountDue")
                strCurrAmt = CurrentAmt.text
                ' i finally got a value here when I use msgbox to view it.'
          next

          for each WarrantNum in objNode.SelectNodes("./FormattedWarrantNumber")
                strWarNum = WarrantNum.text   
                ' getting this value also when I use msgbox to view it.
          next
   next

所以你可以看到我的尝试是徒劳的.

So you can see my attempts are futile.

我也试过在下面插入这一行.我把它放在最后一个 NEXT 之前,但它没有按预期工作.我还尝试在写入文件之前插入一些 IF-Then 语句来检查 Year 和 Account 中的值,然后在写入文件后清除这些值.这几乎奏效了,但我的第一行和最后一行没有产生正确的数据.

I also tried insert this line below. I put it just before the last NEXT, but it didn't work as intended. I also attempted to insert some IF-Then statements to check for values in Year and Account before writing to the file and then clearing out the values after writing to the file. That almost worked, but my first line and last lines are not producing correct data.

     objFileToWrite.WriteLine(strYear & "," & strAccount & "," & strCurrAmt & "," & strHasDelq & "," & strWarNum)

好的,既然您已经对我在史前编写此代码的尝试产生了兴趣,您能帮我一把吗?:)如果需要其他任何东西,请告诉我.感谢您投入的任何时间.我知道你们中的一些人可能可以轻松解决这个问题.

ok now that you've had a giggle with my prehistoric attempt at coding this, can you lend me a hand? :) let me know if anything else is needed. thanks for any time invested. I know some of you can likely kick this out with ease.

推荐答案

针对前半部分问题的低技术设计模式" -创建和写入 .CSV/.TXT 文件 - 是:

The low-tech 'design pattern' for the first half of your problem - creating and writing to a .CSV/.TXT file - is:

Get an FSO
Open traget file for writing
WriteLine Header (optional)
Loop over your data to export
    Create empty Array (elements ~ columns)
    Fill elements (if possible)
    WriteLine Join(Array, Delimiter) to traget file
Close file

在代码中:

  Option Explicit
  Dim oFS     : Set oFS = CreateObject("Scripting.FileSystemObject")
  Dim sFSpec  : sFSpec  = "..\data\step00.csv"
  Dim sDelim  : sDelim  = ";"
  Dim aFields : aFields = Split("Yr ANum Amnt Delq FWNum")
  Dim oTS     : Set oTS = oFS.CreateTextFile(sFSpec)
  Dim nRecs   : nRecs   = 10
  Dim nRec
  oTS.WriteLine Join(aFields, sDelim)
  For nRec = 1 To nRecs
      ReDim aData(UBound(aFields))
      aData(0) = nRec
      If nRec Mod 2 Then aData(1) = "odd"

      oTS.WriteLine Join(aData, sDelim)
  Next
  oTS.Close

  WScript.Echo oFS.OpenTextFile(sFSpec).ReadAll()

输出:

Yr;ANum;Amnt;Delq;FWNum
1;odd;;;
2;;;;
3;odd;;;
4;;;;
5;odd;;;
6;;;;
7;odd;;;
8;;;;
9;odd;;;
10;;;;

请标出两者的区别

oTS.WriteLine Join(aData, sDelim)

objFileToWrite.WriteLine(strYear & "," & strAccount & "," & strCurrAmt & "," & strHasDelq & "," & strWarNum)
(spurious param list (), btw)

第二部分的骨架 - 循环结构化 XML - 应该看起来像这样

A skeleton for the second part - looping over structured XML - should look like this

Get an msxml2.domdocument
Configure
Load .XML file
If error
   deal with it
Else
   use top level XPath to get your top level nodelist
   Loop nodelist
      handle sub-parts
End If

在代码中:

  Option Explicit
  Dim oFS     : Set oFS = CreateObject("Scripting.FileSystemObject")
  Dim sFSpec  : sFSpec  = oFS.GetAbsolutePathName("..\data\step01.xml")
  WScript.Echo oFS.OpenTextFile(sFSpec).ReadAll()

  Dim oXD : Set oXD = CreateObject("msxml2.domdocument")
  oXD.setProperty "SelectionLanguage", "XPath"
  oXD.async = False
  oXD.load sFSpec
  If oXD.parseError.errorCode Then
     WScript.Echo "fail", sFSpec
     WScript.Echo oXD.parseError.reason
  Else
     WScript.Echo "ok", sFSpec
     Dim ndlBills : Set ndlBills = oXD.selectNodes("/Bills/BillData/BillHeader")
     If ndlBills.length Then
        WScript.Echo ndlBills.length, "bill nodes"
        Dim ndBill
        For Each ndBill In ndlBills
            Dim ndSub
            Set ndSub = ndBill.selectSingleNode("Year")
            If ndSub Is Nothing Then
               WScript.Echo "no Year"
            Else
               WScript.Echo "Year", ndSub.text
            End If
            Set ndSub = ndBill.selectSingleNode("PayAmounts/CurrentAmountDue")
            If ndSub Is Nothing Then
               WScript.Echo "no Amount"
            Else
               WScript.Echo "Amount", ndSub.text
            End If
        Next
     End If
  End If

输出:

<?xml version="1.0" encoding="utf-8" ?>
<Bills>
 <BillData>
  <BillHeader>
   <Year>2012</Year>
  </BillHeader>
 </BillData>
 <BillData>
  <BillHeader>
   <PayAmounts>
    <CurrentAmountDue>123.45</CurrentAmountDue>
   </PayAmounts>
  </BillHeader>
 </BillData>
</Bills>

ok E:\trials\SoTrials\answers\19571565\data\Step01.xml
2 bill nodes
Year 2012
no Amount
no Year
Amount 123.45

如果您想将每个 BillHeader 中的数据放入 .CSV 的一行中,并且元素丢失,不要冒险使用//或其他类型的错误映射松散的查询.只需获取所有/Bills/BillData/BillHeader"的列表并向下钻取即可.

As you want to put the data from each BillHeader into one line of the .CSV and elements are missing, don't risk wrong mappings by using // or other kinds of loose queries. Just get a list of all "/Bills/BillData/BillHeader" and drill down.

两个脚本的合并:

  Option Explicit
  Dim oFS     : Set oFS = CreateObject("Scripting.FileSystemObject")
  Dim sXFSpec : sXFSpec = oFS.GetAbsolutePathName("..\data\step02.xml")
  WScript.Echo oFS.OpenTextFile(sXFSpec).ReadAll()
  Dim sCFSpec : sCFSpec = "..\data\step02.csv"
  Dim sDelim  : sDelim  = ","
  Dim aFields : aFields = Split("Yr ANum Amnt Delq FWNum")
  Dim oTS     : Set oTS = oFS.CreateTextFile(sCFSpec)
  oTS.WriteLine Join(aFields, sDelim)

  Dim oXD : Set oXD = CreateObject("msxml2.domdocument")
  oXD.setProperty "SelectionLanguage", "XPath"
  oXD.async = False
  oXD.load sXFSpec
  If oXD.parseError.errorCode Then
     WScript.Echo "fail", sXFSpec
     WScript.Echo oXD.parseError.reason
  Else
     WScript.Echo "ok", sXFSpec
     Dim ndlBills : Set ndlBills = oXD.selectNodes("/Bills/BillData/BillHeader")
     If ndlBills.length Then
        WScript.Echo ndlBills.length, "bill nodes"
        Dim ndBill
        For Each ndBill In ndlBills
            ReDim aData(UBound(aFields))
            Dim ndSub
            Set ndSub = ndBill.selectSingleNode("Year")
            If Not ndSub Is Nothing Then
               aData(0) = ndSub.text
            End If
            Set ndSub = ndBill.selectSingleNode("PayAmounts/CurrentAmountDue")
            If Not ndSub Is Nothing Then
               aData(2) = ndSub.text
            End If
            oTS.WriteLine Join(aData, sDelim)
        Next
     End If
  End If
  oTS.Close

  WScript.Echo oFS.OpenTextFile(sCFSpec).ReadAll()

输出:

<?xml version="1.0" encoding="utf-8" ?>
<Bills>
 <BillData>
  <BillHeader>
   <Year>2012</Year>
  </BillHeader>
 </BillData>

  <BillHeader>
   <Year>0000</Year>
   <PayAmounts>
    <CurrentAmountDue>0.0</CurrentAmountDue>
   </PayAmounts>
   <junk/>
  </BillHeader>

 <BillData>
  <BillHeader>
   <PayAmounts>
    <CurrentAmountDue>123.45</CurrentAmountDue>
   </PayAmounts>
  </BillHeader>
 </BillData>

 <BillData>
  <BillHeader>
   <Year>2013</Year>
   <PayAmounts>
    <CurrentAmountDue>47.11</CurrentAmountDue>
   </PayAmounts>
  </BillHeader>
 </BillData>
</Bills>

ok E:\trials\SoTrials\answers\19571565\data\Step02.xml
3 bill nodes
Yr,ANum,Amnt,Delq,FWNum
2012,,,,
,,123.45,,
2013,,47.11,,

为了解决您的实际问题,您可以编入更多 IF 子句喜欢

To solve your real-world problem you can weave in more IF clauses like

Set ndSub = ndBill.selectSingleNode("XPath")
If Not ndSub Is Nothing Then
   aData(N) = ndSub.text
End If

或者 - 从长远来看可能更好

or - probably better in the long run

定义查询数组(按字段顺序)

Define an array of queries (in field order)

Dim aQueries : aQueries = Array( _年" _, "PayAmounts/CurrentAmountDue" _)

Dim aQueries : aQueries = Array( _ "Year" _ , "PayAmounts/CurrentAmountDue" _ )

将最里面的循环减少到

Dim ndBill
For Each ndBill In ndlBills
    oTS.WriteLine Join(getData(ndBill, aQueries), sDelim)
Next

定义 getData()

Define getData()

Function getData(ndBill, aQueries)
  Dim nUb : nUb = UBound(aQueries)
  ReDim aData(nUb)
  Dim q
  For q = 0 To nUb
      Dim ndSub
      Set ndSub = ndBill.selectSingleNode(aQueries(q))
      If Not ndSub Is Nothing Then
         aData(q) = ndSub.text
      End If
  Next
  getData = aData
End Function

这篇关于使用 VBscript 从节点和子节点的 XML 中提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆