使用VBA提取XML属性 [英] Extracting XML attribute using VBA

查看:66
本文介绍了使用VBA提取XML属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不是开发人员,并且对XML的了解非常有限,但是对于我过去3-4天在网络上学习所学到的知识却是如此.因此,提前为这个问题的基本水平道歉.我正在尝试结束这一一次性任务.

I'm not a developer and have very limited XML knowledge but for what I've learned the past 3-4 days researching on the web. So apologies in advance for the basic level of this question. I'm trying to wrap up this one time task.

我有一些VBA Excel知识,目前我正在尝试使用VBA从SEC备案网站上给定公司的页面中提取SIC代码属性.例如,这是沃尔玛的网站

I have some VBA Excel knowledge and currently I'm trying to use VBA to extract the SIC code attribute from a given company's page on the SEC filing website. As an example, this is the site for Walmart

在顶部的蓝色栏中,您可以看到"SIC:5331",它是5331,我正在尝试返回到VBA变量,以便填充电子表格.当我右键单击IE并单击查看源代码"时,页面的相关部分将以XML读取为:

In the blue bar at the top you can see 'SIC: 5331' it's the 5331 I'm trying to return to a VBA variable so I can populate a spreadsheet. When I right click in IE and clich View Source the part of the page that is relevant reads in XML as:

<div id="contentDiv">
  <!-- START FILER DIV -->
  <div style="margin: 15px 0 10px 0; padding: 3px; overflow: hidden; background-color: #BCD6F8;">
    <div class="mailer">Mailing Address
      <span class="mailerAddress">702 SOUTHWEST 8TH STREET</span>
      <span class="mailerAddress"> BENTONVILLE AR 72716         </span>
    </div>
    <div class="mailer">Business Address
      <span class="mailerAddress">702 SOUTHWEST 8TH ST</span>
      <span class="mailerAddress">BENTONVILLE AR 72716         </span>
      <span class="mailerAddress">5012734000</span>
    </div>
    <div class="companyInfo">
      <span class="companyName">WAL MART STORES INC <acronym title="Central Index Key">CIK</acronym>#: <a href="/cgi-bin/browse-edgar?action=getcompany&amp;CIK=0000104169&amp;owner=exclude&amp;count=40">0000104169 (see all company filings)</a></span>
      <p class="identInfo"><acronym title="Standard Industrial Code">SIC</acronym>: <a href="/cgi-bin/browse-edgar?action=getcompany&amp;SIC=5331&amp;owner=exclude&amp;count=40">5331</a> - RETAIL-VARIETY STORES<br />State location: <a href="/cgi-bin/browse-edgar?action=getcompany&amp;State=AR&amp;owner=exclude&amp;count=40">AR</a> | State of Inc.: <strong>DE</strong> | Fiscal Year End: 0131<br />(Assistant Director Office: 2)<br />Get <a href="/cgi-bin/own-disp?action=getissuer&amp;CIK=0000104169"><b>insider transactions</b></a> for this <b> issuer</b>.
        <br />Get <a href="/cgi-bin/own-disp?action=getowner&amp;CIK=0000104169"><b>insider transactions</b></a> for this <b>reporting owner</b>.
      </p>
    </div>
  </div>
</div>

在试图了解如何使用VBA提取SIC时,我在您的网站上找到了以下帖子:

In trying to understand how VBA might be used to extract the SIC, I found the following post on your site:

查询并使用VBA将xml属性值解析为XLS

我试图通过复制/粘贴将barrowc的答案应用于Excel模块,并插入了沃尔玛文件的路径,但是当我逐步执行时,得到了Debug.Print"*****",但我没有得到n.Text的任何内容.

I tried to apply barrowc's answer by copy/paste into an Excel module and inserted the path to the Wal Mart filings however when I step through I get the Debug.Print "*****" but I'm not getting anything for n.Text.

Sub test4()
    Dim d As MSXML2.DOMDocument60
    Dim i As IXMLDOMNodeList
    Dim n As IXMLDOMNode

    Set d = New MSXML2.DOMDocument60
    d.async = False
    d.Load ("http://www.sec.gov/cgi-bin/browse-edgar?company=&match=&CIK=886475&filenum=&State=&Country=&SIC=&owner=exclude&Find=Find+Companies&action=getcompany")

    Debug.Print "*****"
    Set i = d.SelectNodes("//div[@id='contentDiv']")
    For Each n In i
        Debug.Print n.Text
    Next n
    Debug.Print "*****"

    Set d = Nothing
End Sub

我已经尝试过在 d.SelectNodes()中使用各种字符串,但是我对该主题的了解不足,无法理解我要去哪里.因此,对我的语法进行注释或指向资源的指针都将非常有帮助.

I've tried various strings in d.SelectNodes(), but I don't know enough about this topic to understand where I'm going wrong. So either a comment on my syntax or a pointer to a resource would be enormously helpful.

推荐答案

如果您仅对SIC感兴趣,那么不值得花时间尝试解析整个DOM结构.相反,请确定一组唯一的字符,进行搜索,然后从那里提取SIC.

If you are interested in just the SIC, it is not worth your time to try to parse the entire DOM structure. Instead, identify a unique set of characters, search for that and then extract the SIC from there.

以下功能可以做到这一点.您只需要向其传递页面的完整HTML源代码,它将返回SIC:

The following function does just that. You just need to pass it the full HTML source of the page and it will return the SIC:

Function ExtractSIC(SourceHtml As String) As String
    Const PrefixChars As String = "&amp;SIC="
    Const SuffixChars As String = "&"
    Dim StartPos As Long, EndPos As Long
    StartPos = InStr(SourceHtml, PrefixChars)
    If StartPos = 0 Then Exit Function

    StartPos = StartPos + Len(PrefixChars)
    EndPos = InStr(StartPos, SourceHtml, SuffixChars) - 1
    ExtractSIC = Mid(SourceHtml, StartPos, EndPos - StartPos + 1)
End Function

这篇关于使用VBA提取XML属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆