VBA从多个Web位置拉取XML数据 [英] VBA pull XML data from multiple Web locations

查看:791
本文介绍了VBA从多个Web位置拉取XML数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的



ANN LINK



下面我们有两个网站位于BDX公司的Instance Document位置,ANN公司有2个网站。





我们如何从存在的XML元素所有四个实例文件,例如 us-gaap:CommonStockValue ,只需给予VBA


  1. 股票代码

  2. 文件类型(10-K,10-Q)

可以使用 Microsoft XML Core Services(MSXML)或者我们还需要一些其他的库?



您可以看到启动这个代码的次数是多么不切实际,每次将网址从Web浏览器复制到strXMLSite作为字符串值....

解决方案

[edit1]



回应评论: / p>


我们唯一仍然需要了解URL的实际变化,以便他们可以通过强化合并来预测和操纵?在URL中写什么代码语言?


简短的答案是打开浏览器,右键单击网页上的空白处您有兴趣,并从弹出菜单中选择查看源



重复其他提供的示例发布 VBA href Crawl浏览器的源代码,请执行以下操作:



打开Edgar在线公司在浏览器中搜索: https:// www.sec.gov/edgar/searchedgar/companysearch.html



使用快速搜索功能搜索股票CRR,并给我这个URL: a href =https://www.sec.gov/cgi-bin/browse-edgar?CIK=CRR&Find=Search&owner=exclude&action=getcompany =nofollow noreferrer> https:// www .sec.gov / cgi-bin目录/浏览-CIK埃德加= CRR和放大器;查找=搜索和;所有者=排除和放大器;行动= GETC ompany ,其中包含Carbo Ceramics,Inc。的公开文件列表。



现在,右键单击页面以获取源并向下滚动到第91行你会看到这个代码块:

 < table class =tableFile2summary =Results> 

这是结果表的开头,显示公共文件列表。

 < tr> 
< th width =7%scope =col> Filings< / th>
< th width =10%scope =col>格式< / th>
< th scope =col>描述< / th>
< th width =10%scope =col>归档日期< / th>
< th width =15%scope =col>文件/电影号码< / th>
< / tr>

这是具有列描述的表的标题行。

 < tr> 
< td nowrap =nowrap> SC 13G< / td>
< td nowrap =nowrap>< a href =/ Archives / edgar / data / 1009672/000108975514000003 / 0001089755-14-000003-index.htmid =documentsbutton>& NBSP;文件及LT; / A>< / TD>
< td class =small>个人获得受益所有权的声明< br /> Acc-no:0001089755-14-000003& nbsp;(34 Act)& nbsp;大小:8 KB / td
< td> 2014-02-14< / td>
< td nowrap =nowrap>< a href =/ cgi-bin / browse-edgar?action = getcompany& amp; filenum = 005-48851& amp; amp; = 40> 005-48851< / a>< br> 14615563< / td>
< / tr>

这是表中提交的实际数据的第一行 SC 13G 个人获得实益拥有权声明Acc-no:0001089755-14-000003(34 Act)大小:8 KB ,提交 2014-02-14



所以,现在你想循环访问这个页面上的所有文档URL,这就是为什么你问什么语言的URL? (抓取页面,换句话说?)



[开始原始答案]


我们如何在Excel中实际输入我们的坐标,例如在我们的一个用户窗体/单元格中,然后使用这些坐标进行VBA导航/爬行,就像我们用浏览器导航一样? / p>

我在研究另一个问题时,Google Googled获取google结果为xml。有一个有趣的回报是这个链接: http://nielsbosma.se/projects/seotools/function /



我不表示这个工具的优点,但它似乎具有您要求的功能。


现在在上一个过程中,我们可以看到我们只需要使用语句
strXMLSite = http://www.sec.gov/Archives/edgar/data/10795/000119312513456802/bdx-20130930。 xml
...但是当我们事先知道我们想要从网络中的一个指定位置获取数据



$ b $是的,所以一旦你得到一些网页抓取功能来返回一个xml文档链接的列表,你首先需要把它们放在用户可以看到的地方。我的偏好是工作表上的范围,但您也可以加载表单中的列表或组合框。无论如何,您将修改 Sub GetNode()以接受基于用户选择的输入参数:

  Sub GetNode(strUrl as String)
...
strXMLSite = strUrl
...
工作表(Sheet1)。范围(A1 ).Value = objXMLNodeDIIRSP.Text
End Sub

或者更好地让它成为一个功能它返回xml作为您要消费的文本,但您想要:

 函数GetNode(strUrl as String)as String 
...
strXMLSite = strUrl
...
'返回结果
GetNode = objXMLNodeDIIRSP.Text
结束函数

整体有趣的问题,我很乐意向您发送有关您发布的代码的反馈。您可以通过进行谷歌搜索来解答您的其他问题。


In my previous question (everything needed is on this question; the link is here for the sake of completeness and measure) i ask for a way to pull XML data to Excel from a Web location. The code i received (courtesy of user2140261) as an answer lies here:

Sub GetNode()
Dim strXMLSite As String
Dim objXMLHTTP As MSXML2.XMLHTTP
Dim objXMLDoc As MSXML2.DOMDocument
Dim objXMLNodexbrl As MSXML2.IXMLDOMNode
Dim objXMLNodeDIIRSP As MSXML2.IXMLDOMNode

Set objXMLHTTP = New MSXML2.XMLHTTP
Set objXMLDoc = New MSXML2.DOMDocument

strXMLSite = "http://www.sec.gov/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml"

objXMLHTTP.Open "POST", strXMLSite, False
objXMLHTTP.send
objXMLDoc.LoadXML (objXMLHTTP.responseText)

Set objXMLNodexbrl = objXMLDoc.SelectSingleNode("xbrl")

Set objXMLNodeDIIRSP = objXMLNodexbrl.SelectSingleNode("us-gaap:DebtInstrumentInterestRateStatedPercentage")

Worksheets("Sheet1").Range("A1").Value = objXMLNodeDIIRSP.Text
End Sub  

But every company has a different XML Instance Document, and every time period a company publishes a different XML Instance Document (e.g. quarterly, annually). So these documents can be accessed in different web locations.

Now in the previous procedure we can see we only need to use the statement

strXMLSite = "http://www.sec.gov/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml"

...but this is when we know beforehand that we want data from one specified location in the Web

What if we want to pull some data for these 4 different locations depicted by an asterisk(*) in the image below

How could we actually input our "coordinates" in Excel let's say in one of our userforms/cells for example and then make VBA "navigate/crawl" there just by using these coordinates just as we are navigating there with a browser?

The coordinates that we input can be:

  • A Stock Ticker (e.g. TSLA for Tesla Motors)
  • A type of files for example 10-Q's

You can pick the type of files in these links for BDX and ANN respectively:

BDX LINK

ANN LINK

Below we have 2 web locations for the Instance Document locations of BDX company and 2 for ANN company

How could we pull from an XML element that is existent in all the four instance documents for example us-gaap:CommonStockValue by simply giving VBA the

  1. Stock Ticker
  2. The document type (10-K, 10-Q)

Can it be done with the use of Microsoft XML Core Services (MSXML) or we require some other Library too?

You can see how impractical it is to fire this code thousand of times and every time copy the URL from the Web Browser to the strXMLSite as a String value....

解决方案

[edit1]

In response to the comment:

the only thing that remains for the us is to understand how URL's actually change so they can be predictable and manipulated by sting concantenation? In what code language is the URL written?

The short answer is open a browser and right-click on a blank spot in the webpage you're interested in and select View Source from the popup menu.

To repeat the example provided in the other post VBA href Crawl on Browser's Source Code , do this:

Open Edgar Online Company Search in a browser: https://www.sec.gov/edgar/searchedgar/companysearch.html

Use the Fast Search function to search for ticker CRR and it gives me this URL: https://www.sec.gov/cgi-bin/browse-edgar?CIK=CRR&Find=Search&owner=exclude&action=getcompany which contains the list of public filings for Carbo Ceramics, Inc.

Now, right click on the page to get the source and scroll down to line 91. You'll see this block of code:

      <table class="tableFile2" summary="Results">

That's the beginning of the results table that shows the list of public filings.

         <tr>
            <th width="7%" scope="col">Filings</th>
            <th width="10%" scope="col">Format</th>
            <th scope="col">Description</th>
            <th width="10%" scope="col">Filing Date</th>
            <th width="15%" scope="col">File/Film Number</th>
         </tr>

That's the header row of the table with column descriptions.

<tr>
<td nowrap="nowrap">SC 13G</td>
<td nowrap="nowrap"><a href="/Archives/edgar/data/1009672/000108975514000003/0001089755-14-000003-index.htm" id="documentsbutton">&nbsp;Documents</a></td>
<td class="small" >Statement of acquisition of beneficial ownership by individuals<br />Acc-no: 0001089755-14-000003&nbsp;(34 Act)&nbsp; Size: 8 KB            </td>
            <td>2014-02-14</td>
            <td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&amp;filenum=005-48851&amp;owner=exclude&amp;count=40">005-48851</a><br>14615563         </td>
         </tr>

And that's the first row of actual data in the table for filing SC 13G, Statement of acquisition of beneficial ownership by individuals Acc-no: 0001089755-14-000003 (34 Act) Size: 8 KB, submitted on 2014-02-14.

So, now you want to loop through all of the document URLs on this page and that's why you're asking what language the URLs are in? (Crawl the page, in other words?)

[begin original answer]

How could we actually input our "coordinates" in Excel let's say in one of our userforms/cells for example and then make VBA "navigate/crawl" there just by using these coordinates just as we are navigating there with a browser?

I googled "get google results as xml" while researching another question. One interesting hit that came back was this link: http://nielsbosma.se/projects/seotools/functions/

I make no representation about the merits of this tool, but it seems to have the functionality you're asking for.

Now in the previous procedure we can see we only need to use the statement strXMLSite = "http://www.sec.gov/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml" ...but this is when we know beforehand that we want data from one specified location in the Web

Yes, so once you've gotten some sort of web crawling function to return a list of xml document links, you first need to put them somewhere the user can see. My preference would be a range on a worksheet, but you could load up a list or combo box in a form as well. Regardless, then you would modify Sub GetNode() to accept an input parameter based on user selection:

Sub GetNode(strUrl as String)
...
strXMLSite = strUrl
...
Worksheets("Sheet1").Range("A1").Value = objXMLNodeDIIRSP.Text
End Sub  

Or perhaps better make it a function which returns the xml as text for you to consume however you'd like:

Function GetNode(strUrl as String) as String
...
strXMLSite = strUrl
...
'return result
GetNode = objXMLNodeDIIRSP.Text
End Function  

Interesting question overall and I was happy to give you feedback on the code you posted. Your other questions can probably be answered by doing a bit of google searching.

这篇关于VBA从多个Web位置拉取XML数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆