使用vba从网站抓取数据 [英] Scraping data from website using vba

查看:82
本文介绍了使用vba从网站抓取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从网站上抓取数据:http://uk.investing.com/利率债券/金融期货通过 vba,如实时价格,即德国 5 年波布尔,美国 30 年 T 债券,我尝试过 excel 网络查询,但它只会抓取整个网站,但我想只刮费率,有没有办法做到这一点?

Im trying to scrape data from website: http://uk.investing.com/rates-bonds/financial-futures via vba, like real-time price, i.e. German 5 YR Bobl, US 30Y T-Bond, i have tried excel web query but it only scrapes the whole website, but I would like to scrape the rate only, is there a way of doing this?

推荐答案

有几种方法可以做到这一点.这是我写的一个答案,希望在浏览关键字从网站上抓取数据"时可以找到 Internet Explorer 自动化的所有基础知识,但请记住,没有任何东西值得您自己研究(如果您不想坚持您无法自定义的预先编写的代码).

There are several ways of doing this. This is an answer that I write hoping that all the basics of Internet Explorer automation will be found when browsing for the keywords "scraping data from website", but remember that nothing's worth as your own research (if you don't want to stick to pre-written codes that you're not able to customize).

请注意,这是一种方式,我不喜欢在性能方面(因为它取决于浏览器速度),但这有助于理解互联网自动化背后的基本原理.

Please note that this is one way, that I don't prefer in terms of performance (since it depends on the browser speed) but that is good to understand the rationale behind Internet automation.

1) 如果我需要浏览网页,我需要一个浏览器!所以我创建了一个 Internet Explorer 浏览器:

1) If I need to browse the web, I need a browser! So I create an Internet Explorer browser:

Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")

2) 我要求浏览器浏览目标网页.通过使用.Visible"属性,我决定是否要查看浏览器是否完成其工作.在构建代码时,Visible = True 很好,但是当代码用于抓取数据时,最好不要每次都看到它,所以 Visible = False.

2) I ask the browser to browse the target webpage. Through the use of the property ".Visible", I decide if I want to see the browser doing its job or not. When building the code is nice to have Visible = True, but when the code is working for scraping data is nice not to see it everytime so Visible = False.

With appIE
    .Navigate "http://uk.investing.com/rates-bonds/financial-futures"
    .Visible = True
End With

3) 网页需要一些时间来加载.所以,我会在忙的时候等待......

3) The webpage will need some time to load. So, I will wait meanwhile it's busy...

Do While appIE.Busy
    DoEvents
Loop

4) 好了,现在页面已加载.假设我想刮掉US30Y T-Bond的变化:我要做的只是在 Internet Explorer 上单击 F12 以查看网页代码,因此使用指针(红色圆圈中)我将单击要抓取的元素以查看如何达到我的目的.

4) Well, now the page is loaded. Let's say that I want to scrape the change of the US30Y T-Bond: What I will do is just clicking F12 on Internet Explorer to see the webpage's code, and hence using the pointer (in red circle) I will click on the element that I want to scrape to see how can I reach my purpose.

5) 我应该做的是直截了当的.首先,我将通过 ID 属性获取包含值的 tr 元素:

5) What I should do is straight-forward. First of all, I will get by the ID property the tr element which is containing the value:

Set allRowOfData = appIE.document.getElementById("pair_8907")

这里我会得到一个 td 元素的集合(具体来说,tr 是一行数据,td 是它的单元格.我们正在寻找第 8 个,所以我会写:

Here I will get a collection of td elements (specifically, tr is a row of data, and the td are its cells. We are looking for the 8th, so I will write:

Dim myValue As String: myValue = allRowOfData.Cells(7).innerHTML

为什么我写的是 7 而不是 8?因为单元格的集合从0开始,所以第8个元素的索引是7(8-1).简单分析这行代码:

Why did I write 7 instead of 8? Because the collections of cells starts from 0, so the index of the 8th element is 7 (8-1). Shortly analysing this line of code:

  • .Cells() 让我访问 td 元素;
  • innerHTML 是包含我们要查找的值的单元格的属性.
  • .Cells() makes me access the td elements;
  • innerHTML is the property of the cell containing the value we look for.

一旦我们有了我们的值,它现在存储在 myValue 变量中,我们可以关闭 IE 浏览器并通过将其设置为 Nothing 来释放内存:

Once we have our value, which is now stored into the myValue variable, we can just close the IE browser and releasing the memory by setting it to Nothing:

appIE.Quit
Set appIE = Nothing

好吧,现在你有了你的值,你可以用它做任何你想做的事:把它放到一个单元格 (Range("A1").Value = myValue) 中,或者放到一个一个表单(Me.label1.Text = myValue).

Well, now you have your value and you can do whatever you want with it: put it into a cell (Range("A1").Value = myValue), or into a label of a form (Me.label1.Text = myValue).

我只想指出,这不是 StackOverflow 的工作方式:在这里您可以发布有关特定编码问题的问题,但您应该先进行自己的搜索.我回答一个没有表现出太多研究努力的问题的原因只是我看到它被问了好几次,回到我学会如何做到这一点的时候,我记得我本来希望有一些更好的支持开始.所以我希望这个答案,这只是一个研究输入",而不是最好/最完整的解决方案,可以为下一个遇到同样问题的用户提供支持.因为多亏了这个社区,我学会了如何编程,而且我想您和其他初学者可能会利用我的意见发现编程的美丽世界.

I'd just like to point you out that this is not how StackOverflow works: here you post questions about specific coding problems, but you should make your own search first. The reason why I'm answering a question which is not showing too much research effort is just that I see it asked several times and, back to the time when I learned how to do this, I remember that I would have liked having some better support to get started with. So I hope that this answer, which is just a "study input" and not at all the best/most complete solution, can be a support for next user having your same problem. Because I have learned how to program thanks to this community, and I like to think that you and other beginners might use my input to discover the beautiful world of programming.

享受你的练习;)

这篇关于使用vba从网站抓取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆