使用Excel VBA从网站刮取文字？ [英] Scrape text from website using Excel VBA?

查看：154 发布时间：2017/9/24 16:14:53 excel excel-vba web-scraping

本文介绍了使用Excel VBA从网站刮取文字？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我非常喜欢使用Excel作为网页抓取工具，但我发现这是非常有趣的文章解释如何使用Excel VBA从网站上刮取某些标签。我有下面的代码工作正常，但它只从第一个 标签中找到的内容：

  Sub get_title_header（）
 Dim wb As Object 
 Dim doc As Object 
 Dim sURL As String 
 Dim lastrow As Long 
 lastrow = Sheet1.Cells（Rows.Count，A）。End（xlUp）.Row 
 
 For i = 2 To lastrow 
设置wb = CreateObject（internetExplorer。应用程序）
 sURL =单元格（i，1）
 
 wb.navigate sURL 
 wb.Visible = True 
 
虽然wb.Busy 
 DoEvents 
 Wend 
 
'HTML文档
设置doc = wb.document 
 
单元格（i，2）= doc.title 
 
错误GoTo err_clear 
单元格（i，3）= doc.GetElementsByTagName（p）（0）.innerText 
 err_clear：
如果Err& 0然后
 Err.Clear 
恢复下一个
结束如果
 wb.Quit 
范围（单元格（i，1），单元格（i，3））列.AutoFit 
 Next i 
 
 End Sub

'd喜欢做的是调整代码，并使刮刀获取之内的所有内容。标签。所以我猜想一个 foreach 某些功能缺失。希望有人在这里愿意帮助我扩展代码，以便多个 标签将被收集。

更新
下面的工作代码！

  Sub get_title_header（）
 Dim wb As Object 
 Dim doc As Object 
 Dim sURL As String 
 Dim lastrow As Long 
 Dim i As Integer 
 lastrow = Sheet1.Cells（Rows.Count，A）。End（xlUp）.Row 
 
 for i = 2 To lastrow 
设置wb = CreateObject（internetExplorer.Application）
 sURL =单元格（i，1）
 
 wb.navigate sURL 
 wb.Visible = True 
 
虽然wb.Busy 
 DoEvents 
 Wend 
 
'HTML文档
设置doc = wb.document 
 
单元格（i，2）= doc.Title 
 
错误GoTo err_clear 
 
 Dim el As Object 
对于每个el In doc。 GetElementsByTagName（p）
 
 counter = counter + 1 
单元格（i，counter + 2）.Value = Cells（counter + 1）.Value& el.innerText 
 
下一个el 
 counter = 0 
 
 err_clear：
如果Err& 0然后
 Err.Clear 
恢复下一个
结束如果
 wb.Quit 
范围（单元格（i，1），单元格（i，10））。列.AutoFit 
 Next i 
 
 End Sub

解决方案

你几乎在那里！ doc.GetElementsByTagName（p）返回 HTMLParagraphElement 的对象，您使用 doc.GetElementsByTagName（ p）（0）。正如你所暗示的， For Each loop将让您依次访问每个：

  Sub get_title_header（）
 Dim wb As Object 
 Dim doc As Object 
 Dim sURL As String 
 Dim lastrow As Long 
 Dim i As Integer 
 lastrow = Sheet1.Cells（Rows.Count，A）。End（xlUp）.Row 
 
 For i = 2 To lastrow 
设置wb = CreateObject（internetExplorer.Application ）
 sURL =单元格（i，1）
 
 wb.navigate sURL 
 wb.Visible = True 
 
虽然wb.Busy 
 DoEvents 
 Wend 
 
'HTML文档
设置doc = wb.document 
 
单元格（i，2）= doc.Title 
 
错误GoTo err_clear 
 
 Dim el As Object 
对于每个el在doc.GetElementsByTagName（p）
单元格（i，3）.Value =单元格（i，3）.Value& ，& el.innerText 
下一个el 
 
 err_clear：
如果Err<> 0然后
 Err.Clear 
恢复下一个
结束如果
 wb.Quit 
范围（单元格（i，1），单元格（i，3））列.AutoFit 
 Next i 
 
 End Sub

I am quite new to using Excel as a webpage scraper, but I found this very interesting article explaining how to scrape certain tags from a website using Excel VBA. I have the code below which works fine but it only gets the content from the first  tag that it finds:

Sub get_title_header()
Dim wb As Object
Dim doc As Object
Dim sURL As String
Dim lastrow As Long
lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row

For i = 2 To lastrow
Set wb = CreateObject("internetExplorer.Application")
sURL = Cells(i, 1)

wb.navigate sURL
wb.Visible = True

While wb.Busy
    DoEvents
Wend

'HTML document
Set doc = wb.document

Cells(i, 2) = doc.title

On Error GoTo err_clear
Cells(i, 3) = doc.GetElementsByTagName("p")(0).innerText
err_clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
wb.Quit
Range(Cells(i, 1), Cells(i, 3)).Columns.AutoFit
Next i

End Sub

Now what I'd like to do is adjust the code and make the scraper get all the content that is within a  tag on a webpage. So I guess a foreach functionality of some kind is missing.

Hopefully someone here is willing to help me out to extend the code, so that the content from multiple  tags will be collected.

UPDATE Below the working code!

Sub get_title_header()
Dim wb As Object
Dim doc As Object
Dim sURL As String
Dim lastrow As Long
Dim i As Integer
lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row

For i = 2 To lastrow
Set wb = CreateObject("internetExplorer.Application")
sURL = Cells(i, 1)

wb.navigate sURL
wb.Visible = True

While wb.Busy
    DoEvents
Wend

'HTML document
Set doc = wb.document

Cells(i, 2) = doc.Title

On Error GoTo err_clear

Dim el As Object
For Each el In doc.GetElementsByTagName("p")

counter = counter + 1
    Cells(i, counter + 2).Value = Cells(counter + 1).Value & el.innerText

Next el
counter = 0

err_clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
wb.Quit
Range(Cells(i, 1), Cells(i, 10)).Columns.AutoFit
Next i

End Sub

解决方案

You're almost there! doc.GetElementsByTagName("p") returns a collection of HTMLParagraphElement objects of which you accessed the first entry using doc.GetElementsByTagName("p")(0). As you allude to, a For Each loop would let you access each in turn:

Sub get_title_header()
Dim wb As Object
Dim doc As Object
Dim sURL As String
Dim lastrow As Long
Dim i As Integer
lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row

For i = 2 To lastrow
Set wb = CreateObject("internetExplorer.Application")
sURL = Cells(i, 1)

wb.navigate sURL
wb.Visible = True

While wb.Busy
    DoEvents
Wend

'HTML document
Set doc = wb.document

Cells(i, 2) = doc.Title

On Error GoTo err_clear

Dim el As Object
For Each el In doc.GetElementsByTagName("p")
    Cells(i, 3).Value = Cells(i, 3).Value & ", " & el.innerText
Next el

err_clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
wb.Quit
Range(Cells(i, 1), Cells(i, 3)).Columns.AutoFit
Next i

End Sub

这篇关于使用Excel VBA从网站刮取文字？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Excel VBA从网站刮取文字？ [英] Scrape text from website using Excel VBA?

问题描述

相关文章

Office最新文章

热门教程

热门工具

登录关闭

使用Excel VBA从网站刮取文字？ [英] Scrape text from website using Excel VBA?

问题描述

相关文章

Office最新文章

热门教程

热门工具

登录 关闭

登录关闭