从网站抓取表格 [英] Scrape table from website

查看:39
本文介绍了从网站抓取表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码可以导航到网站,输入两个名称(例如,此处使用的真实名称将从电子表格中提取10个名称的列表),然后搜索其记录.我正在尝试将生成的结果表提取到电子表格中.我已经尝试了几种方法,但似乎无法使其正常工作.在注释"Scrape Table Here"下查找代码.我知道这涉及到访问网站的HTML,我也可以这样做,但是我对HTML不够熟悉,无法独自解决这一问题.奖励问题:我还想将每个人的ID#添加到电子表格中.在HTML中,它在"MP_Details?"之后列出.例如,对于罗伯特·琼斯",我要抓住的是"36481".基本上,屏幕快照中所有以红色突出显示的内容,我都想从表中拉出并在电子表格中吐出:

  Sub Input_And_Return()'创建Internet Explorer的新实例将Dim ieApp作为对象:设置ieApp = New InternetExplorer昏暗的ieDoc作为对象昏暗的html作为HTMLDocumentieApp.Visible =真ieApp.navigate"https://hdmaster.net/MP/MP_Public"同时做ieApp.Busy:DoEvents:循环直到ieApp.readyState = READYSTATE_COMPLETE:DoEvents:循环执行设置ieDoc = ieApp.document设置html = ieApp.document'在搜索框中输入名称,然后单击搜索使用ieDoc.forms(0).SearchFor.Value =安德森,凯利"&Chr(10)&琼斯·罗伯特".提交结束于'在这里刮桌子'关闭IE并重置状态栏设置ieApp = NothingApplication.StatusBar ="结束子 


一些丑陋的代码来获取短ID

 选项显式公共子Input_And_Return()昏暗的ieApp作为对象,即ieDoc作为对象设置ieApp = New InternetExplorer使用ieApp.Visible = True.navigate"https://hdmaster.net/MP/MP_Public"而.Busy或.readyState<4:DoEvents:Wend使用.document.forms(0).SearchFor.Value =安德森,凯利"&Chr $(10)&琼斯·罗伯特".提交Dim r Long,c Long,tr作为对象,td作为对象,hTable作为对象,aNodeList作为对象设置hTable = .getElementsByClassName("newTable")(0)设置aNodeList = .getElementsByClassName("newTable")(0).querySelectorAll("[align = center] [onclick * ='javascript:rowClick']")Dim idDict作为对象,i一样长,tempVal一样长设置idDict = CreateObject("Scripting.Dictionary")对于i = 0到aNodeList.Length-1tempVal = Split(Split(aNodeList.Item(i).onclick,"id =")(1),Chr $(39))(0)如果不是idDict.exists(tempVal),则idDict.Add tempVal,vbNullString接下来我与hTable对于.getElementsByTagName("tr")中的每个trr = r + 1:c = 1对于tr.getElementsByTagName("td")中的每个td单元格(r,c).值= td.innerTextc = c + 1下一个td下一个tr如果idDict.Count = r-1则Cells(2,c).Resize(idDict.Count,1)= Application.WorksheetFunction.Transpose(idDict.keys)结束于结束于.放弃结束于结束子 

I have the following code which navigates to a website, enters in two names (used here for example, the real names will pull a list of 10 names from a spreadsheet), then searches for their records. I'm trying to pull the resulting table that is generated into a spreadsheet. I've tried it a few ways but can't seem to get it to work. Looking for code to go under the comment "Scrape Table Here". I know this involves accessing the site's HTML which I can also do but I'm not familiar enough with HTML to figure this one out on my own. Bonus question: I'd like to also add each person's ID# to the spreadsheet. In the HTML, it's listed after "MP_Details?". For example, for "Robert Jones" it's "36481" that I'm looking to grab. Basically everything highlighted in red in the screenshot, I'd like to pull from the table and spit out on a spreadsheet:

Sub Input_And_Return()

'Create new instance of Internet Explorer
Dim ieApp As Object: Set ieApp = New InternetExplorer
Dim ieDoc As Object
Dim html As HTMLDocument

ieApp.Visible = True
ieApp.navigate "https://hdmaster.net/MP/MP_Public"
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.readyState = READYSTATE_COMPLETE: DoEvents: Loop

Set ieDoc = ieApp.document
Set html = ieApp.document

'Enter names into search box and click search
With ieDoc.forms(0)
    .SearchFor.Value = "Anderson, Kelly" & Chr(10) & "Jones, Robert"
    .submit
End With

'Scrape Table Here

'Close down IE and reset status bar
Set ieApp = Nothing
Application.StatusBar = ""

End Sub

HTML Screenshot

解决方案

You could copy the table outerHTML to the clipboard and paste that to Excel. It is nice, easy and quick.

Option Explicit
Public Sub GetInfo()
    Dim IE As New InternetExplorer
    Dim nameList As String
    nameList = "Anderson, Kelly" & Chr$(10) & "Jones, Robert"
    With IE
        .Visible = True
        .navigate "https://hdmaster.net/MP/MP_Public"

        While .Busy Or .readyState < 4: DoEvents: Wend

        With .document
            .querySelector("[name=SearchFor]").Value = nameList
            .querySelector("#search").Click
        End With

        While .Busy Or .readyState < 4: DoEvents: Wend

        Dim clipboard As Object
        Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
        clipboard.SetText .document.querySelector(".newTable").outerHTML
        clipboard.PutInClipboard
        ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial
        .Quit
    End With
End Sub


References (VBE > Tools > References):

  1. Microsoft HTML Object Library
  2. Microsoft Internet Controls


Your code version of the above:

Public Sub Input_And_Return()
    Dim ieApp As Object: Set ieApp = New InternetExplorer
    Dim ieDoc As Object

    With ieApp
        .Visible = True
        .navigate "https://hdmaster.net/MP/MP_Public"
        While .Busy Or .readyState < 4: DoEvents: Wend

        With .document.forms(0)
            .SearchFor.Value = "Anderson, Kelly" & Chr$(10) & "Jones, Robert"
            .submit
            Dim clipboard As Object
            Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
            clipboard.SetText .getElementsByClassName("newTable")(0).outerHTML
            clipboard.PutInClipboard
        End With
        ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial
        .Quit
    End With
End Sub

Or by looping rows and columns of the table:

Public Sub Input_And_Return()
    Dim ieApp As Object, ieDoc As Object
    Set ieApp = New InternetExplorer
    With ieApp
        .Visible = True
        .navigate "https://hdmaster.net/MP/MP_Public"
        While .Busy Or .readyState < 4: DoEvents: Wend

        With .document.forms(0)
            .SearchFor.Value = "Anderson, Kelly" & Chr$(10) & "Jones, Robert"
            .submit
            Dim r As Long, c As Long, tr As Object, td As Object
            With .getElementsByClassName("newTable")(0)
                For Each tr In .getElementsByTagName("tr")
                    r = r + 1: c = 1
                    For Each td In tr.getElementsByTagName("td")
                        Cells(r, c).Value = td.innerText
                        c = c + 1
                    Next td
                Next tr
            End With
        End With
        .Quit
    End With
End Sub


Output:


EDIT:

Some ugly code to get the short ids

Option Explicit
Public Sub Input_And_Return()
    Dim ieApp As Object, ieDoc As Object
    Set ieApp = New InternetExplorer
    With ieApp
        .Visible = True
        .navigate "https://hdmaster.net/MP/MP_Public"
        While .Busy Or .readyState < 4: DoEvents: Wend

        With .document.forms(0)
            .SearchFor.Value = "Anderson, Kelly" & Chr$(10) & "Jones, Robert"
            .submit
            Dim r As Long, c As Long, tr As Object, td As Object, hTable As Object, aNodeList As Object
            Set hTable = .getElementsByClassName("newTable")(0)
            Set aNodeList = .getElementsByClassName("newTable")(0).querySelectorAll("[align=center][onclick*='javascript:rowClick']")

            Dim idDict As Object, i As Long, tempVal As Long
            Set idDict = CreateObject("Scripting.Dictionary")

            For i = 0 To aNodeList.Length - 1
                tempVal = Split(Split(aNodeList.Item(i).onclick, "id=")(1), Chr$(39))(0)
                If Not idDict.exists(tempVal) Then idDict.Add tempVal, vbNullString
            Next i

            With hTable
                For Each tr In .getElementsByTagName("tr")
                    r = r + 1: c = 1
                    For Each td In tr.getElementsByTagName("td")
                        Cells(r, c).Value = td.innerText
                        c = c + 1
                    Next td
                Next tr
               If idDict.Count = r - 1 Then Cells(2, c).Resize(idDict.Count, 1) = Application.WorksheetFunction.Transpose(idDict.keys)
            End With
        End With
        .Quit
    End With
End Sub

这篇关于从网站抓取表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆