从网上抓取Excel [英] Excel from web scraping

查看:36
本文介绍了从网上抓取Excel的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从此所有表中提取所有6个表网站到我的工作簿中.(vs全部,vs PG,vs SG,vs SF,vs PF,vs C)当我尝试使用excel中的from web选项并选择表时,它只是拉入标题.这是为什么?目前,我有一个立即粘贴"按钮,然后转到网站,将其复制并单击使用宏创建的粘贴"按钮,以清除当前信息并粘贴新值.我想消除我不得不手动转到网站并复制表格的麻烦.除了来自网络"之外,还有另一种方法吗?

I want to pull all 6 of these tables from this website into my workbook. (vs All,vs PG,vs SG,vs SF,vs PF,vs C) When I try using the from web option in excel and selecting the table it just pulls in the headers. Why is that? Currently I have a paste now button and I goto the website, copy it and click the "Paste" button I created with a macro to clear current info and paste new values. I would like to eliminate me having to manually goto the website and copy the table. Is there another way to do it besides "From Web"

推荐答案

确保选择正确的表.有两个表元素.第一个只是标题.第二个是标题+信息.我不确定您可以使用此方法来获取所有选项卡,因为URL不会更改,并且内容是用javascript更新的.您可以查看 API 是否可以提供任何服务想要在发布API密钥之前与您交谈的人员.

Make sure you are selecting the right table. There are two table elements. The first is just headers. The second is headers + info. I am not sure you can use this method to get all the tabs though as the URL doesn't change and the content is javascript updated. You can see whether the API has anything to offer though it is gate-kept by staff who want to speak to you before issuing an API key.

任何简单的方法都是转到VBE>工具>引用>添加对Microsoft Internet Control的引用,然后使用Internet Explorer导航到该页面.

Any easy way is to go VBE > Tools > References > Add a reference to Microsoft Internet Controls then use Internet Explorer to navigate to the page.

您可以使用 CSS选择器来定位它的 id 和另一个CSS选择器类别选择器定位所有选项卡链接,以便单击它们以更新每个标签的表格.

You can use a CSS selector to target the table by its id and another CSS selector class selector to target all the tab links so as to click them to update the table for each tab.

Option Explicit

Public Sub GetInfo()
    Dim IE As New InternetExplorer, hTable As HTMLTable
    Dim ws As Worksheet
    Set ws = ThisWorkbook.Worksheets("Sheet1")

    Dim clipboard As Object, tabs As Object, iTab As Long
    Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")

    With IE
        .Visible = True
        .navigate "https://swishanalytics.com/optimus/nba/daily-fantasy-team-defensive-ranks-position"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Set tabs = .document.querySelectorAll(".position.fastClick")

        For iTab = 0 To tabs.Length - 1

            If iTab > 0 Then
                tabs.item(iTab).Click
                While .Busy Or .readyState < 4: DoEvents: Wend

            End If

            clipboard.SetText .document.querySelector("#stat-table").outerHTML
            clipboard.PutInClipboard

            With ws
                .Cells(GetLastRow(ws, 1) + 2, 1).PasteSpecial
            End With
        Next
        .Quit
    End With

End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
    End With
End Function


如果您不想使用剪贴板复制粘贴表,则可以循环其行和行内的表格单元格.


If you don't want to use the clipboard to copy paste the table you can loop its rows and table cells within rows.

Option Explicit

Public Sub GetInfo()
    Dim IE As New InternetExplorer, ws As Worksheet, tabs As Object, iTab As Long
    Application.ScreenUpdating = False
    Set ws = ThisWorkbook.Worksheets("Sheet1")

    With IE
        .Visible = True
        .navigate "https://swishanalytics.com/optimus/nba/daily-fantasy-team-defensive-ranks-position"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Set tabs = .document.querySelectorAll(".position.fastClick")

        For iTab = 0 To tabs.Length - 1
            If iTab > 0 Then
                tabs.item(iTab).Click
                While .Busy Or .readyState < 4: DoEvents: Wend
            End If

            WriteTable .document.querySelector("#stat-table"), GetLastRow(ws, 1) + 2, ws
        Next
        .Quit
    End With
    Application.ScreenUpdating = True
End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
    End With
End Function

Public Sub WriteTable(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)

    If ws Is Nothing Then Set ws = ActiveSheet

    Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, C As Long, tBody As Object
    r = startRow
    With ws
        Dim headers As Object, header As Object, columnCounter As Long
        Set headers = hTable.getElementsByTagName("th")
        For Each header In headers
            columnCounter = columnCounter + 1
            .Cells(startRow, columnCounter) = header.innerText
        Next header
        startRow = startRow + 1
        Set tBody = hTable.getElementsByTagName("tbody")
        For Each tSection In tBody
            Set tRow = tSection.getElementsByTagName("tr")
            For Each tr In tRow
                r = r + 1
                Set tCell = tr.getElementsByTagName("td")
                C = 1
                For Each td In tCell
                    .Cells(r, C).Value = td.innerText
                    C = C + 1
                Next td
            Next tr
        Next tSection
    End With
End Sub

这篇关于从网上抓取Excel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆