从网页检索所有Excel文件链接 [英] Retrieving all Excel file links from a webpage

查看:49
本文介绍了从网页检索所有Excel文件链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从网站上获取所有可下载的Excel文件链接,但遇到了困难.请帮助指导我.谢谢.

I'm trying to get all the downloadable Excel file links from the website, but having difficulty. Please help to guide me. Thanks.

Sub TYEX()

    Dim internet As Object
    Dim internetdata As Object
    Dim div_result As Object
    Dim header_links As Object
    Dim link As Object
    Dim URL As String

    Set internet = CreateObject("InternetExplorer.Application")
    internet.Visible = True

    URL = "https://www.jpx.co.jp/markets/public/short-selling/index.html"
    internet.Navigate URL

    Do Until internet.ReadyState >= 4
        DoEvents
    Loop

    Application.Wait Now + TimeSerial(0, 0, 5)

    Set internetdata = internet.Document
    Set div_result = internetdata.getElementById("readArea")

    Set header_links = div_result.getElementsByTagName("td")

    For Each h In header_links
        Set link = h.ChildNodes.item(0)
        Cells(Range("A" & Rows.Count).End(xlUp).Row + 1, 1) = link.href
    Next

    MsgBox "done"
End Sub

推荐答案

您可以使用

You can use an attribute = value CSS selector with $ operator to say the href value must end with .xls. Then use querySelectorAll to retrieve all matched results. Using CSS selectors is a very fast and generally robust method.

Dim list As Object
Set list = ie.document.querySelectorAll("[href$='.xls']")


使用XMLHTTP比打开IE更快.请注意,然后您可以将这些链接传递给执行二进制下载的函数,或传递给URLMon进行下载.


It is much faster to use XMLHTTP as well, rather than opening IE. Note that you can then pass these links to a function to perform a binary download or to URLMon for downloading.

Option Explicit   
Public Sub Links()
    Dim sResponse As String, html As HTMLDocument, list As Object, i As Long

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.jpx.co.jp/markets/public/short-selling/index.html", False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With

    Set html = New HTMLDocument
    With html
        .body.innerHTML = sResponse
        Set list = html.querySelectorAll("[href$='.xls']")
    End With
    For i = 0 To list.Length - 1
        Debug.Print Replace$(list.item(i), "about:", "https://www.jpx.co.jp")
    Next
End Sub


示例下载功能(尽管您可以重复使用现有的XMLHTTP对象-这只是为了说明):


Example download function (though you could re-use your existing XMLHTTP object - this is just to illustrate):

Public Function DownloadFile(ByVal downloadFolder As String, ByVal downloadURL As String) As String
    Dim http As Object , tempArr As Variant
    Set http =  CreateObject("WinHttp.WinHttpRequest.5.1")
    http.Open "GET", downloadURL, False
    http.send
    On Error GoTo errhand
    With CreateObject("ADODB.Stream")
        .Open
        .Type = 1
        .write http.responseBody
        tempArr = Split(downloadURL, "/")
        tempArr = tempArr(UBound(tempArr))
        .SaveToFile downloadFolder & tempArr, 2  '< "/" on enter of downloadFolder. 2 for overwrite which is Ok if no file modifications.
        .Close
    End With
    DownloadFile = downloadFolder & tempArr
    Exit Function
errhand:
    If Err.Number <> 0 Then
        Debug.Print Err.Number, Err.Description
        MsgBox "Download failed"
    End If
    DownloadFile = vbNullString
End Function


参考(VBE>工具>参考):

  1. Microsoft HTML对象库

这篇关于从网页检索所有Excel文件链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆