用 xmlhttp 抓取 [英] Scrape with xmlhttp

查看:41
本文介绍了用 xmlhttp 抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从 https://www.goaloong.net/football/6in1<获取数据/a>此页面包含一个表格.

I would like to get data from https://www.goaloong.net/football/6in1 This page contains a table.

我尝试过:

Sub REQUESTXML()

Dim XMLHttpRequest As xmlHttp
Dim HTMLDoc As New HTMLDocument
Dim elem As Object
Dim x As Long

Set XMLHttpRequest = New MSXML2.xmlHttp
XMLHttpRequest.Open "GET", "https://www.goaloong.net/football/6in1", False
XMLHttpRequest.send
While XMLHttpRequest.readyState = 200
    DoEvents
Wend

Debug.Print XMLHttpRequest.responseText
HTMLDoc.Body.innerHTML = XMLHttpRequest.responseText

x = 1

For Each elem In HTMLDoc.getElementsByClassName("Leaguestitle")

    Sheets("req").Range("A" & x).Value = HTMLDoc.getElementsByTagName("a")(0).innerText
    
 x = x + 1
 
 Next elem

 End Sub

我没有结果.

请帮助我?

推荐答案

页面 https://www.goaloong.net/football/6in1 是动态的,即首先加载java 脚本,然后脚本加载内容.一种方法是在 IE 中加载整个页面内容并将其取出.下面的示例(已测试):

The page https://www.goaloong.net/football/6in1 is dynamic, i.e. first the java scripts are loaded, then the scripts are loading the content. One approach is to load the full page content in IE and get it out of it. Example below (tested):

Sub REQUESTXML()
    Dim IE As New InternetExplorer
    Dim elem As Object
    Dim x As Long
    
    IE.navigate "https://www.goaloong.net/football/6in1"
    
    Do While IE.readyState = READYSTATE_COMPLETE: DoEvents: Loop
    Do Until IE.readyState = READYSTATE_COMPLETE: DoEvents: Loop
    
    'for debug purpose
    Open ThisWorkbook.Path & "\TESTFILE.html" For Output As #1
    Print #1, IE.document.body.innerHTML
    Close #1
    
    x = 1
    For Each elem In IE.document.getElementsByClassName("Leaguestitle")
        Sheets(1).Range("A" & x).Value = elem.innerText
        x = x + 1
    Next elem

    IE.Quit
End Sub

这篇关于用 xmlhttp 抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆