Excel-VBA:在.aspx网页上抓取数据之前的按钮和下拉列表 [英] Excel-VBA: buttons and drop-down lists before scraping data on .aspx web pages

查看:280
本文介绍了Excel-VBA:在.aspx网页上抓取数据之前的按钮和下拉列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用VBA从网页中获取表格,但是我无法直接访问它,因为我需要从一个标签页切换到另一个标签页.该问题取决于网页的.aspx末尾(URL不会相应地发展).

URL: http://www.morningstar.fr/fr/fundquickrank/default .aspx

单击URL后的默认页面

移动到"Long Terme"标签-URL不会改变

我设法构建了将表导出到Excel工作表的代码,但是缺少导航"部分.

以下是与单击"Long Terme标签"的按钮有关的HTML代码:

HTML代码长码"标签

我的起点在这里(ProcessHTMLPage进行表格抓取):

Sub Browse_Morningstar()

Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument


XMLPage.Open "GET", "http://www.morningstar.fr/fr/fundquickrank/default.aspx", False
XMLPage.send

HTMLDoc.body.innerHTML = XMLPage.responseText


ProcessHTMLPage HTMLDoc

End Sub

我认为必须以某种方式更新请求. 很抱歉,如果我不够准确,但是我有点新意.

非常感谢你们!

解决方案

__ doPostBack和onclick事件:

当您检查与选择主要内容div栏相关的HTML时,例如Long Terme,您会看到有一个Java脚本 __doPostBack函数与单个栏项目的onclick事件关联.

观察相关的HTML:

引用我上面的链接:

该函数采用以下两个参数:

eventTarget-这包含导致 回发. eventArgument-这包含任何其他数据 与控件关联.

在任何ASP.NET页中,两个隐藏字段:__EVENTTARGET和 __EVENTARGUMENT被自动声明.将页面发布回服务器时,ASP.NET将检查__EVENTTARGET和__EVENTARGUMENT 值,并以此方式可以确定哪个控件导致了 要回发的页面以及必须处理的事件.

tldr;

ASP的较旧"日子里,通常必须有一种形式来捕获用户输入,然后再创建更多页面来接受这些输入(GETPOST),验证,执行操作等.使用ASP.NET,您可以在服务器上声明接受上述参数的控件,并在检查

的值后将其发布回同一页面.

第一个参数告诉您触发了哪个控件,第二个参数提供其他信息,在这种情况下,它确定了返回的选项卡信息.

从上面我们可以看到TabAction是控件,其后的数字对应于所需的标签,例如Long Terme为2(索引为0).

在VBA中,我们可以通过多种方式执行此JS函数,但我将使用:

.document.parentWindow.execScript "__doPostBack('EVENTTARGET', 'EVENTARGUMENT')"

它变成:

.document.parentWindow.execScript "__doPostBack('TabAction', '2')"

我重写以接受EVENTARGUMENT作为常量OPTION_CHOSEN,因此可以通过更改顶部的值来检索不同的选项卡.

执行该功能后,还需要一点时间来刷新页面,然后通过其id来抓取该表:

Set hTable = .document.getElementById("ctl00_ctl00_MainContent_Layout_1MainContent_gridResult")

然后,表格沿其行和列循环(列是沿每一行长度的表格单元格).


页面示例:


代码输出示例:


完整代码:

Option Explicit
Public Sub GetTable()
    Dim IE As New InternetExplorer
    Const OPTION_CHOSEN As Long = 2             '0 Aperçu; 1 Court terme; 2 Long terme; 3 Portefeuille; 4 Frais & Détails

    Application.ScreenUpdating = True
    With IE
        .Visible = True
        .navigate "http://www.morningstar.fr/fr/fundquickrank/default.aspx"
        While .readyState < 4: DoEvents: Wend

        .document.parentWindow.execScript "__doPostBack('TabAction', ' " & OPTION_CHOSEN & "')"

        Do While .Busy = True Or .readyState <> 4: DoEvents: Loop

        Dim hTable As HTMLTable, tRow As HTMLTableRow, tCell As HTMLTableCell

        Set hTable = .document.getElementById("ctl00_ctl00_MainContent_Layout_1MainContent_gridResult")
        Dim c As Long, r As Long
        With ActiveSheet
            For Each tRow In hTable.Rows
                For Each tCell In tRow.Cells
                    c = c + 1: .Cells(r + 1, c) = tCell.innerText
                Next tCell
                c = 0: r = r + 1
            Next tRow
            .Columns("A:A").Delete
            .UsedRange.Columns.AutoFit
        End With
        .Quit
    End With
    Application.ScreenUpdating = True
End Sub


参考(VBE>工具>参考):

  1. Microsoft Internet控件

I'd like to use VBA to get a table from a webpage, but I cannot access it directly because I need to switch from a tab to another. The problem relies on the .aspx end of the web page (URL doesn't evolve accordingly).

URL: http://www.morningstar.fr/fr/fundquickrank/default.aspx

Default page after clicking URL

Moving to "Long Terme" Tab - URL won't change

I've managed to build the code for exporting the table to an Excel worksheet, but lack the "navigation" part.

Here is the HTML code that regards the button clicking the "Long Terme Tab":

HTML Code Long Terme Tab

My starting point is here (ProcessHTMLPage does the table scraping):

Sub Browse_Morningstar()

Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument


XMLPage.Open "GET", "http://www.morningstar.fr/fr/fundquickrank/default.aspx", False
XMLPage.send

HTMLDoc.body.innerHTML = XMLPage.responseText


ProcessHTMLPage HTMLDoc

End Sub

I suppose the request must be updated somehow. Sorry if I am not being accurate enough, but I'm kind of new at all this.

Thanks a lot guys!

解决方案

__doPostBack and onclick event:

When you inspect the HTML associated with selecting Main Content div bar e.g. Long Terme, you can see that there is a java script __doPostBack function associated with the onclick event for the individual bar items.

Observe the HTML in question:

To quote from my link above:

The function takes the following two arguments:

eventTarget - This contains the ID of the control that caused the post back. eventArgument - This contains any additional data associated with the control.

In any ASP.NET page the two hidden fields: __EVENTTARGET and __EVENTARGUMENT are automatically declared. When a page is posted back to the server ASP.NET inspects __EVENTTARGET and __EVENTARGUMENT values and this way it can decide which of the controls caused the page to be posted back and what is the event that has to be handled.

tldr;

In the "olden" days of ASP there often had to be a form to capture user inputs and then further pages created to accept those inputs (GET or POST), validate, perform actions and the like. With ASP.NET you can declare controls on the server which accept the above arguments and post back to the same page after inspecting the values of

The first argument tells you which control was fired and the second argument provides additional information, which in this case determines what tab information is returned.

We can see from the above that TabAction is the control and that the number following it corresponds to the tab of interest e.g. 2 for Long Terme (as 0 - indexed).

In VBA we can execute this JS function in a number of ways but I will use:

.document.parentWindow.execScript "__doPostBack('EVENTTARGET', 'EVENTARGUMENT')"

This becomes:

.document.parentWindow.execScript "__doPostBack('TabAction', '2')"

Which I re-write to accept the EVENTARGUMENT as a constant, OPTION_CHOSEN, so different tabs can be retrieved by changing the its value at the top.

After executing the function a little time is left to refresh the page and then the table is grabbed by its id:

Set hTable = .document.getElementById("ctl00_ctl00_MainContent_Layout_1MainContent_gridResult")

Then the table is looped along its rows and columns (columns being the table cells along the length of each row).


Examples from page:


Examples from code output:


Full code:

Option Explicit
Public Sub GetTable()
    Dim IE As New InternetExplorer
    Const OPTION_CHOSEN As Long = 2             '0 Aperçu; 1 Court terme; 2 Long terme; 3 Portefeuille; 4 Frais & Détails

    Application.ScreenUpdating = True
    With IE
        .Visible = True
        .navigate "http://www.morningstar.fr/fr/fundquickrank/default.aspx"
        While .readyState < 4: DoEvents: Wend

        .document.parentWindow.execScript "__doPostBack('TabAction', ' " & OPTION_CHOSEN & "')"

        Do While .Busy = True Or .readyState <> 4: DoEvents: Loop

        Dim hTable As HTMLTable, tRow As HTMLTableRow, tCell As HTMLTableCell

        Set hTable = .document.getElementById("ctl00_ctl00_MainContent_Layout_1MainContent_gridResult")
        Dim c As Long, r As Long
        With ActiveSheet
            For Each tRow In hTable.Rows
                For Each tCell In tRow.Cells
                    c = c + 1: .Cells(r + 1, c) = tCell.innerText
                Next tCell
                c = 0: r = r + 1
            Next tRow
            .Columns("A:A").Delete
            .UsedRange.Columns.AutoFit
        End With
        .Quit
    End With
    Application.ScreenUpdating = True
End Sub


References (VBE > Tools > References):

  1. Microsoft Internet Controls

这篇关于Excel-VBA:在.aspx网页上抓取数据之前的按钮和下拉列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆