使用VBA提取一系列URL [英] Extracting a series of URL using VBA

查看:85
本文介绍了使用VBA提取一系列URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是尝试遍历url链接列表,但它始终显示运行时错误"91",对象变量或未设置块变量.

I just trying to run through a list of url link, but it keep showing run time error'91',object variable or with block variable not set.

我要提取的数据来自iframe.它确实显示了一些值,但是卡在了过程的中间,并出现了错误.

The data I want to extract is from iframes. It do shown some of the values but it stuck in the middle of process with error.

下面是我要从中提取值的示例URL链接: http://www.bursamalaysia.com/market/listed-companies/company-announcements/5927201

Below is the sample url link that I want to extract value from:http://www.bursamalaysia.com/market/listed-companies/company-announcements/5927201

Public Sub GetInfo()
    Dim IE As New InternetExplorer As Object
    With IE
        .Visible = False

        For u = 2 To 100

        .navigate Cells(u, 1).Value

        While .Busy Or .readyState < 4: DoEvents: Wend



        With .document.getElementById("bm_ann_detail_iframe").contentDocument
            ThisWorkbook.Worksheets("Sheet1").Cells(u, 3) = .getElementById("main").innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(u, 4) = .getElementsByClassName("company_name")(0).innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(u, 5) = .getElementsByClassName("formContentData")(0).innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(u, 6) = .getElementsByClassName("formContentData")(5).innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(u, 7) = .getElementsByClassName("formContentData")(7).innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(u, 8) = .getElementsByClassName("formContentData")(8).innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(u, 9) = .getElementsByClassName("formContentData")(9).innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(u, 10) = .getElementsByClassName("formContentData")(10).innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(u, 11) = .getElementsByClassName("formContentData")(11).innerText
       End With

    Next u
    End With
End Sub

推荐答案

tl; dr

您的错误是由于以下事实造成的:给定类名的元素数量不同,具体取决于每页的结果.因此,您不能使用固定索引.对于该页面,您通过iframe指示了该类的最后一个索引为9,即ThisWorkbook.Worksheets("Sheet1").cells(u, 9) = .getElementsByClassName("formContentData")(9).innerText. 10和11无效.下面,我展示了一种确定结果数量并从每个结果行中提取信息的方法.

Your error is due to the fact there are different numbers of elements for the given class name depending on the results per page. So you can't used fixed indexes. For the page you indicated the last index for that class, via the iframe, is 9 i.e. ThisWorkbook.Worksheets("Sheet1").cells(u, 9) = .getElementsByClassName("formContentData")(9).innerText . 10 and 11 are invalid. Below I show a way to determine the number of results and extract info from each result row.

好吧...因此,以下内容基于针对大多数信息的Details of Changes表为目标的原理.

Ok... so the following works on the principle of targeting the Details of Changes table for most of the info.

示例摘录:

更具体地说,我定位重复No, Date of Change, #Securities, Type of Transaction and Nature of Interest信息的行.这些值存储在一个数组数组中(每行信息一个数组).然后将结果数组存储在一个集合中,以便稍后将其写出到工作表中.我在目标行(父级tr中的td标签元素)中循环每个表格单元格以填充数组.

More specifically, I target the rows that repeat the info for No, Date of Change, #Securities, Type of Transaction and Nature of Interest. These values are stored in an array of arrays (one array per row of information). Then the results arrays are stored in a collection to later be written out to the sheet. I loop each table cell in the targeted rows (td tag elements within parent tr) to populate the arrays.

我在页面上方的表中添加了Name,而且,因为可能存在多个结果行,具体取决于网页,并且因为我将结果写入新的Results工作表中,我在每个结果之前添加URL,以指示信息来源.

I add in the Name from the table above on the page and also, because there can be more than one row of results, depending on the webpage, and because I am writing the results to a new Results sheet, I add in the URL before each result to indicate source of information.

待办事项:

  1. 重构代码使其更具模块化
  2. 可能会添加一些错误处理


CSS选择器:


①从Particulars of substantial Securities Holder表中选择Name元素,称为title.


CSS selectors:


① I select the Name element, which I refer to as title, from the Particulars of substantial Securities Holder table.

名称元素示例:

检查此元素的HTML显示其具有formContentLabel类,并且它是页面上具有此值的第一类.

Inspecting the HTML for this element shows it has a class of formContentLabel, and that it is the first class with this value on the page.

目标名称的HTML示例:

这意味着我可以使用类选择器.formContentLabel,以定位元素.因为它是一个元素,所以我想使用querySelector方法来应用CSS选择器.

This means I can use a class selector , .formContentLabel, to target the element. As it is a single element I want I use the querySelector method to apply the CSS selector.

②我用.ven_table tr的选择器组合定位Details of Changes表中感兴趣的行.这是后代选择器组合,其中将选择元素与tr父级为ven_table的标签.由于这些是多个元素,因此我使用querySelectorAll方法应用CSS选择器组合.

② I target the rows of interest in the Details of Changes table with a selector combination of .ven_table tr. This is descendant selector combination combining selecting elements with tr tag having parent with class ven_table. As these are multiple elements I use the querySelectorAll method to apply the CSS selector combination.

目标行的示例:

CSS选择器返回的示例结果(示例):

我感兴趣的行从1开始,例如每隔+ 4行重复一次.第5、9等行 因此,我在代码中使用了一些数学运算以仅返回感兴趣的行:

The rows I am interested start at 1 and repeat every + 4 rows after e.g. row 5 , 9 etc. So I use a little maths in the code to return just the rows of interest:

Set currentRow = data.item(i * 4 + 1)


VBA:

Option Explicit
Public Sub GetInfo()
    Dim IE As New InternetExplorer, headers(), u As Long, resultCollection As Collection
    headers = Array("URL", "Name", "No", "Date of change", "# Securities", "Type of Transaction", "Nature of Interest")
    Set resultCollection = New Collection
    Dim links()
    links = Application.Transpose(ThisWorkbook.Worksheets("Sheet1").Range("A2:A3")) 'A100

    With IE
        .Visible = True

        For u = LBound(links) To UBound(links)
            If InStr(links(u), "http") > 0 Then
                .navigate links(u)

                While .Busy Or .readyState < 4: DoEvents: Wend
                Application.Wait Now + TimeSerial(0, 0, 1) '<you may not always need this. Or may need to increase.
                Dim data As Object, title As Object
                With .document.getElementById("bm_ann_detail_iframe").contentDocument
                    Set title = .querySelector(".formContentData")
                    Set data = .querySelectorAll(".ven_table tr")
                End With

                Dim results(), numberOfRows As Long, i As Long, currentRow As Object, td As Object, c As Long, r As Long

                numberOfRows = Round(data.Length / 4, 0)
                ReDim results(1 To numberOfRows, 1 To 7)

                For i = 0 To numberOfRows - 1
                    r = i + 1
                    results(r, 1) = links(u): results(r, 2) = title.innerText
                    Set currentRow = data.item(i * 4 + 1)
                    c = 3
                    For Each td In currentRow.getElementsByTagName("td")
                        results(r, c) = Replace$(td.innerText, "document.write(rownum++);", vbNullString)
                        c = c + 1
                    Next td
                Next i
                resultCollection.Add results
                Set data = Nothing: Set title = Nothing
            End If
        Next u
        .Quit
    End With
    Dim ws As Worksheet, item As Long
    If Not resultCollection.Count > 0 Then Exit Sub

    If Not Evaluate("ISREF('Results'!A1)") Then '<==Credit to @Rory for this test
        Set ws = Worksheets.Add
        ws.NAME = "Results"
    Else
        Set ws = ThisWorkbook.Worksheets("Results")
        ws.cells.Clear
    End If

    Dim outputRow As Long: outputRow = 2
    With ws
        .cells(1, 1).Resize(1, UBound(headers) + 1) = headers
        For item = 1 To resultCollection.Count
            Dim arr()
            arr = resultCollection(item)
            For i = LBound(arr, 1) To UBound(arr, 1)
                .cells(outputRow, 1).Resize(1, 7) = Application.WorksheetFunction.Index(arr, i, 0)
                outputRow = outputRow + 1
            Next
        Next
    End With
End Sub


使用2个提供的测试URL的示例结果:

工作表1中的示例URL:

  1. http://www.bursamalaysia.com/market/listed -companies/company-announcements/5928057
  2. http://www.bursamalaysia.com/market/listed -companies/company-announcements/5927201
  1. http://www.bursamalaysia.com/market/listed-companies/company-announcements/5928057
  2. http://www.bursamalaysia.com/market/listed-companies/company-announcements/5927201

这篇关于使用VBA提取一系列URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆