使用VBA刮取HTML表 [英] Scraping HTML tables using VBA

查看:176
本文介绍了使用VBA刮取HTML表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用以下脚本我从以下链接成功地将HTML表中的值返回到工作簿: link1 LINK2 。但是,当我尝试使用相同的脚本进行以下 link3 ,它不返回任何东西。我认为这是由于网站上存在复杂的HTML表格结构。我认为代码要求 .Item(0)数字由于表格复杂而需要调整,请提供建议。

Using the below script I'm successfully returning values from the HTML tables into the workbook from the following links: link1 and link2. But when I'm trying to use the same script for the following link3, it does not return anything back. I think it is due to complex HTML table structure existing on the website. I believe code requires .Item(0) number to be adjusted due to table complexity, please advice.

Sub Web_Data()
    Dim http As New XMLHTTP60, html As New HTMLDocument
    Dim topic As HTMLHtmlElement

    With http
        .Open "GET", "http://www.dolphinfitness.co.uk/en/optimum-nutrition/", False
        .send
        html.body.innerHTML = .responseText
    End With

    For Each topic In html.getElementsByClassName("category-products")
        With topic.getElementsByClassName("product-name")
            If .Length Then x = x + 1: Cells(x, 1) = .Item(0).innerText
        End With
        With topic.getElementsByClassName("price")
            If .Length Then Cells(x, 2) = .Item(0).innerText
        End With
    Next topic
End Sub


推荐答案

您在帖子中提到的网站有点棘手这是为了解析不同产品的价格。很少有产品有原价,其余产品都有特价。在对表达式应用技术之前,不能同时解析它们。我写了一个能够处理它们的xpath,你将能够得到它们。这是脚本:

The very site you mentioned in your post is a bit tricky when it comes to parse the price of different products. Few products have got original price with it and the rest have got special price with them. You can't parse both of them all at once until you apply a technique with your expression. I've written an xpath which is able to deal with them and you will be able to get them all. Here is the script:

Sub Body_Building()
    Dim driver As New WebDriver, post As Object

    With driver
        .Start "chrome", "http://www.bodybuildingwarehouse.co.uk"
        .Get "/optimum-nutrition?limit=all"
    End With

    On Error Resume Next
    For Each post In driver.FindElementsByClass("grid-info")
        i = i + 1: Cells(i, 1) = post.FindElementByClass("product-name").Text
        Cells(i, 2) = post.FindElementByXPath(".//span[@class='regular-price']//span[@class='price']|.//p[@class='special-price']//span[@class='price']").Text
    Next post
End Sub

如果您在执行脚本时遇到任何问题,请告诉我。顺便说一句,与vba绑定的selenium没有任何属性可以避免On error resume next所以我把它放在循环之前。谢谢。

Let me know if you have any problem executing the script. Btw, selenium binding with vba doesn't have any property to shun "On error resume next" so i put it before the loop. Thanks.

这篇关于使用VBA刮取HTML表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆