使用组合框 VBA 中的选项抓取网站数据 [英] Scraping website data with options in combo box VBA

查看:22
本文介绍了使用组合框 VBA 中的选项抓取网站数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从这个网站上抓取产品名称(Cohiba Robusto)、产品尺寸(单支雪茄,3 支一包,25 支一盒)和价格(33.65 英镑、90 英镑、730 英镑):

我正在使用下面的代码,它给出了一个错误(对象变量或变量未设置").

非常感谢您对此的任何帮助.

Sub getproducts()表格(JJFox").选择将 oHtml 变暗为 HTMLDocumentDim oElement 作为对象Dim 元素作为 IHTMLElementCollection将文档变暗为 HTMLDocument设置 oHtml = 新建 HTMLDocument'细胞(1, 6)=时间()lastrow = ActiveSheet.Cells(Rows.Count, A").End(xlUp).Rowcnt = 最后一行 + 1计数器 1 = cnt将 gg 调暗为字符串gg =https://www.jjfox.co.uk/cohiba-robusto-621.html"Dim objHTTP As New WinHttp.WinHttpRequest网址 = ggobjHTTP.Open "POST", url, FalseobjHTTP.setRequestHeader "Content-Type", "application/json";objHTTP.send ("{""key"":null,""from"":"""me@me.com""",""to""":null,""cc"":null,""bcc"":null,""date"":null,""subject"":""My Subject"",""body"":null,""attachments"":null}")oHtml.body.innerHTML = objHTTP.responseText'Cells(rw, 2) = oHtml.getElementsByTagName("description").innerText' 如果不是 .Document.querySelector("button[aria-label='Close']") Is Nothing Then' .Document.querySelector("button[aria-label='Close']").点击'  万一txttitle = oHtml.getElementsByClassName("productcart")(0).innerTexttxttitlehtml = oHtml.getElementsByClassName("packsize")(0).innerHTMLtxttitle = Mid(txttitle, 1, InStr(1, txttitle, Chr(10)))'Debug.Print txttitlehtml'txttitle2 = oHtml.getElementsByClassName(价格")(0).innerText将文本变暗为字符串文本 = GetHTML(gg)开始 = InStr(1, Text, spConfig =")endS = InStr(starts + 1, Text, spConfig")如果开始 = 0 那么单元格 (counter1, 1) = txttitleCells(counter1, 2) = 单"开始= InStr(开始+ 1,文本,产品价格")endl = InStr(starts + 1, Text, ,")Cells(counter1, 3) = Val(Mid(Text, starts + 14, endl - (starts + 14)))Cells(counter1, 4) = JJFox"细胞(计​​数器1,5)=现在()Cells(counter1, 7) = gg ' 页面链接计数器 1 = 计数器 1 + 1别的文本 = Mid(文本,开始,endS - 开始)'调试.打印文本'找到有多少包选项可用myTxt = 文本countTxt = 标签"bb = (Len(myTxt) - Len(replace(myTxt, countTxt, "")))/Len(countTxt) - 1'结束查找////////////////////////////////////varlabel = "class=""&Chr(34) &标签"&铬(34)开始 = InStr(1, Text, 标签") + 1文本 = Mid(文本,开始,Len(文本))对于 i = 1 到 bb开始 = InStr(1, 文本, 标签")If InStr(starts, Text, "label") Then'显示元素的属性单元格 (counter1, 1) = txttitleCells(counter1, 2) = Mid(Text, starts + 8, InStr(starts, Text, "") - (starts + 8))开始 = InStr(开始 + 1, 文本, oldPrice")endl = InStr(starts + 1, Text, ,")Cells(counter1, 3).FormulaR1C1 = Val(Mid(Text, starts + 11, endl - (starts + 11)))'Debug.Print Val(Mid(Text, startS + chrs, 6))Cells(counter1, 4) = JJFox"细胞(计​​数器1,5)=现在()开始 = 开始 + 1文本 = Mid(文本,开始,Len(文本))Cells(counter1, 7) = gg ' 页面链接计数器 1 = 计数器 1 + 1万一接下来我万一'细胞(2, 6)=时间()结束子函数 GetHTML(url As String) As String使用 CreateObject(MSXML2.ServerXMLHTTP.6.0").打开GET",网址,假.发送GetHTML = .responseText结束于结束函数

解决方案

价格和标签是从脚本标签动态提取的,您可以使用 json 解析器将其内容解析为 json.但是,您需要从 html 中获取名称.

只要对 html 和 css 有一点了解,就可以很容易地定义一个 css 模式来定位感兴趣的脚本节点:

.fieldset [type='text/x-magento-init']

寻找一个子scripttype 属性具有text/x-magento-init 属性,父类<代码>字段集.

我使用的效率稍低(你不会注意到):

For i = 1 To optionsCollection.Count

仅仅因为我知道集合很小,并且允许我用一个循环索引到两个变量.


Json 库:

我使用 jsonconverter.bas.从这里下载原始代码并添加到标准模块称为 JsonConverter .从复制的代码中删除顶部的 Attribute 行.

然后你需要去:

VBE >工具 >参考文献添加对以下内容的引用:

Microsoft 脚本运行时微软 HTML 对象库微软 XML 库.

在 json 的 VBA 中,[] 表示一个集合,而 {} 表示一个字典.


选项显式公共子 GetCigarData()'<VBE >工具 >参考:'微软脚本运行时'微软 HTML 对象库'微软 XML 库Dim json As Object, html As MSHTML.HTMLDocument, xhr As MSXML2.XMLHTTP60, ws As Worksheet设置 ws = ThisWorkbook.Worksheets("Sheet1")设置 xhr = 新 MSXML2.XMLHTTP60设置 html = 新建 MSHTML.HTMLDocument与 xhr.打开GET",https://www.jjfox.co.uk/cohiba-robusto-621.html",假.setRequestHeader用户代理",Mozilla/5.0";.发送html.body.innerHTML = .responseText结束于设置 json = jsonConverter.ParseJson(html.querySelector(".fieldset [type='text/x-magento-init']").innerHTML)("#product_addtocart_form")("configurable")(";spConfig")Dim 价格作为 Scripting.Dictionary,选项作为 Scripting.Dictionary,optionsCollection 作为集合设置价格 = json("optionPrices")设置选项 = json("属性")设置 optionsCollection = options(options.Keys(0))("options")Dim results() As Variant, headers() As Variant, i As Long, name As StringReDim 结果(1 到 optionsCollection.Count,1 到 3)name = html.querySelector(".base").innerText对于 i = 1 到 optionsCollection.Count结果(i, 1) = 名称结果(i,2)= optionsCollection.item(i)(标签")结果(i,3)=价格(prices.Keys(i - 1))(finalPrice")(amount")下一个headers = Array(名称"、尺寸"、价格")与 ws.Cells(1, 1).Resize(1, UBound(headers) + 1) = 标题.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = 结果结束于结束子


阅读有关 css 选择器的信息:

  1. https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors

Hi I am trying to scrape the Product name (Cohiba Robusto), Product Size (Single Cigar, Pack of 3, Box of 25) and prices (£33.65, £90, £730) from this website: https://www.jjfox.co.uk/cohiba-robusto-621.html

I am trying to get something like this:

I am using the code below, which gives an error ("Object variable or with variable not set").

Will appreciate any help with this.

Sub getproducts()

Sheets("JJFox").Select

Dim oHtml       As HTMLDocument
Dim oElement    As Object

Dim Elements As IHTMLElementCollection
Dim Document As HTMLDocument

Set oHtml = New HTMLDocument


'Cells(1, 6) = Time()
lastrow = ActiveSheet.Cells(Rows.Count, "A").End(xlUp).Row
cnt = lastrow + 1

counter1 = cnt

Dim gg As String

gg = "https://www.jjfox.co.uk/cohiba-robusto-621.html"


Dim objHTTP As New WinHttp.WinHttpRequest
url = gg
    objHTTP.Open "POST", url, False
    objHTTP.setRequestHeader "Content-Type", "application/json"
    objHTTP.send ("{""key"":null,""from"":""me@me.com"",""to"":null,""cc"":null,""bcc"":null,""date"":null,""subject"":""My Subject"",""body"":null,""attachments"":null}")
   
 oHtml.body.innerHTML = objHTTP.responseText
 'Cells(rw, 2) = oHtml.getElementsByTagName("description").innerText

   '    If Not .Document.querySelector("button[aria-label='Close']") Is Nothing Then
     '       .Document.querySelector("button[aria-label='Close']").Click
      '  End If
      
      
    txttitle = oHtml.getElementsByClassName("productcart")(0).innerText
txttitlehtml = oHtml.getElementsByClassName("packsize")(0).innerHTML


txttitle = Mid(txttitle, 1, InStr(1, txttitle, Chr(10)))
'Debug.Print txttitlehtml
'txttitle2 = oHtml.getElementsByClassName("price")(0).innerText

Dim Text As String
Text = GetHTML(gg)


starts = InStr(1, Text, "spConfig =")
endS = InStr(starts + 1, Text, "spConfig")

If starts = 0 Then


    Cells(counter1, 1) = txttitle
    Cells(counter1, 2) = "Single"
    starts = InStr(starts + 1, Text, "productPrice")
    endl = InStr(starts + 1, Text, ",")
    Cells(counter1, 3) = Val(Mid(Text, starts + 14, endl - (starts + 14)))
    Cells(counter1, 4) = "JJFox"
    Cells(counter1, 5) = Now()
     
    Cells(counter1, 7) = gg ' link to the page
    counter1 = counter1 + 1
   
Else


Text = Mid(Text, starts, endS - starts)
'Debug.Print Text
'find how many pack options are avaialble

myTxt = Text
countTxt = "label"

bb = (Len(myTxt) - Len(replace(myTxt, countTxt, ""))) / Len(countTxt) - 1
'End find////////////////////////////////////

varlabel = "class=" & Chr(34) & "label" & Chr(34)


starts = InStr(1, Text, "label") + 1
Text = Mid(Text, starts, Len(Text))
        
        For i = 1 To bb
        
        
        starts = InStr(1, Text, "label")
        
        If InStr(starts, Text, "label") Then
        
        'Show the element's properties
           
        
                Cells(counter1, 1) = txttitle
                Cells(counter1, 2) = Mid(Text, starts + 8, InStr(starts, Text, " ") - (starts + 8))
                
                                       
                        starts = InStr(starts + 1, Text, "oldPrice")
                        endl = InStr(starts + 1, Text, ",")
                        

                Cells(counter1, 3).FormulaR1C1 = Val(Mid(Text, starts + 11, endl - (starts + 11)))
                'Debug.Print Val(Mid(Text, startS + chrs, 6))
                Cells(counter1, 4) = "JJFox"
                Cells(counter1, 5) = Now()
                starts = starts + 1
                Text = Mid(Text, starts, Len(Text))
                Cells(counter1, 7) = gg ' link to the page
                counter1 = counter1 + 1
           End If
        
        Next i
            
End If
'Cells(2, 6) = Time()
End Sub



Function GetHTML(url As String) As String
     With CreateObject("MSXML2.ServerXMLHTTP.6.0")
        .Open "GET", url, False
        .send
        GetHTML = .responseText
    End With
End Function

解决方案

The prices and labels are pulled dynamically from a script tag who content you can parse as json with a json parser. You need to grab the name from the html however.

With a little knowledge of html and css, it is easy enough to define a css pattern to target the script node of interest with:

.fieldset [type='text/x-magento-init']

That looks for a child script with type attribute having attribute value text/x-magento-init, and a parent with class fieldset.

I have used a tiny bit less efficient (you won't notice):

For i = 1 To optionsCollection.Count

Simply because I know the collection is small and to allow me to index into two variables with a single loop.


Json library:

I use jsonconverter.bas. Download raw code from here and add to standard module called JsonConverter . Remove the top Attribute line from the copied code.

You then need to go:

VBE > Tools > References > Add references to:

Microsoft Scripting Runtime
Microsoft HTML Object Library
Microsoft XML Library. 

In VBA for json the [] denotes a collection and the {} represents a dictionary.


Option Explicit

Public Sub GetCigarData()
    '<  VBE > Tools > References:
    'Microsoft Scripting Runtime
    'Microsoft HTML Object Library
    'Microsoft XML Library
    
    Dim json As Object, html As MSHTML.HTMLDocument, xhr As MSXML2.XMLHTTP60, ws As Worksheet

    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set xhr = New MSXML2.XMLHTTP60
    Set html = New MSHTML.HTMLDocument
    
    With xhr
        .Open "GET", "https://www.jjfox.co.uk/cohiba-robusto-621.html", False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
         html.body.innerHTML = .responseText
    End With

    Set json = jsonConverter.ParseJson(html.querySelector(".fieldset [type='text/x-magento-init']").innerHTML)("#product_addtocart_form")("configurable")("spConfig")
     
    Dim prices As Scripting.Dictionary, options As Scripting.Dictionary, optionsCollection As Collection
    
    Set prices = json("optionPrices")
    Set options = json("attributes")
    Set optionsCollection = options(options.Keys(0))("options")
    
    Dim results() As Variant, headers() As Variant, i As Long, name As String
    ReDim results(1 To optionsCollection.Count, 1 To 3)
    
    name = html.querySelector(".base").innerText

    For i = 1 To optionsCollection.Count
         results(i, 1) = name
         results(i, 2) = optionsCollection.item(i)("label")
         results(i, 3) = prices(prices.Keys(i - 1))("finalPrice")("amount")
    Next
    
    headers = Array("Name", "Size", "Price")
    
    With ws
        .Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
        .Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results 
    End With
   
End Sub


Read about css selectors:

  1. https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors

这篇关于使用组合框 VBA 中的选项抓取网站数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆