使用组合框VB​​A中的选项来抓取网站数据 [英] Scraping website data with options in combo box VBA

查看:65
本文介绍了使用组合框VB​​A中的选项来抓取网站数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从以下网站上刮取产品名称(Cohiba Robusto),产品尺寸(单支雪茄,3包,25盒)和价格(33.65英镑,90英镑,730英镑):

我正在使用下面的代码,该代码给出错误(对象变量或未设置变量").

将对此表示感谢.

  Sub getproducts()表格("JJFox").选择作为HTMLDocument的Dim oHtml昏暗的oElement作为对象暗元素作为IHTMLElementCollection将Dim文档另存为HTMLDocument设置oHtml = New HTMLDocument'单元格(1,6)=时间()lastrow = ActiveSheet.Cells(Rows.Count,"A").End(xlUp).Rowcnt = lastrow + 1counter1 = cnt昏暗的gg作为字符串gg ="https://www.jjfox.co.uk/cohiba-robusto-621.html"昏暗的objHTTP作为新的WinHttp.WinHttpRequest网址= ggobjHTTP.Open"POST",网址,FalseobjHTTP.setRequestHeader"Content-Type","application/json"objHTTP.send("{""key":null,"来自":" me@me.com",到":null,"cc","null","bcc","null","date","null","subject","my Subject"","body"":null,"attachments":null}")oHtml.body.innerHTML = objHTTP.responseText'Cells(rw,2)= oHtml.getElementsByTagName("description").innerText'如果不是.Document.querySelector("button [aria-label ='Close']")则什么也没有'.Document.querySelector("button [aria-label ='Close']")).点击'  万一txttitle = oHtml.getElementsByClassName("productcart")(0).innerTexttxttitlehtml = oHtml.getElementsByClassName("packsize")(0).innerHTMLtxttitle = Mid(txttitle,1,InStr(1,txttitle,Chr(10)))'Debug.Print txttitlehtml'txttitle2 = oHtml.getElementsByClassName(价格")(0).innerText昏暗的文字作为字符串文字= GetHTML(gg)开始= InStr(1,文本,"spConfig =")endS = InStr(开始+ 1,文本,"spConfig")如果开始= 0那么单元格(counter1,1)= txttitle单元格(counter1,2)=单"开始= InStr(开始+ 1,文本,"productPrice")endl = InStr(开始+ 1,文本,,")像元(counter1,3)= Val(Mid(文本,开始+ 14,endl-(开始+ 14)))单元格(counter1,4)="JJFox"单元格(counter1,5)= Now()单元格(counter1,7)= gg'链接到页面计数器1 =计数器1 + 1别的文字=中(文字,开始,结束S-开始)'Debug.Print文本'找到多少个包装选项可用myTxt =文字countTxt ="label"bb =(Len(myTxt)-Len(替换(myTxt,countTxt,"))))/Len(countTxt)-1'结束查找/////////////////////////////////////varlabel ="class ="&Chr(34)&标签"&铬(34)开始= InStr(1,文本,标签")+ 1文字=中(文字,开头,Len(文字))对于i = 1到bb开始= InStr(1,文本,标签")如果InStr(开始,文本,标签"),则'显示元素的属性单元格(counter1,1)= txttitle单元格(counter1,2)= Mid(文本,开始+ 8,InStr(开始,文本,"\")-(开始+ 8))开始= InStr(开始+ 1,文本,"oldPrice")endl = InStr(开始+ 1,文本,,")Cells(counter1,3).FormulaR1C1 = Val(Mid(Text,starts + 11,endl-(starts + 11))))'Debug.Print Val(Mid(Text,startS + chrs,6))单元格(counter1,4)="JJFox"单元格(counter1,5)= Now()开始=开始+ 1文字=中(文字,开头,Len(文字))单元格(counter1,7)= gg'链接到页面计数器1 =计数器1 + 1万一接下来我万一'Cells(2,6)= Time()结束子函数GetHTML(URL作为字符串)作为字符串使用CreateObject("MSXML2.ServerXMLHTTP.6.0").打开"GET",URL,False.发送GetHTML = .responseText结束于结束功能 

解决方案

价格和标签是从脚本标签动态提取的,您可以使用json解析器将其内容解析为json.但是,您需要从html中获取名称.

对html和css有一点了解,就可以很容易地定义一个css模式来针对目标脚本节点:

  .fieldset [type ='text/x-magento-init'] 

查找具有属性 text/x-magento-init type 属性的子 script 和父类 fieldset .

我使用的效率稍低(您不会注意到):

 对于i = 1到optionsCollection.Count 

仅仅是因为我知道集合很小,并且允许我通过一个循环将其索引为两个变量.


Json库:

我使用jsonconverter.bas.从此处下载原始代码,并将其添加到标准模块中称为JsonConverter.从复制的代码中删除最上面的属性行.

然后您需要去

VBE>工具>参考书目>添加对以下内容的引用:

  Microsoft脚本运行时Microsoft HTML对象库Microsoft XML库. 

在VBA for json中,[]表示集合,{}表示字典.


 选项显式公共子GetCigarData()'<VBE>工具>参考:'Microsoft脚本运行时'Microsoft HTML对象库'Microsoft XML库Dim json作为对象,html作为MSHTML.HTMLDocument,xhr作为MSXML2.XMLHTTP60,ws作为工作表设置ws = ThisWorkbook.Worksheets("Sheet1")设置xhr = New MSXML2.XMLHTTP60设置html =新的MSHTML.HTMLDocument与xhr.打开"GET","https://www.jjfox.co.uk/cohiba-robusto-621.html",False.setRequestHeader用户代理","Mozilla/5.0".发送html.body.innerHTML = .responseText结束于设置json = jsonConverter.ParseJson(html.querySelector(.fieldset [type ='text/x-magento-init']").innerHTML)(#product_addtocart_form")(可配置")("spConfig")暗淡的价格作为Scripting.Dictionary,选项作为Scripting.Dictionary,options作为集合设置价格= json("optionPrices")设置选项= json(属性")设置optionsCollection = options(options.Keys(0))("options")Dim results()作为变量,headers()作为变量,i作为long,名称作为StringReDim结果(1到optionsCollection.Count,1到3)名称= html.querySelector(.base").innerText对于i = 1 To optionsCollection.Count结果(i,1)=名称results(i,2)= optionsCollection.item(i)(标签")结果(i,3)=价格(prices.Keys(i-1))("finalPrice")(金额")下一个标头= Array(名称",大小",价格")与ws.Cells(1,1).Resize(1,UBound(headers)+1)=标头.Cells(2,1).Resize(UBound(results,1),UBound(results,2))=结果结束于结束子 


了解有关CSS选择器的信息:

  1. https://developer.mozilla.org/zh-CN/docs/Web/CSS/CSS_Selectors

Hi I am trying to scrape the Product name (Cohiba Robusto), Product Size (Single Cigar, Pack of 3, Box of 25) and prices (£33.65, £90, £730) from this website: https://www.jjfox.co.uk/cohiba-robusto-621.html

I am trying to get something like this:

I am using the code below, which gives an error ("Object variable or with variable not set").

Will appreciate any help with this.

Sub getproducts()

Sheets("JJFox").Select

Dim oHtml       As HTMLDocument
Dim oElement    As Object

Dim Elements As IHTMLElementCollection
Dim Document As HTMLDocument

Set oHtml = New HTMLDocument


'Cells(1, 6) = Time()
lastrow = ActiveSheet.Cells(Rows.Count, "A").End(xlUp).Row
cnt = lastrow + 1

counter1 = cnt

Dim gg As String

gg = "https://www.jjfox.co.uk/cohiba-robusto-621.html"


Dim objHTTP As New WinHttp.WinHttpRequest
url = gg
    objHTTP.Open "POST", url, False
    objHTTP.setRequestHeader "Content-Type", "application/json"
    objHTTP.send ("{""key"":null,""from"":""me@me.com"",""to"":null,""cc"":null,""bcc"":null,""date"":null,""subject"":""My Subject"",""body"":null,""attachments"":null}")
   
 oHtml.body.innerHTML = objHTTP.responseText
 'Cells(rw, 2) = oHtml.getElementsByTagName("description").innerText

   '    If Not .Document.querySelector("button[aria-label='Close']") Is Nothing Then
     '       .Document.querySelector("button[aria-label='Close']").Click
      '  End If
      
      
    txttitle = oHtml.getElementsByClassName("productcart")(0).innerText
txttitlehtml = oHtml.getElementsByClassName("packsize")(0).innerHTML


txttitle = Mid(txttitle, 1, InStr(1, txttitle, Chr(10)))
'Debug.Print txttitlehtml
'txttitle2 = oHtml.getElementsByClassName("price")(0).innerText

Dim Text As String
Text = GetHTML(gg)


starts = InStr(1, Text, "spConfig =")
endS = InStr(starts + 1, Text, "spConfig")

If starts = 0 Then


    Cells(counter1, 1) = txttitle
    Cells(counter1, 2) = "Single"
    starts = InStr(starts + 1, Text, "productPrice")
    endl = InStr(starts + 1, Text, ",")
    Cells(counter1, 3) = Val(Mid(Text, starts + 14, endl - (starts + 14)))
    Cells(counter1, 4) = "JJFox"
    Cells(counter1, 5) = Now()
     
    Cells(counter1, 7) = gg ' link to the page
    counter1 = counter1 + 1
   
Else


Text = Mid(Text, starts, endS - starts)
'Debug.Print Text
'find how many pack options are avaialble

myTxt = Text
countTxt = "label"

bb = (Len(myTxt) - Len(replace(myTxt, countTxt, ""))) / Len(countTxt) - 1
'End find////////////////////////////////////

varlabel = "class=" & Chr(34) & "label" & Chr(34)


starts = InStr(1, Text, "label") + 1
Text = Mid(Text, starts, Len(Text))
        
        For i = 1 To bb
        
        
        starts = InStr(1, Text, "label")
        
        If InStr(starts, Text, "label") Then
        
        'Show the element's properties
           
        
                Cells(counter1, 1) = txttitle
                Cells(counter1, 2) = Mid(Text, starts + 8, InStr(starts, Text, " \") - (starts + 8))
                
                                       
                        starts = InStr(starts + 1, Text, "oldPrice")
                        endl = InStr(starts + 1, Text, ",")
                        

                Cells(counter1, 3).FormulaR1C1 = Val(Mid(Text, starts + 11, endl - (starts + 11)))
                'Debug.Print Val(Mid(Text, startS + chrs, 6))
                Cells(counter1, 4) = "JJFox"
                Cells(counter1, 5) = Now()
                starts = starts + 1
                Text = Mid(Text, starts, Len(Text))
                Cells(counter1, 7) = gg ' link to the page
                counter1 = counter1 + 1
           End If
        
        Next i
            
End If
'Cells(2, 6) = Time()
End Sub



Function GetHTML(url As String) As String
     With CreateObject("MSXML2.ServerXMLHTTP.6.0")
        .Open "GET", url, False
        .send
        GetHTML = .responseText
    End With
End Function

解决方案

The prices and labels are pulled dynamically from a script tag who content you can parse as json with a json parser. You need to grab the name from the html however.

With a little knowledge of html and css, it is easy enough to define a css pattern to target the script node of interest with:

.fieldset [type='text/x-magento-init']

That looks for a child script with type attribute having attribute value text/x-magento-init, and a parent with class fieldset.

I have used a tiny bit less efficient (you won't notice):

For i = 1 To optionsCollection.Count

Simply because I know the collection is small and to allow me to index into two variables with a single loop.


Json library:

I use jsonconverter.bas. Download raw code from here and add to standard module called JsonConverter . Remove the top Attribute line from the copied code.

You then need to go:

VBE > Tools > References > Add references to:

Microsoft Scripting Runtime
Microsoft HTML Object Library
Microsoft XML Library. 

In VBA for json the [] denotes a collection and the {} represents a dictionary.


Option Explicit

Public Sub GetCigarData()
    '<  VBE > Tools > References:
    'Microsoft Scripting Runtime
    'Microsoft HTML Object Library
    'Microsoft XML Library
    
    Dim json As Object, html As MSHTML.HTMLDocument, xhr As MSXML2.XMLHTTP60, ws As Worksheet

    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set xhr = New MSXML2.XMLHTTP60
    Set html = New MSHTML.HTMLDocument
    
    With xhr
        .Open "GET", "https://www.jjfox.co.uk/cohiba-robusto-621.html", False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
         html.body.innerHTML = .responseText
    End With

    Set json = jsonConverter.ParseJson(html.querySelector(".fieldset [type='text/x-magento-init']").innerHTML)("#product_addtocart_form")("configurable")("spConfig")
     
    Dim prices As Scripting.Dictionary, options As Scripting.Dictionary, optionsCollection As Collection
    
    Set prices = json("optionPrices")
    Set options = json("attributes")
    Set optionsCollection = options(options.Keys(0))("options")
    
    Dim results() As Variant, headers() As Variant, i As Long, name As String
    ReDim results(1 To optionsCollection.Count, 1 To 3)
    
    name = html.querySelector(".base").innerText

    For i = 1 To optionsCollection.Count
         results(i, 1) = name
         results(i, 2) = optionsCollection.item(i)("label")
         results(i, 3) = prices(prices.Keys(i - 1))("finalPrice")("amount")
    Next
    
    headers = Array("Name", "Size", "Price")
    
    With ws
        .Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
        .Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results 
    End With
   
End Sub


Read about css selectors:

  1. https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors

这篇关于使用组合框VB​​A中的选项来抓取网站数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆