使用组合框 VBA 中的选项抓取网站数据 [英] Scraping website data with options in combo box VBA
问题描述
我想从这个网站上抓取产品名称(Cohiba Robusto)、产品尺寸(单支雪茄,3 支一包,25 支一盒)和价格(33.65 英镑、90 英镑、730 英镑):
我正在使用下面的代码,它给出了一个错误(对象变量或变量未设置").
非常感谢您对此的任何帮助.
Sub getproducts()表格(JJFox").选择将 oHtml 变暗为 HTMLDocumentDim oElement 作为对象Dim 元素作为 IHTMLElementCollection将文档变暗为 HTMLDocument设置 oHtml = 新建 HTMLDocument'细胞(1, 6)=时间()lastrow = ActiveSheet.Cells(Rows.Count, A").End(xlUp).Rowcnt = 最后一行 + 1计数器 1 = cnt将 gg 调暗为字符串gg =https://www.jjfox.co.uk/cohiba-robusto-621.html"Dim objHTTP As New WinHttp.WinHttpRequest网址 = ggobjHTTP.Open "POST", url, FalseobjHTTP.setRequestHeader "Content-Type", "application/json";objHTTP.send ("{""key"":null,""from"":"""me@me.com""",""to""":null,""cc"":null,""bcc"":null,""date"":null,""subject"":""My Subject"",""body"":null,""attachments"":null}")oHtml.body.innerHTML = objHTTP.responseText'Cells(rw, 2) = oHtml.getElementsByTagName("description").innerText' 如果不是 .Document.querySelector("button[aria-label='Close']") Is Nothing Then' .Document.querySelector("button[aria-label='Close']").点击' 万一txttitle = oHtml.getElementsByClassName("productcart")(0).innerTexttxttitlehtml = oHtml.getElementsByClassName("packsize")(0).innerHTMLtxttitle = Mid(txttitle, 1, InStr(1, txttitle, Chr(10)))'Debug.Print txttitlehtml'txttitle2 = oHtml.getElementsByClassName(价格")(0).innerText将文本变暗为字符串文本 = GetHTML(gg)开始 = InStr(1, Text, spConfig =")endS = InStr(starts + 1, Text, spConfig")如果开始 = 0 那么单元格 (counter1, 1) = txttitleCells(counter1, 2) = 单"开始= InStr(开始+ 1,文本,产品价格")endl = InStr(starts + 1, Text, ,")Cells(counter1, 3) = Val(Mid(Text, starts + 14, endl - (starts + 14)))Cells(counter1, 4) = JJFox"细胞(计数器1,5)=现在()Cells(counter1, 7) = gg ' 页面链接计数器 1 = 计数器 1 + 1别的文本 = Mid(文本,开始,endS - 开始)'调试.打印文本'找到有多少包选项可用myTxt = 文本countTxt = 标签"bb = (Len(myTxt) - Len(replace(myTxt, countTxt, "")))/Len(countTxt) - 1'结束查找////////////////////////////////////varlabel = "class=""&Chr(34) &标签"&铬(34)开始 = InStr(1, Text, 标签") + 1文本 = Mid(文本,开始,Len(文本))对于 i = 1 到 bb开始 = InStr(1, 文本, 标签")If InStr(starts, Text, "label") Then'显示元素的属性单元格 (counter1, 1) = txttitleCells(counter1, 2) = Mid(Text, starts + 8, InStr(starts, Text, "") - (starts + 8))开始 = InStr(开始 + 1, 文本, oldPrice")endl = InStr(starts + 1, Text, ,")Cells(counter1, 3).FormulaR1C1 = Val(Mid(Text, starts + 11, endl - (starts + 11)))'Debug.Print Val(Mid(Text, startS + chrs, 6))Cells(counter1, 4) = JJFox"细胞(计数器1,5)=现在()开始 = 开始 + 1文本 = Mid(文本,开始,Len(文本))Cells(counter1, 7) = gg ' 页面链接计数器 1 = 计数器 1 + 1万一接下来我万一'细胞(2, 6)=时间()结束子函数 GetHTML(url As String) As String使用 CreateObject(MSXML2.ServerXMLHTTP.6.0").打开GET",网址,假.发送GetHTML = .responseText结束于结束函数
价格和标签是从脚本标签动态提取的,您可以使用 json 解析器将其内容解析为 json.但是,您需要从 html 中获取名称.
只要对 html 和 css 有一点了解,就可以很容易地定义一个 css 模式来定位感兴趣的脚本节点:
.fieldset [type='text/x-magento-init']
寻找一个子script
,type
属性具有text/x-magento-init
属性,父类<代码>字段集代码>.
我使用的效率稍低(你不会注意到):
For i = 1 To optionsCollection.Count
仅仅因为我知道集合很小,并且允许我用一个循环索引到两个变量.
Json 库:
我使用 jsonconverter.bas.从这里下载原始代码并添加到标准模块称为 JsonConverter .从复制的代码中删除顶部的 Attribute 行.
然后你需要去:
VBE >工具 >参考文献添加对以下内容的引用:
Microsoft 脚本运行时微软 HTML 对象库微软 XML 库.
在 json 的 VBA 中,[] 表示一个集合,而 {} 表示一个字典.
选项显式公共子 GetCigarData()'<VBE >工具 >参考:'微软脚本运行时'微软 HTML 对象库'微软 XML 库Dim json As Object, html As MSHTML.HTMLDocument, xhr As MSXML2.XMLHTTP60, ws As Worksheet设置 ws = ThisWorkbook.Worksheets("Sheet1")设置 xhr = 新 MSXML2.XMLHTTP60设置 html = 新建 MSHTML.HTMLDocument与 xhr.打开GET",https://www.jjfox.co.uk/cohiba-robusto-621.html",假.setRequestHeader用户代理",Mozilla/5.0";.发送html.body.innerHTML = .responseText结束于设置 json = jsonConverter.ParseJson(html.querySelector(".fieldset [type='text/x-magento-init']").innerHTML)("#product_addtocart_form")("configurable")(";spConfig")Dim 价格作为 Scripting.Dictionary,选项作为 Scripting.Dictionary,optionsCollection 作为集合设置价格 = json("optionPrices")设置选项 = json("属性")设置 optionsCollection = options(options.Keys(0))("options")Dim results() As Variant, headers() As Variant, i As Long, name As StringReDim 结果(1 到 optionsCollection.Count,1 到 3)name = html.querySelector(".base").innerText对于 i = 1 到 optionsCollection.Count结果(i, 1) = 名称结果(i,2)= optionsCollection.item(i)(标签")结果(i,3)=价格(prices.Keys(i - 1))(finalPrice")(amount")下一个headers = Array(名称"、尺寸"、价格")与 ws.Cells(1, 1).Resize(1, UBound(headers) + 1) = 标题.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = 结果结束于结束子
阅读有关 css 选择器的信息:
Hi I am trying to scrape the Product name (Cohiba Robusto), Product Size (Single Cigar, Pack of 3, Box of 25) and prices (£33.65, £90, £730) from this website: https://www.jjfox.co.uk/cohiba-robusto-621.html
I am trying to get something like this:
I am using the code below, which gives an error ("Object variable or with variable not set").
Will appreciate any help with this.
Sub getproducts()
Sheets("JJFox").Select
Dim oHtml As HTMLDocument
Dim oElement As Object
Dim Elements As IHTMLElementCollection
Dim Document As HTMLDocument
Set oHtml = New HTMLDocument
'Cells(1, 6) = Time()
lastrow = ActiveSheet.Cells(Rows.Count, "A").End(xlUp).Row
cnt = lastrow + 1
counter1 = cnt
Dim gg As String
gg = "https://www.jjfox.co.uk/cohiba-robusto-621.html"
Dim objHTTP As New WinHttp.WinHttpRequest
url = gg
objHTTP.Open "POST", url, False
objHTTP.setRequestHeader "Content-Type", "application/json"
objHTTP.send ("{""key"":null,""from"":""me@me.com"",""to"":null,""cc"":null,""bcc"":null,""date"":null,""subject"":""My Subject"",""body"":null,""attachments"":null}")
oHtml.body.innerHTML = objHTTP.responseText
'Cells(rw, 2) = oHtml.getElementsByTagName("description").innerText
' If Not .Document.querySelector("button[aria-label='Close']") Is Nothing Then
' .Document.querySelector("button[aria-label='Close']").Click
' End If
txttitle = oHtml.getElementsByClassName("productcart")(0).innerText
txttitlehtml = oHtml.getElementsByClassName("packsize")(0).innerHTML
txttitle = Mid(txttitle, 1, InStr(1, txttitle, Chr(10)))
'Debug.Print txttitlehtml
'txttitle2 = oHtml.getElementsByClassName("price")(0).innerText
Dim Text As String
Text = GetHTML(gg)
starts = InStr(1, Text, "spConfig =")
endS = InStr(starts + 1, Text, "spConfig")
If starts = 0 Then
Cells(counter1, 1) = txttitle
Cells(counter1, 2) = "Single"
starts = InStr(starts + 1, Text, "productPrice")
endl = InStr(starts + 1, Text, ",")
Cells(counter1, 3) = Val(Mid(Text, starts + 14, endl - (starts + 14)))
Cells(counter1, 4) = "JJFox"
Cells(counter1, 5) = Now()
Cells(counter1, 7) = gg ' link to the page
counter1 = counter1 + 1
Else
Text = Mid(Text, starts, endS - starts)
'Debug.Print Text
'find how many pack options are avaialble
myTxt = Text
countTxt = "label"
bb = (Len(myTxt) - Len(replace(myTxt, countTxt, ""))) / Len(countTxt) - 1
'End find////////////////////////////////////
varlabel = "class=" & Chr(34) & "label" & Chr(34)
starts = InStr(1, Text, "label") + 1
Text = Mid(Text, starts, Len(Text))
For i = 1 To bb
starts = InStr(1, Text, "label")
If InStr(starts, Text, "label") Then
'Show the element's properties
Cells(counter1, 1) = txttitle
Cells(counter1, 2) = Mid(Text, starts + 8, InStr(starts, Text, " ") - (starts + 8))
starts = InStr(starts + 1, Text, "oldPrice")
endl = InStr(starts + 1, Text, ",")
Cells(counter1, 3).FormulaR1C1 = Val(Mid(Text, starts + 11, endl - (starts + 11)))
'Debug.Print Val(Mid(Text, startS + chrs, 6))
Cells(counter1, 4) = "JJFox"
Cells(counter1, 5) = Now()
starts = starts + 1
Text = Mid(Text, starts, Len(Text))
Cells(counter1, 7) = gg ' link to the page
counter1 = counter1 + 1
End If
Next i
End If
'Cells(2, 6) = Time()
End Sub
Function GetHTML(url As String) As String
With CreateObject("MSXML2.ServerXMLHTTP.6.0")
.Open "GET", url, False
.send
GetHTML = .responseText
End With
End Function
The prices and labels are pulled dynamically from a script tag who content you can parse as json with a json parser. You need to grab the name from the html however.
With a little knowledge of html and css, it is easy enough to define a css pattern to target the script node of interest with:
.fieldset [type='text/x-magento-init']
That looks for a child script
with type
attribute having attribute value text/x-magento-init
, and a parent with class fieldset
.
I have used a tiny bit less efficient (you won't notice):
For i = 1 To optionsCollection.Count
Simply because I know the collection is small and to allow me to index into two variables with a single loop.
Json library:
I use jsonconverter.bas. Download raw code from here and add to standard module called JsonConverter . Remove the top Attribute line from the copied code.
You then need to go:
VBE > Tools > References > Add references to:
Microsoft Scripting Runtime
Microsoft HTML Object Library
Microsoft XML Library.
In VBA for json the [] denotes a collection and the {} represents a dictionary.
Option Explicit
Public Sub GetCigarData()
'< VBE > Tools > References:
'Microsoft Scripting Runtime
'Microsoft HTML Object Library
'Microsoft XML Library
Dim json As Object, html As MSHTML.HTMLDocument, xhr As MSXML2.XMLHTTP60, ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set xhr = New MSXML2.XMLHTTP60
Set html = New MSHTML.HTMLDocument
With xhr
.Open "GET", "https://www.jjfox.co.uk/cohiba-robusto-621.html", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
End With
Set json = jsonConverter.ParseJson(html.querySelector(".fieldset [type='text/x-magento-init']").innerHTML)("#product_addtocart_form")("configurable")("spConfig")
Dim prices As Scripting.Dictionary, options As Scripting.Dictionary, optionsCollection As Collection
Set prices = json("optionPrices")
Set options = json("attributes")
Set optionsCollection = options(options.Keys(0))("options")
Dim results() As Variant, headers() As Variant, i As Long, name As String
ReDim results(1 To optionsCollection.Count, 1 To 3)
name = html.querySelector(".base").innerText
For i = 1 To optionsCollection.Count
results(i, 1) = name
results(i, 2) = optionsCollection.item(i)("label")
results(i, 3) = prices(prices.Keys(i - 1))("finalPrice")("amount")
Next
headers = Array("Name", "Size", "Price")
With ws
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End Sub
Read about css selectors:
这篇关于使用组合框 VBA 中的选项抓取网站数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!