使用VBA进行网页抓取后如何删除引号 [英] How to remove quotation marks after web scraping with vba
问题描述
我开始对网页进行网页抓取,并且在复制单元格时注意到以下内容.
I start a web scraping of webpage and I notice the following when I copy the cells.
"
In stock
"
"
4 to 10 bus days
"
"
4 to 10 bus days
"
"
4 to 10 bus days
"
我试图将它们与多余的CR LF一起删除,以便获得以下
I tried to remove them together with the extra CR LF in order to have the following
In stock
4 to 10 bus days
4 to 10 bus days
4 to 10 bus days
我尝试了以下无效的方法
I tried the following which are not working
Set availability = ie.Document.querySelector(".product-section")
Dim arr() As String
arr = Split(Replace(Trim(availability.innerText), Chr(34), ""), ":")
wks.Cells(i, "D").Value = (arr(UBound(arr)))
Set availability = ie.Document.querySelector(".product-section")
Dim arr() As String
arr = Split(Replace(Trim(availability.innerText), """", ""), ":")
wks.Cells(i, "D").Value = (arr(UBound(arr)))
Set availability = ie.Document.querySelector(".product-section")
Dim arr() As String
arr = Split(Trim(availability.innerText), ":")
wks.Cells(i, "D").Value = (arr(UBound(arr)))
与网页有关吗?其他网页的输出正常吗?
It has to do with webpage? other webpages have normal output?
我该如何解决?
第一个URL是 https://www.overshop.gr/index.php?route = product/product& product_id = 11684 和 https://www.overshop.gr/index.php?route = product/product& product_id = 1485 上面写着有货"
The first URL is https://www.overshop.gr/index.php?route=product/product&product_id=11684 and https://www.overshop.gr/index.php?route=product/product&product_id=1485 which says In Stock
推荐答案
在这种情况下,最好使用直接选择器,但由于某些链接,第二类在缺货的情况下更改为 .prod-stock-
,您需要进行测试以确定要使用哪个后代类选择器.
In this case it is better to use a direct selector but as some links, where out of stock, second class changes to .prod-stock-out
, you need a test to determine which descendant class selector to use.
CSS:
.product-section .prod-stock
VBA:
ie.document.querySelector(".product-section .prod-stock").innerText
Option Explicit
Public Sub GetInfo()
Dim ie As New InternetExplorer, wks As Worksheet
Dim j As Long, urls()
Set wks = ThisWorkbook.Worksheets("Sheet1")
urls = Application.Transpose(wks.Range("A1:A2").Value) 'adjust for range containing all urls
With ie
.Visible = True
For j = LBound(urls) To UBound(urls)
.Navigate2 urls(j)
While .Busy Or .readyState < 4: DoEvents: Wend
wks.Cells(j, "C") = .document.querySelector(".col-sm-8 h1").innerText
If .document.getElementsByClassName("product-section")(0).getElementsByClassName("prod-stock").Length = 0 Then
wks.Cells(j, "D") = .document.querySelector(".product-section .prod-stock-out").innerText
Else
wks.Cells(j, "D") = .document.querySelector(".product-section .prod-stock").innerText
End If
Next
.Quit
End With
End Sub
您还可以使用可读性更高的
You could also use the more readable:
If .document.querySelectorAll(".product-section .prod-stock").Length = 0 Then
wks.Cells(j, "D") = .document.querySelector(".product-section .prod-stock-out").innerText
Else
wks.Cells(j, "D") = .document.querySelector(".product-section .prod-stock").innerText
End If
这篇关于使用VBA进行网页抓取后如何删除引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!