使用VBA进行网页抓取后如何删除引号 [英] How to remove quotation marks after web scraping with vba

查看:69
本文介绍了使用VBA进行网页抓取后如何删除引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开始对网页进行网页抓取,并且在复制单元格时注意到以下内容.

I start a web scraping of webpage and I notice the following when I copy the cells.

"
In stock
"
"
4 to 10 bus days
"
"
4 to 10 bus days
"
"
4 to 10 bus days
"

我试图将它们与多余的CR LF一起删除,以便获得以下

I tried to remove them together with the extra CR LF in order to have the following

In stock
4 to 10 bus days
4 to 10 bus days
4 to 10 bus days

我尝试了以下无效的方法

I tried the following which are not working

Set availability = ie.Document.querySelector(".product-section")
Dim arr() As String
arr = Split(Replace(Trim(availability.innerText), Chr(34), ""), ":")
wks.Cells(i, "D").Value = (arr(UBound(arr)))

Set availability = ie.Document.querySelector(".product-section")
Dim arr() As String
arr = Split(Replace(Trim(availability.innerText), """", ""), ":")
wks.Cells(i, "D").Value = (arr(UBound(arr)))

Set availability = ie.Document.querySelector(".product-section")
Dim arr() As String
arr = Split(Trim(availability.innerText), ":")
wks.Cells(i, "D").Value = (arr(UBound(arr)))

与网页有关吗?其他网页的输出正常吗?

It has to do with webpage? other webpages have normal output?

我该如何解决?

第一个URL是 https://www.overshop.gr/index.php?route = product/product& product_id = 11684 https://www.overshop.gr/index.php?route = product/product& product_id = 1485 上面写着有货"

The first URL is https://www.overshop.gr/index.php?route=product/product&product_id=11684 and https://www.overshop.gr/index.php?route=product/product&product_id=1485 which says In Stock

推荐答案

在这种情况下,最好使用直接选择器,但由于某些链接,第二类在缺货的情况下更改为 .prod-stock-,您需要进行测试以确定要使用哪个后代类选择器.

In this case it is better to use a direct selector but as some links, where out of stock, second class changes to .prod-stock-out, you need a test to determine which descendant class selector to use.

CSS:

.product-section .prod-stock

VBA:

ie.document.querySelector(".product-section .prod-stock").innerText


Option Explicit
Public Sub GetInfo()
    Dim ie As New InternetExplorer, wks As Worksheet
    Dim j As Long, urls()
    Set wks = ThisWorkbook.Worksheets("Sheet1")
    urls = Application.Transpose(wks.Range("A1:A2").Value) 'adjust for range containing all urls
    With ie
        .Visible = True

        For j = LBound(urls) To UBound(urls)
            .Navigate2 urls(j)

            While .Busy Or .readyState < 4: DoEvents: Wend

            wks.Cells(j, "C") = .document.querySelector(".col-sm-8 h1").innerText

            If .document.getElementsByClassName("product-section")(0).getElementsByClassName("prod-stock").Length = 0 Then
                wks.Cells(j, "D") = .document.querySelector(".product-section .prod-stock-out").innerText
            Else
                wks.Cells(j, "D") = .document.querySelector(".product-section .prod-stock").innerText
            End If
        Next
        .Quit
    End With
End Sub

您还可以使用可读性更高的

You could also use the more readable:

If .document.querySelectorAll(".product-section .prod-stock").Length = 0 Then
    wks.Cells(j, "D") = .document.querySelector(".product-section .prod-stock-out").innerText
Else
    wks.Cells(j, "D") = .document.querySelector(".product-section .prod-stock").innerText
End If

这篇关于使用VBA进行网页抓取后如何删除引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆