易趣产品刮刀 [英] eBay Product scraper

查看:20
本文介绍了易趣产品刮刀的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 VBA 非常有限,

I am very limited on VBA,

代码在一个模块中,代码也有一个子进程,如果我把代码贴错了,请见谅

The Code is in a Module, the code has a sub process as well, so sorry if I post the code wrong

  • A) 打开 IE
  • B) 子进程获取数据.
  1. 代码在 ebay.com 上运行良好,但不适用于 ebay.co.uk - 不知道为什么,它还将 url 转换为超链接

  1. The code works fine on ebay.com but NOT for ebay.co.uk - can't work out why, also it converts urls to hyperlinks

它只处理第一页,我需要它浏览 X 个页面 - 有一个代码但无法使其工作,所以已将其删除.

It only does the first page, I need it to go through an X amount of pages - have a code but can't get it to work so have removed it.

搜索查询可以在Ebay打开后运行吗,所以它打开,然后搜索项输入到ebay然后代码运行,或者从一个单元格运行,如果它的单元格A1提取的数据需要粘贴在 A2 及以下.

Can the search query be run AFTER Ebay opens, so it opens, then search item is input to ebay and then code runs, or to run from a cell, IF its Cell A1 the data extracted needs to be pasted in A2 and below.

<小时>

  1. 我查看了 ebay.com 和 ebay.co.uk 的元素,它们在我看来是一样的,所以无法弄清楚为什么它不起作用,因为它适用于 1 而不是另一个.

  1. I have looked at elements for ebay.com and ebay.co.uk and they look the same to me, so can't work out why its not working as it works for 1 and not the other.

我确实输入了从几个页面获取数据的代码,但它不起作用.我知道当我从 google 获取 url 时,这段代码可以正常工作

I did input the code for getting data from several pages it did not work. I know this code works as I have it for when I fetch urls from google

<小时>

Public IE As New SHDocVw.InternetExplorer
Sub GetData()

    Dim HTMLdoc As MSHTml.HTMLDocument
    Dim othwb As Variant
    Dim objShellWindows As New SHDocVw.ShellWindows

    Set IE = CreateObject("internetexplorer.application")

        With IE
            .Visible = True
            '.Navigate "https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=jackets&_sacat=0"
            .Navigate "https://www.ebay.com/sch/i.html_from=R40&_nkw=ralph+lauren&_sacat=1059&LH_TitleDesc=0&_dmd=1&rt=nc"
            While .Busy Or .readyState <> 4: DoEvents: Wend

Set HTMLdoc = IE.document
                ProcessHTMLPage HTMLdoc

            .Quit
        End With


End Sub
code here

    enter 

'''''' THIS IS THE SUB PROCESS '''''


Sub ProcessHTMLPage(HTMLPage As MSHTml.HTMLDocument)

    Dim HTMLItem As MSHTml.IHTMLElement
    Dim HTMLItems As MSHTml.IHTMLElementCollection
    Dim HTMLInput As MSHTml.IHTMLElement
    Dim rownum As Long

    rownum = 1

    Set HTMLItems = HTMLPage.getElementsByClassName("s-item__title")

    For Each HTMLItem In HTMLItems

            Cells(rownum, 1).Value = HTMLItem.innerText
            rownum = rownum + 1

    Next HTMLItem

    rownum = 1

    Set HTMLItems = HTMLPage.getElementsByClassName("s-item__price")

    For Each HTMLItem In HTMLItems

            Cells(rownum, 2).Value = HTMLItem.innerText
            rownum = rownum + 1

    Next HTMLItem

    rownum = 1

  Set HTMLItems = HTMLPage.getElementsByClassName("s-item__link")
    For Each HTMLItem In HTMLItems
             Cells(rownum, 3).Value = HTMLItem.href
            rownum = rownum + 1

    Next HTMLItem

'Converts each text hyperlink selected into a working hyperlink from C1 to 25000 rows
  Range("C1:C25000").Select
    For Each xCell In Selection
        ActiveSheet.Hyperlinks.Add Anchor:=xCell, Address:=xCell.Formula
    Next xCell
Range("C1").Select
End Sub

进入下一页的代码

pageNumber = 1
'i = 2
    If pageNumber >= 6 Then Exit Do 'the first 6 pages
    internetdata.getElementById("pnnext").click 'next web page
    Do While internet.Busy Or internet.readyState <> 4
        DoEvents
    Loop
    Set internetdata = internet.document
    pageNumber = pageNumber + 1
Loop

  1. 在 Ebay.co.uk 上不起作用 - 未提取任何结果 - 在 ebay.com 中工作正常

  1. Does not work on Ebay.co.uk - NO RESULTS ARE EXTRACTED - Works fine in ebay.com

需要它从 X 个页面中获取数据,而不仅仅是 1 个页面

Need it to get data from X amount of pages and NOT just 1 page

是否可以在 Ebay 打开后运行搜索查询,所以它打开,然后将搜索项输入到 ebay 然后代码运行,或者从一个单元格运行,如果它的单元格 A1 提取的数据需要粘贴在 A2 及以下.

Can the search query be run AFTER Ebay opens, so it opens, then search item is input to ebay and then code runs, or to run from a cell, IF its Cell A1 the data extracted needs to be pasted in A2 and below.

这是我用于谷歌搜索的代码,我已经让它工作了,所以搜索来自单元格 A1,我正在寻找这样的东西,我将看看我是否可以使用 ebay 代码.因为这也是谷歌搜索的前 25 页

This is my code for google search, I have got it working so the search comes from cell A1, I am look for something like this, I am going to see if I can use the ebay code with this. As this also does the first 25 pages in google search

enter Sub webpage()

Dim ie As Object
Dim htmlDoc As Object
Dim nextPageElement As Object
Dim div As Object
Dim link As Object
Dim url As String
Dim pageNumber As Long
Dim i As Long

' Takes seach from A1 and places it into google
url = "https://www.google.co.uk/search?q=" & Replace(Worksheets("Sheet1").Range("A1").Value, " ", "+")


Set ie = CreateObject("InternetExplorer.Application")

With ie
    .Visible = True
    .navigate url
    Do While .Busy Or .readyState <> 4
        DoEvents
    Loop
End With


Application.Wait Now + TimeSerial(0, 0, 5)

Set htmlDoc = ie.document


pageNumber = 1
i = 2
Do
    For Each div In htmlDoc.getElementsByTagName("div")
        If div.getAttribute("class") = "r" Then
            Set link = div.getElementsByTagName("a")(0)
            Cells(i, 2).Value = link.getAttribute("href")
            i = i + 1
        End If
    Next div
    If pageNumber >= 25 Then Exit Do 'the first 25 pages
    Set nextPageElement = htmlDoc.getElementById("pnnext")
    If nextPageElement Is Nothing Then Exit Do

    ' Clicks web next page
    nextPageElement.Click 'next web page
    Do While ie.Busy Or ie.readyState <> 4
        DoEvents
    Loop
    Application.Wait Now + TimeSerial(0, 0, 5)
    Set htmlDoc = ie.document
    pageNumber = pageNumber + 1
Loop


MsgBox "All Done"

Set ie = Nothing
Set htmlDoc = Nothing
Set nextPageElement = Nothing
Set div = Nothing
Set link = Nothing

结束子代码在这里

推荐答案

问题 1:为什么它适用于一个域而不适用于另一个域?

回答问题 1(其他问题应该是新帖子)- html 根本不一样.在 ebay.co.uk 中找不到适用于 ebay.com 的课程;因此,您对集合的循环不会执行任何操作,因为它们的计数为 0(如果使用 querySelectorAll,则节点列表的长度为 0).相反,您需要分支代码.根据 url 域设置选择器.

To answer question 1 (the other questions should be new posts) - the html is not the same at all. The classes which work for ebay.com are not found in ebay.co.uk; So, your loop over collections doesn't do anything because they are count 0 (or length 0 with nodeLists if using querySelectorAll). Instead, you need branched code. Set your selectors based on the url domain.

我使用了 css 选择器,因为这是选择所需元素的最简单、最快的方法,同时保持代码重构的灵活性以减少重复代码行.

I have used css selectors as this is the easiest, and fastest way, to select the required elements whilst maintaining the flexibility of a code re-factor to reduce the lines of repeated code.

旁注:

如果您不确定您的选择方法是否适用于不同的页面,您至少可以做两件事:

If you are unsure about whether your selection method will work across different pages you can do at least two things:

  1. 右键单击 > 检查元素 > 目视检查您尝试比较的元素的类名是否相同.那么,如果您正在查看产品名称,那么两个页面上 html 中的类名称是否相同?

  1. Right click > inspect element > visually check the class names are the same for the elements you are attempting to compare. So, if you are looking at product names, are the class names in the html the same on both pages?

您可以使用浏览器的搜索功能 > 通过 F12 打开元素选项卡,然后按 Ctrl+F 拉起搜索框 > 将第一页中的班级名称输入到第二页的此框中,然后按 Enter.您还可以在此处输入 css 选择器以及某些情况下的正则表达式.您将获得一个命中计数,告诉您找到了多少匹配项.您可以一直按 Enter 键循环浏览匹配项,每个匹配项都会在上面的 html 中突出显示,因此您可以轻松比较匹配的结果是否符合您的预期.

You can use the search facility of the browser > open element tab via F12 then press Ctrl+F to pull up search box > enter your class name from the first page into this box in the second page and hit enter. You can also enter css selectors here and some cases regex. You will get a hit count telling you how many matches found. You can keep pressing enter to cycle through matches and each match will be highlighted in the html above, so you can easily compare if matched results are what you expected.

点击图片放大

img 网址:https://i.stack.imgur.com/MWkEx.png

VBA:

Option Explicit

Public Sub GetData()
    Dim htmlDoc As MSHTML.HTMLDocument, ie As SHDocVw.InternetExplorer, ws As Worksheet

    Set ie = New SHDocVw.InternetExplorer
    Set htmlDoc = New MSHTML.HTMLDocument
    Set ws = ThisWorkbook.Worksheets("Sheet1")

    With ie
        .Visible = True
        '.Navigate2 "https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=jackets&_sacat=0"
        .Navigate2 "https://www.ebay.com/sch/i.html_from=R40&_nkw=ralph+lauren&_sacat=1059&LH_TitleDesc=0&_dmd=1&rt=nc"
        While .Busy Or .readyState <> 4: DoEvents: Wend

        Dim index As Long, HTMLItems As Object, rowNum As Long, xCell As Range
        Dim cssSelectors(), i As Long

        Select Case True
        Case InStr(.document.URL, "ebay.co.uk") > 0
            cssSelectors = Array(".gvtitle a", ".amt", ".gvtitle a")
        Case InStr(.document.URL, "ebay.com") > 0
            cssSelectors = Array(".s-item__title", ".s-item__price", ".s-item__link")
        End Select

        With ws
            For i = LBound(cssSelectors) To UBound(cssSelectors)
                rowNum = 1
                Set HTMLItems = ie.document.querySelectorAll(cssSelectors(i))

                For index = 0 To HTMLItems.length - 1
                    .Cells(rowNum, i + 1).Value = IIf(i = 2, HTMLItems.item(index).getAttribute("href"), HTMLItems.item(index).innerText)
                    rowNum = rowNum + 1
                Next
            Next
            For Each xCell In .Range("C1:C25000") '<= all these really?
                .Hyperlinks.Add Anchor:=xCell, Address:=xCell.Formula
            Next xCell
        End With
        .Quit
    End With
End Sub

这篇关于易趣产品刮刀的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆