eBay产品刮板 [英] eBay Product scraper

查看:112
本文介绍了eBay产品刮板的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在VBA上非常受限制,

I am very limited on VBA,

该代码位于模块中,该代码也具有子进程,因此,如果我输入的代码错误,请对不起

The Code is in a Module, the code has a sub process as well, so sorry if I post the code wrong

  • A)打开IE
  • B)子进程获取数据.
  1. 该代码在ebay.com上工作正常,但在ebay.co.uk上却不行-无法弄清楚原因,并且还将网址转换为超链接

  1. The code works fine on ebay.com but NOT for ebay.co.uk - can't work out why, also it converts urls to hyperlinks

它仅处理第一页,我需要它浏览X的页面-有代码但无法正常工作,因此将其删除.

It only does the first page, I need it to go through an X amount of pages - have a code but can't get it to work so have removed it.

是否可以在打开Ebay后打开搜索查询,将其打开,然后将搜索项输入到ebay,然后运行代码,或者从某个单元运行,如果要粘贴其单元格A1,则需要粘贴提取的数据在A2及以下版本中.

Can the search query be run AFTER Ebay opens, so it opens, then search item is input to ebay and then code runs, or to run from a cell, IF its Cell A1 the data extracted needs to be pasted in A2 and below.


  1. 我已经查看了ebay.com和ebay.co.uk的元素,它们对我来说看起来是相同的,因此无法弄清楚为什么它不起作用,因为它适用于1,而不适用于其他.

  1. I have looked at elements for ebay.com and ebay.co.uk and they look the same to me, so can't work out why its not working as it works for 1 and not the other.

我确实输入了从几个页面获取数据的代码,但是这没用.我知道当我从Google

I did input the code for getting data from several pages it did not work. I know this code works as I have it for when I fetch urls from google


Public IE As New SHDocVw.InternetExplorer
Sub GetData()

    Dim HTMLdoc As MSHTml.HTMLDocument
    Dim othwb As Variant
    Dim objShellWindows As New SHDocVw.ShellWindows

    Set IE = CreateObject("internetexplorer.application")

        With IE
            .Visible = True
            '.Navigate "https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=jackets&_sacat=0"
            .Navigate "https://www.ebay.com/sch/i.html_from=R40&_nkw=ralph+lauren&_sacat=1059&LH_TitleDesc=0&_dmd=1&rt=nc"
            While .Busy Or .readyState <> 4: DoEvents: Wend

Set HTMLdoc = IE.document
                ProcessHTMLPage HTMLdoc

            .Quit
        End With


End Sub
code here

    enter 

'''''' THIS IS THE SUB PROCESS '''''


Sub ProcessHTMLPage(HTMLPage As MSHTml.HTMLDocument)

    Dim HTMLItem As MSHTml.IHTMLElement
    Dim HTMLItems As MSHTml.IHTMLElementCollection
    Dim HTMLInput As MSHTml.IHTMLElement
    Dim rownum As Long

    rownum = 1

    Set HTMLItems = HTMLPage.getElementsByClassName("s-item__title")

    For Each HTMLItem In HTMLItems

            Cells(rownum, 1).Value = HTMLItem.innerText
            rownum = rownum + 1

    Next HTMLItem

    rownum = 1

    Set HTMLItems = HTMLPage.getElementsByClassName("s-item__price")

    For Each HTMLItem In HTMLItems

            Cells(rownum, 2).Value = HTMLItem.innerText
            rownum = rownum + 1

    Next HTMLItem

    rownum = 1

  Set HTMLItems = HTMLPage.getElementsByClassName("s-item__link")
    For Each HTMLItem In HTMLItems
             Cells(rownum, 3).Value = HTMLItem.href
            rownum = rownum + 1

    Next HTMLItem

'Converts each text hyperlink selected into a working hyperlink from C1 to 25000 rows
  Range("C1:C25000").Select
    For Each xCell In Selection
        ActiveSheet.Hyperlinks.Add Anchor:=xCell, Address:=xCell.Formula
    Next xCell
Range("C1").Select
End Sub

转到下一页的代码

pageNumber = 1
'i = 2
    If pageNumber >= 6 Then Exit Do 'the first 6 pages
    internetdata.getElementById("pnnext").click 'next web page
    Do While internet.Busy Or internet.readyState <> 4
        DoEvents
    Loop
    Set internetdata = internet.document
    pageNumber = pageNumber + 1
Loop

  1. 在Ebay.co.uk上不起作用-未提取任何结果-在ebay.com上可以正常工作

  1. Does not work on Ebay.co.uk - NO RESULTS ARE EXTRACTED - Works fine in ebay.com

需要它从X个页面而不是仅1个页面中获取数据

Need it to get data from X amount of pages and NOT just 1 page

是否可以在打开Ebay后打开搜索查询,将其打开,然后将搜索项输入到ebay,然后运行代码,或者从某个单元运行,如果要粘贴其单元格A1,则需要粘贴提取的数据在A2及以下版本中.

Can the search query be run AFTER Ebay opens, so it opens, then search item is input to ebay and then code runs, or to run from a cell, IF its Cell A1 the data extracted needs to be pasted in A2 and below.

这是我用于Google搜索的代码,它可以正常工作,因此搜索来自A1单元格,我正在寻找类似的东西,我将查看是否可以与此一起使用ebay代码.谷歌搜索的前25个页面也是如此

This is my code for google search, I have got it working so the search comes from cell A1, I am look for something like this, I am going to see if I can use the ebay code with this. As this also does the first 25 pages in google search

enter Sub webpage()

Dim ie As Object
Dim htmlDoc As Object
Dim nextPageElement As Object
Dim div As Object
Dim link As Object
Dim url As String
Dim pageNumber As Long
Dim i As Long

' Takes seach from A1 and places it into google
url = "https://www.google.co.uk/search?q=" & Replace(Worksheets("Sheet1").Range("A1").Value, " ", "+")


Set ie = CreateObject("InternetExplorer.Application")

With ie
    .Visible = True
    .navigate url
    Do While .Busy Or .readyState <> 4
        DoEvents
    Loop
End With


Application.Wait Now + TimeSerial(0, 0, 5)

Set htmlDoc = ie.document


pageNumber = 1
i = 2
Do
    For Each div In htmlDoc.getElementsByTagName("div")
        If div.getAttribute("class") = "r" Then
            Set link = div.getElementsByTagName("a")(0)
            Cells(i, 2).Value = link.getAttribute("href")
            i = i + 1
        End If
    Next div
    If pageNumber >= 25 Then Exit Do 'the first 25 pages
    Set nextPageElement = htmlDoc.getElementById("pnnext")
    If nextPageElement Is Nothing Then Exit Do

    ' Clicks web next page
    nextPageElement.Click 'next web page
    Do While ie.Busy Or ie.readyState <> 4
        DoEvents
    Loop
    Application.Wait Now + TimeSerial(0, 0, 5)
    Set htmlDoc = ie.document
    pageNumber = pageNumber + 1
Loop


MsgBox "All Done"

Set ie = Nothing
Set htmlDoc = Nothing
Set nextPageElement = Nothing
Set div = Nothing
Set link = Nothing

结束字幕 代码在这里

推荐答案

问题1:为什么它只适用于一个域而不适用于另一个域?

要回答问题1(其他问题应该是新帖子)-html根本不一样.在ebay.co.uk中找不到适用于ebay.com的类.因此,对集合的循环不执行任何操作,因为它们的计数为0(如果使用querySelectorAll,则计数为0(对于nodeLists,其长度为0)).相反,您需要分支代码.根据url域设置选择器.

To answer question 1 (the other questions should be new posts) - the html is not the same at all. The classes which work for ebay.com are not found in ebay.co.uk; So, your loop over collections doesn't do anything because they are count 0 (or length 0 with nodeLists if using querySelectorAll). Instead, you need branched code. Set your selectors based on the url domain.

我使用了css选择器,因为这是选择所需元素的最简单,最快的方法,同时保持了代码重构的灵活性,从而减少了重复代码的行数.

I have used css selectors as this is the easiest, and fastest way, to select the required elements whilst maintaining the flexibility of a code re-factor to reduce the lines of repeated code.

旁注:

如果不确定您的选择方法是否可以在不同页面上使用,您可以至少做两件事:

If you are unsure about whether your selection method will work across different pages you can do at least two things:

  1. 右键单击>检查元素>目视检查要比较的元素的类名称是否相同.因此,如果您正在查看产品名称,那么两个页面中html中的类名称是否相同?

  1. Right click > inspect element > visually check the class names are the same for the elements you are attempting to compare. So, if you are looking at product names, are the class names in the html the same on both pages?

您可以使用浏览器的搜索工具>通过 F12 打开元素选项卡,然后按 Ctrl + F 向上拉搜索框>从第一页的班级名称输入到第二页的此框,然后按Enter键.您还可以在此处输入css选择器,有时还可以输入正则表达式.您会得到一个命中次数,告诉您找到了多少个匹配项.您可以一直按Enter键以循环显示匹配项,并且每个匹配项都将在上方的html中突出显示,因此您可以轻松地比较匹配结果是否符合您的预期.

You can use the search facility of the browser > open element tab via F12 then press Ctrl+F to pull up search box > enter your class name from the first page into this box in the second page and hit enter. You can also enter css selectors here and some cases regex. You will get a hit count telling you how many matches found. You can keep pressing enter to cycle through matches and each match will be highlighted in the html above, so you can easily compare if matched results are what you expected.

点击图片放大

img网址: https://i.stack.imgur.com/MWkEx.png

VBA:

Option Explicit

Public Sub GetData()
    Dim htmlDoc As MSHTML.HTMLDocument, ie As SHDocVw.InternetExplorer, ws As Worksheet

    Set ie = New SHDocVw.InternetExplorer
    Set htmlDoc = New MSHTML.HTMLDocument
    Set ws = ThisWorkbook.Worksheets("Sheet1")

    With ie
        .Visible = True
        '.Navigate2 "https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=jackets&_sacat=0"
        .Navigate2 "https://www.ebay.com/sch/i.html_from=R40&_nkw=ralph+lauren&_sacat=1059&LH_TitleDesc=0&_dmd=1&rt=nc"
        While .Busy Or .readyState <> 4: DoEvents: Wend

        Dim index As Long, HTMLItems As Object, rowNum As Long, xCell As Range
        Dim cssSelectors(), i As Long

        Select Case True
        Case InStr(.document.URL, "ebay.co.uk") > 0
            cssSelectors = Array(".gvtitle a", ".amt", ".gvtitle a")
        Case InStr(.document.URL, "ebay.com") > 0
            cssSelectors = Array(".s-item__title", ".s-item__price", ".s-item__link")
        End Select

        With ws
            For i = LBound(cssSelectors) To UBound(cssSelectors)
                rowNum = 1
                Set HTMLItems = ie.document.querySelectorAll(cssSelectors(i))

                For index = 0 To HTMLItems.length - 1
                    .Cells(rowNum, i + 1).Value = IIf(i = 2, HTMLItems.item(index).getAttribute("href"), HTMLItems.item(index).innerText)
                    rowNum = rowNum + 1
                Next
            Next
            For Each xCell In .Range("C1:C25000") '<= all these really?
                .Hyperlinks.Add Anchor:=xCell, Address:=xCell.Formula
            Next xCell
        End With
        .Quit
    End With
End Sub

这篇关于eBay产品刮板的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆