易趣产品刮刀 [英] eBay Product scraper
问题描述
我对 VBA 非常有限,
I am very limited on VBA,
代码在一个模块中,代码也有一个子进程,如果我把代码贴错了,请见谅
The Code is in a Module, the code has a sub process as well, so sorry if I post the code wrong
- A) 打开 IE
- B) 子进程获取数据.
代码在 ebay.com 上运行良好,但不适用于 ebay.co.uk - 不知道为什么,它还将 url 转换为超链接
The code works fine on ebay.com but NOT for ebay.co.uk - can't work out why, also it converts urls to hyperlinks
它只处理第一页,我需要它浏览 X 个页面 - 有一个代码但无法使其工作,所以已将其删除.
It only does the first page, I need it to go through an X amount of pages - have a code but can't get it to work so have removed it.
搜索查询可以在Ebay打开后运行吗,所以它打开,然后搜索项输入到ebay然后代码运行,或者从一个单元格运行,如果它的单元格A1提取的数据需要粘贴在 A2 及以下.
Can the search query be run AFTER Ebay opens, so it opens, then search item is input to ebay and then code runs, or to run from a cell, IF its Cell A1 the data extracted needs to be pasted in A2 and below.
<小时>
我查看了 ebay.com 和 ebay.co.uk 的元素,它们在我看来是一样的,所以无法弄清楚为什么它不起作用,因为它适用于 1 而不是另一个.
I have looked at elements for ebay.com and ebay.co.uk and they look the same to me, so can't work out why its not working as it works for 1 and not the other.
我确实输入了从几个页面获取数据的代码,但它不起作用.我知道当我从 google 获取 url 时,这段代码可以正常工作
I did input the code for getting data from several pages it did not work. I know this code works as I have it for when I fetch urls from google
<小时>
Public IE As New SHDocVw.InternetExplorer
Sub GetData()
Dim HTMLdoc As MSHTml.HTMLDocument
Dim othwb As Variant
Dim objShellWindows As New SHDocVw.ShellWindows
Set IE = CreateObject("internetexplorer.application")
With IE
.Visible = True
'.Navigate "https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=jackets&_sacat=0"
.Navigate "https://www.ebay.com/sch/i.html_from=R40&_nkw=ralph+lauren&_sacat=1059&LH_TitleDesc=0&_dmd=1&rt=nc"
While .Busy Or .readyState <> 4: DoEvents: Wend
Set HTMLdoc = IE.document
ProcessHTMLPage HTMLdoc
.Quit
End With
End Sub
code here
enter
'''''' THIS IS THE SUB PROCESS '''''
Sub ProcessHTMLPage(HTMLPage As MSHTml.HTMLDocument)
Dim HTMLItem As MSHTml.IHTMLElement
Dim HTMLItems As MSHTml.IHTMLElementCollection
Dim HTMLInput As MSHTml.IHTMLElement
Dim rownum As Long
rownum = 1
Set HTMLItems = HTMLPage.getElementsByClassName("s-item__title")
For Each HTMLItem In HTMLItems
Cells(rownum, 1).Value = HTMLItem.innerText
rownum = rownum + 1
Next HTMLItem
rownum = 1
Set HTMLItems = HTMLPage.getElementsByClassName("s-item__price")
For Each HTMLItem In HTMLItems
Cells(rownum, 2).Value = HTMLItem.innerText
rownum = rownum + 1
Next HTMLItem
rownum = 1
Set HTMLItems = HTMLPage.getElementsByClassName("s-item__link")
For Each HTMLItem In HTMLItems
Cells(rownum, 3).Value = HTMLItem.href
rownum = rownum + 1
Next HTMLItem
'Converts each text hyperlink selected into a working hyperlink from C1 to 25000 rows
Range("C1:C25000").Select
For Each xCell In Selection
ActiveSheet.Hyperlinks.Add Anchor:=xCell, Address:=xCell.Formula
Next xCell
Range("C1").Select
End Sub
进入下一页的代码
pageNumber = 1
'i = 2
If pageNumber >= 6 Then Exit Do 'the first 6 pages
internetdata.getElementById("pnnext").click 'next web page
Do While internet.Busy Or internet.readyState <> 4
DoEvents
Loop
Set internetdata = internet.document
pageNumber = pageNumber + 1
Loop
在 Ebay.co.uk 上不起作用 - 未提取任何结果 - 在 ebay.com 中工作正常
Does not work on Ebay.co.uk - NO RESULTS ARE EXTRACTED - Works fine in ebay.com
需要它从 X 个页面中获取数据,而不仅仅是 1 个页面
Need it to get data from X amount of pages and NOT just 1 page
是否可以在 Ebay 打开后运行搜索查询,所以它打开,然后将搜索项输入到 ebay 然后代码运行,或者从一个单元格运行,如果它的单元格 A1 提取的数据需要粘贴在 A2 及以下.
Can the search query be run AFTER Ebay opens, so it opens, then search item is input to ebay and then code runs, or to run from a cell, IF its Cell A1 the data extracted needs to be pasted in A2 and below.
这是我用于谷歌搜索的代码,我已经让它工作了,所以搜索来自单元格 A1,我正在寻找这样的东西,我将看看我是否可以使用 ebay 代码.因为这也是谷歌搜索的前 25 页
This is my code for google search, I have got it working so the search comes from cell A1, I am look for something like this, I am going to see if I can use the ebay code with this. As this also does the first 25 pages in google search
enter Sub webpage()
Dim ie As Object
Dim htmlDoc As Object
Dim nextPageElement As Object
Dim div As Object
Dim link As Object
Dim url As String
Dim pageNumber As Long
Dim i As Long
' Takes seach from A1 and places it into google
url = "https://www.google.co.uk/search?q=" & Replace(Worksheets("Sheet1").Range("A1").Value, " ", "+")
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
.navigate url
Do While .Busy Or .readyState <> 4
DoEvents
Loop
End With
Application.Wait Now + TimeSerial(0, 0, 5)
Set htmlDoc = ie.document
pageNumber = 1
i = 2
Do
For Each div In htmlDoc.getElementsByTagName("div")
If div.getAttribute("class") = "r" Then
Set link = div.getElementsByTagName("a")(0)
Cells(i, 2).Value = link.getAttribute("href")
i = i + 1
End If
Next div
If pageNumber >= 25 Then Exit Do 'the first 25 pages
Set nextPageElement = htmlDoc.getElementById("pnnext")
If nextPageElement Is Nothing Then Exit Do
' Clicks web next page
nextPageElement.Click 'next web page
Do While ie.Busy Or ie.readyState <> 4
DoEvents
Loop
Application.Wait Now + TimeSerial(0, 0, 5)
Set htmlDoc = ie.document
pageNumber = pageNumber + 1
Loop
MsgBox "All Done"
Set ie = Nothing
Set htmlDoc = Nothing
Set nextPageElement = Nothing
Set div = Nothing
Set link = Nothing
结束子代码在这里
推荐答案
问题 1:为什么它适用于一个域而不适用于另一个域?
回答问题 1(其他问题应该是新帖子)- html 根本不一样.在 ebay.co.uk 中找不到适用于 ebay.com 的课程;因此,您对集合的循环不会执行任何操作,因为它们的计数为 0(如果使用 querySelectorAll,则节点列表的长度为 0).相反,您需要分支代码.根据 url 域设置选择器.
To answer question 1 (the other questions should be new posts) - the html is not the same at all. The classes which work for ebay.com are not found in ebay.co.uk; So, your loop over collections doesn't do anything because they are count 0 (or length 0 with nodeLists if using querySelectorAll). Instead, you need branched code. Set your selectors based on the url domain.
我使用了 css 选择器,因为这是选择所需元素的最简单、最快的方法,同时保持代码重构的灵活性以减少重复代码行.
I have used css selectors as this is the easiest, and fastest way, to select the required elements whilst maintaining the flexibility of a code re-factor to reduce the lines of repeated code.
旁注:
如果您不确定您的选择方法是否适用于不同的页面,您至少可以做两件事:
If you are unsure about whether your selection method will work across different pages you can do at least two things:
右键单击 > 检查元素 > 目视检查您尝试比较的元素的类名是否相同.那么,如果您正在查看产品名称,那么两个页面上 html 中的类名称是否相同?
Right click > inspect element > visually check the class names are the same for the elements you are attempting to compare. So, if you are looking at product names, are the class names in the html the same on both pages?
您可以使用浏览器的搜索功能 > 通过 F12 打开元素选项卡,然后按 Ctrl+F 拉起搜索框 > 将第一页中的班级名称输入到第二页的此框中,然后按 Enter.您还可以在此处输入 css 选择器以及某些情况下的正则表达式.您将获得一个命中计数,告诉您找到了多少匹配项.您可以一直按 Enter 键循环浏览匹配项,每个匹配项都会在上面的 html 中突出显示,因此您可以轻松比较匹配的结果是否符合您的预期.
You can use the search facility of the browser > open element tab via F12 then press Ctrl+F to pull up search box > enter your class name from the first page into this box in the second page and hit enter. You can also enter css selectors here and some cases regex. You will get a hit count telling you how many matches found. You can keep pressing enter to cycle through matches and each match will be highlighted in the html above, so you can easily compare if matched results are what you expected.
点击图片放大
img 网址:https://i.stack.imgur.com/MWkEx.png
VBA:
Option Explicit
Public Sub GetData()
Dim htmlDoc As MSHTML.HTMLDocument, ie As SHDocVw.InternetExplorer, ws As Worksheet
Set ie = New SHDocVw.InternetExplorer
Set htmlDoc = New MSHTML.HTMLDocument
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ie
.Visible = True
'.Navigate2 "https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=jackets&_sacat=0"
.Navigate2 "https://www.ebay.com/sch/i.html_from=R40&_nkw=ralph+lauren&_sacat=1059&LH_TitleDesc=0&_dmd=1&rt=nc"
While .Busy Or .readyState <> 4: DoEvents: Wend
Dim index As Long, HTMLItems As Object, rowNum As Long, xCell As Range
Dim cssSelectors(), i As Long
Select Case True
Case InStr(.document.URL, "ebay.co.uk") > 0
cssSelectors = Array(".gvtitle a", ".amt", ".gvtitle a")
Case InStr(.document.URL, "ebay.com") > 0
cssSelectors = Array(".s-item__title", ".s-item__price", ".s-item__link")
End Select
With ws
For i = LBound(cssSelectors) To UBound(cssSelectors)
rowNum = 1
Set HTMLItems = ie.document.querySelectorAll(cssSelectors(i))
For index = 0 To HTMLItems.length - 1
.Cells(rowNum, i + 1).Value = IIf(i = 2, HTMLItems.item(index).getAttribute("href"), HTMLItems.item(index).innerText)
rowNum = rowNum + 1
Next
Next
For Each xCell In .Range("C1:C25000") '<= all these really?
.Hyperlinks.Add Anchor:=xCell, Address:=xCell.Formula
Next xCell
End With
.Quit
End With
End Sub
这篇关于易趣产品刮刀的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!