通过硒中的< div>刮擦时的不一致 [英] Inconsistency in scraping through <div>'s in Selenium

查看:71
本文介绍了通过硒中的< div>刮擦时的不一致的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从grailed.com(

I'm working on scraping all of the Air Jordan Data off of grailed.com (https://www.grailed.com/designers/jordan-brand/hi-top-sneakers). I am storing the size, model, url, and image url in an object. I currently have a program that scrolls through the entire feed and fetches all of this. Everything works except finding the image url. I have tried many things and the issue seems to be that for some elements in the feed Selenium doesn't detect the div or url containing the image. I have gone through and manually checked these cases, and they do indeed have images in the same structure. My current code looks like this:

       feed = driver.find_elements_by_class_name('feed-item')
       for item in feed:
          # Find the div containing the image 
          img_div = item.find_element_by_class_name("listing-cover-photo ")
          img = img_div.find_element_by_tag_name('img')

我也尝试了其他几件事.问题是,有时它说它找不到带有"listing-cover-photo"的元素,即使我可以检查情况是否如此,我仍然可以找到这些元素.我应该如何调试/修复此问题,或者任何人都可以帮助您?

I have tried a couple other things as well. The issue is that sometimes it says it can't find elements with the "listing-cover-photo", even though I can check the items for which this is the case and I can still find the elements. How should I debug/fix this, or can anyone help?

推荐答案

要获取 image src值,您需要先滚动页面. 产生WebDriverWait()并等待visibility_of_all_elements_located()并跟随css选择器.

To get the image src value you need to scroll the page first. Induce WebDriverWait() and wait for visibility_of_all_elements_located() and following css selector.

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("https://www.grailed.com/designers/jordan-brand/hi-top-sneakers")
driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
images=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".feed-item .listing-cover-photo>img")))
for image in images:
    print(image.get_attribute("src"))

输出:

https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/AYHhwtgRxSkdTtZ2fMoi
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/yPm24xb1QeyNJvmlKriU
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/0PmW3y2SOmvy9iDHr44q
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/0huJrabvQyei6H8xVZWS
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/23Bx5rr8SR2Pv53lO9Hb
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/dsdGACdNRse93DpTN9Sl
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/KQ3z8G9DQFWTjNkO6Obp
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/mF8nkq8LTzi2fTuCfAAS
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/X9tLf5KzSreO1QW2QX4w
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/gNnXP7ToTnl9hjSEiRrz
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/LMFdqBosRI2NLDCkR9Ze
https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/auto_image/cache=expiry:max/rotate=deg:exif/resize=height:320,width:240,fit:crop/output=quality:70/compress/https://cdn.fs.grailed.com/api/file/htBeZs05SNyflHqpd7pC

这篇关于通过硒中的< div>刮擦时的不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆