如何处理硒中的延迟加载图像? [英] How to handle lazy-loaded images in selenium?

查看:26
本文介绍了如何处理硒中的延迟加载图像?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在标记为重复之前,请考虑我已经浏览了许多相关的堆栈溢出帖子,以及网站和文章.我还没有找到解决办法.

Before marking as duplicate, please consider that I have already looked through many related stack overflow posts, as well as websites and articles. I have not found a solution yet.

这个问题是对这个问题的跟进Selenium尽管字符串看似相同,但 Webdriver 没有找到 XPATH.通过更新代码以更优雅的方式工作,我确定问题实际上并非来自 xpath 方法:

This question is a follow up to this question here Selenium Webdriver not finding XPATH despite seemingly identical strings. I determined the problem did not in fact come from the xpath method by updating the code to work in a more elegant manner:

for item in feed:
    img_div = item.find_element_by_class_name('listing-cover-photo ')
    img = WebDriverWait(img_div, 10).until(
            EC.visibility_of_element_located((By.TAG_NAME, 'img')))

这适用于前 5 个元素.但在那之后它超时了,通过获取 img_div 的内部 html 并打印它,我发现对于超时的元素,而不是我想要的图像,有一个带有类lazyload-placeholder"的 div.这导致我抓取延迟加载的元素,但我找不到答案.如您所见,我正在使用 WebDriverWait 尝试给它加载时间,但我也尝试了站点范围的等待调用以及 time.sleep 调用.等待似乎并不能解决它.我正在寻找处理这些延迟加载图像的最简单方法,最好是在 Selenium 中,但如果有其他库或产品可以与我已经拥有的 Selenium 代码一起使用,那就太好了.任何帮助表示赞赏.

This works for the first 5ish elements. But after that it times out, by getting the inner html of the img_div and printing it, I found that for elements that time out, instead of the image I want there is a div with class "lazyload-placeholder". This led me to scraping lazy-loaded elements, but there was no answer that I could find. As you can see, I am using a WebDriverWait to try and give it time to load, but I also tried a site-wide wait call, as well as a time.sleep call. Waiting does not seem to fix it. I am looking for the easiest way to handle these lazy-loaded images, preferably in Selenium, but if there are other libraries or products I can use in tandem with the Selenium code I already have, that would be great. Any help is appreciated.

推荐答案

您的图片只有在滚动到视图中时才会加载.这是一个常见的要求,Selenium Python 文档在其 常见问题解答.改编自 this answer,以下脚本将在抓取图像之前向下滚动页面.

Your images will only load when they're scrolled into view. It's such a common requirement that the Selenium Python docs have it in their FAQ. Adapting from this answer, the below script will scroll down the page before scraping the images.

    driver.get("https://www.grailed.com/categories/footwear")

    SCROLL_PAUSE_TIME = 0.5
    i = 0
    last_height = driver.execute_script("return document.body.scrollHeight")
    while True:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(SCROLL_PAUSE_TIME)
        new_height = driver.execute_script("return document.body.scrollHeight")
        if new_height == last_height:
            break
        last_height = new_height
        i += 1
        if i == 5:
            break

    driver.implicitly_wait(10)
    shoe_images = driver.find_elements(By.CSS_SELECTOR, 'div.listing-cover-photo img')

    print(len(shoe_images))

为了不(看似)永远滚动鞋子,我在 5 次迭代后添加了 break,但是,您可以随意删除 i变量,它会尽可能地向下滚动.

In the interest of not scrolling through shoes (seemingly) forever, I have added in a break after 5 iterations, however, you're free to remove the i variable and it will scroll down for as long as it can.

隐式等待 允许追赶任何剩余的图像仍在加载中.

The implicit wait is there to allow catchup for any remaining images that are still loading in.

测试运行产生了 82 张图片,我通过使用 Chrome 的 DevTools 选择器 突出显示 82.根据您允许加载的图像数量,您会看到不同的数字.

A test run yielded 82 images, I confirmed that it had scraped all on the page by using Chrome's DevTools selector which highlighted 82. You'll see a different number based on how many images you allow to load.

这篇关于如何处理硒中的延迟加载图像?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆