Python Selenium-调整pause_time在无限页面中向下滚动 [英] Python Selenium - Adjust pause_time to scroll down in infinite page

查看：374 发布时间：2020/7/7 4:42:03 python selenium scroll sleep sleep-mode

本文介绍了Python Selenium-调整pause_time在无限页面中向下滚动的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正尝试在无限页面中抓取所有可用的链接，向下滚动并获取新的链接.但是，在一次又一次向下滚动之前，time.sleep()不允许在合理的时间内暂停驱动程序.

I'm trying to scrape all the links available in an infinite page, scrolling down and getting the new links available. However, time.sleep() does not allow to pause the driver for a reasonable time, before scrolling down again and again.

有什么方法可以调整您可以在底部找到的代码，以减少第一次迭代(当页面仍快速加载新内容时)的睡眠次数，并等待下一次迭代所需的时间(页面何时将缓慢加载新内容)?

Is there any way to adjust the code that you can find at the bottom to reduce the number of sleep during the first iterations (when the page still loads the new content fast) and wait for the necessary time for the next iterations (when the page will load the new content slowly)?

使用简单的

for i in range(1,20):
    time.sleep(i)

在第一次迭代中不会让我节省时间，并且在多次迭代后也不会有效地调整time.sleep().

would not make me save time during the first iterations and would not adjust the time.sleep() efficiently after many iterations.

这是我在"

from selenium import webdriver

scroll_pause_time = 5
scraped_links = []

driver = webdriver.Chrome(executable_path=driver_path)
driver.get(url)
links = driver.find_elements_by_xpath(links_filepath)
for link in links:
    if link not in scraped_links:
        scraped_links.append(link)
        print(link)
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(scroll_pause_time)
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height
    links = driver.find_elements_by_xpath(links_filepath)
    for link in links:
        if link not in scraped_links:
            scraped_links.append(link)
            print(link)

经过20-30次迭代后，代码中断，因为time.sleep()与网页的刷新速度相比太低了.

After 20-30 iterations the code breaks because time.sleep() is too low compared to the refreshing speed of the webpage.

If you do not want to guess each time how long does it take to load the page and set some random seconds to sleep, you can use Explicit Waits. Example:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
try:
    element = WebDriverWait(browser, 10).until(
                                    EC.presence_of_element_located((By.ID, "myDynamicElement"))
                                )
except common.exceptions.TimeoutException:
    print('TimeoutException')
finally:
    driver.quit()

# do what you want after necessary elements are loaded

当time.sleep()与网页刷新速度相比太低时，这将解决问题.

This will solve the problem when time.sleep() becomes too low compared to the refreshing speed of the webpage.

这篇关于Python Selenium-调整pause_time在无限页面中向下滚动的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python Selenium-调整pause_time在无限页面中向下滚动 [英] Python Selenium - Adjust pause_time to scroll down in infinite page

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python Selenium-调整pause_time在无限页面中向下滚动 [英] Python Selenium - Adjust pause_time to scroll down in infinite page

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭