在 Python 中使用 PhantomJS 向下滚动到无限页面的底部 [英] Scroll down to bottom of infinite page with PhantomJS in Python

查看:14
本文介绍了在 Python 中使用 PhantomJS 向下滚动到无限页面的底部的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经成功地让 Python 使用 Selenium 和 PhantomJS 重新加载动态加载的无限滚动页面,如下例所示.但是如何修改,而不是手动设置重新加载的次数,而是程序在达到最低点时停止?

I have succeeded in getting Python with Selenium and PhantomJS to reload a dynamically loading infinite scrolling page, like in the example below. But how could this be modified so that instead of setting a number of reloads manually, the program stopped when reaching rock bottom?

reloads = 100000 #set the number of times to reload
pause = 0 #initial time interval between reloads
driver = webdriver.PhantomJS()

# Load Twitter page and click to view all results
driver.get(url)
driver.find_element_by_link_text("All").click()

# Keep reloading and pausing to reach the bottom
for _ in range(reloads):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(pause)

text_file.write(driver.page_source.encode("utf-8"))
text_file.close()

推荐答案

您可以检查滚动是否在每一步都做了任何事情.

You can check whether the scroll did anything in every step.

lastHeight = driver.execute_script("return document.body.scrollHeight")
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(pause)
    newHeight = driver.execute_script("return document.body.scrollHeight")
    if newHeight == lastHeight:
        break
    lastHeight = newHeight

这使用了一个静态等待量,这很糟糕,因为您不想在它完成得更快时进行不必要的等待,并且您不想在动态加载由于某种原因太慢时过早退出.

This uses a static wait amount which is bad because you don't want to wait unnecessary when it finishes faster and you don't want that the script exits prematurely when the dynamic load is too slow for some reason.

由于页面通常会在列表中加载更多元素,因此您可以在加载前检查列表的长度,然后等待下一个元素加载.

Since a page usually loads some more elements into a list, you can check the length of the list before the load and wait until the next element is loaded.

对于 Twitter,这可能如下所示:

For twitter this could look like this:

while True:
    elemsCount = browser.execute_script("return document.querySelectorAll('.stream-items > li.stream-item').length")

    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    try:
        WebDriverWait(browser, 20).until(
            lambda x: x.find_element_by_xpath(
                "//*[contains(@class,'stream-items')]/li[contains(@class,'stream-item')]["+str(elemsCount+1)+"]"))
    except:
        break

我使用了 XPath 表达式,因为 PhantomJS 1.x 在使用 :nth-child() CSS 选择器时有时会出现错误.

I used an XPath expression, because PhantomJS 1.x has a bug sometimes when using :nth-child() CSS selectors.

完整版供参考.

这篇关于在 Python 中使用 PhantomJS 向下滚动到无限页面的底部的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆