使用Python中的PhantomJS向下滚动到无限页面的底部 [英] Scroll down to bottom of infinite page with PhantomJS in Python

查看:87
本文介绍了使用Python中的PhantomJS向下滚动到无限页面的底部的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经成功地使Python与Selenium和PhantomJS一起重新加载了动态加载的无限滚动页面,如下面的示例所示.但是如何修改此方法,以使程序在到达最低点时停止运行,而不是手动设置重载次数?

I have succeeded in getting Python with Selenium and PhantomJS to reload a dynamically loading infinite scrolling page, like in the example below. But how could this be modified so that instead of setting a number of reloads manually, the program stopped when reaching rock bottom?

reloads = 100000 #set the number of times to reload
pause = 0 #initial time interval between reloads
driver = webdriver.PhantomJS()

# Load Twitter page and click to view all results
driver.get(url)
driver.find_element_by_link_text("All").click()

# Keep reloading and pausing to reach the bottom
for _ in range(reloads):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(pause)

text_file.write(driver.page_source.encode("utf-8"))
text_file.close()

推荐答案

您可以检查滚动是否在每个步骤中都执行了任何操作.

You can check whether the scroll did anything in every step.

lastHeight = driver.execute_script("return document.body.scrollHeight")
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(pause)
    newHeight = driver.execute_script("return document.body.scrollHeight")
    if newHeight == lastHeight:
        break
    lastHeight = newHeight

这使用了一个静态等待量,这很糟糕,因为您不想在完成速度更快时不需要不必要的等待,并且您不希望由于某种原因动态加载太慢时脚本会过早退出.

This uses a static wait amount which is bad because you don't want to wait unnecessary when it finishes faster and you don't want that the script exits prematurely when the dynamic load is too slow for some reason.

由于页面通常会将更多的元素加载到列表中,因此您可以在加载之前检查列表的长度,然后等待下一个元素加载.

Since a page usually loads some more elements into a list, you can check the length of the list before the load and wait until the next element is loaded.

对于Twitter,它可能看起来像这样:

For twitter this could look like this:

while True:
    elemsCount = browser.execute_script("return document.querySelectorAll('.stream-items > li.stream-item').length")

    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    try:
        WebDriverWait(browser, 20).until(
            lambda x: x.find_element_by_xpath(
                "//*[contains(@class,'stream-items')]/li[contains(@class,'stream-item')]["+str(elemsCount+1)+"]"))
    except:
        break

我使用了XPath表达式,因为PhantomJS 1.x有时在使用:nth-child() CSS选择器时存在一个错误.

I used an XPath expression, because PhantomJS 1.x has a bug sometimes when using :nth-child() CSS selectors.

完整版供参考.

这篇关于使用Python中的PhantomJS向下滚动到无限页面的底部的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆