使用phatomJS和硒滚动浏览网站 [英] Scroll over website using phatomJS and selenium

查看：99 发布时间：2020/5/26 20:02:29 python selenium phantomjs

本文介绍了使用phatomJS和硒滚动浏览网站的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要滚动浏览一个网页(例如twitter)，然后对新元素进行网络抓取，这些新元素将随着网站的发展而出现.我尝试使用python 3.x，selenium和PhantomJS进行设置.这是我的代码

I need to scroll over a web page (example twitter) an make a web scraping of the new elements that appear as one advances on the website. I try to make this using python 3.x, selenium and PhantomJS. This is my code

import time
from selenium import webdriver
from bs4 import BeautifulSoup

user = 'ciroylospersas'
# Start web browser
#browser = webdriver.Firefox()
browser = webdriver.PhantomJS()
browser.set_window_size(1024, 768)
browser.get("https://twitter.com/")

# Fill username in login
element = browser.find_element_by_id("signin-email")
element.clear()
element.send_keys('your twitter user')
# Fill password in login
element = browser.find_element_by_id("signin-password")
element.clear()
element.send_keys('your twitter pass')

browser.save_screenshot('screen.png') # save a screenshot to disk

# Summit the login
element.submit()
time.sleep(5

browser.save_screenshot('screen1.png') # save a screenshot to disk
# Move to the following url
browser.get("https://twitter.com/" + user + "/following")
browser.save_screenshot('screen2.png') # save a screenshot to disk

scroll_script = "var h = document.body.scrollHeight; window.scrollTo(0, h); return h;"
newHeight = browser.execute_script(scroll_script)
print(newHeight)
browser.save_screenshot('screen3.png') # save a screenshot to disk

问题是我无法滚动到底部. screen2.png和screen3.png相同.但是，如果将webdriver从PhantomJS更改为Firefox，则相同的代码可以正常工作.为什么?

The problem is I can't scroll to the bottom. The screen2.png and screen3.png are the same. But if I change the webdriver from PhantomJS to Firefox the same code work fine. Why?

推荐答案

尝试解决类似问题时，我能够在phantomJS中使用它:

I was able to get this to work in phantomJS when trying to solve a similar problem:

check_height = driver.execute_script("return document.body.scrollHeight;")
while True:
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(5)
    height = driver.execute_script("return document.body.scrollHeight;")
    if height == check_height:
        break
    check_height = height

它将滚动到当前的底部"，请等待，查看页面是否加载了更多内容，如果页面没有加载，则保释(假设高度匹配，则加载了所有内容.)

It will scroll to the current "bottom", wait, see if the page loaded more, and bail if it did not (assuming everything got loaded if the heights match.)

在我的原始代码中，我在匹配的高度旁边检查了一个"max"值，因为我只对前10个左右的页面"感兴趣.如果还有更多，我希望它停止加载并跳过它们.

In my original code I had a "max" value I checked alongside the matching heights because I was only interested in the first 10 or so "pages". If there were more I wanted it to stop loading and skip them.

这也是我用作查看全文

使用phatomJS和硒滚动浏览网站 [英] Scroll over website using phatomJS and selenium

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用phatomJS和硒滚动浏览网站 [英] Scroll over website using phatomJS and selenium

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭