使用Selenium Python WebDriver滚动网页 [英] Scrolling web page using selenium python webdriver

查看:192
本文介绍了使用Selenium Python WebDriver滚动网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在抓取此网页中的用户名,该用户名在滚动后会加载用户

I am scraping this webpage for usernames which loads the users after scrolling

转到页面的网址:" http://www.quora.com/Kevin-Rose/followers "

Url to page : "http://www.quora.com/Kevin-Rose/followers"

我知道页面上的用户数量(在这种情况下,编号为43812) 如何滚动页面,直到所有用户都加载完毕? 我在互联网上搜索了相同的代码,到处都可以找到几乎相同的代码行:

I know the number of users on the page (in this case no. is 43812) How can I scroll the page till all the users are loaded? I have searched for the same on the internet and everywhere I got almost same line of code for doing it which is:

driver.execute_script("window.scrollTo(0,)")

driver.execute_script("window.scrollTo(0, )")

如何确定垂直位置以确保所有用户都被装载?还有其他选项可以在不实际滚动的情况下实现相同的目的吗?

How can I determine the vertical position to ensure that all the users are loaded? Is there any other option to achieve the same thing without actually scrolling?

   from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import urllib

driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
time.sleep(10)

wait = WebDriverWait(driver, 10)

form = driver.find_element_by_class_name('regular_login')
time.sleep(10)
#add explicit wait

username = form.find_element_by_name('email')
time.sleep(10)
#add explicit wait

username.send_keys('abc@gmail.com')
time.sleep(30)
#add explicit wait

password = form.find_element_by_name('password')
time.sleep(30)
#add explicit wait

password.send_keys('def')
#add explicit wait

password.send_keys(Keys.RETURN)
time.sleep(30)

#search = driver.find_element_by_name('search_input')
search = wait.until(EC.presence_of_element_located((By.XPATH, "//form[@name='search_form']//input[@name='search_input']")))

search.clear()
search.send_keys('Kevin Rose')
search.send_keys(Keys.RETURN)

link = wait.until(EC.presence_of_element_located((By.LINK_TEXT, "Kevin Rose")))
link.click()
#Wait till the element is loaded (Asynchronusly loaded webpage)

handle = driver.window_handles
driver.switch_to.window(handle[1])
#switch to new window 

element = WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.PARTIAL_LINK_TEXT, "Followers")))
element.click()

推荐答案

由于在加载了最后一个关注者存储桶之后没有出现任何特殊情况,因此我将依靠这样的事实:您知道用户拥有多少个关注者,并且您知道每个向下滚动加载了多少个(我检查过-每个滚动18个).因此,您可以计算将页面向下滚动多少次.

Since there is nothing special appearing after the last followers bucket is loaded, I would rely on the fact that you know how many followers does the user have and you know how many are loaded on each scroll down (I've inspected - it is 18 per scroll). Hence, you can calculate how many times do you need to scroll the page down.

这是实现(我使用了只有53个关注者的其他用户来演示解决方案):

Here's the implementation (I've used a different user with only 53 followers to demonstrate the solution):

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

followers_per_page = 18

driver = webdriver.Chrome()  # webdriver.Firefox() in your case
driver.get("http://www.quora.com/Andrew-Delikat/followers")

# get the followers count
element = WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.XPATH, '//li[contains(@class, "FollowersNavItem")]//span[@class="profile_count"]')))
followers_count = int(element.text.replace(',', ''))
print followers_count

# scroll down the page iteratively with a delay
for _ in xrange(0, followers_count/followers_per_page + 1):
    driver.execute_script("window.scrollTo(0, 10000);")
    time.sleep(2)

此外,如果有大量关注者,您可能需要基于循环变量增加此10000 Y坐标值.

Also, you may need to increase this 10000 Y coordinate value based on the loop variable in case there is a big number of followers.

这篇关于使用Selenium Python WebDriver滚动网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆