使用Selenium Python WebDriver滚动网页 [英] Scrolling web page using selenium python webdriver
问题描述
我正在抓取此网页中的用户名,该用户名在滚动后会加载用户
I am scraping this webpage for usernames which loads the users after scrolling
转到页面的网址:" http://www.quora.com/Kevin-Rose/followers "
Url to page : "http://www.quora.com/Kevin-Rose/followers"
我知道页面上的用户数量(在这种情况下,编号为43812) 如何滚动页面,直到所有用户都加载完毕? 我在互联网上搜索了相同的代码,到处都可以找到几乎相同的代码行:
I know the number of users on the page (in this case no. is 43812) How can I scroll the page till all the users are loaded? I have searched for the same on the internet and everywhere I got almost same line of code for doing it which is:
driver.execute_script("window.scrollTo(0,)")
driver.execute_script("window.scrollTo(0, )")
如何确定垂直位置以确保所有用户都被装载?还有其他选项可以在不实际滚动的情况下实现相同的目的吗?
How can I determine the vertical position to ensure that all the users are loaded? Is there any other option to achieve the same thing without actually scrolling?
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import urllib
driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
time.sleep(10)
wait = WebDriverWait(driver, 10)
form = driver.find_element_by_class_name('regular_login')
time.sleep(10)
#add explicit wait
username = form.find_element_by_name('email')
time.sleep(10)
#add explicit wait
username.send_keys('abc@gmail.com')
time.sleep(30)
#add explicit wait
password = form.find_element_by_name('password')
time.sleep(30)
#add explicit wait
password.send_keys('def')
#add explicit wait
password.send_keys(Keys.RETURN)
time.sleep(30)
#search = driver.find_element_by_name('search_input')
search = wait.until(EC.presence_of_element_located((By.XPATH, "//form[@name='search_form']//input[@name='search_input']")))
search.clear()
search.send_keys('Kevin Rose')
search.send_keys(Keys.RETURN)
link = wait.until(EC.presence_of_element_located((By.LINK_TEXT, "Kevin Rose")))
link.click()
#Wait till the element is loaded (Asynchronusly loaded webpage)
handle = driver.window_handles
driver.switch_to.window(handle[1])
#switch to new window
element = WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.PARTIAL_LINK_TEXT, "Followers")))
element.click()
推荐答案
由于在加载了最后一个关注者存储桶之后没有出现任何特殊情况,因此我将依靠这样的事实:您知道用户拥有多少个关注者,并且您知道每个向下滚动加载了多少个(我检查过-每个滚动18个).因此,您可以计算将页面向下滚动多少次.
Since there is nothing special appearing after the last followers bucket is loaded, I would rely on the fact that you know how many followers does the user have and you know how many are loaded on each scroll down (I've inspected - it is 18 per scroll). Hence, you can calculate how many times do you need to scroll the page down.
这是实现(我使用了只有53个关注者的其他用户来演示解决方案):
Here's the implementation (I've used a different user with only 53 followers to demonstrate the solution):
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
followers_per_page = 18
driver = webdriver.Chrome() # webdriver.Firefox() in your case
driver.get("http://www.quora.com/Andrew-Delikat/followers")
# get the followers count
element = WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.XPATH, '//li[contains(@class, "FollowersNavItem")]//span[@class="profile_count"]')))
followers_count = int(element.text.replace(',', ''))
print followers_count
# scroll down the page iteratively with a delay
for _ in xrange(0, followers_count/followers_per_page + 1):
driver.execute_script("window.scrollTo(0, 10000);")
time.sleep(2)
此外,如果有大量关注者,您可能需要基于循环变量增加此10000
Y坐标值.
Also, you may need to increase this 10000
Y coordinate value based on the loop variable in case there is a big number of followers.
这篇关于使用Selenium Python WebDriver滚动网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!