如何通过使用SeleniumWebdriver和Python滚动查找网页上的所有元素 [英] How to find all elements on the webpage through scrolling using SeleniumWebdriver and Python
问题描述
我似乎无法在网页上获得所有元素.无论我尝试过使用硒如何.我确定我想念一些东西.这是我的代码.该网址至少有30个元素,但是每当我抓取时,只有6个元素返回.我想念什么?
I can't seem to get all elements on a webpage. No matter what I have tried using selenium. I am sure I am missing something. Here's my code. The url has at least 30 elements yet whenever I scrape only 6 elements return. What am I missing?
import requests
import webbrowser
import time
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
url = 'https://www.adidas.com/us/men-shoes-new_arrivals'
res = requests.get(url, headers = headers)
page_soup = bs(res.text, "html.parser")
containers = page_soup.findAll("div", {"class": "gl-product-card-container show-variation-carousel"})
print(len(containers))
#for each container find shoe model
shoe_colors = []
for container in containers:
if container.find("div", {'class': 'gl-product-card__reviews-number'}) is not None:
shoe_model = container.div.div.img["title"]
review = container.find('div', {'class':'gl-product-card__reviews-number'})
review = int(review.text)
driver = webdriver.Chrome()
driver.get(url)
time.sleep(5)
shoe_prices = driver.find_elements_by_css_selector('.gl-price')
for price in shoe_prices:
print(price.text)
print(len(shoe_prices))
推荐答案
因此,使用您的代码试用版的结果似乎有所不同:
So there seems to be some difference in the results as using your code trial:
- 您发现具有请求的 30 个项目和具有 Selenium 的 6 个项目
- 在哪里找到 40 个带有请求的项目和 4 个带有 Selenium 的项目
- You find 30 items with requests and 6 items with Selenium
- Where as I found 40 items with requests and 4 items with Selenium
此网站上的此项是通过延迟加载动态生成的,因此您必须
This items on this website are dynamically generated through Lazy Loading so you have to scrollDown
and wait for the new elements to render within the HTML DOM and you can use the following solution:
-
代码块:
Code Block:
import requests
import webbrowser
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException, TimeoutException
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
url = 'https://www.adidas.com/us/men-shoes-new_arrivals'
res = requests.get(url, headers = headers)
page_soup = bs(res.text, "html.parser")
containers = page_soup.findAll("div", {"class": "gl-product-card-container show-variation-carousel"})
print(len(containers))
shoe_colors = []
for container in containers:
if container.find("div", {'class': 'gl-product-card__reviews-number'}) is not None:
shoe_model = container.div.div.img["title"]
review = container.find('div', {'class':'gl-product-card__reviews-number'})
review = int(review.text)
options = Options()
options.add_argument('start-maximized')
options.add_argument('disable-infobars')
options.add_argument('--disable-extensions')
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get(url)
myLength = len(WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "span.gl-price"))))
while True:
driver.execute_script("window.scrollBy(0,400)", "")
try:
WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_css_selector("span.gl-price")) > myLength)
titles = driver.find_elements_by_css_selector("span.gl-price")
myLength = len(titles)
except TimeoutException:
break
print(myLength)
for title in titles:
print(title.text)
driver.quit()
控制台输出:
Console Output:
47
$100
$100
$100
$100
$100
$100
$180
$180
$180
$180
$130
$180
$180
$130
$180
$130
$200
$180
$180
$130
$60
$100
$30
$65
$120
$100
$85
$180
$150
$130
$100
$100
$80
$100
$120
$180
$200
$130
$130
$100
$120
$120
$100
$180
$90
$140
$100
这篇关于如何通过使用SeleniumWebdriver和Python滚动查找网页上的所有元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!