如何通过使用SeleniumWebdriver和Python滚动查找网页上的所有元素 [英] How to find all elements on the webpage through scrolling using SeleniumWebdriver and Python

查看:361
本文介绍了如何通过使用SeleniumWebdriver和Python滚动查找网页上的所有元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我似乎无法在网页上获得所有元素.无论我尝试过使用硒如何.我确定我想念一些东西.这是我的代码.该网址至少有30个元素,但是每当我抓取时,只有6个元素返回.我想念什么?

I can't seem to get all elements on a webpage. No matter what I have tried using selenium. I am sure I am missing something. Here's my code. The url has at least 30 elements yet whenever I scrape only 6 elements return. What am I missing?

import requests
import webbrowser
import time
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException



headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
url = 'https://www.adidas.com/us/men-shoes-new_arrivals'

res = requests.get(url, headers = headers)
page_soup = bs(res.text, "html.parser")


containers = page_soup.findAll("div", {"class": "gl-product-card-container show-variation-carousel"})


print(len(containers))
#for each container find shoe model
shoe_colors = []

for container in containers:
    if container.find("div", {'class': 'gl-product-card__reviews-number'}) is not None:
        shoe_model = container.div.div.img["title"]
        review = container.find('div', {'class':'gl-product-card__reviews-number'})
        review = int(review.text)



driver = webdriver.Chrome()
driver.get(url)
time.sleep(5)
shoe_prices = driver.find_elements_by_css_selector('.gl-price')

for price in shoe_prices:
    print(price.text)
print(len(shoe_prices))

推荐答案

因此,使用您的代码试用版的结果似乎有所不同:

So there seems to be some difference in the results as using your code trial:

  • 您发现具有请求 30 个项目和具有 Selenium
  • 6 个项目
  • 在哪里找到 40 个带有请求的项目和 4 个带有 Selenium
  • 的项目
  • You find 30 items with requests and 6 items with Selenium
  • Where as I found 40 items with requests and 4 items with Selenium

此网站上的此项是通过延迟加载动态生成的,因此您必须,然后等待新元素在 HTML DOM 中呈现,您可以使用以下解决方案:

This items on this website are dynamically generated through Lazy Loading so you have to scrollDown and wait for the new elements to render within the HTML DOM and you can use the following solution:

  • 代码块:

  • Code Block:

import requests
import webbrowser
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException, TimeoutException

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
url = 'https://www.adidas.com/us/men-shoes-new_arrivals'
res = requests.get(url, headers = headers)
page_soup = bs(res.text, "html.parser")
containers = page_soup.findAll("div", {"class": "gl-product-card-container show-variation-carousel"})
print(len(containers))
shoe_colors = []
for container in containers:
    if container.find("div", {'class': 'gl-product-card__reviews-number'}) is not None:
    shoe_model = container.div.div.img["title"]
    review = container.find('div', {'class':'gl-product-card__reviews-number'})
    review = int(review.text)
options = Options()
options.add_argument('start-maximized')
options.add_argument('disable-infobars')
options.add_argument('--disable-extensions')
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get(url)
myLength = len(WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "span.gl-price"))))
while True:
    driver.execute_script("window.scrollBy(0,400)", "")
    try:
        WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_css_selector("span.gl-price")) > myLength)
        titles = driver.find_elements_by_css_selector("span.gl-price")
        myLength = len(titles)
    except TimeoutException:
        break
print(myLength)
for title in titles:
    print(title.text)
driver.quit()

  • 控制台输出:

  • Console Output:

    47
    $100
    $100
    $100
    $100
    $100
    $100
    $180
    $180
    $180
    $180
    $130
    $180
    $180
    $130
    $180
    $130
    $200
    $180
    $180
    $130
    $60
    $100
    $30
    $65
    $120
    $100
    $85
    $180
    $150
    $130
    $100
    $100
    $80
    $100
    $120
    $180
    $200
    $130
    $130
    $100
    $120
    $120
    $100
    $180
    $90
    $140
    $100
    

  • 这篇关于如何通过使用SeleniumWebdriver和Python滚动查找网页上的所有元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆