硒不会在刮板中进入下一页 [英] Selenium not going to next page in scraper

查看:72
本文介绍了硒不会在刮板中进入下一页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写我的第一个真正的刮板,尽管总体上进展顺利,但我使用 Selenium 遇到了困难.我无法转到下一页.

I'm writing my first real scraper and although in general it's been going well, I've hit a wall using Selenium. I can't get it to go to the next page.

下面是我的代码的头部.下面的输出现在只是在终端中打印出数据,一切正常.它只是在第 1 页的末尾停止抓取并显示我的终端提示.它永远不会从第 2 页开始.如果有人能提出建议,我将不胜感激.我试过选择页面底部的按钮,我试图同时使用相对 Xpath 和完整 Xpath(您在此处看到完整的 Xpath),但两者都不起作用.我正在尝试单击右箭头按钮.

Below is the head of my code. The output below this is just printing out data in terminal for now and that's all working fine. It just stops scraping at the end of page 1 and shows me my terminal prompt. It never starts on page 2. I would be so grateful if anyone could make a suggestion. I've tried selecting the button at the bottom of the page I'm trying to scrape using both the relative and full Xpath (you're seeing the full one here) but neither work. I'm trying to click the right-arrow button.

我内置了自己的错误消息来指示驱动程序是否通过 Xpath 成功找到了该元素.当我执行我的代码时会触发错误消息,所以我猜它没有找到该元素.我就是不明白为什么不.

I built in my own error message to indicate whether the driver successfully found the element by Xpath or not. The error message fires when I execute my code, so I guess it's not finding the element. I just can't understand why not.

# Importing libraries
import requests
import csv
import re
from urllib.request import urlopen
from bs4 import BeautifulSoup

# Import selenium 
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, WebDriverException
import time

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
driver = webdriver.Chrome("/path/to/driver", options=options)
# Yes, I do have the actual path to my driver in the original code

driver.get("https://uk.eu-supply.com/ctm/supplier/publictenders?B=UK")
time.sleep(5)
while True:
    try:
        driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div/form/div[3]/div/div/ul[1]/li[4]/a').click()
    except (TimeoutException, WebDriverException) as e:
        print("A timeout or webdriver exception occurred.")
        break
driver.quit()

推荐答案

你可以做的是设置 Selenium 预期条件 (visibility_of_element_located, element_to_be_clickable) 并使用相对 XPath 选择下一个页面元素.所有这些都在一个循环中(它的范围是您必须处理的页数).

What you can do is to set up Selenium expected conditions (visibility_of_element_located, element_to_be_clickable) and use a relative XPath to select the next page element. All of this in a loop (its range is the number of pages you have to deal with).

下一页链接的 XPath :

XPath for the next page link :

//div[@class='pagination ctm-pagination']/ul[1]/li[last()-1]/a

代码看起来像:

## imports

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver.get("https://uk.eu-supply.com/ctm/supplier/publictenders?B=UK")

## count the number of pages you have

els = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='pagination ctm-pagination']/ul[1]/li[last()]/a"))).get_attribute("data-current-page")

## loop. at the end of the loop, click on the following page

for i in range(int(els)):
    ***scrape what you want***
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='pagination ctm-pagination']/ul[1]/li[last()-1]/a"))).click()

这篇关于硒不会在刮板中进入下一页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆