无法找到使用请求从网页中获取零件号的正确方法 [英] Can't find the right way to grab part numbers from a webpage using requests

查看:14
本文介绍了无法找到使用请求从网页中获取零件号的正确方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个脚本,以使用请求从网页中解析不同的部件号.如果您查看此链接并点击产品列表 选项卡,您将看到零件编号.

代表零件编号所在的位置.

我尝试过:

导入请求链接 = 'https://www.festo.com/cat/en-id_id/products_ADNH'post_url = 'https://www.festo.com/cfp/camosHTML5Client/cH5C/HRQ'有效载荷 = {q":4,ReqID":21,焦点":f24~v472_0",滚动":[],事件":[e468~12~0"~472~0~4","e468_0~6~472"],"ito":22,"kms":4}使用 requests.Session() 作为 s:s.headers['user-agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36's.headers['referer'] = 'https://www.festo.com/cfp/camosHTML5Client/cH5C/go?q=2's.headers['content-type'] = 'application/json;字符集=UTF-8'r = s.post(post_url,data=payload)打印(r.json())

当我执行上述脚本时,得到如下结果:

{'isRedirect': True, 'url': '../../camosStatic/Exception.html'}

<块引用>

如何使用请求从该站点获取部件号?

在 selenium 的情况下,我尝试像下面这样获取部件号,但如果我从中踢出硬编码延迟,脚本似乎无法单击产品列表选项卡.鉴于我不希望在脚本中进行任何硬编码延迟.

导入时间从硒导入网络驱动程序from selenium.webdriver.common.by import By从 selenium.webdriver.support.ui 导入 WebDriverWait从 selenium.webdriver.support 导入 expected_conditions 作为 EC链接 = 'https://www.festo.com/cat/en-id_id/products_ADNH'使用 webdriver.Chrome() 作为驱动程序:driver.get(链接)等待 = WebDriverWait(驱动程序,15)wait.until(EC.frame_to_be_available_and_switch_to_it(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,对象"))​​)))wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#btn-group-cookie > input[value='Accept all cookies']"))).click()driver.switch_to.default_content()wait.until(EC.frame_to_be_available_and_switch_to_it(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "iframe#CamosIFId"))))))time.sleep(10) #我想摆脱这种硬编码的延迟item = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "[id='r17'] > [id='f24']"))))driver.execute_script("arguments[0].click();",item)对于 wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "[data-ctcwgtname='tabTable'] [id^='v471_']")))[1:] 中的 elem:打印(元素.文本)

解决方案

驱动程序的难点是点击产品列表"按钮,所以我找到了解决方案:

from selenium.webdriver.common.by import By从 selenium.webdriver.support.ui 导入 WebDriverWait从 selenium.webdriver.support 导入 expected_conditions 作为 ecfrom selenium.common.exceptions import TimeoutException, StaleElementReferenceException从硒导入网络驱动程序导入时间类 NoPartsNumberException(Exception):经过驱动程序 = webdriver.Chrome()等待 = WebDriverWait(驱动程序,10)driver.get(https://www.festo.com/cat/en-id_id/products_ADNH")wait.until(ec.frame_to_be_available_and_switch_to_it(wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR,对象"))​​)))wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "#btn-group-cookie > input[value='Accept all cookies']"))).click()driver.switch_to.default_content()wait.until(ec.frame_to_be_available_and_switch_to_it((By.XPATH, "//iframe[@name='CamosIF']")))结束时间 = time.time() + 30为真:尝试:如果 time.time() >时间结束:raise NoPartsNumberException('找不到零件号')product_list = wait.until(ec.element_to_be_clickable((By.XPATH, "//div[@id='f24']")))product_list.click()part_numbers_elements = wait.until(ec.visibility_of_all_elements_located((By.XPATH, "//div[contains(@id, 'v471')]")))休息除了(超时异常,StaleElementReferenceException):经过part_numbers = [p.text for p in part_numbers_elements[1:]]打印(零件编号)驱动程序关闭()

通过这种方式,驱动程序单击产品列表"按钮,直到它打开包含零件编号的窗口,并且您必须等待少于 10 秒的时间,就像在硬编码时间睡眠的代码中一样

I'm trying to create a script to parse different part numbers from a webpage using requests. If you check on this link and click on Product list tab, you will see the part numbers.

represents where the part numbers are.

I've tried with:

import requests

link = 'https://www.festo.com/cat/en-id_id/products_ADNH'
post_url = 'https://www.festo.com/cfp/camosHTML5Client/cH5C/HRQ'

payload = {"q":4,"ReqID":21,"focus":"f24~v472_0","scroll":[],"events":["e468~12~0~472~0~4","e468_0~6~472"],"ito":22,"kms":4}

with requests.Session() as s:
    s.headers['user-agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
    s.headers['referer'] = 'https://www.festo.com/cfp/camosHTML5Client/cH5C/go?q=2'
    s.headers['content-type'] = 'application/json; charset=UTF-8'
    r = s.post(post_url,data=payload)
    print(r.json())

When I execute the above script, I get the following result:

{'isRedirect': True, 'url': '../../camosStatic/Exception.html'}

How can I fetch the part numbers from that site using requests?

In case of selenium, I tried like below to fetch the part numbers but it seems the script can't click on the product list tab if I kick out hardcoded delay from it. Given that I don't wish to go for any hardcoded delay within the script.

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
 
link = 'https://www.festo.com/cat/en-id_id/products_ADNH'
 
with webdriver.Chrome() as driver:
    driver.get(link)
    wait = WebDriverWait(driver,15)
    wait.until(EC.frame_to_be_available_and_switch_to_it(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "object")))))
    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#btn-group-cookie > input[value='Accept all cookies']"))).click()
    driver.switch_to.default_content()
    wait.until(EC.frame_to_be_available_and_switch_to_it(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "iframe#CamosIFId")))))
    
    time.sleep(10)   #I would like to get rid of this hardcoded delay
    
    item = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "[id='r17'] > [id='f24']")))
    driver.execute_script("arguments[0].click();",item)
    for elem in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "[data-ctcwgtname='tabTable'] [id^='v471_']")))[1:]:
        print(elem.text)

解决方案

The difficulty for the driver is to click to the 'Product list' button so I found a solution:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
from selenium.common.exceptions import TimeoutException, StaleElementReferenceException
from selenium import webdriver
import time

class NoPartsNumberException(Exception):
    pass

driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)


driver.get("https://www.festo.com/cat/en-id_id/products_ADNH")
wait.until(ec.frame_to_be_available_and_switch_to_it(wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, "object")))))
wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "#btn-group-cookie > input[value='Accept all cookies']"))).click()
driver.switch_to.default_content()
wait.until(ec.frame_to_be_available_and_switch_to_it((By.XPATH, "//iframe[@name='CamosIF']")))

endtime = time.time() + 30
while True:
    try:
        if time.time() > endtime:
            raise NoPartsNumberException('No parts number found')
        product_list = wait.until(ec.element_to_be_clickable((By.XPATH, "//div[@id='f24']")))
        product_list.click()
        part_numbers_elements = wait.until(ec.visibility_of_all_elements_located((By.XPATH, "//div[contains(@id, 'v471')]")))
        break
    except (TimeoutException, StaleElementReferenceException):
        pass

part_numbers = [p.text for p in part_numbers_elements[1:]]
print(part_numbers)

driver.close()

In this way the driver clicks on the 'Product list' button until it opens the window containing the part numbers and you have to wait much less than 10 seconds as in your code with the hardcoded time sleep

这篇关于无法找到使用请求从网页中获取零件号的正确方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆