无法找到使用请求从网页中获取零件号的正确方法 [英] Can't find the right way to grab part numbers from a webpage using requests
问题描述
我正在尝试创建一个脚本,以使用请求从网页中解析不同的部件号.如果您查看此链接并点击产品列表代码> 选项卡,您将看到零件编号.
代表零件编号所在的位置.
我尝试过:
导入请求链接 = 'https://www.festo.com/cat/en-id_id/products_ADNH'post_url = 'https://www.festo.com/cfp/camosHTML5Client/cH5C/HRQ'有效载荷 = {q":4,ReqID":21,焦点":f24~v472_0",滚动":[],事件":[e468~12~0"~472~0~4","e468_0~6~472"],"ito":22,"kms":4}使用 requests.Session() 作为 s:s.headers['user-agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36's.headers['referer'] = 'https://www.festo.com/cfp/camosHTML5Client/cH5C/go?q=2's.headers['content-type'] = 'application/json;字符集=UTF-8'r = s.post(post_url,data=payload)打印(r.json())
当我执行上述脚本时,得到如下结果:
{'isRedirect': True, 'url': '../../camosStatic/Exception.html'}
<块引用>
如何使用请求从该站点获取部件号?
在 selenium 的情况下,我尝试像下面这样获取部件号,但如果我从中踢出硬编码延迟,脚本似乎无法单击产品列表选项卡.鉴于我不希望在脚本中进行任何硬编码延迟.
导入时间从硒导入网络驱动程序from selenium.webdriver.common.by import By从 selenium.webdriver.support.ui 导入 WebDriverWait从 selenium.webdriver.support 导入 expected_conditions 作为 EC链接 = 'https://www.festo.com/cat/en-id_id/products_ADNH'使用 webdriver.Chrome() 作为驱动程序:driver.get(链接)等待 = WebDriverWait(驱动程序,15)wait.until(EC.frame_to_be_available_and_switch_to_it(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,对象")))))wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#btn-group-cookie > input[value='Accept all cookies']"))).click()driver.switch_to.default_content()wait.until(EC.frame_to_be_available_and_switch_to_it(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "iframe#CamosIFId"))))))time.sleep(10) #我想摆脱这种硬编码的延迟item = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "[id='r17'] > [id='f24']"))))driver.execute_script("arguments[0].click();",item)对于 wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "[data-ctcwgtname='tabTable'] [id^='v471_']")))[1:] 中的 elem:打印(元素.文本)
驱动程序的难点是点击产品列表"按钮,所以我找到了解决方案:
from selenium.webdriver.common.by import By从 selenium.webdriver.support.ui 导入 WebDriverWait从 selenium.webdriver.support 导入 expected_conditions 作为 ecfrom selenium.common.exceptions import TimeoutException, StaleElementReferenceException从硒导入网络驱动程序导入时间类 NoPartsNumberException(Exception):经过驱动程序 = webdriver.Chrome()等待 = WebDriverWait(驱动程序,10)driver.get(https://www.festo.com/cat/en-id_id/products_ADNH")wait.until(ec.frame_to_be_available_and_switch_to_it(wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR,对象")))))wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "#btn-group-cookie > input[value='Accept all cookies']"))).click()driver.switch_to.default_content()wait.until(ec.frame_to_be_available_and_switch_to_it((By.XPATH, "//iframe[@name='CamosIF']")))结束时间 = time.time() + 30为真:尝试:如果 time.time() >时间结束:raise NoPartsNumberException('找不到零件号')product_list = wait.until(ec.element_to_be_clickable((By.XPATH, "//div[@id='f24']")))product_list.click()part_numbers_elements = wait.until(ec.visibility_of_all_elements_located((By.XPATH, "//div[contains(@id, 'v471')]")))休息除了(超时异常,StaleElementReferenceException):经过part_numbers = [p.text for p in part_numbers_elements[1:]]打印(零件编号)驱动程序关闭()
通过这种方式,驱动程序单击产品列表"按钮,直到它打开包含零件编号的窗口,并且您必须等待少于 10 秒的时间,就像在硬编码时间睡眠的代码中一样
I'm trying to create a script to parse different part numbers from a webpage using requests. If you check on this link and click on Product list
tab, you will see the part numbers.
represents where the part numbers are.
I've tried with:
import requests
link = 'https://www.festo.com/cat/en-id_id/products_ADNH'
post_url = 'https://www.festo.com/cfp/camosHTML5Client/cH5C/HRQ'
payload = {"q":4,"ReqID":21,"focus":"f24~v472_0","scroll":[],"events":["e468~12~0~472~0~4","e468_0~6~472"],"ito":22,"kms":4}
with requests.Session() as s:
s.headers['user-agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
s.headers['referer'] = 'https://www.festo.com/cfp/camosHTML5Client/cH5C/go?q=2'
s.headers['content-type'] = 'application/json; charset=UTF-8'
r = s.post(post_url,data=payload)
print(r.json())
When I execute the above script, I get the following result:
{'isRedirect': True, 'url': '../../camosStatic/Exception.html'}
How can I fetch the part numbers from that site using requests?
In case of selenium, I tried like below to fetch the part numbers but it seems the script can't click on the product list tab if I kick out hardcoded delay from it. Given that I don't wish to go for any hardcoded delay within the script.
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = 'https://www.festo.com/cat/en-id_id/products_ADNH'
with webdriver.Chrome() as driver:
driver.get(link)
wait = WebDriverWait(driver,15)
wait.until(EC.frame_to_be_available_and_switch_to_it(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "object")))))
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#btn-group-cookie > input[value='Accept all cookies']"))).click()
driver.switch_to.default_content()
wait.until(EC.frame_to_be_available_and_switch_to_it(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "iframe#CamosIFId")))))
time.sleep(10) #I would like to get rid of this hardcoded delay
item = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "[id='r17'] > [id='f24']")))
driver.execute_script("arguments[0].click();",item)
for elem in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "[data-ctcwgtname='tabTable'] [id^='v471_']")))[1:]:
print(elem.text)
The difficulty for the driver is to click to the 'Product list' button so I found a solution:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
from selenium.common.exceptions import TimeoutException, StaleElementReferenceException
from selenium import webdriver
import time
class NoPartsNumberException(Exception):
pass
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("https://www.festo.com/cat/en-id_id/products_ADNH")
wait.until(ec.frame_to_be_available_and_switch_to_it(wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, "object")))))
wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "#btn-group-cookie > input[value='Accept all cookies']"))).click()
driver.switch_to.default_content()
wait.until(ec.frame_to_be_available_and_switch_to_it((By.XPATH, "//iframe[@name='CamosIF']")))
endtime = time.time() + 30
while True:
try:
if time.time() > endtime:
raise NoPartsNumberException('No parts number found')
product_list = wait.until(ec.element_to_be_clickable((By.XPATH, "//div[@id='f24']")))
product_list.click()
part_numbers_elements = wait.until(ec.visibility_of_all_elements_located((By.XPATH, "//div[contains(@id, 'v471')]")))
break
except (TimeoutException, StaleElementReferenceException):
pass
part_numbers = [p.text for p in part_numbers_elements[1:]]
print(part_numbers)
driver.close()
In this way the driver clicks on the 'Product list' button until it opens the window containing the part numbers and you have to wait much less than 10 seconds as in your code with the hardcoded time sleep
这篇关于无法找到使用请求从网页中获取零件号的正确方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!