使用Python和Selenium的.submit()刮除Ajax表单 [英] Scrape an Ajax form with .submit() with Python and Selenium

查看：78 发布时间：2021/4/15 19:20:48 python selenium-webdriver web-scraping beautifulsoup scrapy

本文介绍了使用Python和Selenium的.submit()刮除Ajax表单的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试从网页获取链接.网页使用javascript发送请求，然后服务器发送响应，该响应直接用于下载PDF.新的PDF会自动下载到您的浏览器中.我的第一种方法是使用硒来获取信息:

I am trying to get the link from a web page. The web page sends the request using javascript, then the server sends a response which goes directly to download a PDF. This new PDF is automatically downloaded into your browser. My first approach was to use selenium to get the information:

# Path chromedriver & get url
path = "/Users/my_user/Desktop/chromedriver"
browser = webdriver.Chrome(path)
browser.get("https://www.holzwickede.de/amtsblatt/index.php")

# Banner click
ban = WebDriverWait(browser,15).until(EC.element_to_be_clickable((By.XPATH,"//a[@id='cc_btn_accept_all']"))).click()

#Element to get
elem = browser.find_element_by_xpath("//div[@id='content']/div[7]/table//form[@name='gazette_52430']/a[@href='#gazette_52430']")
elem.click()
print (browser.current_url)

结果是当前URL，该URL对应于同一网页，而请求直接发送到服务器.

The result was the current URL which corresponds to the same webpage, while the request is directly to the server.

https://www.holzwickede.de/amtsblatt/index.php#gazette_52430

在这个失败的结果之后，我尝试通过请求来抓住它.

I tried after this unsuccessful result to grab it with requests.

 # Access requests via the `requests` attribute
 for request in browser.requests: #It captures all the requessin chronologica order
     if request.response.headers:
         print(
             request.path,
             request.response.status_code,
             request.response.headers,
            request.body,
            "/n"

        )

结果仍然不是来自PDF的背后链接.你们知道我该怎么办吗?预先感谢.

The result stills not the behind link from which the PDF is coming. Do you guys have an idea what can I do ? Thanks in advance.

推荐答案

我找到了答案.该请求将发送一个POST表单.因此，我们必须提取标题内容及其参数.当您知道表单发送的参数时，可以使用请求将链接返回到控制台.

I found the answer. The request sends a POST form. Therefore, we have to extract the header contents and their parameters. When you know the parameters the form sends, you can use the request to get back the link to your console.

response = requests.get(url, params={'key1': 'value1', 'key2': 'value2'})
print (response.url)

此问题还解决了以下问题:使用硒python捕获AJAX响应

This question solves additionally this question: Capture AJAX response with selenium python

干杯！

这篇关于使用Python和Selenium的.submit()刮除Ajax表单的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Python和Selenium的.submit()刮除Ajax表单 [英] Scrape an Ajax form with .submit() with Python and Selenium

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用Python和Selenium的.submit()刮除Ajax表单 [英] Scrape an Ajax form with .submit() with Python and Selenium

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭