寻找与机械化功能等效的请求 [英] Looking for Requests equivalent of Mechanize capabilities

查看:36
本文介绍了寻找与机械化功能等效的请求的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我感兴趣的是查看请求是否可以处理我主要在Mechanize中执行的某些任务.

I am interested in seeing if Requests can handle some tasks I have primarily been doing in Mechanize.

Mechanize可以轻松处理填写表格和提交表格的过程,而我在尝试在Requests中执行相同操作时遇到了困难.

Mechanize can easily handle filling out forms and submitting forms and I am having a hard time trying to do the same thing in Requests.

例如,

import mechanize
br = mechanize.Browser()
url = "https://www.euronext.com/en/data/download?ml=nyx_pd_stocks&cmd=default&formKey=nyx_pd_filter_values%3A18d1ee939a63d459d9a2a3b07b8837a7"
br.open(url)
br.select_form(nr=1)
br.form['format']=['2']
br.form['date_format']=['2']
response = br.submit().read()

等效请求不是:

import requests
url = "https://www.euronext.com/en/data/download?ml=nyx_pd_stocks&cmd=default&formKey=nyx_pd_filter_values%3A18d1ee939a63d459d9a2a3b07b8837a7"
payload = {'format':'2','date_format':'2'}
r = requests.post(url, data=payload)

requests.post是否不提交下载页面上嵌入的CSV的表格?

Does requests.post not submit the form to download the CSV embedded on the page?

此外,有关其他信息,以下是页面上的表单:

Also, for additional information, here are what the forms on the page look like:

for form in br.forms():
    print form

<POST https://www.euronext.com/en/data/download?ml=nyx_pd_stocks&cmd=default&formKey=nyx_pd_filter_values%3A18d1ee939a63d459d9a2a3b07b8837a7  application/x-www-form-urlencoded
    <TextControl(search_block_form=)>
    <SubmitControl(op=Search) (readonly)>
    <RadioControl(search_type=[*quote, site])>
    <HiddenControl(form_build_id=form-af2eb21e9b6448ffca4e358d0b52f499) (readonly)>
    <HiddenControl(form_id=search_block_form) (readonly)>
    <HiddenControl(search_target=search_instruments) (readonly)>
    <HiddenControl(search_language=&lan=) (readonly)>>
<POST https://www.euronext.com/en/data/download?ml=nyx_pd_stocks&cmd=default&formKey=nyx_pd_filter_values%3A18d1ee939a63d459d9a2a3b07b8837a7 application/x-www-form-urlencoded
  <RadioControl(format=[*1, 2, 3])>
  <RadioControl(layout=[*2, 1])>
  <RadioControl(decimal_separator=[*1, 2])>
  <RadioControl(date_format=[*1, 2])>
  <SubmitControl(op=Go) (readonly)>
  <SubmitControl(op=Cancel) (readonly)>
  <HiddenControl(form_build_id=form-37e81285a4dbf60e091037f904bac2eb) (readonly)>
  <HiddenControl(form_id=nyx_download_form) (readonly)>>

推荐答案

requests与机械化作用不同.

Mechanize会加载实际的HTML表单并进行解析,以便您填写表单中各个元素的值.然后,当您要求Mechanize提交表单时,它将使用表单中的 all 信息向服务器发出有效请求.这包括您未提供新值的任何表单元素,如果存在,则使用默认值.这包括在浏览器中不可见的所有隐藏表单元素.

Mechanize loads the actual HTML form and parses this, letting you fill in values for the various elements in the form. When you then ask Mechanize to submit the form, it'll use all information in the form to produce a valid request to the server. This includes any form elements you didn't provide a new value for, using default values if present. This includes any hidden form elements not visible in your browser.

使用类似 robobrowser 的项目;它包装requests BeautifulSoup 来加载网页,解析出表单元素,可以帮助您填写这些元素,然后再次提交.

Use a project like robobrowser instead; it wraps requests as well as BeautifulSoup to load webpages, parse out the form elements, help you fill out those elements and submit them back again.

如果要使用 just 请求,则需要确保要发布表单定义的所有字段.这意味着您需要查看method属性(默认为GET),action属性(默认为当前URL),以及所有inputselecttextareabutton元素.服务器可能还希望HTTP请求中包含其他信息,例如cookie或 Referer (sic)标头.

If you want to use just requests, you'll need to make sure you are posting all fields defined by the form. This means you need to look at the method attribute (defaults to GET), the action attribute (defaults to the current URL), and at all the input, select, textarea and button elements. The server may also be expecting additional information in the HTTP request, such as cookies or the Referer (sic) header.

例如,您打印的机械化"信息表明它已经从您未提供值的表单中解析了另外几个字段.有问题的表单还包含一个名为form_build_id的隐藏输入字段,服务器可能依赖该输入字段. Mechanize还将捕获与原始表单请求一起发送的所有cookie,并且服务器可能也需要这些cookie才能接受请求. robobrowser会考虑相同的上下文.

The Mechanize information you printed indicates that it has parsed several more fields from the forms for which you did not provide values, for example. The form in question also contains a hidden input field named form_build_id for example, which the server may be relying on. Mechanize would also have captured any cookies sent with the original form request, and those cookies may also be required for the server to accept the request. robobrowser would take the same context into account.

这篇关于寻找与机械化功能等效的请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆