获取application / x-www-form-urlencoded所需的密钥 [英] Obtaining required keys for application/x-www-form-urlencoded

查看:98
本文介绍了获取application / x-www-form-urlencoded所需的密钥的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用机械化从网站填写表格,但现在已经发生了变化,并且一些必填字段似乎隐藏起来,无法再使用机械化来访问 - 打印所有可用表单时。
我假设它已被修改为使用更多当前方法(application / x-www-form-urlencoded),但我还没有找到更新我的脚本以继续以编程方式使用此表单的方法。



从我读过的内容中,我应该能够直接向提交按钮发送字典(键/值对),而不是首先填写表单 - 如果我是我,请纠正我错误。
但我一直无法找到一种方法来获得所需的键......



我会大量赞赏它,如果有人能指出我在正确的方向或者直接告诉我这是不可能的。

解决方案

在任何情况下,您都不能提取所有字段服务器期望。



发布目标,处理POST的代码,是一个黑匣子。您无法查看服务器运行的代码。关于它所期望的最佳信息是原始表单告诉您的浏览器发布的内容。原始形式不仅包括HTML,还包括随它发送的头文件(例如cookie)以及由浏览器运行的任何JavaScript代码。



在很多情况下,解析为表单发送的HTML就足够了;这就是Mechanize(或最近更现代化的框架,比如 robobrowser ),加上一点cookie处理并确保包含引荐来源等典型标题。但是,如果任何JavaScript代码操纵 HTML或拦截表单提交以添加或删除数据,则Mechanize或其他Python表单解析器无法复制该步骤。



<然后,您的选择是:


  • 反向设计Javascript代码的功能,并在Python代码中进行复制。您的浏览器的开发工具可以在这里帮助;例如,观察在网络选项卡上发布的内容,或者使用调试器浏览JavaScript代码以查看其功能。

  • 使用实际浏览器,由Python控制。硒能为你做到这一点;它可以驱动桌面浏览器(Chrome,Firefox等),也可以用来驱动PhantomJS等无头浏览器。这对资源来说比较重要,但实际上会运行JavaScript代码,并让您像每个方式一样发布表单,就像您的浏览器一样。

I have been using mechanize to fill in a form from a website but this now has changed and some of the required fields seem to be hidden and cannot be accessed using mechanize any longer - when printing all available forms. I assume it has been modified to use more current methods (application/x-www-form-urlencoded) but I have not found a way to update my script to continue using this form programmatically.

From what I have read, I should be able to send a dict (key/value pair) to the submit button directly rather than filling the form in the first place - please correct me if I am wrong. BUT I have not been able to find a way to obtain what keys are required...

I would massively appreciate it if someone could point me in the right direction or put me straight in case this is no longer possible.

解决方案

You cannot, in all circumstances, extract all fields a server expects.

The post target, the code handling the POST, is a black box. You cannot look inside the code that the server runs. The best information you have about what it expects is what the original form tells your browser to post. That original form consists not only of the HTML, but also of the headers that were sent with it (cookies for example) and any JavaScript code that is run by the browser.

In many cases, parsing the HTML sent for the form is enough; that's what Mechanize (or a recent more modern framework like robobrowser) does, plus a little cookie handling and making sure typical headers such as the referrer are included. But if any JavaScript code manipulated the HTML or intercepts the form submission to add or remove data then Mechanize or other Python form parsers cannot replicate that step.

Your options then are to:

  • Reverse engineer what the Javascript code does and replicate that in Python code. The development tools of your browser can help here; observe what is being posted on the network tab, for example, or use the debugger to step through the JavaScript code to see what it does.

  • Use an actual browser, controlled from Python. Selenium could do this for you; it can drive a desktop browser (Chrome, Firefox, etc.) or it can be used to drive a headless browser implementation such as PhantomJS. This is heavier on the resources, but will actually run the JavaScript code and let you post a form just as your browser would, in each and every way.

这篇关于获取application / x-www-form-urlencoded所需的密钥的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆