python请求带有标题和参数的POST [英] python requests POST with header and parameters

查看:34
本文介绍了python请求带有标题和参数的POST的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 post 请求,我试图在 python 中使用 requests 发送它.但是我收到一个无效的 403 错误.请求通过浏览器正常工作.

I have a post request which I am trying to send using requests in python. But I get an invalid 403 error. The requests works fine through the browser.

POST /ajax-load-system HTTP/1.1
Host: xyz.website.com
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-GB,en;q=0.5
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0
Referer: http://xyz.website.com/help-me/ZYc5Yn
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With: XMLHttpRequest
Content-Length: 56
Cookie: csrf_cookie_name=a3f8adecbf11e29c006d9817be96e8d4; ci_session=ba92hlh6o0ns7f20t4bsgjt0uqfdmdtl; _ga=GA1.2.1535910352.1530452604; _gid=GA1.2.1416631165.1530452604; _gat_gtag_UA_21820217_30=1
Connection: close

csrf_test_name=a3f8adecbf11e29c006d9817be96e8d4&vID=9999

我在 python 中尝试的是:

What I am trying in python is:

import requests
import json

url = 'http://xyz.website.com/ajax-load-system'

payload = {
'Host': 'xyz.website.com',
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'en-GB,en;q=0.5',
'Referer': 'http://xyz.website.com/help-me/ZYc5Yn',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'X-Requested-With': 'XMLHttpRequest',
'Content-Length': '56',
'Cookie': 'csrf_cookie_name=a3f8adecbf11e29c006d9817be96e8d4; ci_session=ba92hlh6o0ns7f20t4bsgjt0uqfdmdtl; _ga=GA1.2.1535910352.1530452604; _gid=GA1.2.1416631165.1530452604; _gat_gtag_UA_21820217_30=1',
'Connection': 'close',
'csrf_test_name': 'a3f8adecbf11e29c006d9817be96e8d4',
'vID': '9999',
}    

headers = {}

r = requests.post(url, headers=headers, data=json.dumps(payload))
print(r.status_code)  

但这是打印一个 403 错误代码.我在这里做错了什么?

But this is printing a 403 error code. What am I doing wrong here?

我期待以 json 格式返回响应:

I am expecting a return response as json:

{"status_message":"感谢帮助.","help_count":"141","status":true}

推荐答案

您混淆了标头和有效负载,有效负载不是 JSON 编码的.

You are confusing headers and payload, an the payload is not JSON encoded.

这些都是标题:

Host: xyz.website.com
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-GB,en;q=0.5
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0
Referer: http://xyz.website.com/help-me/ZYc5Yn
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With: XMLHttpRequest
Content-Length: 56
Cookie: csrf_cookie_name=a3f8adecbf11e29c006d9817be96e8d4; ci_session=ba92hlh6o0ns7f20t4bsgjt0uqfdmdtl; _ga=GA1.2.1535910352.1530452604; _gid=GA1.2.1416631165.1530452604; _gat_gtag_UA_21820217_30=1
Connection: close

其中大部分是自动化的,不需要手动设置.requests 将根据 URL 为您设置 HostAccept 设置为可接受的默认值,Accept-Language在这些情况下很少需要 Referer,除非使用 HTTPS,出于隐私原因,通常甚至不会设置或过滤掉它,因此网站不再依赖它的设置,Content-Type 必须实际反映您的 POST 的内容(而不是 JSON!),因此 requests 根据您的调用方式为您设置它,Content-Length 必须反映实际的内容长度,所以由 requests 设置,因为它是计算这个的最佳位置,并且 Connection 肯定应该由库,因为您不想阻止它有效地重用连接(如果可以).

Most of these are automated and don't need to be set manually. requests will set Host for you based on the URL, Accept is set to an acceptable default, Accept-Language is rarely needed in these situations, Referer, unless using HTTPS, is often not even set or filtered out for privacy reasons, so sites no longer rely on it being set, Content-Type must actually reflect the contents of your POST (and is not JSON!), so requests sets this for you depending on how you call it, Content-Length must reflect the actual content length, so is set by requests as it is in the best position to calculate this, and Connection should definitely be handled by the library as you don't want to prevent it from efficiently re-using connections if it can.

最好你可以设置 X-Requested-WithUser-Agent,但前提是服务器不接受请求.Cookies 标头反映了浏览器保存的 cookie 的值.您的脚本可以使用 从服务器获取自己的一组 cookie请求 Session 对象Referer 标头中指定的 url(或同一站点上的其他合适的 URL)发出初始 GET 请求,此时服务器应该在响应上设置 cookie,并且这些 cookie 将存储在会话中以便在 post 请求中重用.使用该机制获取您自己的 CSRF cookie 值.

At best you could set X-Requested-With and User-Agent, but only if the server would not otherwise accept the request. The Cookies header reflect the values of cookies the browser holds. Your script can get their own set of cookies from the server by using a requests Session object to make an initial GET request to the url named in the Referer header (or other suitable URL on the same site), at which point the server should set cookies on the response, and those would be stored in the session for reuse on the post request. Use that mechanism to get your own CSRF cookie value.

注意 Content-Type 标头:

Content-Type: application/x-www-form-urlencoded; charset=UTF-8

当您将字典传递给 requests.post() 函数的 data 关键字时,库将为您将数据编码为准确的内容类型.

When you pass in a dictionary to the data keyword of the requests.post() function, the library will encode the data to exactly that content type for you.

实际有效载荷为

csrf_test_name=a3f8adecbf11e29c006d9817be96e8d4&vID=9999

这是两个字段,csrf_test_namevID,它们需要成为 payload 字典的一部分.

These are two fields, csrf_test_name, and vID, that need to part of your payload dictionary.

请注意,csrf_test_name与 cookie 中的 csrf_cookie_name 值匹配.这就是网站如何保护自己免受跨站伪造攻击,其中第三方可能会代表您尝试向同一 URL 发帖.这样的第三方将无法访问相同的 cookie,因此将被阻止.你的代码需要获取一个新的cookie;正确的 CSRF 实现会限制任何 CSRF cookie 可以重复使用的时间.

Note that the csrf_test_name value matches the csrf_cookie_name value in the cookies. This is how the site protects itself from Cross-site forgery attacks, where a third party may try to post to the same URL on your behalf. Such a third party would not have access to the same cookies so would be prevented. Your code needs to obtain a new cookie; a proper CSRF implementation would limit the time any CSRF cookie can be re-used.

那么至少需要什么才能让它一切正常,是:

So what would at least be needed to make it all work, is:

# *optional*, the site may not care about these. If they *do* care, then
# they care about keeping out automated scripts and could in future 
# raise the stakes and require more 'browser-like' markers. Ask yourself
# if you want to anger the site owners and get into an arms race.
headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0',
    'X-Requested-With': 'XMLHttpRequest',
}

payload = {
    'vID': 9999,
}

url = 'http://xyz.website.com/ajax-load-system'
# the URL from the Referer header, but others at the site would probably
# also work
initial_url = 'http://xyz.website.com/help-me/ZYc5Yn'

with requests.Session() as session:
    # obtain CSRF cookie
    initial_response  = session.get(initial_url)
    payload['csrf_test_name'] = session.cookies['csrf_cookie_name']

    # Now actually post with the correct CSRF cookie
    response = session.post(url, headers=headers, data=payload)

如果这仍然导致问题,您需要尝试另外两个标题,AcceptAccept-Language.考虑到这将意味着该站点已经仔细考虑了如何将自动站点抓取工具排除.考虑与他们联系并询问他们是否提供 API 选项.

If this still causes issues, you'll need to try out two additional headers, , Accept and Accept-Language. Take into account this will mean that the site has already thought long and hard about how to keep automated site scrapers out. Consider contacting them and asking if they offer an API option instead.

这篇关于python请求带有标题和参数的POST的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆