Python请求不返回与浏览器请求/ cURL相同的头 [英] Python Requests Not Returning Same Header as Browser Request/cURL

查看:187
本文介绍了Python请求不返回与浏览器请求/ cURL相同的头的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要写一个脚本,可以从 .zip DL_SelectFields.asp?Table_ID = 293rel =nofollow>运输局统计运营商网站,但我无法获得与我在Chrome中看到的相同的响应标头,当我下载zip文件。我正在寻找一个如下所示的响应头:

  HTTP / 1.1 302对象移动
Cache-控制:private
Content-Length:183
Content-Type:text / html
位置:http://tsdata.bts.gov/103627300_T_T100_SEGMENT_ALL_CARRIER.zip
服务器:Microsoft- IIS / 8.5
X-Powered-By:ASP.NET
Date:Thu,21 Apr 2016 15:56:31 GMT

但是,当调用 requests.post(url,data = params,headers = headers)在Chrome网络检查器中查看我得到以下响应:

 >> res.headers 
{'Cache-Control':'private','Content-Length':'262','Content-Type':'text / html','X-Powered- By':'ASP .NET','Date':'Thu,21 Apr 2016 20:16:26 GMT','Server':'Microsoft-IIS / 8.5'}

除了它缺少了 Location 键,我需要为了下载 .zip 文件中包含我想要的所有数据。 另外 Content-Length 值不同,但我不确定这是否是一个问题。



我认为我的问题与以下事实有关:当您点击页面上的下载时,它实际上会发送两个请求,我可以在Chrome网络中看到安慰。第一个请求是产生 HTTP 响应为302的 POST 请求,然后具有位置。第二个请求是对响应头的 Location 值中指定的URL的 GET 请求。



我真的应该在这里发送两个请求吗?为什么我在浏览器中没有使用请求获得相同的响应标头? FWIW我使用 curl -X POST -d / *我的数据* / ,并在我的终端回来了:

 < head>< title>对象已移动< / title>< / head> 
< body>< h1>对象已移动< / h1>可能会找到此对象< a HREF =http://tsdata.bts.gov/103714760_T_T100_SEGMENT_ALL_CARRIER.zip>此处< / a& 。< / body>

非常感谢任何帮助!

解决方案

我可以通过使用几乎所有在Google Chrome浏览器网络控制台中看到的标题下载我要找的zip文件。我的头像这样:

  {'Connection':'keep-alive','Cache-Control' age = 0','Referer':'http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=293','Origin':'http://www.transtats.bts.gov',' Upgrade-Insecure-Requests':1,'Accept':'text / html,application / xhtml + xml,application / xml; q = 0.9,image / webp,* / *; q = 0.8' :'Mozilla / 5.0(Windows NT 6.1; WOW64)AppleWebKit / 537.36(KHTML,像Gecko)Chrome / 49.0.2623.112 Safari / 537.36','Cookie':'ASPSESSIONIDQADBBRTA = CMKGLHMDDJIECMNGLMDPOKHC','Accept-Language':'en- US,en; q = 0.8','Accept-Encoding':'gzip,deflate','Content-Type':'application / x-www-form-urlencoded'} 



然后我写了:

  res = requests.post(url,data = form_data,headers = headers)

form_data 是从Chrome控制台的表单数据部分复制而来的。一旦我得到了请求,我使用 zipfile io 模块来解析存储在 res 。像这样:

  import zipfile,io 
zipfile.ZipFile(io.BytesIO(res.content))

,然后该文件位于我运行Python代码的目录中。



感谢在此主题回答问题的用户, a>。


I'm looking to write a script that can automatically download .zip files from the Bureau of Transportation Statistics Carrier Website, but I'm having trouble getting the same response headers as I can see in Chrome when I download the zip file. I'm looking to get a response header that looks like this:

HTTP/1.1 302 Object moved
Cache-Control: private
Content-Length: 183
Content-Type: text/html
Location: http://tsdata.bts.gov/103627300_T_T100_SEGMENT_ALL_CARRIER.zip
Server: Microsoft-IIS/8.5
X-Powered-By: ASP.NET
Date: Thu, 21 Apr 2016 15:56:31 GMT

However, when calling requests.post(url, data=params, headers=headers) with the same information that I can see in the Chrome network inspector I am getting the following response:

>>> res.headers
{'Cache-Control': 'private', 'Content-Length': '262', 'Content-Type': 'text/html', 'X-Powered-By': 'ASP.NET', 'Date': 'Thu, 21 Apr 2016 20:16:26 GMT', 'Server': 'Microsoft-IIS/8.5'}

It's got pretty much everything except it's missing the Location key that I need in order to download the .zip file with all of the data I want. Also the Content-Length value is different, but I'm not sure if that's an issue.

I think that my issue has something to do with the fact that when you click "Download" on the page it actually sends two requests that I can see in the Chrome network console. The first request is a POST request that yields an HTTP response of 302 and then has the Location in the response header. The second request is a GET request to the url specified in the Location value of the response header.

Should I really be sending two requests here? Why am I not getting the same response headers using requests as I do in the browser? FWIW I used curl -X POST -d /*my data*/ and got back this in my terminal:

<head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found <a HREF="http://tsdata.bts.gov/103714760_T_T100_SEGMENT_ALL_CARRIER.zip">here</a>.</body>

Really appreciate any help!

解决方案

I was able to download the zip file that I was looking for by using almost all of the headers that I could see in the Google Chrome web console. My headers looked like this:

{'Connection': 'keep-alive', 'Cache-Control': 'max-age=0', 'Referer': 'http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=293', 'Origin': 'http://www.transtats.bts.gov', 'Upgrade-Insecure-Requests': 1, 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36', 'Cookie': 'ASPSESSIONIDQADBBRTA=CMKGLHMDDJIECMNGLMDPOKHC', 'Accept-Language': 'en-US,en;q=0.8', 'Accept-Encoding': 'gzip, deflate', 'Content-Type': 'application/x-www-form-urlencoded'}

And then I just wrote:

res = requests.post(url, data=form_data, headers=headers)

where form_data was copied from the "Form Data" section of the Chrome console. Once I got that request, I used the zipfile and io modules to parse the content of the response stored in res. Like this:

import zipfile, io
zipfile.ZipFile(io.BytesIO(res.content))

and then the file was in the directory where I ran the Python code.

Thanks to the users who answered on this thread.

这篇关于Python请求不返回与浏览器请求/ cURL相同的头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆