Python请求不返回与浏览器请求/ cURL相同的头 [英] Python Requests Not Returning Same Header as Browser Request/cURL

查看：187 发布时间：2017/3/6 4:59:19 python google-chrome curl http-headers python-requests

本文介绍了Python请求不返回与浏览器请求/ cURL相同的头的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想要写一个脚本，可以从 .zip DL_SelectFields.asp？Table_ID = 293rel =nofollow>运输局统计运营商网站，但我无法获得与我在Chrome中看到的相同的响应标头，当我下载zip文件。我正在寻找一个如下所示的响应头：

  HTTP / 1.1 302对象移动
 Cache-控制：private 
 Content-Length：183 
 Content-Type：text / html 
位置：http://tsdata.bts.gov/103627300_T_T100_SEGMENT_ALL_CARRIER.zip 
服务器：Microsoft- IIS / 8.5 
 X-Powered-By：ASP.NET 
 Date：Thu，21 Apr 2016 15:56:31 GMT

但是，当调用 requests.post（url，data = params，headers = headers）在Chrome网络检查器中查看我得到以下响应：

 >> res.headers 
 {'Cache-Control'：'private'，'Content-Length'：'262'，'Content-Type'：'text / html'，'X-Powered- By'：'ASP .NET'，'Date'：'Thu，21 Apr 2016 20:16:26 GMT'，'Server'：'Microsoft-IIS / 8.5'}

除了它缺少了 Location 键，我需要为了下载 .zip 文件中包含我想要的所有数据。另外， Content-Length 值不同，但我不确定这是否是一个问题。

我认为我的问题与以下事实有关：当您点击页面上的下载时，它实际上会发送两个请求，我可以在Chrome网络中看到安慰。第一个请求是产生 HTTP 响应为302的 POST 请求，然后具有位置。第二个请求是对响应头的 Location 值中指定的URL的 GET 请求。

我真的应该在这里发送两个请求吗？为什么我在浏览器中没有使用请求获得相同的响应标头？ FWIW我使用 curl -X POST -d / *我的数据* / ，并在我的终端回来了：

 < head>< title>对象已移动< / title>< / head> 
< body>< h1>对象已移动< / h1>可能会找到此对象< a HREF =http://tsdata.bts.gov/103714760_T_T100_SEGMENT_ALL_CARRIER.zip>此处< / a& 。< / body>

非常感谢任何帮助！

解决方案

我可以通过使用几乎所有在Google Chrome浏览器网络控制台中看到的标题下载我要找的zip文件。我的头像这样：

{'Connection'：'keep-alive'，'Cache-Control' age = 0'，'Referer'：'http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=293'，'Origin'：'http://www.transtats.bts.gov'，' Upgrade-Insecure-Requests'：1，'Accept'：'text / html，application / xhtml + xml，application / xml; q = 0.9，image / webp，* / *; q = 0.8' ：'Mozilla / 5.0（Windows NT 6.1; WOW64）AppleWebKit / 537.36（KHTML，像Gecko）Chrome / 49.0.2623.112 Safari / 537.36'，'Cookie'：'ASPSESSIONIDQADBBRTA = CMKGLHMDDJIECMNGLMDPOKHC'，'Accept-Language'：'en- US，en; q = 0.8'，'Accept-Encoding'：'gzip，deflate'，'Content-Type'：'application / x-www-form-urlencoded'}

然后我写了：

  res = requests.post（url，data = form_data，headers = headers）

form_data 是从Chrome控制台的表单数据部分复制而来的。一旦我得到了请求，我使用 zipfile 和 io 模块来解析存储在 res 。像这样：

  import zipfile，io 
 zipfile.ZipFile（io.BytesIO（res.content））

，然后该文件位于我运行Python代码的目录中。

感谢在此主题回答问题的用户， a>。

I'm looking to write a script that can automatically download .zip files from the Bureau of Transportation Statistics Carrier Website, but I'm having trouble getting the same response headers as I can see in Chrome when I download the zip file. I'm looking to get a response header that looks like this:

HTTP/1.1 302 Object moved
Cache-Control: private
Content-Length: 183
Content-Type: text/html
Location: http://tsdata.bts.gov/103627300_T_T100_SEGMENT_ALL_CARRIER.zip
Server: Microsoft-IIS/8.5
X-Powered-By: ASP.NET
Date: Thu, 21 Apr 2016 15:56:31 GMT

However, when calling requests.post(url, data=params, headers=headers) with the same information that I can see in the Chrome network inspector I am getting the following response:

>>> res.headers
{'Cache-Control': 'private', 'Content-Length': '262', 'Content-Type': 'text/html', 'X-Powered-By': 'ASP.NET', 'Date': 'Thu, 21 Apr 2016 20:16:26 GMT', 'Server': 'Microsoft-IIS/8.5'}

It's got pretty much everything except it's missing the Location key that I need in order to download the .zip file with all of the data I want. Also the Content-Length value is different, but I'm not sure if that's an issue.

I think that my issue has something to do with the fact that when you click "Download" on the page it actually sends two requests that I can see in the Chrome network console. The first request is a POST request that yields an HTTP response of 302 and then has the Location in the response header. The second request is a GET request to the url specified in the Location value of the response header.

Should I really be sending two requests here? Why am I not getting the same response headers using requests as I do in the browser? FWIW I used curl -X POST -d /*my data*/ and got back this in my terminal:

<head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found <a HREF="http://tsdata.bts.gov/103714760_T_T100_SEGMENT_ALL_CARRIER.zip">here</a>.</body>

Really appreciate any help!

解决方案

I was able to download the zip file that I was looking for by using almost all of the headers that I could see in the Google Chrome web console. My headers looked like this:

{'Connection': 'keep-alive', 'Cache-Control': 'max-age=0', 'Referer': 'http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=293', 'Origin': 'http://www.transtats.bts.gov', 'Upgrade-Insecure-Requests': 1, 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36', 'Cookie': 'ASPSESSIONIDQADBBRTA=CMKGLHMDDJIECMNGLMDPOKHC', 'Accept-Language': 'en-US,en;q=0.8', 'Accept-Encoding': 'gzip, deflate', 'Content-Type': 'application/x-www-form-urlencoded'}

And then I just wrote:

res = requests.post(url, data=form_data, headers=headers)

where form_data was copied from the "Form Data" section of the Chrome console. Once I got that request, I used the zipfile and io modules to parse the content of the response stored in res. Like this:

import zipfile, io
zipfile.ZipFile(io.BytesIO(res.content))

and then the file was in the directory where I ran the Python code.

Thanks to the users who answered on this thread.

这篇关于Python请求不返回与浏览器请求/ cURL相同的头的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python请求不返回与浏览器请求/ cURL相同的头 [英] Python Requests Not Returning Same Header as Browser Request/cURL

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python请求不返回与浏览器请求/ cURL相同的头 [英] Python Requests Not Returning Same Header as Browser Request/cURL

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭