Python请求不返回与浏览器请求/ cURL相同的头 [英] Python Requests Not Returning Same Header as Browser Request/cURL
问题描述
我想要写一个脚本,可以从 .zip DL_SelectFields.asp?Table_ID = 293rel =nofollow>运输局统计运营商网站,但我无法获得与我在Chrome中看到的相同的响应标头,当我下载zip文件。我正在寻找一个如下所示的响应头:
HTTP / 1.1 302对象移动
Cache-控制:private
Content-Length:183
Content-Type:text / html
位置:http://tsdata.bts.gov/103627300_T_T100_SEGMENT_ALL_CARRIER.zip
服务器:Microsoft- IIS / 8.5
X-Powered-By:ASP.NET
Date:Thu,21 Apr 2016 15:56:31 GMT
但是,当调用 requests.post(url,data = params,headers = headers)
在Chrome网络检查器中查看我得到以下响应:
>> res.headers
{'Cache-Control':'private','Content-Length':'262','Content-Type':'text / html','X-Powered- By':'ASP .NET','Date':'Thu,21 Apr 2016 20:16:26 GMT','Server':'Microsoft-IIS / 8.5'}
除了它缺少了 Location
键,我需要为了下载 .zip
文件中包含我想要的所有数据。 另外, Content-Length
值不同,但我不确定这是否是一个问题。
我认为我的问题与以下事实有关:当您点击页面上的下载时,它实际上会发送两个请求,我可以在Chrome网络中看到安慰。第一个请求是产生 HTTP
响应为302的 POST
请求,然后具有位置
。第二个请求是对响应头的 Location
值中指定的URL的 GET
请求。
我真的应该在这里发送两个请求吗?为什么我在浏览器中没有使用请求
获得相同的响应标头? FWIW我使用 curl -X POST -d / *我的数据* /
,并在我的终端回来了:
< head>< title>对象已移动< / title>< / head>
< body>< h1>对象已移动< / h1>可能会找到此对象< a HREF =http://tsdata.bts.gov/103714760_T_T100_SEGMENT_ALL_CARRIER.zip>此处< / a& 。< / body>
非常感谢任何帮助!
我可以通过使用几乎所有在Google Chrome浏览器网络控制台中看到的标题下载我要找的zip文件。我的头像这样:
{'Connection':'keep-alive','Cache-Control' age = 0','Referer':'http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=293','Origin':'http://www.transtats.bts.gov',' Upgrade-Insecure-Requests':1,'Accept':'text / html,application / xhtml + xml,application / xml; q = 0.9,image / webp,* / *; q = 0.8' :'Mozilla / 5.0(Windows NT 6.1; WOW64)AppleWebKit / 537.36(KHTML,像Gecko)Chrome / 49.0.2623.112 Safari / 537.36','Cookie':'ASPSESSIONIDQADBBRTA = CMKGLHMDDJIECMNGLMDPOKHC','Accept-Language':'en- US,en; q = 0.8','Accept-Encoding':'gzip,deflate','Content-Type':'application / x-www-form-urlencoded'}
然后我写了:
res = requests.post(url,data = form_data,headers = headers)
form_data 是从Chrome控制台的表单数据部分复制而来的。一旦我得到了请求,我使用
zipfile
和io
模块来解析存储在res
。像这样:import zipfile,io
zipfile.ZipFile(io.BytesIO(res.content))
,然后该文件位于我运行Python代码的目录中。
I'm looking to write a script that can automatically download
.zip
files from the Bureau of Transportation Statistics Carrier Website, but I'm having trouble getting the same response headers as I can see in Chrome when I download the zip file. I'm looking to get a response header that looks like this:HTTP/1.1 302 Object moved Cache-Control: private Content-Length: 183 Content-Type: text/html Location: http://tsdata.bts.gov/103627300_T_T100_SEGMENT_ALL_CARRIER.zip Server: Microsoft-IIS/8.5 X-Powered-By: ASP.NET Date: Thu, 21 Apr 2016 15:56:31 GMT
However, when calling
requests.post(url, data=params, headers=headers)
with the same information that I can see in the Chrome network inspector I am getting the following response:>>> res.headers {'Cache-Control': 'private', 'Content-Length': '262', 'Content-Type': 'text/html', 'X-Powered-By': 'ASP.NET', 'Date': 'Thu, 21 Apr 2016 20:16:26 GMT', 'Server': 'Microsoft-IIS/8.5'}
It's got pretty much everything except it's missing the
Location
key that I need in order to download the.zip
file with all of the data I want. Also theContent-Length
value is different, but I'm not sure if that's an issue.I think that my issue has something to do with the fact that when you click "Download" on the page it actually sends two requests that I can see in the Chrome network console. The first request is a
POST
request that yields anHTTP
response of 302 and then has theLocation
in the response header. The second request is aGET
request to the url specified in theLocation
value of the response header.Should I really be sending two requests here? Why am I not getting the same response headers using
requests
as I do in the browser? FWIW I usedcurl -X POST -d /*my data*/
and got back this in my terminal:<head><title>Object moved</title></head> <body><h1>Object Moved</h1>This object may be found <a HREF="http://tsdata.bts.gov/103714760_T_T100_SEGMENT_ALL_CARRIER.zip">here</a>.</body>
Really appreciate any help!
解决方案I was able to download the zip file that I was looking for by using almost all of the headers that I could see in the Google Chrome web console. My headers looked like this:
{'Connection': 'keep-alive', 'Cache-Control': 'max-age=0', 'Referer': 'http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=293', 'Origin': 'http://www.transtats.bts.gov', 'Upgrade-Insecure-Requests': 1, 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36', 'Cookie': 'ASPSESSIONIDQADBBRTA=CMKGLHMDDJIECMNGLMDPOKHC', 'Accept-Language': 'en-US,en;q=0.8', 'Accept-Encoding': 'gzip, deflate', 'Content-Type': 'application/x-www-form-urlencoded'}
And then I just wrote:
res = requests.post(url, data=form_data, headers=headers)
where
form_data
was copied from the "Form Data" section of the Chrome console. Once I got that request, I used thezipfile
andio
modules to parse the content of the response stored inres
. Like this:import zipfile, io zipfile.ZipFile(io.BytesIO(res.content))
and then the file was in the directory where I ran the Python code.
Thanks to the users who answered on this thread.
这篇关于Python请求不返回与浏览器请求/ cURL相同的头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!