urllib2 python(传输编码:分块) [英] urllib2 python (Transfer-Encoding: chunked)

查看:35
本文介绍了urllib2 python(传输编码:分块)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用以下python代码下载html页面:

I used the following python code to download the html page:

response = urllib2.urlopen(current_URL)
msg = response.read()  
print msg

对于诸如 这个,它打开url没有错误,但只打印了html页面的一部分!

For a page such as this one, it opens the url without error but then prints only part of the html-page!

在以下几行中,您可以找到 html 页面的 http 标头.我认为问题是由于传输编码:分块"造成的.

In the following lines you can find the http headers of the html-page. I think the problem is due to "Transfer-Encoding: chunked".

似乎 urllib2 只返回第一个块!我很难阅读剩余的块.我如何读取剩余的块?

It seems urllib2 returns only the first chunk! I have difficulties reading the remaining chunks. How I can read the remaining chunks?

Server: nginx/1.0.5
Date: Wed, 27 Feb 2013 14:41:28 GMT
Content-Type: text/html;charset=UTF-8
Transfer-Encoding: chunked
Connection: close
Set-Cookie: route=c65b16937621878dd49065d7d58047b2; Path=/
Set-Cookie: JSESSIONID=EE18E813EE464664EA64086D5AE9A290.tpdjo13v_3; Path=/
Pragma: No-cache
Cache-Control: no-cache,no-store,max-age=0
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Vary: Accept-Encoding
Content-Language: fr

推荐答案

我发现如果指定了 Accept-Language 标头,则服务器不会断开 TCP 连接,否则会断开.

I've found out that if I Accept-Language header is specified than server doesn't drop TCP connection, otherwise it does.

curl -H "Accept-Language:uk,en-US;q=0.8,en;q=0.6,ru;q=0.4" -v 'http://www.legifrance.gouv.fr/affichJuriJudi.do?oldAction=rechJuriJudi&idTexte=JURITEXT000024053954&fastReqId=660326373&fastPos=1'

这篇关于urllib2 python(传输编码:分块)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆