AppEngine上,网址抓取和Content-Length头 [英] appengine, urlfetch, and the content-length header

查看:201
本文介绍了AppEngine上,网址抓取和Content-Length头的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个谷歌的AppEngine应用程序要求使用的urllib2 POST的另一台服务器的页面。我最近启用的gzip COM pression运行的Apache2在其他服务器上,而AppEngine上的页面请求失败开始在关键错误,表示内容长度是不是在报头。

I have a Google Appengine app requesting pages from another server using urllib2 POSTs. I recently enabled gzip compression on the other server running Apache2, and the Appengine page requests started failing on key-error, indicating 'content-length' is not in the headers.

我不显式声明的gzip作为从AppEngine上我的请求被接受的编码,但它有可能是的AppEngine补充说头。谷歌搜索没有打开的任何明确的迹象表明AppEngine上的网址抓取隐式添加页眉接受gzip编码。

I am not explicitly declaring gzip as an accepted encoding in my requests from Appengine, but it is possible Appengine is adding that header. Googling has not turned up any clear indication that Appengine's urlfetch implicitly adds a header to accept gzip encoding.

Apache2的,如果我没有记错,省略内容长度的头当响应为com pressed,但应该不会影响非COM pressed来自同一服务器的响应。

Apache2, if I recall correctly, omits content-length headers when the response is compressed, but that should not affect non-compressed responses from the same server.

没有任何人有任何见解,以发生了什么,为什么被省略内容长度的头?

Does anybody have any insight as to what is happening, why the content-length header is being omitted?

推荐答案

根据这个线索:
<一href=\"http://groups.google.com/group/google-appengine-java/browse%5Fthread/thread/5c5f2a7e2d2beadc?pli=1\" rel=\"nofollow\">http://groups.google.com/group/google-appengine-java/browse%5Fthread/thread/5c5f2a7e2d2beadc?pli=1)
上AppEngine上的Java新闻组,谷歌也一般都设置了接受编码:gzip。在网址抓取请求标题,然后DECOM presses(ungzips)数据递给脚本之前输入

According to this thread: http://groups.google.com/group/google-appengine-java/browse%5Fthread/thread/5c5f2a7e2d2beadc?pli=1) on an Appengine Java newsgroup, Google does generally set the 'Accept-Encoding: gzip' header on urlfetch requests, and then decompresses (ungzips) the input before handing the data to the script.

因此​​很明显,AppEngine上增加了接受编码的$ pssed数据大小。所以,如果外面的服务器将提供gzip压缩的反应,最终结果到AppEngine上的脚本(后上述所有的pre型和后处理的行为通过的AppEngine)是Content-Length头的丢失。

So apparently, Appengine adds an accept-encoding: gzip header implicitly on the requests way out to the internet, and decompresses the response, but does not insert a content-length into the headers for the decompressed data size. So if the outside server will provide gzipped responses, the net result to the Appengine script (after all the pre- and post- processing behavior by Appengine described above) is the loss of the content-length header.

这篇关于AppEngine上,网址抓取和Content-Length头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆