在Python 3中从服务器返回回复时引发BadStatusLine异常 [英] BadStatusLine exception raised when returning reply from server in Python 3

查看:81
本文介绍了在Python 3中从服务器返回回复时引发BadStatusLine异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将脚本移植到python 3,该脚本提交在此处找到的XML提要:

I am trying to port a script to python 3 that submits XML feeds found here:

https://developers.google.com/search-appliance/documentation/files/pushfeed_client.py.txt

在运行2to3.py并进行了一些较小的调整以删除任何语法错误后,脚本将失败,并为此:

After running 2to3.py and making a few minor adjustments to remove any syntax errors the script fails with this:

(py33dev) d:\dev\workspace>python pushfeed_client.py --datasource="TEST1" --feedtype="full" --url="http://gsa:19900/xmlfeed" --xmlfilename="test.xml"
Traceback (most recent call last):
  File "pushfeed_client.py", line 108, in <module>
    main(sys.argv)
  File "pushfeed_client.py", line 56, in main
    result = urllib.request.urlopen(request_url)
  File "C:\Python33\Lib\urllib\request.py", line 156, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python33\Lib\urllib\request.py", line 469, in open
    response = self._open(req, data)
  File "C:\Python33\Lib\urllib\request.py", line 487, in _open
    '_open', req)
  File "C:\Python33\Lib\urllib\request.py", line 447, in _call_chain
    result = func(*args)
  File "C:\Python33\Lib\urllib\request.py", line 1268, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "C:\Python33\Lib\urllib\request.py", line 1253, in do_open
    r = h.getresponse()
  File "C:\Python33\Lib\http\client.py", line 1147, in getresponse
    response.begin()
  File "C:\Python33\Lib\http\client.py", line 358, in begin
    version, status, reason = self._read_status()
  File "C:\Python33\Lib\http\client.py", line 340, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: <!DOCTYPE html>

为什么返回带有服务器响应的异常?这是我嗅探会话时来自GSA的完整回复:

Why is it returning that exception with the response from the server? Here's the full response from the GSA when I sniffed the session:

<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 400 (Bad Request)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}
  </style>
  <a href=//www.google.com/><img src=//www.google.com/images/errors/logo_sm.gif alt=Google></a>
  <p><b>400.</b> <ins>That’s an error.</ins>
  <p>Your client has issued a malformed or illegal request.  <ins>That’s all we know.</ins>

它确实返回了HTTP400.只要XML有效负载中包含utf-8字符,我就可以可靠地引起此问题.当它是纯ASCII码时,它可以完美地工作.这是我可以用来可靠地重现问题的最基本的代码版本:

And it did return an HTTP 400. I can reliably cause this issue whenever the XML payload has a utf-8 character in it. It works flawlessly when it's plain ascii. Here's the most basic version of code I can use to reliably recreate the issue:

import http.client
http.client.HTTPConnection.debuglevel = 1
with open("GSA_full_Feed.xml", encoding='utf-8') as xdata:
    payload = xdata.read()
content_length = len(payload)
feed_path = "xmlfeed"
content_type = "multipart/form-data; boundary=----------boundary_of_feed_data$"
headers = {"Content-type": content_type, "Content-length": content_length}
conn = http.client.HTTPConnection("gsa", 19900)
conn.request("POST", feed_path, body=payload.encode("utf-8"), headers=headers)
res = conn.getresponse()
print(res.read())
conn.close()

这是用于导致异常的示例XML有效负载:

And here's a sample XML payload that is used to cause the exception:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE gsafeed PUBLIC "-//Google//DTD GSA Feeds//EN" "gsafeed.dtd">
<gsafeed>
  <header>
    <datasource>TEST1</datasource>
    <feedtype>full</feedtype>
  </header>
  <group>
    <record action="add" mimetype="text/html" url="https://myschweetassurl.com">
      <metadata>
        <meta content="shit happens, then you die" name="description"/>
      </metadata>
      <content>wacky Umläut test of non utf-8 characters</content>
    </record>
  </group>
</gsafeed>

我能在2和3版本之间找到的唯一增量是每个请求的content-length标头. Python 3版本始终比2版本短,分别是870和873.

The only delta I can find between the 2 and 3 version are the content-length headers on each request. The Python 3 version is consistently shorter than the 2 version, 870 vs. 873.

推荐答案

经过大量的布线,我弄清楚了问题的原因和解决方法是设置内容长度标头的方式.在脚本的Python 3端口中,我复制了设置内容长度的方法.这是什么:

After lots of wiresharking, I figured out the cause, and solution, of the problem is the way the content-length header was being set. In my Python 3 port of the script, I copied over the method that set the content-length. Which is this:

headers['Content-length']=str(len(body))

那是不正确的!正确的方法是这样:

That is incorrect! The correct way would be this:

headers['Content-length']=str(len(bytes(body, 'utf-8')))

因为有效负载必须是字节对象.当您对字节进行编码时,其长度与字符串版本不同.

Because the payload must be a bytes object. When you bytes encode it, the length is different than the string version.

return urllib.request.Request(theurl, bytes(body, 'utf-8'), headers)

使用从http.client.HTTPConnection派生的任何内容时,可以安全地省略手动设置content-length标头的操作.它具有一个内部方法,用于检查content-length标头,如果缺少该标头,则根据内容主体的长度对其进行设置,而不管其形式如何.

You can safely omit manually setting the content-length header when using anything that derives from http.client.HTTPConnection. It has an internal method that checks for the content-length header, and if it's missing, set it based on the length of the content body, regardless of form.

问题是Python 2和3之间的翻译但微妙的区别,以及它如何处理字符串和对其进行编码.普通的ASCII版本在utf-8版本不起作用的情况下一定是偶然的,哦.

The issue was a translation but subtle difference between Python 2 and 3 and how it handles strings and encodes them. It must've been some kind of fluke that the regular ASCII version worked when the utf-8 version didn't, oh well.

这篇关于在Python 3中从服务器返回回复时引发BadStatusLine异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆