使用 Python 请求库上传大型 XML 文件 [英] Upload a large XML file with Python Requests library

查看:37
本文介绍了使用 Python 请求库上传大型 XML 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试用 Python & 替换 curl请求库.使用 curl,我可以使用 curl -T 选项将单个 XML 文件上传到 REST 服务器.我一直无法对请求库做同样的事情.

I'm trying to replace curl with Python & the requests library. With curl, I can upload a single XML file to a REST server with the curl -T option. I have been unable to do the same with the requests library.

一个基本场景有效:

payload = '<person test="10"><first>Carl</first><last>Sagan</last></person>'
headers = {'content-type': 'application/xml'}
r = requests.put(url, data=payload, headers=headers, auth=HTTPDigestAuth("*", "*"))

当我通过打开 XML 文件将有效负载更改为更大的字符串时,.put 方法挂起(我使用编解码器库来获取正确的 unicode 字符串).例如,对于 66KB 的文件:

When I change payload to a bigger string by opening an XML file, the .put method hangs (I use the codecs library to get a proper unicode string). For example, with a 66KB file:

xmlfile = codecs.open('trb-1996-219.xml', 'r', 'utf-8')
headers = {'content-type': 'application/xml'}
content = xmlfile.read()
r = requests.put(url, data=content, headers=headers, auth=HTTPDigestAuth("*", "*"))

我一直在考虑使用 multipart 选项(文件),但服务器似乎不喜欢那样.

I've been looking into using the multipart option (files), but the server doesn't seem to like that.

所以我想知道是否有办法在 Python 请求库中模拟 curl -T 行为.

So I was wondering if there is a way to simulate curl -T behaviour in Python requests library.

更新1:程序在 textmate 中挂起,但在命令行上抛出 UnicodeEncodeError 错误.看来这一定是问题所在.所以问题是:有没有办法使用请求库将 unicode 字符串发送到服务器?

UPDATE 1: The program hangs in textmate, but throws an UnicodeEncodeError error on the commandline. Seems that must be the problem. So the question would be: is there a way to send unicode strings to a server with the requests library?

更新2:感谢 Martijn Pieters 的评论,UnicodeEncodeError 消失了,但出现了一个新问题.对于文字 (ASCII) XML 字符串,日志记录显示以下几行:

UPDATE 2: Thanks to the comment of Martijn Pieters the UnicodeEncodeError went away, but a new issue turned up. With a literal (ASCII) XML string, logging shows the following lines:

2012-11-11 15:55:05,154 INFO Starting new HTTP connection (1): my.ip.address
2012-11-11 15:55:05,294 DEBUG "PUT /v1/documents?uri=/example/test.xml HTTP/1.1" 401 211
2012-11-11 15:55:05,430 DEBUG "PUT /v1/documents?uri=/example/test.xml HTTP/1.1" 201 0

似乎服务器总是退回第一次身份验证尝试 (?),然后接受第二次.

Seems the server always bounces the first authentication attempt (?) but then accepts the second one.

将文件对象 (open('trb-1996-219.xml', 'rb')) 传递给数据后,日志文件显示:

With a file object (open('trb-1996-219.xml', 'rb')) passed to data, the logfile shows:

2012-11-11 15:50:54,309 INFO Starting new HTTP connection (1): my.ip.address
2012-11-11 15:50:55,105 DEBUG "PUT /v1/documents?uri=/example/test.xml HTTP/1.1" 401 211
2012-11-11 15:51:25,603 WARNING Retrying (0 attempts remain) after connection broken by 'BadStatusLine("''",)': /v1/documents?uri=/example/test.xml

因此,第一次尝试像以前一样被阻止,但不会进行第二次尝试.

So, first attempt is blocked as before, but no second attempt is made.

根据 Martijn Pieters(下),第二个问题可以解释为服务器故障(空行).我会研究这个,但如果有人有解决方法(除了使用 curl),我不介意听到它.

According to Martijn Pieters (below), the second issue can be explained by a faulty server (empty line). I will look into this, but if someone has a workaround (apart from using curl) I wouldn't mind hearing it.

我仍然感到惊讶的是,请求库对于小字符串和文件对象的行为如此不同.文件对象在到达服务器之前不是已经序列化了吗?

And I am still surprised that the requests library behaves so differently for small string and file object. Isn't the file object serialized before it gets to the server anyway?

推荐答案

要 PUT 大文件,不要将它们读入内存.只需将文件作为 data 关键字传递:

To PUT large files, don't read them into memory. Simply pass the file as the data keyword:

xmlfile = open('trb-1996-219.xml', 'rb')
headers = {'content-type': 'application/xml'}
r = requests.put(url, data=xmlfile, headers=headers, auth=HTTPDigestAuth("*", "*"))

此外,您以 unicode 格式打开文件(从 UTF-8 解码).由于您要将其发送到远程服务器,因此您需要原始字节,而不是 unicode 值,您应该将文件作为二进制文件打开.

Moreover, you were opening the file as unicode (decoding it from UTF-8). As you'll be sending it to a remote server, you need raw bytes, not unicode values, and you should open the file as a binary instead.

这篇关于使用 Python 请求库上传大型 XML 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆