使用 Python urllib2,如何在 GET 和 POST 之间进行流传输? [英] Using Python urllib2, How can I stream between a GET and a POST?

查看:30
本文介绍了使用 Python urllib2,如何在 GET 和 POST 之间进行流传输?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想编写代码将文件从一个站点传输到另一个站点.这可能是一个大文件,我想在不创建本地临时文件的情况下进行.

I want to write code to transfer a file from one site to another. This can be a large file, and I'd like to do it without creating a local temporary file.

我看到了在 Python 中使用 mmap 上传大文件的技巧:HTTP Post a large file with streaming",但我真正需要的是一种将来自 GET 的响应链接到创建 POST 的方法.

I saw the trick of using mmap to upload a large file in Python: "HTTP Post a large file with streaming", but what I really need is a way to link up the response from the GET to creating the POST.

以前有人这样做过吗?

推荐答案

你不能,或者至少不应该.

You can't, or at least shouldn't.

urllib2 请求对象无法动态地将数据流式传输到它们中,期间.另一方面,响应对象是类文件对象,所以理论上你可以read(8192)而不是read(),但对于大多数协议— 包括 HTTP — 它会经常或总是将整个响应读入内存并从其缓冲区中提供您的 read(8192) 调用,使其毫无意义.因此,您必须拦截请求,从中窃取套接字并手动处理它,此时 urllib2 对您的影响远远超过了它的帮助.

urllib2 request objects have no way to stream data into them on the fly, period. And in the other direction, response objects are file-like objects, so in theory you can read(8192) out of them instead of read(), but for most protocols—including HTTP—it will either often or always read the whole response into memory and serve your read(8192) calls out of its buffer, making it pointless. So, you have to intercept the request, steal the socket out of it, and deal with it manually, at which point urllib2 is getting in your way more than it's helping.

urllib2 使一些事情变得容易,一些事情比他们应该做的要困难得多,而有些事情几乎不可能;当它使事情变得不简单时,停止使用它.

urllib2 makes some things easy, some things much harder than they should be, and some things next to impossible; when it isn't making things easy, stop using it.

一种解决方案是使用更高级别的第三方库.例如,requests 可以让您完成一半(它使流式传输变得非常容易来自响应,但只能在有限的情况下流入响应)和 requests-toolbelt 为您提供剩下的方法(它添加了各种流式上传的方法).

One solution is to use a higher-level third-party library. For example, requests gets you half-way there (it makes it very easy to stream from a response, but can only stream into a response in limited situations), and requests-toolbelt gets you the rest of the way there (it adds various ways to stream-upload).

另一种解决方案是使用较低级别的库.在这里,您甚至不必离开 stdlib.httplib 迫使您考虑发送和一点一点地接受东西,但这正是你想要的.在get请求上,你可以只调用connectrequest,然后在响应对象上重复调用read(8192).在发布请求时,您调用 connectputrequestputheaderendheaders,然后重复 send 获取请求中的每个缓冲区,然后 getresponse 完成后.

The other solution is to use a lower-level library. And here, you don't even have to leave the stdlib. httplib forces you to think in terms of sending and receiving things bit by bit, but that's exactly what you want. On the get request, you can just call connect and request, and then call read(8192) repeatedly on the response object. On the post request, you call connect, putrequest, putheader, endheaders, then repeatedly send each buffer from the get request, then getresponse when you're done.

事实上,在 Python 3.2+ 的 http.client(相当于 2.x 的 httplib)中,HTTPClient.request 没有'不必是一个字符串,它可以是任何可迭代的或任何具有 readfileno 方法的类似文件的对象......其中包括一个响应目的.所以,就是这么简单:

In fact, in Python 3.2+'s http.client (the equivalent of 2.x's httplib), HTTPClient.request doesn't have to be a string, it can be any iterable or any file-like object with read and fileno methods… which includes an response object. So, it's this simple:

import http.client

getconn = httplib.HTTPConnection('www.example.com')
getconn.request('GET', 'http://www.example.com/spam')
getresp = getconn.getresponse()

getconn = httplib.HTTPConnection('www.example.com')
getconn.request('POST', 'http://www.example.com/eggs', body=getresp)
getresp = getconn.getresponse()

... 当然,除了您可能想要制作适当的标头(您实际上可以使用 urllib.requesturllib2 的 3.x 版本,来构建一个 Request 对象而不是发送它...),并使用 urlparse 从 URL 中拉出主机和端口,而不是对它们进行硬编码,并且您想要耗尽或至少检查来自 POST 请求的响应,等等.但这显示了困难的部分,并不难.

… except, of course, that you probably want to craft appropriate headers (you can actually use urllib.request, the 3.x version of urllib2, to build a Request object and not send it…), and pull the host and port out of the URL with urlparse instead of hardcoding them, and you want to exhaust or at least check the response from the POST request, and so on. But this shows the hard part, and it's not hard.

不幸的是,我认为这不适用于 2.x.

Unfortunately, I don't think this works in 2.x.

最后,如果您熟悉 libcurl,那么它至少有三个包装器(包括源代码分发中的一个).我不确定是将 libcurl 称为比 urllib2 更高级别还是更低级别,这有点在其自身奇怪的复杂性轴上.:)

Finally, if you're familiar with libcurl, there are at least three wrappers for it (including one that comes with the source distribution). I'm not sure whether to call libcurl higher-level or lower-level than urllib2, it's sort of on its own weird axis of complexity. :)

这篇关于使用 Python urllib2,如何在 GET 和 POST 之间进行流传输?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆