Python-将文件从HTTP(S)URL传输到FTP / Dropbox,而无需磁盘写入(分块上传) [英] Python - Transfer a file from HTTP(S) URL to FTP/Dropbox without disk writing (chunked upload)
问题描述
我在HTTP(S)位置上存储了一个大文件(500 Mb-1Gb)
(例如 https://example.com/largefile.zip
)。
I have a large file (500 Mb-1Gb) stored on a HTTP(S) location
(say https://example.com/largefile.zip
).
我对FTP服务器具有读/写访问权限
I have read/write access to an FTP server
我正常用户权限(无sudo)。
I have normal user permissions (no sudo).
在这些限制下,我想通过请求从HTTP URL读取文件,然后将其发送到FTP服务器,而无需先写入磁盘。
Within these constraints I want to read the file from the HTTP URL via requests and send it to the FTP server without writing to disk first.
通常,我会这样做。
response=requests.get('https://example.com/largefile.zip', stream=True)
with open("largefile_local.zip", "wb") as handle:
for data in response.iter_content(chunk_size=4096):
handle.write(data)
然后上传本地文件到FTP。但是我想避免磁盘I / O。我不能将FTP作为保险丝文件系统挂载,因为我没有超级用户权限。
and then upload the local file to FTP. But I want to avoid the disk I/O. I cannot mount the FTP as a fuse filesystem because I don't have super user rights.
理想情况下,我会做类似 ftp_file.write( )
而不是 handle.write()
。那可能吗? ftplib文档似乎假设仅本地文件将被上传,而不是 response.content
。因此,理想情况下,我想这样做
Ideally I would do something like ftp_file.write()
instead of handle.write()
. Is that possible? The ftplib documentation seems to assume only local files will be uploaded, not response.content
. So ideally I would like to do
response=requests.get('https://example.com/largefile.zip', stream=True)
for data in response.iter_content(chunk_size=4096):
ftp_send_chunk(data)
我不确定如何写 ftp_send_chunk()
。
有一个这里有类似的问题( Python-将内存中文件(由API调用生成)按块上传到FTP中)。我的用例需要从HTTP URL中检索一个块并将其写入FTP。
There is a similar question here (Python - Upload a in-memory file (generated by API calls) in FTP by chunks). My use case requires retrieving a chunk from the HTTP URL and writing it to FTP.
PS:答案中提供的解决方案(围绕urllib.urlopen的包装器)可以使用投寄箱也可以上传。我在使用ftp提供程序时遇到问题,因此最终使用了dropbox,它可以可靠地工作。
P.S.: The solution provided in the answer (wrapper around urllib.urlopen) will work with dropbox uploads as well. I had problems working with my ftp provider ,so finally used dropbox, which is working reliably.
请注意,Dropbox在api中具有添加网络上传功能同一件事(远程上传)。仅适用于直接链接。在我的用例中,http_url来自i.p.的流服务。受限制的。因此,此变通办法变得必要。
这是代码
Note that Dropbox has a "add web upload" feature in the api which does the same thing (remote upload). That only works with "direct" links. In my use case the http_url came from a streaming service that was i.p. restricted. So this workaround became necessary. Here's the code
import dropbox;
d = dropbox.Dropbox(<ACTION-TOKEN>);
f=FileWithProgress(filehandle);
filesize=filehandle.length;
targetfile='/'+fname;
CHUNK_SIZE=4*1024*1024
upload_session_start_result = d.files_upload_session_start(f.read(CHUNK_SIZE));
num_chunks=1
cursor = dropbox.files.UploadSessionCursor(session_id=upload_session_start_result.session_id,
offset=CHUNK_SIZE*num_chunks)
commit = dropbox.files.CommitInfo(path=targetfile)
while CHUNK_SIZE*num_chunks < filesize:
if ((filesize - (CHUNK_SIZE*num_chunks)) <= CHUNK_SIZE):
print d.files_upload_session_finish(f.read(CHUNK_SIZE),cursor,commit)
else:
d.files_upload_session_append(f.read(CHUNK_SIZE),cursor.session_id,cursor.offset)
num_chunks+=1
cursor.offset = CHUNK_SIZE*num_chunks
link = d.sharing_create_shared_link(targetfile)
url = link.url
dl_url = re.sub(r"\?dl\=0", "?dl=1", url)
dl_url = dl_url.strip()
print 'dropbox_url: ',dl_url;
我想Google甚至可以通过他们的python api使用驱动器来做到这一点,但是使用python包装器的凭据对我来说太难了。选中此 1 和此 2
I think it should even be possible to do this with google-drive via their python api , but using credentials with their python wrapper is too hard for me. Check this1 and this2
推荐答案
使用 urllib.request.urlopen
,因为它会返回类似文件的对象,您可以直接与 FTP.storbinary
。
ftp = FTP(host, user, passwd)
filehandle = urllib.request.urlopen(http_url)
ftp.storbinary("STOR /ftp/path/file.dat", filehandle)
如果要监视进度,请实现包装文件l ike对象将委派对 filehandle
对象的调用,但还将显示进度:
If you want to monitor progress, implement a wrapper file-like object that will delegate calls to filehandle
object, but will also display the progress:
class FileWithProgress:
def __init__(self, filehandle):
self.filehandle = filehandle
self.p = 0
def read(self, blocksize):
r = self.filehandle.read(blocksize)
self.p += len(r)
print(str(self.p) + " of " + str(self.p + self.filehandle.length))
return r
filehandle = urllib.request.urlopen(http_url)
ftp.storbinary("STOR /ftp/path/file.dat", FileWithProgress(filehandle))
对于Python 2使用:
For Python 2 use:
-
urllib.urlopen
,而不是urllib.request.urlopen
。 -
filehandle.info()。getheader('Content-Length ')
而不是str(self.p + filehandle.length)
urllib.urlopen
, instead ofurllib.request.urlopen
.filehandle.info().getheader('Content-Length')
instead ofstr(self.p + filehandle.length)
这篇关于Python-将文件从HTTP(S)URL传输到FTP / Dropbox,而无需磁盘写入(分块上传)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!