用python通过FTP下载大文件 [英] Download big files via FTP with python

查看:550
本文介绍了用python通过FTP下载大文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我编写了这段代码(删除了无用的部分,作为电子邮件功能):

  import os 
from time import strftime $ b $ from ftplib import FTP
从email.MIMEMultipart导入MIMEMultipart
从email.MIMEBase导入MIMEBase
从email.MIMEText导入MIMEText
编码器

day = strftime(%d)
today = strftime(%d-%m-%Y)

link = FTP(ftphost)
link.login (passwd = ftp_pass,user = ftp_user)
link.cwd(file_path)
link.retrbinary('RETR'+ file_name,open('/ var / backups / backup-%s.tgz'%today ,'wb')。写入)
link.delete(file_name)#从在线服务器删除文件
link.close()
mail(user_mail,下载数据库%s%今天,数据库已成功下载:%s%file_name)
exit()

你可以使用crontab运行它:

  40 23 * * * python /usr/bin/backup-transfer.py> > /var/log/backup-transfer.log 2>& 1 

它适用于小文件,但随着备份文件(大约1.7Gb)冻结,下载的文件约1.2Gb,然后永远不会增长(我等了一天),并且日志文件为空。



任何想法?



ps:即时通讯使用Python 2.6.5

解决方案

div>

对不起,如果我回答我自己的问题,但我找到了解决方案。



我尝试了ftputil没有成功,所以我尝试了很多方法,最后,作品:

  def ftp_connect(路径):
link = FTP(host ='example.com',timeout = 5)#保持低超时
link.login(passwd ='ftppass',user ='ftpuser')
debug(%s - 连接到FTP%strftime(%d-%m- %Y%H.%M))
link.cwd(路径)
返回链接

已下载= open('/ local / path / to / file.tgz' ,'wb')

def debug(txt):
print txt

link = ftp_connect(path )
file_size = link.size(文件名)

max_attempts = 5#我不想要死循环。

while file_size!= downloaded.tell():
try:
debug(%s while> try,run retrbinary \\\
%strftime(%d - %m-%Y%H.%M))
如果downloaded.tell()!= 0:
link.retrbinary('RETR'+ filename,downloaded.write,downloaded.tell ))
else:
link.retrbinary('RETR'+ filename,downloaded.write)
除了作为myerror的异常:
如果max_attempts!= 0:
调试(%s while>除了出错:%s \ n \ tfile lenght是:%i>%i \\\

Y%H%M),myerror,file_size,downloaded.tell())

link = ftp_connect(路径)
max_attempts - = 1
else:
$ break
debug(完成文件,尝试下载m5dsum)
[...]

在我找到的日志文件中:

  01-12-2011 23.30  - 连接到FTP 
01-12-2011 23.30 while>尝试,运行retrbinary
02-12-2011 00.31而>除了出错之外:超时
文件长度为:1754695793> 1754695793
02-12-2011 00.31 - 连接到FTP
用文件完成,尝试下载m5dsum

可悲的是,即使文件已被完全下载,我也必须重新连接到FTP,因为我必须下载md5sum,所以在我的cas中没有问题。



正如您所看到的,我无法检测超时并重试连接,但当我超时时,我只需重新连接;如果有人知道如何重新连接而不创建新的ftplib.FTP实例,请告诉我;)


Im trying to download daily a backup file from my server to my local storage server, but i got some problems.

I wrote this code (removed the useless parts, as the email function):

import os
from time import strftime
from ftplib import FTP
import smtplib
from email.MIMEMultipart import MIMEMultipart
from email.MIMEBase import MIMEBase
from email.MIMEText import MIMEText
from email import Encoders

day = strftime("%d")
today = strftime("%d-%m-%Y")

link = FTP(ftphost)
link.login(passwd = ftp_pass, user = ftp_user)
link.cwd(file_path)
link.retrbinary('RETR ' + file_name, open('/var/backups/backup-%s.tgz' % today, 'wb').write)
link.delete(file_name) #delete the file from online server
link.close()
mail(user_mail, "Download database %s" % today, "Database sucessfully downloaded: %s" % file_name)
exit()

And i run this with a crontab like:

40    23    *    *    *    python /usr/bin/backup-transfer.py >> /var/log/backup-transfer.log 2>&1

It works with small files, but with the backups files (about 1.7Gb) it freeze, the downloaded file get about 1.2Gb then never grows up (i waited about a day), and the log file is empty.

Any idea?

p.s: im using Python 2.6.5

解决方案

Sorry if i answer my own question, but I found the solution.

I tryed ftputil with no success, so i tryed many way and finally, this works:

def ftp_connect(path):
    link = FTP(host = 'example.com', timeout = 5) #Keep low timeout
    link.login(passwd = 'ftppass', user = 'ftpuser')
    debug("%s - Connected to FTP" % strftime("%d-%m-%Y %H.%M"))
    link.cwd(path)
    return link

downloaded = open('/local/path/to/file.tgz', 'wb')

def debug(txt):
    print txt

link = ftp_connect(path)
file_size = link.size(filename)

max_attempts = 5 #I dont want death loops.

while file_size != downloaded.tell():
    try:
        debug("%s while > try, run retrbinary\n" % strftime("%d-%m-%Y %H.%M"))
        if downloaded.tell() != 0:
            link.retrbinary('RETR ' + filename, downloaded.write, downloaded.tell())
        else:
            link.retrbinary('RETR ' + filename, downloaded.write)
    except Exception as myerror:
        if max_attempts != 0:
            debug("%s while > except, something going wrong: %s\n \tfile lenght is: %i > %i\n" %
                (strftime("%d-%m-%Y %H.%M"), myerror, file_size, downloaded.tell())
            )
            link = ftp_connect(path)
            max_attempts -= 1
        else:
            break
debug("Done with file, attempt to download m5dsum")
[...]

In my log file i found:

01-12-2011 23.30 - Connected to FTP
01-12-2011 23.30 while > try, run retrbinary
02-12-2011 00.31 while > except, something going wrong: timed out
    file lenght is: 1754695793 > 1754695793
02-12-2011 00.31 - Connected to FTP
Done with file, attempt to download m5dsum

Sadly, i have to reconnect to FTP even if the file has been fully downloaded, that in my cas is not a problem, becose i have to download the md5sum too.

As you can see, I'm not been able to detect the timeout and retry the connection, but when i got timeout, I simply reconnect again; If someone know how to reconnect without creating a new ftplib.FTP instance, let me know ;)

这篇关于用python通过FTP下载大文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆