从服务器下载文件时处理EOFError [英] Deal with EOFError while downloading files from server

查看:426
本文介绍了从服务器下载文件时处理EOFError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用案例:

下载数十万个xmls文件(大小从字节到50 MB /文件),结构像这样/年使用ftplib的月/日/月/日/文件。
所以我循环每个小时文件夹给定的一天,并为每一个我存储所有的文件名与ftp.nlst(),然后我循环通过每个文件名,我donwload像这样的有关文件。

  with open(local_file,'wb')as fhandle:
try:
ftp.retrbinary('RETR'+文件名,fhandle.write)
除EOFError:
try:
fhandle.close()
os.remove(local_file)
ftp = ftplib.FTP()
ftp.connect(self.remote_host,self.port,timeout = 60)
ftp.login(self.username,self.passwd,acct =)
ftp.cwd(self。 ('RETR'+文件名,fhandle.write,8192)
将b $ b打开(local_file,'wb')作为fhandle:
ftp.retrbinary('RETR'+ filename,
除外:
self.log.error('我放弃!!!')

预期:



对于输入文件夹中的每一天,下载所有相关的xml文件



我得到了:




  • 尽管所有可能的帖子都是关于
    stackoverflow和一般网络的主题的,

  • 我试图关闭并打开小时文件夹中每个子文件夹的
    a ne连接。

  • 它似乎不是一个导致问题的特定文件。这绝对不是第一个。当使用 ftp.retrbinary()下载文件时,我得到这个 EOFError 。这与我下载成千上万个xml文件的事实有关,因为我已经用2000个文件测试了这个脚本,并且我没有任何例外,但总是有大约287000个文件。而我不明白的是,脚本每次下载相同数量/数量的xml文件,大约159 000次,而且它始终是

  • 我尝试过玩



    ftp.retrbinary('RETR'+ filename,fhandle.write,4096)




问题:

是因为我错过了什么?
如何处理这个EOFError,继续下载我所有的文件......并且不会失去理智。

我为我的问题找到了解决方案。而不是打开每个子文件夹的连接,现在我打开每个要下载的文件的连接。这是性能较差,但我得到通过这个 EOFError
我也发现我想要下载文件的FTP服务器有限制,例如并行连接的数量或连接可能会持续多久。


Use Case:

Dowload hundred of thousands of xmls files (size from bytes to 50 mb/file) structured like this /year-month/year-month-day/hours/files with ftplib. So i loop through each hour folder for a given day and for each one i store all the filenames with ftp.nlst(), then i loop through each filename and i donwload the concerned file like this.

with open(local_file, 'wb') as fhandle:
    try:
        ftp.retrbinary('RETR ' + filename, fhandle.write)
    except EOFError:
        try:
            fhandle.close()
            os.remove(local_file)
            ftp = ftplib.FTP()
            ftp.connect(self.remote_host,self.port, timeout=60)
            ftp.login(self.username, self.passwd, acct="")
            ftp.cwd(self.input_folder + '/' + subdir)
            try:
                with open(local_file, 'wb') as fhandle:
                ftp.retrbinary('RETR ' + filename, fhandle.write, 8192)
            except:
                self.log.error('i give up !!!')

Expected:

For each day given as input folder, download all the concerned xml files

what i get:

EOFError

What i already tried:

  • I have gone though all possible posts about the subject on stackoverflow and the net in general.
  • i have tried to close and open a ne connection for each subfolder in the hour folder.
  • It doesn't seem to be one specific file that is causing the problem. It is definitely not the first one. i get this EOFError while downloading files with ftp.retrbinary(). It is related to the fact that i download hundred of thousands of xmls files, because i have tested the script with 2000 files and i didn't got any exceptions but with around 287000 files i get it always. And what i don't understand is that the script downloads each time the same amount/number of xml files, around 159 000 and it is always
  • I have tried to play with the buffersize in

    ftp.retrbinary('RETR ' + filename, fhandle.write,4096)

Question:

it may be that i have missed something? How to handle this EOFError to continue downloading all my files...and without loosing my sanity.

解决方案

Finally i found a solution for my problem. Instead of opening a connection for each sub-folder, i now open a connection for each file to be downloaded. It is less performant, but i get to pass this EOFError. I also found out that the FTP server which i want to download files from have restrictions for example on the number of parallel connections or how long a connection may last.

这篇关于从服务器下载文件时处理EOFError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆