从服务器下载文件时处理EOFError [英] Deal with EOFError while downloading files from server
问题描述
下载数十万个xmls文件(大小从字节到50 MB /文件),结构像这样/年使用ftplib的月/日/月/日/文件。
所以我循环每个小时文件夹给定的一天,并为每一个我存储所有的文件名与ftp.nlst(),然后我循环通过每个文件名,我donwload像这样的有关文件。
with open(local_file,'wb')as fhandle:
try:
ftp.retrbinary('RETR'+文件名,fhandle.write)
除EOFError:
try:
fhandle.close()
os.remove(local_file)
ftp = ftplib.FTP()
ftp.connect(self.remote_host,self.port,timeout = 60)
ftp.login(self.username,self.passwd,acct =)
ftp.cwd(self。 ('RETR'+文件名,fhandle.write,8192)
将b $ b打开(local_file,'wb')作为fhandle:
ftp.retrbinary('RETR'+ filename,
除外:
self.log.error('我放弃!!!')
预期:
对于输入文件夹中的每一天,下载所有相关的xml文件
我得到了:
- 尽管所有可能的帖子都是关于
stackoverflow和一般网络的主题的, - 我试图关闭并打开小时文件夹中每个子文件夹的
a ne连接。 - 它似乎不是一个导致问题的特定文件。这绝对不是第一个。当使用
ftp.retrbinary()
下载文件时,我得到这个EOFError
。这与我下载成千上万个xml文件的事实有关,因为我已经用2000个文件测试了这个脚本,并且我没有任何例外,但总是有大约287000个文件。而我不明白的是,脚本每次下载相同数量/数量的xml文件,大约159 000次,而且它始终是 -
我尝试过玩
ftp.retrbinary('RETR'+ filename,fhandle.write,4096)
问题: 是因为我错过了什么?
如何处理这个EOFError,继续下载我所有的文件......并且不会失去理智。
我为我的问题找到了解决方案。而不是打开每个子文件夹的连接,现在我打开每个要下载的文件的连接。这是性能较差,但我得到通过这个 EOFError
。
我也发现我想要下载文件的FTP服务器有限制,例如并行连接的数量或连接可能会持续多久。
Use Case:
Dowload hundred of thousands of xmls files (size from bytes to 50 mb/file) structured like this /year-month/year-month-day/hours/files with ftplib. So i loop through each hour folder for a given day and for each one i store all the filenames with ftp.nlst(), then i loop through each filename and i donwload the concerned file like this.
with open(local_file, 'wb') as fhandle:
try:
ftp.retrbinary('RETR ' + filename, fhandle.write)
except EOFError:
try:
fhandle.close()
os.remove(local_file)
ftp = ftplib.FTP()
ftp.connect(self.remote_host,self.port, timeout=60)
ftp.login(self.username, self.passwd, acct="")
ftp.cwd(self.input_folder + '/' + subdir)
try:
with open(local_file, 'wb') as fhandle:
ftp.retrbinary('RETR ' + filename, fhandle.write, 8192)
except:
self.log.error('i give up !!!')
Expected:
For each day given as input folder, download all the concerned xml files
what i get:
EOFError
What i already tried:
- I have gone though all possible posts about the subject on stackoverflow and the net in general.
- i have tried to close and open a ne connection for each subfolder in the hour folder.
- It doesn't seem to be one specific file that is causing the problem. It is definitely not the first one. i get this
EOFError
while downloading files withftp.retrbinary()
. It is related to the fact that i download hundred of thousands of xmls files, because i have tested the script with 2000 files and i didn't got any exceptions but with around 287000 files i get it always. And what i don't understand is that the script downloads each time the same amount/number of xml files, around 159 000 and it is always I have tried to play with the buffersize in
ftp.retrbinary('RETR ' + filename, fhandle.write,4096)
Question:
it may be that i have missed something? How to handle this EOFError to continue downloading all my files...and without loosing my sanity.
Finally i found a solution for my problem. Instead of opening a connection for each sub-folder, i now open a connection for each file to be downloaded. It is less performant, but i get to pass this EOFError
.
I also found out that the FTP server which i want to download files from have restrictions for example on the number of parallel connections or how long a connection may last.
这篇关于从服务器下载文件时处理EOFError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!