用Python ftplib下载时忽略丢失的文件 [英] Ignore missing file while downloading with Python ftplib

查看:295
本文介绍了用Python ftplib下载时忽略丢失的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

I am trying to download a certain file (named 010010-99999-year.gz) from an FTP server.这个相同的文件,但不同的年份是居住在不同的FTP目录。例如:



ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite/2000/010010-99999-1973.gz $b$ b ).因此,我想出了这个代码:

  DEF FtpDownloader2(URL = ftp.ncdc.noaa.gov):$ B $b     ftp=FTP(url)         
ftp.login()
for year in range(1901,2015):
ftp.cwd(\"/pub/data/noaa/isd-lite \")
ftp.cwd(str(year))
fullStationId=\"010010-99999-%s.gz\" % year
try:
file=open(fullStationId,\" wb\")
ftp.retrbinary('RETR %s' % fullStationId, file.write)
print(\"File is available\")
file.close()
except:
打印( 文件不可用)
ftp.close()

This downloads the existing files (year 1973-2014) correctly, but it is also generating empty files for years 1901-1972.该文件不在1901-1972的FTP中。
Am I doing anything wrong in the use of try and except, or is it some other issue?

解决方案

I took your code and modified it a little:

from ftplib import FTP, error_perm 
import os

高清FtpDownloader2(URL = ftp.ncdc.noaa.gov):
FTP = FTP(URL)
ftp.login()
在一年范围内(1901 2015 ):
remote_file =/pub/data/noaa/isd-lite/{0}/010010-99999-{0}.gz'.format(year)
LOCAL_FILE = os.path.basename (remote_file)
尝试:
与开放(LOCAL_FILE, WB)作为file_handle:
ftp.retrbinary( 'RETR%s' 的%remote_file,file_handle.write)
打印('OK', local_file)
except error_perm:
print('ERR', local_file)
os.unlink(local_file)
ftp.close()



Notes




  • The most dangerous操作程序erson can do is to have an except clause without a specific exception class.这种类型的结构将忽略所有错误,从而难以排除故障。 To fix this, I added the specific exception error_perm

  • Once the exception occurred, I absolutely know for sure that the local file is closed because the with statement guarantees that

  • I removed the local file if error_perm exception occurred, a sign that the file is not available from the server

  • I removed the code to change directories: for each year, you cwd twice which slows down the process

  • range(1901, 2015) will not include 2015. If you want it, you have至指定范围(1901,2016)

  • 予改善了打印语句包含的文件名,使得更易于追踪哪些是可用的,哪些是不



更新



本更新回答您关于不创建空的本地文件(然后不得不删除它们)的问题。 There are a couple of different ways:


  1. Query the remote file’s existence before downloading.只有在远程存在时才创建本地文件。 The problem with this approach is querying a remote file takes longer than creating/deleting a local file.

  2. Create a string buffer (StringIO), download to that buffer.当字符串缓冲区不为空时,只创建一个本地文件。 The problem with this approach is you are writing the same data twice: once to the string buffer, and once from the string buffer to the file.


I am trying to download a certain file (named 010010-99999-year.gz) from an FTP server. This same file, but for different years is residing in different FTP directories. For instance:

ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite/2000/010010-99999-1973.gz ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite/2001/010010-99999-1974.gz and so on. The picture illustrates one of the directories:

The file is not located in all the directories (i.e. all years). In such case I want the script to ignore that missing files, print "not available", and continue with the next directory (i.e. next year). I could do this using the NLST listing by first generating a list of files in the current FTP directory and then checking if my file is on that list, but that is slow, and NOAA (the organization owning the server) does not like file listing (source). Therefore I came up with this code:

def FtpDownloader2(url="ftp.ncdc.noaa.gov"):
    ftp=FTP(url)        
    ftp.login()
    for year in range(1901,2015):
        ftp.cwd("/pub/data/noaa/isd-lite")
        ftp.cwd(str(year))
        fullStationId="010010-99999-%s.gz" % year
        try:              
            file=open(fullStationId,"wb")
            ftp.retrbinary('RETR %s' % fullStationId, file.write)
            print("File is available")
            file.close()
        except: 
            print("File not available")
    ftp.close()

This downloads the existing files (year 1973-2014) correctly, but it is also generating empty files for years 1901-1972. The file is not in the FTP for 1901-1972. Am I doing anything wrong in the use of try and except, or is it some other issue?

解决方案

I took your code and modified it a little:

from ftplib import FTP, error_perm
import os

def FtpDownloader2(url="ftp.ncdc.noaa.gov"):
    ftp = FTP(url)
    ftp.login()
    for year in range(1901, 2015):
        remote_file = '/pub/data/noaa/isd-lite/{0}/010010-99999-{0}.gz'.format(year)
        local_file = os.path.basename(remote_file)
        try:
            with open(local_file, "wb") as file_handle:
                ftp.retrbinary('RETR %s' % remote_file, file_handle.write)
            print('OK', local_file)
        except error_perm:
            print('ERR', local_file)
            os.unlink(local_file)
    ftp.close()

Notes

  • The most dangerous operation a person can do is to have an except clause without a specific exception class. This type of construct will ignore all errors, making it hard to troubleshoot. To fix this, I added the specific exception error_perm
  • Once the exception occurred, I absolutely know for sure that the local file is closed because the with statement guarantees that
  • I removed the local file if error_perm exception occurred, a sign that the file is not available from the server
  • I removed the code to change directories: for each year, you cwd twice which slows down the process
  • range(1901, 2015) will not include 2015. If you want it, you have to specify range(1901, 2016)
  • I improved the print statements to include the file names, making it easier to track which ones are available and which ones are not

Update

This update answers your question regarding not creating empty local file (then having to delete them). There are a couple of different ways:

  1. Query the remote file's existence before downloading. Only create the local file when the remote exists. The problem with this approach is querying a remote file takes longer than creating/deleting a local file.
  2. Create a string buffer (StringIO), download to that buffer. Only create a local file when that string buffer is not empty. The problem with this approach is you are writing the same data twice: once to the string buffer, and once from the string buffer to the file.

这篇关于用Python ftplib下载时忽略丢失的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆