Python:成功下载文件后,使用ftplib的文件下载将永远挂起 [英] Python: File download using ftplib hangs forever after file is successfully downloaded

查看:159
本文介绍了Python:成功下载文件后,使用ftplib的文件下载将永远挂起的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试解决从ftp / ftps下载文件的问题。文件下载成功,但文件下载完成后未执行任何操作。没有发生错误,可以提供有关此问题的更多信息。
我尝试在stackoverflow上搜索它,并发现了这个链接讨论类似的问题陈述,尽管我不确定,但看起来我也面临类似的问题。需要更多的帮助来解决该问题。

I have been trying to troubleshoot an issue where in when we are downloading a file from ftp/ftps. File gets downloaded successfully but no operation is performed post file download completion. No error has occurred which could give more information about the issue. I tried searching for this on stackoverflow and found this link which talks about similar problem statement and looks like I am facing similar issue, though I am not sure. Need little more help in resolving the issue.

我尝试将FTP连接超时设置为60分钟,但帮助较少。
在此之前,我使用的是ftplib的retrbinary(),但在那里同样发生了问题。我尝试传递不同的块大小和窗口大小,但是也存在可重现的问题。

I tried setting the FTP connection timeout to 60mins but of less help. Prior to this I was using retrbinary() of the ftplib but same issue occurs there. I tried passing different blocksize and windowsize but with that also issue was reproducible.

我试图从AWS EMR集群下载大小约为3GB的文件。示例代码如下所示。

I am trying to download the file of size ~3GB from AWS EMR cluster. Sample code is written below.

    def download_ftp(self, ip, port, user_name, password, file_name, target_path):
    try:
        os.chdir(target_path)
        ftp = FTP(host=ip)
        ftp.connect(port=int(port), timeout=3000)
        ftp.login(user=user_name, passwd=password)

        if ftp.nlst(file_name) != []:
            dir = os.path.split(file_name)
            ftp.cwd(dir[0])
            for filename in ftp.nlst(file_name):
                sock = ftp.transfercmd('RETR ' + filename)

                def background():
                    fhandle = open(filename, 'wb')
                    while True:
                        block = sock.recv(1024 * 1024)
                        if not block:
                            break
                        fhandle.write(block)
                    sock.close()

                t = threading.Thread(target=background)
                t.start()
                while t.is_alive():
                    t.join(60)
                    ftp.voidcmd('NOOP')
                logger.info("File " + filename + " fetched successfully")
            return True
        else:
            logger.error("File " + file_name + " is not present in FTP")

    except Exception, e:
        logger.error(e)
        raise

上述链接中建议的另一个选项是在下载文件的小块后关闭连接,然后重新启动连接。有人可以建议如何实现这一点,不确定在关闭连接之前如何从上次停止文件下载的同一点继续下载。这种方法是否可以完全下载整个文件。

Another option suggested in the above mentioned link is to close the connection post downloading small chunk of the file and then restart the connection. Can someone suggest how can this be achieved, not sure how to resume the download from the same point where the file download was stopped last time before closing the connection. Will this method be full proof of downloading the entire file.

我对FTP服务器级别的超时设置了解不多,所以不知道该怎么做以及需要什么方式被改变。我基本上想写一个通用的FTP下载程序,它可以帮助从FTP / FTPS下载文件。

I don't know much about FTP server level timeout settings so didn't know what and how it needs to be altered. I basically want to write a generic FTP down-loader which can help in downloading the files from FTP/FTPS.

当我使用ftplib的retrbinary()方法并设置debug时级别为2。

When I use retrbinary() method of ftplib and set debug level to 2.

ftp.set_debuglevel(2)
ftp.retrbinary('RETR ' + filename, fhandle.write)

正在打印以下日志。

cmd 'TYPE I'
put 'TYPE I\r\n'
get '200类型设置为I.\r\n'
resp '200类型设置为I.'
cmd 'PASV'
put 'PASV\r\n'
get '227进入被动模式(64,27,160,28,133,251)。 \r\n'
resp '227进入被动模式(64,27,160,28,133,251)。'
cmd 'RETR FFFT_BRA_PM_R_201711.txt '
put 'RETR FFFT_BRA_PM_R_201711.txt\r\n'
get '150为FFFT_BRA_PM_R_201711.txt打开BINARY模式数据连接。 \r\n'
resp '150开幕B '

cmd 'TYPE I' put 'TYPE I\r\n' get '200 Type set to I.\r\n' resp '200 Type set to I.' cmd 'PASV' put 'PASV\r\n' get '227 Entering Passive Mode (64,27,160,28,133,251).\r\n' resp '227 Entering Passive Mode (64,27,160,28,133,251).' cmd 'RETR FFFT_BRA_PM_R_201711.txt' put 'RETR FFFT_BRA_PM_R_201711.txt\r\n' get '150 Opening BINARY mode data connection for FFFT_BRA_PM_R_201711.txt.\r\n' resp '150 Opening BINARY mode data connection for FFFT_BRA_PM_R_201711.txt.'


推荐答案

在进行任何操作之前,请注意您的连接有很多问题,诊断并修复该问题比解决该问题要好得多。但是有时候,您只需要处理损坏的服务器,甚至发送keepalive都无济于事。因此,您该怎么办?

Before doing anything, note that there is something very wrong with your connection, and diagnosing that and getting it fixed is far better than working around it. But sometimes, you just have to deal with a broken server, and even sending keepalives doesn't help. So, what can you do?

诀窍是一次下载一个块,然后中止下载,或者,如果服务器无法处理中止,则关闭并重新打开连接。

The trick is to download a chunk at a time, then abort the download—or, if the server can't handle aborting, close and reopen the connection.

请注意,我正在使用 ftp://speedtest.tele2.net/5MB.zip ,希望这不会导致一百万人开始锤击服务器。当然,您需要在实际服务器上进行测试。

Note that I'm testing everything below with ftp://speedtest.tele2.net/5MB.zip, which hopefully this doesn't cause a million people to start hammering their servers. Of course you'll want to test it with your actual server.

当然,整个解决方案都取决于服务器是否能够恢复传输,但并非所有服务器都能做到—特别是当您处理严重损坏的东西时。因此,我们需要对此进行测试。请注意,此测试将非常慢,并且在服务器上非常繁重,因此请不要使用3GB的文件进行测试。找到小得多的东西。另外,如果您可以在其中放置一些可读的内容,则将有助于调试,因为您可能无法在十六进制编辑器中比较文件。

The entire solution of course relies on the server being able to resume transfers, which not all servers can do—especially when you're dealing with something badly broken. So we'll need to test for that. Note that this test will be very slow, and very heavy on the server, so do not testing with your 3GB file; find something much smaller. Also, if you can put something readable there, it will help for debugging, because you may be stuck comparing files in a hex editor.

def downit():
    with open('5MB.zip', 'wb') as f:
        while True:
            ftp = FTP(host='speedtest.tele2.net', user='anonymous', passwd='test@example.com')
            pos = f.tell()
            print(pos)
            ftp.sendcmd('TYPE I')
            sock = ftp.transfercmd('RETR 5MB.zip', rest=pos)
            buf = sock.recv(1024 * 1024)
            if not buf:
                return
            f.write(buf)

您可能一次不会获得1MB的存储空间小于8KB。假设您看到的是1448,然后是2896、4344,等等。

You will probably not get 1MB at a time, but instead something under 8KB. Let's assume you're seeing 1448, then 2896, 4344, etc.


  • 如果您从中获得例外REST ,服务器无法处理恢复,放弃了。

  • 如果文件超出实际文件大小,请按^ C,并在十六进制编辑器中检查它。


    • 如果一遍又一遍地看到相同的1448字节或其他内容(看到的数量,则打印出来),那么,您的操作就变了。 li>
    • 如果您拥有正确的数据,但是在每个1448字节的块之间有额外的字节,那实际上是可以修复的。如果您遇到了这个问题,却又找不到使用 f.seek 来解决它的方法,我可以解释一下,但是您可能不会遇到它。

    • If you get an exception from the REST, the server does not handle resuming—give up, you're hosed.
    • If the file goes on past the actual file size, hit ^C, and check it in a hex editor.
      • If you see the same 1448 bytes or whatever (the amount you saw it printing out) over and over again, again, you're hosed.
      • If you have the right data, but with extra bytes between each chunk of 1448 bytes, that's actually fixable. If you run into this and can't figure out how to fix it by using f.seek, I can explain—but you probably won't run into it.

      我们可以做的一件事是尝试中止下载并重新连接。

      One thing we can do is try to abort the download and not reconnect.

      def downit():
          with open('5MB.zip', 'wb') as f:
              ftp = FTP(host='speedtest.tele2.net', user='anonymous', passwd='test@example.com')
              while True:
                  pos = f.tell()
                  print(pos)
                  ftp.sendcmd('TYPE I')
                  sock = ftp.transfercmd('RETR 5MB.zip', rest=pos)
                  buf = sock.recv(1024 * 1024)
                  if not buf:
                      return
                  f.write(buf)
                  sock.close()
                  ftp.abort()
      

      您将要尝试多种变体:


      • sock.clos e

      • ftp.abort

      • ftp.abort 之后的 sock.close

      • 带有<$ c $在 sock.close 之后是c> ftp.abort 。

      • 以上所有四个都以<$重复c $ c> TYPE I 而不是每次都移到循环之前。

      • No sock.close.
      • No ftp.abort.
      • With sock.close after ftp.abort.
      • With ftp.abort after sock.close.
      • All four of the above repeated with TYPE I moved to before the loop instead of each time.

      有些会引发异常。其他人似乎永远挂死。如果这对所有8个人都是正确的,我们需要放弃堕胎。

      Some will raise exceptions. Others will just appear to hang forever. If that's true for all 8 of them, we need to give up on aborting. But if any of them works, great!

      加快速度的另一种方法是在中止或重新连接之前一次下载1MB(或更多)。只需替换以下代码即可:

      The other way to speed things up is to download 1MB (or more) at a time before aborting or reconnecting. Just replace this code:

      buf = sock.recv(1024 * 1024)
      if buf:
          f.write(buf)
      

      其中:

      chunklen = 1024 * 1024
      while chunklen:
          print('   ', f.tell())
          buf = sock.recv(chunklen)
          if not buf:
              break
          f.write(buf)
          chunklen -= len(buf)
      

      现在,您每次读取最多读取1MB内存,而不是每次读取1442或8192字节。

      Now, instead of reading 1442 or 8192 bytes for each transfer, you're reading up to 1MB for each transfer. Try pushing it farther.

      例如,如果您的下载失败10MB,并且您问题中的keepalive代码最多可以存储512MB的存储空间,但不足以容纳3GB的存储空间-您可以将两者结合使用。使用Keepalive一次读取512MB,然后中止或重新连接并读取下一个512MB,直到完成。

      If, say, your downloads were failing at 10MB, and the keepalive code in your question got things up to 512MB, but it just wasn't enough for 3GB—you can combine the two. Use keepalives to read 512MB at a time, then abort or reconnect and read the next 512MB, until you're done.

      这篇关于Python:成功下载文件后,使用ftplib的文件下载将永远挂起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆