使用精确的连接数量进行多处理FTP上传 [英] Multiprocessing FTP Uploading With A Precise Number of Connections

查看:238
本文介绍了使用精确的连接数量进行多处理FTP上传的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我已经能够使用多重处理一次将多个文件上传到给定服务器,并具有以下两个功能:

  import ftplib,multiprocessing,subprocess 

def upload(t):
server = locker.server,user = locker.user,password = locker.password,service = locker.service #这些都只是返回代表我将需要的各个领域的字符串。
ftp = ftplib.FTP(服务器)
ftp.login(user = user,passwd = password,acct =)
ftp.storbinary(STOR+ t.split(' /')[ - 1],open(t,rb))
ftp.close()#似乎没有必要,同样的事情发生,无论我是否关闭

def ftp_upload(t = files,server = locker.server,user = locker.user,password = locker.password,service = locker.service):
parsed_targets = parse_it(t)
ftp = ftplib.FTP(server)
ftp.login(user = user,passwd = password,acct =)
remote_files = ftp.nlst(。)
ftp.close()
files_already_on_server = [f for f in t if f.split(/)[ - 1] in remote_files]
files_to_upload = [f for f in t if not f in files_already_on_server]
connections_to_make = 3#允许的最大连接数为5,即使使用1
pool = multiprocessing.Pool(processes = connections_to_make)
pool.map(upload,files_to_upload )

我的问题是(我经常)最终得到如下错误:

 文件/usr/lib/python2.7/multiprocessing/ pool.py,第227行,在map 
中返回self.map_async(func,iterable,chunksize).get()
文件/usr/lib/python2.7/multiprocessing/pool.py ,第528行,得到
raise self._value
ftplib.error_temp:421来自这个IP的连接(5)太多

注意:偶尔也会出现超时错误,但我正在等待它再次追溯到它的丑陋头部,此时我会发布它。



当我使用命令行时(即, ftp -inv,open SERVER,user USERNAME PASSWORD,mput * .rar),甚至当我有(例如)3个同时运行的实例时。

我已经通读了ftplib和multiprocessing文档,我无法弄清楚它是什么导致了这些错误。这是一个问题,因为我经常备份大量数据和大量文件。


  1. 是否有一些我可以避免这些错误,或者是否有不同的方式让/脚本执行此操作?

  2. 有没有办法告诉脚本,如果它有这个错误,它应该等待一秒钟,然后恢复它的工作?

  3. 有没有一种方法可以让脚本按照它们在列表中的顺序上传文件(当然,速度差异意味着他们不会总是4个连续的文件,但目前这个订单似乎基本上是随机的)?
  4. 有人可以解释为什么/如何将更多的连接同时作为这台服务器而不是脚本正在调用?

因此,只处理异常似乎正在工作(除偶尔的递归错误...仍然有没有他妈的想法到底是怎么回事)



根据#3,我没有寻找100%的顺序,只是脚本会选择列表中的下一个文件上传(因此,处理速度上的差异可能会/会导致顺序不完全顺序,因此可变性会比目前的系统似乎几乎是无序的)。

解决方案

您可以尝试使用一个 ftp 每个进程的实例:

  def init(* credentials):
全局ftp
服务器,用户,密码,acct =凭证
ftp = ftplib.FTP(服务器)
ftp.login(user = user,passwd = password,acct = acct)

def open(path):
with open(path,'rb')as file:
try:
ftp.storbinary(STOR+ os.path.basename(path) ,file)
,除了ftplib.error_temp为错误:#处理临时错误
返回路径,错误
else:
返回路径,无

def main ():
#...
pool = multiprocessing.Pool(processes = connections_to_make,
initializer = init,initargs = credentials)
为path,pool.imap_unordered(upload,files_to_upload)中的错误:
如果错误不是无:
print(未能上传%s%(path,))


So, I've been able to use multiprocessing to upload multiple files at once to a given server with the following two functions:

import ftplib,multiprocessing,subprocess

def upload(t):
    server=locker.server,user=locker.user,password=locker.password,service=locker.service #These all just return strings representing the various fields I will need.
    ftp=ftplib.FTP(server)
    ftp.login(user=user,passwd=password,acct="")
    ftp.storbinary("STOR "+t.split('/')[-1], open(t,"rb"))
    ftp.close() # Doesn't seem to be necessary, same thing happens whether I close this or not

def ftp_upload(t=files,server=locker.server,user=locker.user,password=locker.password,service=locker.service):
    parsed_targets=parse_it(t)
    ftp=ftplib.FTP(server)
    ftp.login(user=user,passwd=password,acct="")
    remote_files=ftp.nlst(".")
    ftp.close()
    files_already_on_server=[f for f in t if f.split("/")[-1] in remote_files]
    files_to_upload=[f for f in t if not f in files_already_on_server]
    connections_to_make=3 #The maximum connections allowed the the server is 5, and this error will pop up even if I use 1
    pool=multiprocessing.Pool(processes=connections_to_make)
    pool.map(upload,files_to_upload)

My problem is that I (very regularly) end up getting errors such as:

File "/usr/lib/python2.7/multiprocessing/pool.py", line 227, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 528, in get
    raise self._value
ftplib.error_temp: 421 Too many connections (5) from this IP

Note: There's also a timeout error that occasionally occurs, but I'm waiting for it to rear it's ugly head again, at which point I'll post it.

I don't get this error when I use the command line (i.e. "ftp -inv", "open SERVER", "user USERNAME PASSWORD", "mput *.rar"), even when I have (for example) 3 instances of this running at once.

I've read through the ftplib and multiprocessing documentation, and I can't figure out what it is that is causing these errors. This is somewhat of a problem because I'm regularly backing up a large amount of data and a large number of files.

  1. Is there some way I can avoid these errors or is there a different way of having the/a script do this?
  2. Is there a way I can tell the script that if it has this error, it should wait for a second, and then resume it's work?
  3. Is there a way I can have the script upload the files in the same order they are in the list (of course speed differences would mean they wouldn't all always be 4 consecutive files, but at the moment the order seems basically random)?
  4. Can someone explain why/how more connections are being simultaneously made to this server than the script is calling for?

So, just handling the exceptions seems to be working (except for the occasional recursion error...still have no fucking idea what the hell is going on there).

As per #3, I wasn't looking for that to be 100% in order, only that the script would pick the next file in the list to upload (so differences in processes speeds could/would still cause the order not to be completely sequential, there would be less variability than in the current system, which seems to be almost unordered).

解决方案

You could try to use a single ftp instance per process:

def init(*credentials):
    global ftp
    server, user, password, acct = credentials
    ftp = ftplib.FTP(server)
    ftp.login(user=user, passwd=password, acct=acct)

def upload(path):
    with open(path, 'rb') as file:
        try:
            ftp.storbinary("STOR " + os.path.basename(path), file)
        except ftplib.error_temp as error: # handle temporary error
            return path, error
        else:
            return path, None

def main():
    # ...
    pool = multiprocessing.Pool(processes=connections_to_make,
                                initializer=init, initargs=credentials)
    for path, error in pool.imap_unordered(upload, files_to_upload):
        if error is not None:
           print("failed to upload %s" % (path,))

这篇关于使用精确的连接数量进行多处理FTP上传的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆