Python多进程/多线程可加快文件复制 [英] Python multiprocess/multithreading to speed up file copying

查看:319
本文介绍了Python多进程/多线程可加快文件复制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个程序,可以将大量文件从一个位置复制到另一个位置-我说的是100,000个以上的文件(此刻,我正在按图像顺序复制314g).它们都位于极端的大型,非常快速的网络存储RAID上.我正在使用shutil顺序复制文件,这需要一些时间,因此我试图找到最佳方法来对此进行优化.我注意到有些软件可以有效地使用多线程从网络中读取文件,从而大大缩短了加载时间,因此我想尝试在python中进行此操作.

I have a program which copies large numbers of files from one location to another - I'm talking 100,000+ files (I'm copying 314g in image sequences at this moment). They're both on huge, VERY fast network storage RAID'd in the extreme. I'm using shutil to copy the files over sequentially and it is taking some time, so I'm trying to find the best way to opimize this. I've noticed some software I use effectively multi-threads reading files off of the network with huge gains in load times so I'd like to try doing this in python.

我没有编程多线程/多进程的经验-这看起来像是继续进行的正确领域吗?如果是这样,最好的方法是什么?我看过其他一些关于在python中复制线程文件的SO帖子,它们似乎都说您没有速度提高,但是考虑到我的硬件,我认为情况不会如此.目前,我的IO上限还差得远,资源只占1%左右(我在本地有40个内核和64g的RAM).

I have no experience with programming multithreading/multiprocessesing - does this seem like the right area to proceed? If so what's the best way to do this? I've looked around a few other SO posts regarding threading file copying in python and they all seemed to say that you get no speed gain, but I do not think this will be the case considering my hardware. I'm nowhere near my IO cap at the moment and resources are sitting around 1% (I have 40 cores and 64g of RAM locally).

  • 喷枪

推荐答案

更新:

我从未使Gevent正常工作(第一个答案),因为我无法在没有互联网连接的情况下安装该模块,而我的工作站上却没有.但是,仅使用python内置线程就可以将文件复制时间减少8(我已经学会了如何使用),并且我想将其发布为对感兴趣的任何人的附加答案!这是下面的代码,可能需要注意的是,由于硬件/网络设置的不同,我的8倍复制时间很可能因环境而异.

I never did get Gevent working (first answer) because I couldn't install the module without an internet connection, which I don't have on my workstation. However I was able to decrease file copy times by 8 just using the built in threading with python (which I have since learned how to use) and I wanted to post it up as an additional answer for anyone interested! Here's my code below, and it is probably important to note that my 8x copy time will most likely differ from environment to environment due to your hardware/network set-up.

import Queue, threading, os, time
import shutil

fileQueue = Queue.Queue()
destPath = 'path/to/cop'

class ThreadedCopy:
    totalFiles = 0
    copyCount = 0
    lock = threading.Lock()

    def __init__(self):
        with open("filelist.txt", "r") as txt: #txt with a file per line
            fileList = txt.read().splitlines()

        if not os.path.exists(destPath):
            os.mkdir(destPath)

        self.totalFiles = len(fileList)

        print str(self.totalFiles) + " files to copy."
        self.threadWorkerCopy(fileList)


    def CopyWorker(self):
        while True:
            fileName = fileQueue.get()
            shutil.copy(fileName, destPath)
            fileQueue.task_done()
            with self.lock:
                self.copyCount += 1
                percent = (self.copyCount * 100) / self.totalFiles
                print str(percent) + " percent copied."

    def threadWorkerCopy(self, fileNameList):
        for i in range(16):
            t = threading.Thread(target=self.CopyWorker)
            t.daemon = True
            t.start()
        for fileName in fileNameList:
            fileQueue.put(fileName)
        fileQueue.join()

ThreadedCopy()

这篇关于Python多进程/多线程可加快文件复制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆