多线程文件复制比多核CPU上的单线程慢得多 [英] Multithreaded file copy is far slower than a single thread on a multicore CPU

查看:225
本文介绍了多线程文件复制比多核CPU上的单线程慢得多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在Python中编写一个多线程程序,以加速(1000以下).csv文件的复制。多线程代码的运行速度甚至比顺序方法慢。我用 profile.py 定时代码。我相信我必须做错事,但我不知道什么。



环境:




  • 四核CPU 。

  • 2个硬盘,其中一个包含源文件。另一个是目标。

  • 1000个csv文件,大小从几KB到10 MB。



方法:



我将所有文件路径放在队列中,并创建4-8个工作线程队列并复制指定的文件。在任何情况下,多线程代码都不会更快:




  • 顺序复制需要150-160秒

  • 复制需要230秒以上



我假设这是一个I / O绑定任务,因此多线程应该有助于提高操作速度。 >

代码:

  import Queue 
import threading
import cStringIO
import os
import shutil
import timeit#time代码exec使用gc disable
import glob#file wildcards list,glob.glob ('* .py')
import profile#

fileQueue = Queue.Queue()#global
srcPath ='C:\\temp'
destPath ='D:\\temp'
tcnt = 0
ttotal = 0

def CopyWorker():
while True:
fileName = fileQueue.get()
fileQueue.task_done()
shutil.copy(fileName,destPath)
#tcnt + = 1
print'copied:',tcnt,'of ',ttotal

def threadWorkerCopy(fileNameList):
print'threadWorkerCopy:',len(fileNameList)
ttotal = len(fileNameList)
for i in range 4):
t = threading.Thread(target = CopyWorker)
t.daemon = True
t.start()
fileNameList中的fileName:
fileQueue.put (fileName)
fileQueue.join()

def sequentialCopy(fileNameList):
#around 160.446 seconds,152 seconds
print'sequentialCopy:',len )
cnt = 0
ctotal = len(fileNameList)
fileNameList中的fileName:
shutil.copy(fileName,destPath)
cnt + = 1
print'copied:',cnt,'of',ctotal

def main():
print'this is main method'
fileCount = 0
fileList = glob.glob(srcPath +'\\'+'* .csv')
#sequentialCopy(fileList)
threadWorkerCopy(fileList)

如果__name__ ==' __main__':
profile.run('main()')


解决方案

当然,它更慢。硬盘驱动器不得不不断地在文件之间寻找。你相信多线程会使这个任务更快是完全没有道理的。限制速度是您从磁盘读取数据或向磁盘写入数据的速度,以及从一个文件到另一个文件的每一次搜索都是花费在传输数据上的时间损失。


I am trying to write a multithreaded program in Python to accelerate the copying of (under 1000) .csv files. The multithreaded code runs even slower than the sequential approach. I timed the code with profile.py. I am sure I must be doing something wrong but I'm not sure what.

The Environment:

  • Quad core CPU.
  • 2 hard drives, one containing source files. The other is the destination.
  • 1000 csv files ranging in size from several KB to 10 MB.

The Approach:

I put all the file paths in a Queue, and create 4-8 worker threads pull file paths from the queue and copy the designated file. In no case is the multithreaded code faster:

  • sequential copy takes 150-160 seconds
  • threaded copy takes over 230 seconds

I assume this is an I/O bound task, so multithreading should help the operation speed.

The Code:

    import Queue
    import threading
    import cStringIO 
    import os
    import shutil
    import timeit  # time the code exec with gc disable
    import glob    # file wildcards list, glob.glob('*.py')
    import profile # 

    fileQueue = Queue.Queue() # global
    srcPath  = 'C:\\temp'
    destPath = 'D:\\temp'
    tcnt = 0
    ttotal = 0

    def CopyWorker():
        while True:
            fileName = fileQueue.get()
            fileQueue.task_done()
            shutil.copy(fileName, destPath)
            #tcnt += 1
            print 'copied: ', tcnt, ' of ', ttotal

    def threadWorkerCopy(fileNameList):
        print 'threadWorkerCopy: ', len(fileNameList)
        ttotal = len(fileNameList)
        for i in range(4):
            t = threading.Thread(target=CopyWorker)
            t.daemon = True
            t.start()
        for fileName in fileNameList:
            fileQueue.put(fileName)
        fileQueue.join()

    def sequentialCopy(fileNameList):
        #around 160.446 seconds, 152 seconds
        print 'sequentialCopy: ', len(fileNameList)
        cnt = 0
        ctotal = len(fileNameList)
        for fileName in fileNameList:
            shutil.copy(fileName, destPath)
            cnt += 1
            print 'copied: ', cnt, ' of ', ctotal

    def main():
        print 'this is main method'
        fileCount = 0
        fileList = glob.glob(srcPath + '\\' + '*.csv')
        #sequentialCopy(fileList)
        threadWorkerCopy(fileList)

    if __name__ == '__main__':
        profile.run('main()')

解决方案

Of course it's slower. The hard drives are having to seek between the files constantly. Your belief that multi-threading would make this task faster is completely unjustified. The limiting speed is how fast you can read data from or write data to the disk, and every seek from one file to another is a loss of time that could have been spent transferring data.

这篇关于多线程文件复制比多核CPU上的单线程慢得多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆