关于使用多重处理读取文件 [英] About using multiprocessing to read file

查看:48
本文介绍了关于使用多重处理读取文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的文件夹中有很多文件,所以我认为我应该使用多进程, 然后我使用多进程读取文件夹中的txt文件, 但是我比较是否使用多进程的时间, 我发现如果不使用游泳池,速度会更快.我不知道为什么 那么我应该在什么情况下使用Pool读取文件(大文件?)

I have many files in the folders,so I think I should use multiprocess , then I use multiprocess to read txt file in the folder, But I compare the time if I used multiprocess or not , I found if I don't use pool is more fast. I don't know why , so what situation should I use Pool to read file( huge files?)

using Pool
time:0.5836s
not using Pool
time:0.0076s

代码是,

import pandas as pd
from multiprocessing import Pool
import glob2,os,time

class PandasReadFile:

    def __init__(self):
        print('123')

    def readFilePool(self,path):
        n,t=0,time.time()
        print(t)
        pp = Pool(processes=1)

        # here is using pool
        df = pd.concat(pp.map(self.read_csv, glob2.iglob(os.path.join(path, "*.txt"))))
        # not using pool
        # df = pd.concat(map(pd.read_csv, glob2.iglob(os.path.join(path, "*.txt"))))
        t = time.time() - t
        print('%.4fs' % (t))
        print(df)

    @staticmethod
    def read_csv(filename):
        return pd.read_csv(filename)

if __name__ == '__main__':
    p = PandasReadFile()
    p.readFilePool('D:/')

推荐答案

您可以生成任意数量的进程,但是由于您使用同一硬盘驱动器,因此不会减少时间.更糟糕:您会浪费时间.

You can spawn as many processes as you want but since you work on the same hard drive, you won't reduce time. Worse: you will loose time.

您可以将多处理用于CPU密集型任务,而不是IO密集型任务.

You can use multiprocessing for CPU-intensive tasks, not for IO-intensive tasks.

如果将文件从一个驱动器复制到另一个驱动器,则可以通过两个过程来减少时间.它也可以与已安装的网络驱动器(NAS)一起使用.

You may reduce time with two processes if you copy files from one drive to another. It may also work with mounted network drives (NAS).

这篇关于关于使用多重处理读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆