关于使用多重处理读取文件 [英] About using multiprocessing to read file
问题描述
我的文件夹中有很多文件,所以我认为我应该使用多进程, 然后我使用多进程读取文件夹中的txt文件, 但是我比较是否使用多进程的时间, 我发现如果不使用游泳池,速度会更快.我不知道为什么 那么我应该在什么情况下使用Pool读取文件(大文件?)
I have many files in the folders,so I think I should use multiprocess , then I use multiprocess to read txt file in the folder, But I compare the time if I used multiprocess or not , I found if I don't use pool is more fast. I don't know why , so what situation should I use Pool to read file( huge files?)
using Pool
time:0.5836s
not using Pool
time:0.0076s
代码是,
import pandas as pd
from multiprocessing import Pool
import glob2,os,time
class PandasReadFile:
def __init__(self):
print('123')
def readFilePool(self,path):
n,t=0,time.time()
print(t)
pp = Pool(processes=1)
# here is using pool
df = pd.concat(pp.map(self.read_csv, glob2.iglob(os.path.join(path, "*.txt"))))
# not using pool
# df = pd.concat(map(pd.read_csv, glob2.iglob(os.path.join(path, "*.txt"))))
t = time.time() - t
print('%.4fs' % (t))
print(df)
@staticmethod
def read_csv(filename):
return pd.read_csv(filename)
if __name__ == '__main__':
p = PandasReadFile()
p.readFilePool('D:/')
推荐答案
您可以生成任意数量的进程,但是由于您使用同一硬盘驱动器,因此不会减少时间.更糟糕:您会浪费时间.
You can spawn as many processes as you want but since you work on the same hard drive, you won't reduce time. Worse: you will loose time.
您可以将多处理用于CPU密集型任务,而不是IO密集型任务.
You can use multiprocessing for CPU-intensive tasks, not for IO-intensive tasks.
如果将文件从一个驱动器复制到另一个驱动器,则可以通过两个过程来减少时间.它也可以与已安装的网络驱动器(NAS)一起使用.
You may reduce time with two processes if you copy files from one drive to another. It may also work with mounted network drives (NAS).
这篇关于关于使用多重处理读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!