在列表中多处理多个文件 [英] Multiprocess multiple files in a list
问题描述
我正在尝试读取列表中包含N个.csv文件同时存储在列表中。
I am trying to read a list that contains N number of .csv files stored in a list synchronously.
现在我执行以下操作:
import multiprocess
import multiprocess
- 清空列表
- 附加.csv的listdir的列表
- def A (list [:: 2])
- def B() - odd files(list [1 :: 2]
- ()
-
处理2 def B()
- Empty list
- Append list with listdir of .csv's
- def A() -- even files (list[::2])
- def B() -- odd files (list[1::2]
- Process 1 def A()
Process 2 def B()
def read_all_lead_files(folder):
for files in glob.glob(folder+"*.csv"):
file_list.append(files)
def read_even():
file_list[::2]
def read_odd():
file_list[1::2]
p1 = Process(target=read_even)
p1.start()
p2 = Process(target=read_odd)
p2.start()
是否有更快的方法将列表的分区分割成Process函数?
Is there a faster way to split up the partitioning of the list to Process function?
推荐答案
我根据你的要求在这里猜猜,因为原来的问题很不清楚。由于 os.listdir
不保证排序,我假设你的两个函数实际上是相同的,你只需要同时对多个文件执行相同的过程。
I'm guessing here at your request, because the original question is quite unclear. Since os.listdir
doesn't guarantee an ordering, I'm assuming your "two" functions are actually identical and you just need to perform the same process on multiple files simultaneously.
根据我的经验,最简单的方法是启动池
,启动一个进程每个文件,然后等待。例如
The easiest way to do this, in my experience, is to spin up a Pool
, launch a process for each file, and then wait. e.g.
import multiprocessing
def process(file):
pass # do stuff to a file
p = multiprocessing.Pool()
for f in glob.glob(folder+"*.csv"):
# launch a process for each file (ish).
# The result will be approximately one process per CPU core available.
p.apply_async(process, [f])
p.close()
p.join() # Wait for all child processes to close.
这篇关于在列表中多处理多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!