在列表中多处理多个文件 [英] Multiprocess multiple files in a list

查看:185
本文介绍了在列表中多处理多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试读取列表中包含N个.csv文件同时存储在列表中。

I am trying to read a list that contains N number of .csv files stored in a list synchronously.

现在我执行以下操作:

import multiprocess

import multiprocess


  1. 清空列表

  2. 附加.csv的listdir的列表

  3. def A (list [:: 2])

  4. def B() - odd files(list [1 :: 2]

  5. ()

  6. 处理2 def B()

  1. Empty list
  2. Append list with listdir of .csv's
  3. def A() -- even files (list[::2])
  4. def B() -- odd files (list[1::2]
  5. Process 1 def A()
  6. Process 2 def B()

def read_all_lead_files(folder):

    for files in glob.glob(folder+"*.csv"):
        file_list.append(files)
        def read_even():
           file_list[::2]    
        def read_odd():
           file_list[1::2]  

     p1 = Process(target=read_even)
     p1.start()
     p2 = Process(target=read_odd)
     p2.start()


是否有更快的方法将列表的分区分割成Process函数?

Is there a faster way to split up the partitioning of the list to Process function?

推荐答案

我根据你的要求在这里猜猜,因为原来的问题很不清楚。由于 os.listdir 不保证排序,我假设你的两个函数实际上是相同的,你只需要同时对多个文件执行相同的过程。

I'm guessing here at your request, because the original question is quite unclear. Since os.listdir doesn't guarantee an ordering, I'm assuming your "two" functions are actually identical and you just need to perform the same process on multiple files simultaneously.

根据我的经验,最简单的方法是启动,启动一个进程每个文件,然后等待。例如

The easiest way to do this, in my experience, is to spin up a Pool, launch a process for each file, and then wait. e.g.

import multiprocessing

def process(file):
    pass # do stuff to a file

p = multiprocessing.Pool()
for f in glob.glob(folder+"*.csv"):
    # launch a process for each file (ish).
    # The result will be approximately one process per CPU core available.
    p.apply_async(process, [f]) 

p.close()
p.join() # Wait for all child processes to close.

这篇关于在列表中多处理多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆