并行文件解析，多个CPU内核 [英] parallel file parsing, multiple CPU cores

查看：83 发布时间：2020/5/24 20:54:38 python python-3.x parallel-processing

本文介绍了并行文件解析，多个CPU内核的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我早些时候问了一个相关但非常笼统的问题(尤其请参见此响应).

I asked a related but very general question earlier (see especially this response).

这个问题很具体.这就是我关心的所有代码:

This question is very specific. This is all the code I care about:

result = {}
for line in open('input.txt'):
  key, value = parse(line)
  result[key] = value

函数parse是完全独立的(即不使用任何共享资源).

The function parse is completely self-contained (i.e., doesn't use any shared resources).

我有Intel i7-920 CPU(4核，8个线程；我认为这些线程更相关，但我不确定).

I have Intel i7-920 CPU (4 cores, 8 threads; I think the threads are more relevant, but I'm not sure).

如何使我的程序使用此CPU的所有并行功能?

What can I do to make my program use all the parallel capabilities of this CPU?

我假设我可以打开该文件以便在8个不同的线程中进行读取，而不会造成太多的性能损失，因为磁盘访问时间相对于总时间而言很小.

I assume I can open this file for reading in 8 different threads without much performance penalty since disk access time is small relative to the total time.

推荐答案

cPython不提供您要轻松查找的线程模型.您可以使用multiprocessing模块和进程池得到类似的结果

cPython does not provide the threading model you are looking for easily. You can get something similar using the multiprocessing module and a process pool

这样的解决方案可能看起来像这样:

such a solution could look something like this:

def worker(lines):
    """Make a dict out of the parsed, supplied lines"""
    result = {}
    for line in lines.split('\n'):
        k, v = parse(line)
        result[k] = v
    return result

if __name__ == '__main__':
    # configurable options.  different values may work better.
    numthreads = 8
    numlines = 100

    lines = open('input.txt').readlines()

    # create the process pool
    pool = multiprocessing.Pool(processes=numthreads)

    # map the list of lines into a list of result dicts
    result_list = pool.map(worker, 
        (lines[line:line+numlines] for line in xrange(0,len(lines),numlines) ) )

    # reduce the result dicts into a single dict
    result = {}
    map(result.update, result_list)

这篇关于并行文件解析，多个CPU内核的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

并行文件解析，多个CPU内核 [英] parallel file parsing, multiple CPU cores

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

并行文件解析，多个CPU内核 [英] parallel file parsing, multiple CPU cores

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭