在python中使用多线程读取txt文件 [英] Read txt file with multi-threaded in python

查看：1197 发布时间：2020/5/13 21:04:00 python multithreading text-files

本文介绍了在python中使用多线程读取txt文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用python读取文件(扫描它的行并查找术语)并写入结果-假设每个术语都有计数器.我需要对大量文件(超过3000个)执行此操作.有可能做多线程吗?如果是，怎么办?

I'm trying to read a file in python (scan it lines and look for terms) and write the results- let say, counters for each term. I need to do that for a big amount of files (more than 3000). Is it possible to do that multi threaded? If yes, how?

因此，情况如下:

读取每个文件并扫描其行
将我已读取的所有文件的计数器写入相同的输出文件.

第二个问题是，它是否提高了读写速度.

Second question is, does it improve the speed of read/write.

希望这很清楚.谢谢，

罗恩.

推荐答案

我同意@aix，multiprocessing绝对是可行的方法.无论您受到I/O的束缚如何，无论您正在运行多少个并行进程，您都只能读得这么快.但是很容易实现一些加速.

I agree with @aix, multiprocessing is definitely the way to go. Regardless you will be i/o bound -- you can only read so fast, no matter how many parallel processes you have running. But there can easily be some speedup.

请考虑以下内容(input/是一个包含来自Gutenberg项目的.txt文件的目录).

Consider the following (input/ is a directory that contains several .txt files from Project Gutenberg).

import os.path
from multiprocessing import Pool
import sys
import time

def process_file(name):
    ''' Process one file: count number of lines and words '''
    linecount=0
    wordcount=0
    with open(name, 'r') as inp:
        for line in inp:
            linecount+=1
            wordcount+=len(line.split(' '))

    return name, linecount, wordcount

def process_files_parallel(arg, dirname, names):
    ''' Process each file in parallel via Poll.map() '''
    pool=Pool()
    results=pool.map(process_file, [os.path.join(dirname, name) for name in names])

def process_files(arg, dirname, names):
    ''' Process each file in via map() '''
    results=map(process_file, [os.path.join(dirname, name) for name in names])

if __name__ == '__main__':
    start=time.time()
    os.path.walk('input/', process_files, None)
    print "process_files()", time.time()-start

    start=time.time()
    os.path.walk('input/', process_files_parallel, None)
    print "process_files_parallel()", time.time()-start

当我在双核计算机上运行此程序时，速度会明显提高(但不是2倍):

When I run this on my dual core machine there is a noticeable (but not 2x) speedup:

$ python process_files.py
process_files() 1.71218085289
process_files_parallel() 1.28905105591

如果文件足够小以适合内存，并且您需要完成很多不受I/O约束的处理，那么您应该会看到更好的改进.

If the files are small enough to fit in memory, and you have lots of processing to be done that isn't i/o bound, then you should see even better improvement.

这篇关于在python中使用多线程读取txt文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在python中使用多线程读取txt文件 [英] Read txt file with multi-threaded in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在python中使用多线程读取txt文件 [英] Read txt file with multi-threaded in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭