快速读取大量文件 [英] Reading a large number of files quickly

查看:96
本文介绍了快速读取大量文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大量(> 100k)相对较小的文件(1kb-300kb),我需要阅读和处理。我目前正在浏览所有文件,并使用 File.ReadAllText 读取内容,进行处理,然后读取下一个文件。这是相当慢的,我想知道是否有一种优化它的好方法。

I have a large number of (>100k) relatively small files (1kb - 300kb) that I need to read in and process. I'm currently looping through all the files and using File.ReadAllText to read the content, processing it, and then reading the next file. This is quite slow and I was wondering if there is a good way to optimize it.

我已经尝试使用多个线程,但是由于这似乎是IO绑定,所以我看不到任何改进。

I have already tried using multiple threads but as this seems to be IO bound I didn't see any improvements.

推荐答案

您很可能是正确的-读取许多文件可能会限制潜在的加速,因为磁盘I / O将成为限制因素。

You're most likely correct - Reading that many files is probably going to limit your potential speedups since the Disk I/O will be the limiting factor.

话虽如此,您很可能可以通过将数据处理传递到单独的线程中来做一些小的改进。

That being said, you very likely can do a small improvement by passing the processing of the data into a separate thread.

我会建议尝试使用单个生产者线程来读取文件。此线程将受IO限制。在读取文件时,它可以将处理推入ThreadPool线程(.NET 4任务对此也很有效)以进行处理,这将使其立即读取下一个文件。

I would recommend trying to have a single "producer" thread that reads your files. This thread will be IO limited. As it reads a file, it can push the "processing" into a ThreadPool thread (.NET 4 tasks work great for this too) in order to do the processing, which would allow it to immediately read the next file.

这将至少节省总运行时间中的处理时间,使您的工作总时间几乎与Disk IO一样快,前提是您拥有额外的核心或两个一起使用...

This will at least take the "processing time" out of the total runtime, making the total time for your job nearly as fast as the Disk IO, provided you've got an extra core or two to work with...

这篇关于快速读取大量文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆