如何并行化文件读写 [英] How to parallelize file reading and writing

查看:206
本文介绍了如何并行化文件读写的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个程序,可以从2个文本文件中读取数据,然后将结果保存到另一个文件中.由于要读写的数据很多,会导致性能下降,因此我想使读写操作并行化.

I have a program which reads data from 2 text files and then save the result to another file. Since there are many data to be read and written which cause a performance hit, I want to parallize the reading and writing operations.

我最初的想法是,以2个线程为例,一个线程从头开始读/写,另一个线程从文件中间读/写.由于我的文件格式设置为行而不是字节(每行可能具有不同的数据字节),因此按字节查找对我不起作用.我想到的解决方案是使用getline()首先跳过前面的行,这可能效率不高.

My initial thought is, use 2 threads as an example, one thread read/write from the beginning, and another thread read/write from the middle of the file. Since my files are formatted as lines, not bytes(each line may have different bytes of data), seek by byte does not work for me. And the solution I could think of is use getline() to skip over the previous lines first, which might be not efficient.

有什么好的方法可以查找文件中的指定行吗?还是您还有其他想法可以使文件读取和写入并行化?

Is there any good way to seek to a specified line in a file? or do you have any other ideas to parallize file reading and writing?

环境:Win32,C ++,NTFS,单个硬盘

Environment: Win32, C++, NTFS, Single Hard Disk

谢谢.

-Dbger

推荐答案

通常来说,您不想并行化磁盘I/O.硬盘不喜欢随机I/O,因为它们必须不断地寻找数据.假设您未使用RAID,并且使用的是硬盘驱动器而不是某些固态存储器,那么如果并行化I/O,您将看到严重的性能下降(即使使用类似的技术,您仍然可以看到一些性能).在执行大量随机I/O时性能下降.

Generally speaking, you do NOT want to parallelize disk I/O. Hard disks do not like random I/O because they have to continuously seek around to get to the data. Assuming you're not using RAID, and you're using hard drives as opposed to some solid state memory, you will see a severe performance degradation if you parallelize I/O(even when using technologies like those, you can still see some performance degradation when doing lots of random I/O).

要回答您的第二个问题,实际上没有找到在文件中某一行的好方法.您只能使用read函数显式地寻求字节偏移量(请参见此页面有关如何使用它的更多详细信息.

To answer your second question, there really isn't a good way to seek to a certain line in a file; you can only explicitly seek to a byte offset using the read function(see this page for more details on how to use it.

这篇关于如何并行化文件读写的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆