使用多线程写入文件 [英] Writing a file using multiple threads

查看:72
本文介绍了使用多线程写入文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用多个线程在 Java 中编写一个大文件.

I am trying to write a single huge file in Java using multiple threads.

我已经尝试过 Java 中的 FileWriterbufferedWriter 类.

I have tried both FileWriter and bufferedWriter classes in Java.

正在写入的内容实际上是使用 CopyManager 读取并写入的整个表 (Postgres).文件中的每一行都是表中的一个元组,我一次写了 100 行.

The content being written is actually an entire table (Postgres) being read using CopyManager and written. Each line in the file is a single tuple from the table and I am writing 100s of lines at a time.

写作方法:

单个待写文件由多个线程以追加方式打开.此后,每个线程都尝试写入文件 file.

The single to-be-written file is opened by multiple threads in append mode. Each thread thereafter tries writing to the file file.

以下是我面临的问题:

  • 有一段时间,文件的内容会被覆盖,即:一行仍然不完整,下一行从那里开始.我的假设是写入器的缓冲区已满.这会强制写入器立即将数据写入文件.写入的数据可能不是完整的一行,在写入剩余部分之前,下一个线程将其内容写入文件.
  • 在使用 Filewriter 时,我偶尔会在文件中看到一条黑线.
  • Once a while, the contents of the file gets overwritten i.e: One line remains incomplete and the next line starts from there itself. My assumption here is that the buffers for writer are getting full. This forces the writer to immediately write the data onto the file. The data written may not be a complete line and before it can write the remainder, the next thread writes its content onto the file.
  • While using Filewriter, once a while I see a single black line in the file.

有什么建议,如何避免这种数据完整性问题?

Any suggestions, how to avoid this data integrity issue?

推荐答案

共享资源 == 争用

根据定义写入普通文件是序列化操作.尝试从多个线程写入数据不会获得任何性能,I/O 是一种有限的有界资源,其带宽甚至比最慢或最过载的 CPU 还要少几个数量级.

Shared Resource == Contention

Writing to a normal file by definition is a serialized operation. You gain no performance by trying to write to it from multiple threads, I/O is a finite bounded resource at orders of magnitude less bandwidth than even the slowest or most overloaded CPU.

如果您有多个线程进行昂贵的计算,那么您有多种选择,如果您只是使用多个线程,因为您认为要加快某些事情的速度,那么您只会做相反的事情.对 I/O 的争用总是会减慢对资源的访问速度,它永远不会因为锁等待和其他开销而加快速度.

If you have multiple threads that are doing expensive calculations then you have options, if you are just using multiple threads because you think you are going to speed something up, you are just going to do the opposite. Contention for I/O always slows down access to the resource, it never speeds it up because of the lock waits and other overhead.

您必须有一个受保护的临界区,并且一次只允许一个写入者.只需查找任何支持并发的日志编写器的源代码,您就会看到只有一个线程写入文件.

You have to have a critical section that is protected and allows only a single writer at a time. Just look up the source code for any logging writer that supports concurrency and you will see that there is only a single thread that writes to the file.

如果您的应用主要是:

  1. CPU Bound:您可以使用一些锁定机制/数据结构,一次只让一个线程写入文件,从并发的角度来看,这将是无用的,因为一个天真的解决方案;如果这些线程受 CPU 限制且 I/O 很少,这可能会奏效.

  1. CPU Bound: You can use some locking mechanism/data construct to only let one thread out of many write to the file at a time, which will be useless from a concurrency standpoint as a naive solution; If these threads are CPU bound with little I/O this might work.

I/O 绑定:这是最常见的情况,您必须使用带有某种队列的消息传递系统,并将所有线程发送到队列/缓冲区,并且有一个线程从中拉出并写入文件.这将是最具扩展性和最容易实施的解决方案.

I/O Bound: This is the most common case, you must use a messaging passing system with a queue of some sort and have all the threads post to a queue/buffer and have a single thread pull from it and write to the file. This will be the most scalable and easiest to implement solution.

日志 - 异步写入

如果您需要创建单个超大文件,其中写入顺序不重要并且程序受 CPU 限制,您可以使用日志技术.

Journaling - Async Writes

If you need to create a single super large file where order of writes are unimportant and the program is CPU bound you can use a journaling technique.

让每个 process 写入一个单独的文件,然后将多个文件最后合并成一个大文件.这是一个非常老派的低技术解决方案,效果很好并且已经使用了几十年.

Have each process write to a separate file and then concat the multiple files into a single large file at the end. This is a very old school low tech solution that works well and has for decades.

显然,您拥有的存储 I/O 越多,最终连接的性能就越好.

Obviously the more storage I/O you have the better this will perform on the end concat.

这篇关于使用多线程写入文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆