使用多个线程编写文件 [英] Writing a file using multiple threads

查看:177
本文介绍了使用多个线程编写文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用多线程在Java中编写一个巨大的文件。



我已经尝试了 FileWriter bufferedWriter 类Java。



被写入的内容实际上是使用 CopyManager 读取并写入的整个表格(Postgres)。文件中的每一行都是表格中的单个元组,我一次写入100行。

写法:



单个待写文件由多个线程以追加模式打开。
之后的每个线程都会尝试写入文件。



以下是我所面对的问题:


  • 一段时间后,文件的内容被覆盖,即:一行不完整,下一行从那里开始。我在这里的假设是作家的缓冲区已经满了。这迫使作者立即将数据写入文件。写入的数据可能不是一个完整的行,并且在写入余数之前,下一个线程将其内容写入文件。

  • 在使用 Filewriter 时,曾经有一段时间我在文件中看到一条黑线。



有什么建议,如何避免这个数据完整性问题?

解决方案

共享资源==争用



按定义写入一个普通文件是一个序列化的操作。通过从多个线程写入数据,您不会获得任何性能,I / O是一个有限的有限资源,其带宽比最慢或最重的CPU要少。



< h2>对共享资源的并发访问可能很复杂(而且很慢)

如果你有多个执行昂贵计算的线程,那么你可以选择,如果你只是使用多线程,因为你认为你会加快速度,你会做相反的事情。争夺I / O总是减慢对资源的访问,它永远不会加速,因为锁等待和其他开销。

你必须有一个关键部分被保护,并且一次只允许一个作者。只要查看任何支持并发的日志编写器的源代码,您就会看到只有一个线程写入文件。



如果您的应用程序主要:


  1. CPU绑定:可以使用一些锁定机制/数据结构来只让一个线程在许多情况下写入文件,从并发的角度来看,这是无用的解决方案;如果这些线程与CPU很少I / O绑定这可能工作。 I / O绑定:这是最常见的情况,你必须使用一个带有某种队列的消息传递系统并将所有线程发布到队列/缓冲区,并从中拉出单个线程并写入文件。这将是最具伸缩性和最容易实现的解决方案。

    $ b $ h2日志记录异步写入

    如果您需要创建一个超大文件,其中写次序不重要,并且程序受CPU限制,则可以使用日志记录技术。

    <将每个进程写入一个单独的文件,然后将多个文件在最后连接成一个大文件。这是一个非常古老的低科技解决方案,运行良好,几十年。

    显然,更多的存储I / O你有更好的这将执行结束concat。


    I am trying to write a single huge file in Java using multiple threads.

    I have tried both FileWriter and bufferedWriter classes in Java.

    The content being written is actually an entire table (Postgres) being read using CopyManager and written. Each line in the file is a single tuple from the table and I am writing 100s of lines at a time.

    Approach to write:

    The single to-be-written file is opened by multiple threads in append mode. Each thread thereafter tries writing to the file file.

    Following are the issues I face:

    • Once a while, the contents of the file gets overwritten i.e: One line remains incomplete and the next line starts from there itself. My assumption here is that the buffers for writer are getting full. This forces the writer to immediately write the data onto the file. The data written may not be a complete line and before it can write the remainder, the next thread writes its content onto the file.
    • While using Filewriter, once a while I see a single black line in the file.

    Any suggestions, how to avoid this data integrity issue?

    解决方案

    Shared Resource == Contention

    Writing to a normal file by definition is a serialized operation. You gain no performance by trying to write to it from multiple threads, I/O is a finite bounded resource at orders of magnitude less bandwidth than even the slowest or most overloaded CPU.

    Concurrent access to a shared resource can be complicated ( and slow )

    If you have multiple threads that are doing expensive calculations then you have options, if you are just using multiple threads because you think you are going to speed something up, you are just going to do the opposite. Contention for I/O always slows down access to the resource, it never speeds it up because of the lock waits and other overhead.

    You have to have a critical section that is protected and allows only a single writer at a time. Just look up the source code for any logging writer that supports concurrency and you will see that there is only a single thread that writes to the file.

    If your application is primarily:

    1. CPU Bound: You can use some locking mechanism/data construct to only let one thread out of many write to the file at a time, which will be useless from a concurrency standpoint as a naive solution; If these threads are CPU bound with little I/O this might work.

    2. I/O Bound: This is the most common case, you must use a messaging passing system with a queue of some sort and have all the threads post to a queue/buffer and have a single thread pull from it and write to the file. This will be the most scalable and easiest to implement solution.

    Journaling - Async Writes

    If you need to create a single super large file where order of writes are unimportant and the program is CPU bound you can use a journaling technique.

    Have each process write to a separate file and then concat the multiple files into a single large file at the end. This is a very old school low tech solution that works well and has for decades.

    Obviously the more storage I/O you have the better this will perform on the end concat.

    这篇关于使用多个线程编写文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆