内存映射文件的性能/稳定性 - Native或MappedByteBuffer - 与普通的“FileOutputStream [英] Performance / stability of a Memory Mapped file - Native or MappedByteBuffer - vs. plain ol' FileOutputStream

查看:251
本文介绍了内存映射文件的性能/稳定性 - Native或MappedByteBuffer - 与普通的“FileOutputStream的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我支持使用平面文件(纯文本)进行持久性的遗留Java应用程序。由于应用程序的性质,这些文件的大小可以达到每天100 MB,并且应用程序性能的限制因素往往是文件IO。目前,应用程序使用普通的java.io.FileOutputStream将数据写入磁盘。

最近,我们已经有几个开发人员断言使用内存映射文件,以本地代码(C / C ++)实现并通过JNI访问,将提供更高的性能。但是,FileOutputStream已经在其核心方法中使用本地方法(即write(byte [])),所以它似乎是一个没有硬数据或者至少是轶事证据的脆弱的假设。

我对此有几个问题:


  1. 这个断言是真的吗?
    与Java的
    FileOutputStream相比,内存映射文件总是
    提供更快的IO吗?

  2. 从FileChannel访问类MappedByteBuffer
    是否提供
    与原生
    内存映射文件库相同的功能,通过JNI访问
    ?什么是MappedByteBuffer
    缺乏可能导致您使用
    JNI解决方案?
  3. 使用
    的风险是什么生产
    应用程序中磁盘IO的内存映射文件?也就是说,应用程序
    具有连续的正常运行时间,
    最少重新启动(每月一次,最多)。
    生产
    应用程序(Java或其他)的实际轶事
    优先。


    问题#3很重要 - 我可以自己部分地回答这个问题,通过编写一个玩具应用程序,使用上面描述的各种选项来执行IO测试,但是通过张贴到所以我希望为真实世界的轶事/数据咀嚼。
    $ b

    澄清 - 每一天的运行,应用程序创建多个文件,大小范围从100MB到1演出。总的来说,应用程序可能会写出每天多个数据集。

    解决方案

    您也许可以加快速度通过检查您的数据在写入过程中的缓冲方式。这往往是特定于应用程序,因为您需要了解预期的数据写入模式。如果数据的一致性很重要,这里就会有折衷。如果你只是从你的应用程序写出新的数据到磁盘,内存映射的I / O可能不会帮助很大。我没有看到任何理由,你会想投资一些自定义编码本地解决方案的时间。如果你确定你真的需要更好的I / O性能 - 或者只是O在你的情况下性能,我会研究一个硬件解决方案,如调谐磁盘阵列。从商业的角度来看,投入更多硬件来解决问题通常要比花费时间优化软件更节约成本。通常情况下,执行速度更快,更可靠。

    总体而言,软件优化过程中存在很多缺陷。您将向应用程序中引入新类型的问题。您可能会遇到内存问题/ GC抖动,这将导致更多的维护/调整。最糟糕的是,很多这些问题在投入生产之前将很难测试。



    如果是我的应用程序,我可能会坚持使用FileOutputStream调谐缓冲。在那之后,我会使用时间已久的解决方案来抛出更多硬件。


    I support a legacy Java application that uses flat files (plain text) for persistence. Due to the nature of the application, the size of these files can reach 100s MB per day, and often the limiting factor in application performance is file IO. Currently, the application uses a plain ol' java.io.FileOutputStream to write data to disk.

    Recently, we've had several developers assert that using memory-mapped files, implemented in native code (C/C++) and accessed via JNI, would provide greater performance. However, FileOutputStream already uses native methods for its core methods (i.e. write(byte[])), so it appears a tenuous assumption without hard data or at least anecdotal evidence.

    I have several questions on this:

    1. Is this assertion really true? Will memory mapped files always provide faster IO compared to Java's FileOutputStream?

    2. Does the class MappedByteBuffer accessed from a FileChannel provide the same functionality as a native memory mapped file library accessed via JNI? What is MappedByteBuffer lacking that might lead you to use a JNI solution?

    3. What are the risks of using memory-mapped files for disk IO in a production application? That is, applications that have continuous uptime with minimal reboots (once a month, max). Real-life anecdotes from production applications (Java or otherwise) preferred.

    Question #3 is important - I could answer this question myself partially by writing a "toy" application that perf tests IO using the various options described above, but by posting to SO I'm hoping for real-world anecdotes / data to chew on.

    [EDIT] Clarification - each day of operation, the application creates multiple files that range in size from 100MB to 1 gig. In total, the application might be writing out multiple gigs of data per day.

    解决方案

    You might be able to speed things up a bit by examining how your data is being buffered during writes. This tends to be application specific as you would need an idea of the expected data writing patterns. If data consistency is important, there will be tradeoffs here.

    If you are just writing out new data to disk from your application, memory mapped I/O probably won't help much. I don't see any reason you would want to invest time in some custom coded native solution. It just seems like too much complexity for your application, from what you have provided so far.

    If you are sure you really need better I/O performance - or just O performance in your case, I would look into a hardware solution such as a tuned disk array. Throwing more hardware at the problem is often times more cost effective from a business point of view than spending time optimizing software. It is also usually quicker to implement and more reliable.

    In general, there are a lot of pitfalls in over optimization of software. You will introduce new types of problems to your application. You might run into memory issues/ GC thrashing which would lead to more maintenance/tuning. The worst part is that many of these issues will be hard to test before going into production.

    If it were my app, I would probably stick with the FileOutputStream with some possibly tuned buffering. After that I'd use the time honored solution of throwing more hardware at it.

    这篇关于内存映射文件的性能/稳定性 - Native或MappedByteBuffer - 与普通的“FileOutputStream的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆