使用ofstream缓冲文本输出以获得性能 [英] Using ofstream for buffered text output to gain performance

查看:334
本文介绍了使用ofstream缓冲文本输出以获得性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要编写一个程序,该程序将在输出文件中写入许多字符。
我的程序还需要编写换行符以更好地格式化。
我知道 ofstream 是一个缓冲流,如果我们将缓冲流用于文件io,则可以提高性能。但是,如果我们使用 std :: endl 输出将被刷新,并且由于缓冲输出,我们将失去任何潜在的性能提升。

I need to write a program which will write many characters in a output file. My program will also need to write newline for better formatting. I understand ofstream is a buffered stream and if we use a buffered stream for file io, we gain performance. However, if we use std::endl the output will be flushed and we will loose any potential performance gain due to the buffered output.

我想如果我将'\n'用于新行,则仅当我们将 std:时才刷新输出: endl 。这样对吗?

I suppose if I use '\n' for new line the output will be only flushed when we will std::endl. Is this correct? And are there any tricks that can be used to get performance gain during file output?

注意:我想在文件写操作完成时刷新缓冲的输出。我认为通过这种方式,我可以最大程度地减少文件I / O,从而提高性能。

Note: I want to flush the buffered output at the completion of the file write operations. I think in this way I can minimize file I/O and thus can gain performance.

推荐答案

通常,流类的用户如果要获得最高性能,则不要搞乱流的刷新:流在缓冲区已满时会在内部刷新其缓冲区。实际上,这比等待所有输出准备好(特别是对于大型文件)要更有效率:在缓冲的数据仍很可能在内存中时写入缓冲的数据。如果创建一个巨大的缓冲区并仅将其写入一次,则虚拟内存系统会将部分数据放入磁盘,而不是文件。

Generally, the user of stream classes shouldn't mess with the stream's flushing if maximum performance is wanted: the streams internally flush their buffer when it is full. This is actually more efficient than waiting until all output is ready, especially with large files: the buffered data is written while it is still likely to be in memory. If you create a huge buffer and only write it once the virtual memory system will have put parts of the data onto disc but not the file. It would need to be read from disc and written again.

关于 std :: endl 的要点是人们滥用它的行尾导致缓冲区刷新,他们不知道性能影响。 std :: endl 的意图是使人们能够控制在合理的位置刷新文件。为了使此方法有效,他们需要知道他们在做什么。可悲的是,有太多人不知道 std :: endl 所做的事情,谁把它的使用作为行尾广告,从而在很多明显错误的地方使用它。

The main point with respect to std::endl is that people abuse it a line ending which causes the buffer to flush and they are unaware of the performance implications. The intention of std::endl is that people are given control to flush files at reasonable points. For this to be effective they need to know what they are doing. Sadly, there were too many people ignorant of what std::endl does who advertised its use as a line ending such that it is used in many places where it is plain wrong.

话虽如此,以下是您可能想尝试改善性能的一些事情。我假设您需要格式化输出(使用 std :: ofstream :: write()不会给您)。

That said, below are a number of things you might want to try to improve performance. I assume you need formatted output (which the use of std::ofstream::write() won't give you).


  • 很明显,除非必须,否则不要使用 std :: endl 。如果编写代码已经存在,并且在许多地方使用 std :: endl ,其中有些可能不在您的控制范围内,则可以使用过滤流缓冲区,该缓冲区使用其内部缓冲区合理的大小,并且不会将对其 sync()函数的调用转发到基础流缓冲区。尽管这涉及到额外的副本,但它比某些虚假的刷新要好,因为这些虚假的刷新要贵几个数量级。

  • 尽管它不会对 std产生影响:: ofstream ,调用 std :: ios_base :: sync_with_stdio(false)曾经影响某些实现的性能。如果这样做有效果,您可能会想使用其他IOstream实现,因为在性能方面可能还有更多问题。

  • 请确保您使用的是 std :: locale ,其中 std :: codecvt< ...> 返回 true 调用其 always_noconv()时。可以使用 std :: use_facet< std :: codecvt< char,char,stdd :: mbstate_t>轻松检查>(out.get_loc())。always_noconv()。您可以使用 std :: locale( C)来获取为此目的的 std :: locale

  • 某些语言环境实现使用其数字方面的效率很低的实现,即使它们相当好, std :: num_put<的默认实现也是如此; char> 方面可能仍会执行您实际上不需要的操作。特别是如果您的数字格式相当简单,即您不不断更改格式标志,那么您就没有替换字符映射(即,您没有使用有趣的 std :: ctype< char> 方面),等等。使用自定义的 std :: num_put< char> 方面可能是合理的:创建快速但相当简单的方法简单的整数类型格式化函数和良好的浮点格式化函数,在内部不使用 snprintf()

  • Obviously, don't use std::endl unless you have to. If the writing code already exists and uses std::endl in many places, some of which possibly outside your control, you can use a filtering stream buffer which uses its internal buffer of reasonable size and which doesn't forward calls to its sync() function to the underlying stream buffer. Although this involves an extra copy, this is better than some spurious flushes as these are orders of magnitude more expensive.
  • Although it shouldn't have an effect on std::ofstreams, calling std::ios_base::sync_with_stdio(false) used to affect the performance on some implementations. You'd want to look at using a different IOstream implementation if this has an effect because there are probably more things wrong with respect to performance.
  • Make sure you are using a std::locale whose std::codecvt<...> returns true when calling its always_noconv(). This can easily be checked by using std::use_facet<std::codecvt<char, char, stdd::mbstate_t> >(out.get_loc()).always_noconv(). You can use std::locale("C") to get hold of an std::locale for which this should be true.
  • Some locale implementations use very inefficient implementations of their numeric facets and even even if they are reasonably good, the default implementation of the std::num_put<char> facet may still do things you don't really need. Especially if your numeric formatting is reasonably simple, i.e. you don't keep changing formatting flags, you haven't replace mapping of characters (i.e. you don't use a funny std::ctype<char> facet), etc. it may be reasonable to use a custom std::num_put<char> facet: It is fairly easy to create a fast but simple formatting function for integer types and a good formatting function for floating points which doesn't use snprintf() internally.

有人建议使用内存映射文件,但这仅在预先知道目标文件的大小时才有效。如果是这种情况,这是提高性能的好方法,否则就不值得花时间了。请注意,您可以通过创建使用内存的自定义 std :: streambuf 将流格式与内存映射文件(或更常见的是,与任何类型的输出接口)结合使用映射接口。我发现将内存映射与 std :: istream s一起使用时有时会有效。在很多情况下,差异实际上并没有太大关系。

Some people have suggested the use of memory mapped files but this only works reasonable when the size of the target file is known in advance. If this is the case this is a great way to also improve performance otherwise it isn't worth the bother. Note that you can use the stream formatting with memory mapped files (or, more generally, with any kind of output interface) by creating a custom std::streambuf which uses the memory mapping interface. I found memory mapping sometimes effective when using them with std::istreams. In many cases the differences don't really matter much.

很久以前,我编写了自己的IOStreams和语言环境实现,它不会遭受某些性能问题的困扰上面提到的(可从我的网站获得,但它有点陈旧,我还没有至今已触摸了将近10年)。仍然有许多可以通过此实现进行改进的地方,但是我还没有最新的实现,可以随时发布。希望不久,这是我将近十年以来一直在思考的问题,但是...

A long time ago I wrote my own IOStreams and locales implementation which doesn't suffer from some of the performance problems mentioned above (it is available from my site but it is a bit stale and I haven't touched it for nearly 10 years now). There are lots of things which can be improved over this implementation still but I haven't an up to date implementation which I'd be ready to post somewhere. Soon, hopefully - something I keep thinking since nearly 10 years, though...

这篇关于使用ofstream缓冲文本输出以获得性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆