什么时候使用ByteString,什么时候不使用? [英] When do I use ByteString and when do I not?

查看:263
本文介绍了什么时候使用ByteString,什么时候不使用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对SPOJ上的PRIME1问题做了相当差的尝试。我发现使用ByteString 真正帮助阅读问题文本的性能。但是,使用ByteString写出结果实际上比使用Prelude函数稍慢。我试图找出是否我做错了,或者如果这是预期的。



我使用(putStrLn.show)进行了分析和计时,并且ByteString相当于三种不同的方式:


  1. 我测试每个候选人以查看它是否是
    是素数。如果是这样,我将它添加到列表
    中,并用(putStrLn。
    show)写出它。我列出所有素数
    和用
    (putStrLn。unlines。show)写出列表

  2. 我列出所有素数
    并使用
    map写出列表putStrLn。show)

我预计编号2和3的执行速度会比较慢,因为您在一个函数中构建列表并在另一个函数中使用它。通过打印我生成的数字,我避免为列表分配任何内存。另一方面,您每次调用putStrLn时都会进行一次呼叫系统调用。对?所以我测试过#1实际上是最快的。



使用选项#1和Prelude([Char])函数可以获得最佳性能。我预计我的最佳性能是使用ByteString的选项#1,但事实并非如此。我只使用懒惰的ByteStrings,但我认为这不重要。是否会?

有些问题:


  • 你会希望ByteStrings为
    在编写一堆
    整数到stdout方面表现更好?

  • 我是否缺少一种方式来生成
    并写出答案
    那会导致更好的
    表现?

  • 如果我只写出数字作为
    文本,那么在何时,是否有
    的收益使用ByteString?



我的工作假设是,如果您没有将它们与其他文本合并,则用ByteString写出Integer的速度会更慢。如果您将整数与[Char]组合在一起,那么您可以使用ByteStrings获得更好的性能。即,ByteString重写:

  putStrLn $答案是:++(显示值)

会比上面写的版本快得多。这是真的吗?



感谢您的阅读!

解决方案

输入对于字节串来说通常更快,因为数据是密集的,只需要将更少的数据从磁盘转移到内存中。



将数据写为然而, output 有点不同。通常情况下,你正在序列化一个结构,产生许多小写。因此,在这种情况下,字节串的密集写入操作对你来说帮助不大。即使是常规的 Strings 也会在增量输出时合理地执行。

然而,所有的都不会丢失。我们可以通过有效地在内存中构建字节字符来恢复快速批量写入。这种方法是由各种 * - 构建器包采用的:



很多细小的字节串,并一次写出一个,我们将转换流转换成一个不断增长的缓冲区,然后将这个缓冲区写入一个大块。这会导致IO开销减少很多,并且对字符串IO的性能改进(通常很重要)。

这种方法是由Haskell中的Web服务器或高效的HTML系统,大火



另外,即使使用批量写入,性能也取决于您的类型和字节串之间的任何转换函数的效率。对于整数,您可能只是简单地将内存中的位模式复制到输出中,或者转而使用一些效率低下的解码器。因此,您有时必须考虑一下您使用的编码函数的质量,而不仅仅是使用Char / String还是字符串IO。


I've been making rather poor attempts at the PRIME1 problem on SPOJ. I discovered using that using ByteString really helped performance for reading in the problem text. However, using ByteString to write out the results is actually slightly slower than using Prelude functions. I'm trying to figure out if I'm doing it wrong, or if this is expected.

I've conducted profiling and timing using (putStrLn.show) and the ByteString equivalents three different ways:

  1. I test each candidate to see if it is prime. If so, I add it to a list and write it out with (putStrLn . show)
  2. I make a list of all primes and write out the list using (putStrLn . unlines. show)
  3. I make a list of all primes and write out the list using map (putStrLn . show)

I expected numbers 2 and 3 to perform slower as you are building a list in one function and consuming it in another. By printing the numbers as I generate them, I avoid allocating any memory for the list. On the other hand, you are making a call system call with each call to putStrLn. Right? So I tested and #1 was in fact the fastest.

The best performance was achieved with option #1 and the Prelude ([Char]) functions. I expected that my best performance would be option #1 with ByteString, but this was not the case. I only used lazy ByteStrings, but I didn't think this would matter. Would it?

Some questions:

  • would you expect the ByteStrings to perform better for writing a bunch of Integers to stdout?
  • Am I missing a way pattern to generate and write out the answers that would lead to better performance?
  • If I am only writing out numbers as text, when, if ever, is there a benefit to using ByteString?

My working hypothesis is that writing out Integer's with ByteString is slower iff you aren't combining them with other text. If you are combining Integers with [Char], then you'd get better performance working with ByteStrings. I.e., the ByteString rewrite of:

putStrLn $ "the answer is: " ++ (show value)

will be much faster than the version written above. Is this true?

Thanks for reading!

解决方案

Doing bulk input is usually faster with bytestrings, since the data is dense, there's simply less data to shuffle from the disk into memory.

Writing data as output however, is a little different. Typically, you're serializing a structure, generating many small writes. So the dense, bulk writes of bytestrings don't help you much in that case. Even regular Strings will do reasonably at incremental output.

However, all is not lost. We can recover fast bulk writes by efficiently building up bytestrings in memory. This approach is taken by the various *-builder packages:

Instead of converting values to lots of tiny bytestrings, and writing them out one at a time, we stream the conversion into an ever-growing buffer, and in turn, write that buffer in one big piece. This results in a lot less IO overhead, and performance improvements (often signficant) over string IO.

This kind of approach is taken by e.g. webservers in Haskell, or the efficient HTML system, blaze.

Also, the performance, even with bulk writes, will depend on the efficiency of whatever conversion function you have between your types and bytestrings. For Integer, you could be simply copying the bit pattern in memory to output, or instead going through some inefficient decoder. As a result, you sometimes have to think a bit about the quality of the encoding function you're using, and not just whether to use Char/String or bytestring IO.

这篇关于什么时候使用ByteString,什么时候不使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆