Haskell中有效的字符串实现 [英] Efficient String Implementation in Haskell

查看:105
本文介绍了Haskell中有效的字符串实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



Haskell中默认的字符串实现是Char列表。根据真实世界Haskell ,因为每个字符都是分开分配的(我认为这意味着一个字符串基本上是Haskell中的一个链表,但我不确定。)



但是,如果默认的字符串实现对于文件I / O来说效率低下,那么在内存中使用字符串的效率也是低效的吗?为什么或者为什么不? C使用一个char数组来表示一个String,并且我认为这是大多数语言中默认的做事方式。



正如我所看到的,列表String的实现会占用更多的内存,因为每个字符都需要开销,并且还有更多的时间来迭代,因为需要指针解引用来获取下一个字符。但我很喜欢Haskell,所以我想相信默认实现是有效的。

在Haskell中使用字符串基本上是:使用Data.ByteString / Data.ByteString.Lazy。



http://hackage.haskell.org/packages/archive/bytestring/latest/doc/html/






就Haskell中默认字符串实现的效率而言,它不是。每个 Char 代表一个Unicode代码点,这意味着每个 Char 至少需要21位。由于字符串只是 [Char] ,所以 $ b 这是 Char 的链表,它意味着 String 具有较差的引用位置,并且又意味着 String s在内存中相当大,至少是 N *(21bits + Mbits)其中N是字符串和M是一个指针的大小(32,64,你有什么),而不像Haskell使用其他语言可能使用不同结构的其他地方(我在这里特别考虑控制流), String 不太可能被编译器优化为循环等。



虽然 Char 对应于一个代码点,Haskell 98报告没有指定在执行文件IO时使用的编码的任何内容,甚至没有指定更改它的方式。在实践中,GHC提供了一个扩展来做例如二进制IO,但是你仍然要在那个时候保留。



即使像pre prepending在字符串前面这样的操作,不太可能出现字符串在练习中会击败 ByteString


I'm currently teaching myself Haskell, and I'm wondering what the best practices are when working with strings in Haskell.

The default string implementation in Haskell is a list of Char. This is inefficient for file input-output, according to Real World Haskell, since each character is separately allocated (I assume that this means that a String is basically a linked list in Haskell, but I'm not sure.)

But if the default string implementation is inefficient for file i/o, is it also inefficient for working with Strings in memory? Why or why not? C uses an array of char to represent a String, and I assumed that this would be the default way of doing things in most languages.

As I see it, the list implementation of String will take up more memory, since each character will require overhead, and also more time to iterate over, because a pointer dereferencing will be required to get to the next char. But I've liked playing with Haskell so far, so I want to believe that the default implementation is efficient.

解决方案

Best practices for working with strings performantly in Haskell are basically: Use Data.ByteString/Data.ByteString.Lazy.

http://hackage.haskell.org/packages/archive/bytestring/latest/doc/html/


As far as the efficiency of the default string implementation goes in Haskell, it's not. Each Char represents a Unicode codepoint which means it needs at least 21bits per Char.

Since a String is just [Char], that is a linked list of Char, it means Strings have poor locality of reference, and again means that Strings are fairly large in memory, at a minimum it's N * (21bits + Mbits) where N is the length of the string and M is the size of a pointer (32, 64, what have you) and unlike many other places where Haskell uses lists where other languages might use different structures (I'm thinking specifically of control flow here), Strings are much less likely to be able to be optimized to loops, etc. by the compiler.

And while a Char corresponds to a codepoint, the Haskell 98 report doesn't specify anything about the encoding used when doing file IO, not even a default much less a way to change it. In practice GHC provides an extensions to do e.g. binary IO, but you're going off the reservation at that point anyway.

Even with operations like prepending to front of the string it's unlikely that a String will beat a ByteString in practice.

这篇关于Haskell中有效的字符串实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆