为什么Haskell的默认字符串实现是一个链接的字符列表? [英] Why is Haskell's default string implementation a linked list of chars?

查看:133
本文介绍了为什么Haskell的默认字符串实现是一个链接的字符列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Haskell的默认 String 实现在速度和内存方面效率都不高是众所周知的事实。据我所知, []列表通常在Haskell中被实现为单链表和大多数小/简单数据类型(例如 Int )它似乎不是一个好主意,但是对于 String 来说,它似乎总是过度杀伤。对此事的一些看法包括:

真实世界哈斯克尔


在简单的基准测试就像这样,甚至用像Python这样的解释性语言编写的程序都可以胜过使用字符串的Haskell代码一个数量级。


Haskell中的高效字符串实现


由于String只是[Char],它是Char的链表,它意味着Strings具有较差的引用位置,同样意味着Strings在内存中相当大,至少是N * (21bits + Mbits)其中N是字符串的长度,M是指针的大小(...)。字符串不太可能被编译器优化为循环等。


我知道Haskell的 ByteString s(和 Array s),并且它们可以很好地完成这项工作,但我希望默认实施是最有效的。
$ b

TL; DR:为什么Haskell的默认 String 实现了一个单链表,尽管它非常低效,很少用于真实世界的应用程序(除了非常简单的应用程序)?有历史原因吗?为什么Haskell的默认字符串实现是单链表?为什么Haskell的默认字符串实现是单链表? / p>

由于单链表支持:


  • 通过模式匹配进行归纳

  • 具有有用的属性,例如Monad,Functor

  • 正确地参数化多态

  • 自然懒惰



等等字符串 as [char] (unicode points)意味着一种符合语言目标的字符串类型(截至1990年),基本上可以免费与列表库进行交互。



总而言之,历史上语言设计者对设计良好的核心数据类型感兴趣的多于现代文本处理问题,所以我们有一个优雅,易懂,易于教授 String 类型,这不是一个unicode文本块,也不是一个密集的,压缩的,严格的数据类型pe。

The fact that Haskell's default String implementation is not efficient both in terms of speed and memory is well known. As far as I know the [] lists in general are implemented in Haskell as singly-linked lists and for most small/simple data types (e.g. Int) it doesn't seem like a very good idea, but for String it seems like total overkill. Some of the opinions on this matter include:

Real World Haskell

On simple benchmarks like this, even programs written in interpreted languages such as Python can outperform Haskell code that uses String by an order of magnitude.

Efficient String Implementation in Haskell

Since a String is just [Char], that is a linked list of Char, it means Strings have poor locality of reference, and again means that Strings are fairly large in memory, at a minimum it's N * (21bits + Mbits) where N is the length of the string and M is the size of a pointer (...). Strings are much less likely to be able to be optimized to loops, etc. by the compiler.

I know that Haskell has ByteStrings (and Arrays) in several nice flavors and that they can do the job nicely, but I would expect the default implementation to be the most efficient one.

TL;DR: Why is Haskell's default String implementation a singly-linked list even though it is terribly inefficient and rarely used for real world applications (except for the really simple ones)? Are there historical reasons? Is it easier to implement?

解决方案

Why is Haskell's default String implementation a singly-linked list

Because singly-linked lists support:

  • induction via pattern matching
  • have useful properties, such as Monad, Functor
  • are properly parametrically polymorphic
  • are naturally lazy

and so String as [Char] (unicode points) means a string type that fits the language goals (as of 1990), and essentially come "for free" with the list library.

In summary, historically the language designers were interested more in well-designed core data types, than the modern problems of text processing, so we have an elegant, easy to understand, easy to teach String type, that isn't quite a unicode text chunk, and isn't a dense, packed, strict data type.

这篇关于为什么Haskell的默认字符串实现是一个链接的字符列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆