Haskell数据类型的内存占用 [英] Memory footprint of Haskell data types
问题描述
如何找到在Haskell中存储某种数据类型的值所需的实际内存量(主要是使用GHC)?是否有可能在运行时对其进行评估(例如,在GHCi中)还是可以从它的组件估计组合数据类型的内存需求?一般而言,如果内存已知类型 a
和 b
的需求是什么,代数数据类型的内存开销如下:
数据Uno = Uno a
数据到期=到期ab
例如,这些值在内存中占用了多少字节?
1 :: Int8
1 :: Integer
2 ^ 100 :: Integer
\ x - > x + 1
(1 :: Int8,2 :: Int8)
[1] :: [Int8]
Just(1 :: Int8)
Nothing
据我了解,由于垃圾回收延迟,实际内存分配较高。由于懒惰评估可能会有显着差异(并且thunk大小与值的大小无关)。问题是,在给定数据类型的情况下,在完全评估时它的值需要多少内存?
我发现有一个:set + s
选项在GHCi中查看内存统计信息,但不清楚如何估计单个值的内存占用情况。
(以下适用于GHC,其他编译器可能会使用不同的存储约定)
经验法则:构造函数花费一个单词标题和每个字段的一个字。例外:没有字段的构造函数(如 Nothing
或 True
)没有空格,因为GHC会创建一个这些构造函数在所有用途中共享。
一个字在32位机器上是4个字节,在64位机器上是8个字节。
例如
数据Uno = Uno a
数据到期日=到期日
an Uno
需要2个单词,并且 Due $ c $
data Int = I#Int#
现在, Int#
需要一个单词,所以 Int
总共需要2个单词。大多数unboxed类型需要一个词,例外是 Int64#
, Word64#
和 Double #
(在一个32位机器上),其中2个GHC实际上有一个 Char
,所以在很多情况下这些都不占用堆空间。除非您使用 Char
s> 255,否则 String
仅需要列表单元格的空间。
Int8
与 Int
具有相同的表示形式。 整数
定义如下: data整数
= S#Int# - 小整数
| J#Int#ByteArray# - 大整数
如此小的整数
( S#
)需要2个单词,但是一个大整数需要一个可变数量的空间,具体取决于它的值。一个 ByteArray#
需要2个单词(header + size)加上数组本身的空间。
注意用 newtype
定义的构造函数是免费的。 newtype
纯粹是一个编译时的想法,它不占用空间,在运行时不需要任何指令。
a>。
How can I find the actual amount of memory required to store a value of some data type in Haskell (mostly with GHC)? Is it possible to evaluate it at runtime (e.g. in GHCi) or is it possible to estimate memory requirements of a compound data type from its components?
In general, if memory requirements of types a
and b
are known, what is the memory overhead of algebraic data types such as:
data Uno = Uno a
data Due = Due a b
For example, how many bytes in memory do these values occupy?
1 :: Int8
1 :: Integer
2^100 :: Integer
\x -> x + 1
(1 :: Int8, 2 :: Int8)
[1] :: [Int8]
Just (1 :: Int8)
Nothing
I understand that actual memory allocation is higher due to delayed garbage collection. It may be significantly different due to lazy evaluation (and thunk size is not related to the size of the value). The question is, given a data type, how much memory does its value take when fully evaluated?
I found there is a :set +s
option in GHCi to see memory stats, but it is not clear how to estimate the memory footprint of a single value.
(The following applies to GHC, other compilers may use different storage conventions)
Rule of thumb: a constructor costs one word for a header, and one word for each field. Exception: a constructor with no fields (like Nothing
or True
) takes no space, because GHC creates a single instance of these constructors and shares it amongst all uses.
A word is 4 bytes on a 32-bit machine, and 8 bytes on a 64-bit machine.
So e.g.
data Uno = Uno a
data Due = Due a b
an Uno
takes 2 words, and a Due
takes 3.
The Int
type is defined as
data Int = I# Int#
now, Int#
takes one word, so Int
takes 2 in total. Most unboxed types take one word, the exceptions being Int64#
, Word64#
, and Double#
(on a 32-bit machine) which take 2. GHC actually has a cache of small values of type Int
and Char
, so in many cases these take no heap space at all. A String
only requires space for the list cells, unless you use Char
s > 255.
An Int8
has identical representation to Int
. Integer
is defined like this:
data Integer
= S# Int# -- small integers
| J# Int# ByteArray# -- large integers
so a small Integer
(S#
) takes 2 words, but a large integer takes a variable amount of space depending on its value. A ByteArray#
takes 2 words (header + size) plus space for the array itself.
Note that a constructor defined with newtype
is free. newtype
is purely a compile-time idea, and it takes up no space and costs no instructions at run time.
More details in The Layout of Heap Objects in the GHC Commentary.
这篇关于Haskell数据类型的内存占用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!