Haskell 数据类型的内存占用 [英] Memory footprint of Haskell data types

查看:29
本文介绍了Haskell 数据类型的内存占用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何找到在 Haskell 中存储某种数据类型的值所需的实际内存量(主要是使用 GHC)?是否可以在运行时(例如在 GHCi 中)对其进行评估,或者是否可以从其组件估计复合数据类型的内存需求?

How can I find the actual amount of memory required to store a value of some data type in Haskell (mostly with GHC)? Is it possible to evaluate it at runtime (e.g. in GHCi) or is it possible to estimate memory requirements of a compound data type from its components?

一般来说,如果ab类型的内存需求是已知的,那么代数数据类型的内存开销是多少,例如:

In general, if memory requirements of types a and b are known, what is the memory overhead of algebraic data types such as:

data Uno = Uno a
data Due = Due a b

例如,这些值在内存中占用了多少字节?

For example, how many bytes in memory do these values occupy?

1 :: Int8
1 :: Integer
2^100 :: Integer
x -> x + 1
(1 :: Int8, 2 :: Int8)
[1] :: [Int8]
Just (1 :: Int8)
Nothing

据我所知,由于垃圾回收延迟,实际内存分配更高.由于延迟评估,它可能会有显着差异(并且 thunk 大小与值的大小无关).问题是,给定一个数据类型,它的值在完全评估时需要多少内存?

I understand that actual memory allocation is higher due to delayed garbage collection. It may be significantly different due to lazy evaluation (and thunk size is not related to the size of the value). The question is, given a data type, how much memory does its value take when fully evaluated?

我发现 GHCi 中有一个 :set +s 选项可以查看内存统计信息,但不清楚如何估计单个值的内存占用.

I found there is a :set +s option in GHCi to see memory stats, but it is not clear how to estimate the memory footprint of a single value.

推荐答案

(以下适用于GHC,其他编译器可能使用不同的存储约定)

(The following applies to GHC, other compilers may use different storage conventions)

经验法则:构造函数为标题花费一个词,为每个字段花费一个词.例外:没有字段的构造函数(如 NothingTrue)不占用空间,因为 GHC 创建这些构造函数的单个实例并在所有用途之间共享.

Rule of thumb: a constructor costs one word for a header, and one word for each field. Exception: a constructor with no fields (like Nothing or True) takes no space, because GHC creates a single instance of these constructors and shares it amongst all uses.

一个字在 32 位机器上是 4 个字节,在 64 位机器上是 8 个字节.

A word is 4 bytes on a 32-bit machine, and 8 bytes on a 64-bit machine.

所以例如

data Uno = Uno a
data Due = Due a b

一个Uno需要2个单词,一个Due需要3个.

an Uno takes 2 words, and a Due takes 3.

Int 类型定义为

data Int = I# Int#

现在,Int#取1个字,所以Int一共取2个.大多数未装箱的类型采用一个字,例外是 Int64#Word64#Double#(在 32 位机器上)2. GHC 实际上有一个IntChar 类型的小值的缓存,所以在很多情况下它们根本不占用堆空间.String 只需要列表单元格的空间,除非您使用 Chars > 255.

now, Int# takes one word, so Int takes 2 in total. Most unboxed types take one word, the exceptions being Int64#, Word64#, and Double# (on a 32-bit machine) which take 2. GHC actually has a cache of small values of type Int and Char, so in many cases these take no heap space at all. A String only requires space for the list cells, unless you use Chars > 255.

Int8Int 具有相同的表示.Integer 定义如下:

An Int8 has identical representation to Int. Integer is defined like this:

data Integer
  = S# Int#                            -- small integers
  | J# Int# ByteArray#                 -- large integers

所以一个小的 Integer (S#) 需要 2 个字,但是一个大的整数需要根据它的值不同的空间量.ByteArray# 需要 2 个字(标题 + 大小)加上数组本身的空间.

so a small Integer (S#) takes 2 words, but a large integer takes a variable amount of space depending on its value. A ByteArray# takes 2 words (header + size) plus space for the array itself.

注意newtype定义的构造函数是免费的.newtype 纯粹是一个编译时的想法,它在运行时不占用空间和成本.

Note that a constructor defined with newtype is free. newtype is purely a compile-time idea, and it takes up no space and costs no instructions at run time.

GHC 评论中堆对象的布局中的更多详细信息.

这篇关于Haskell 数据类型的内存占用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆