为什么这个功能会使用大量的内存? [英] Why this function uses a lot of memory?

查看:134
本文介绍了为什么这个功能会使用大量的内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图解开140万比特二元载体导入列表。
我检查这个函数的内存使用情况,但它看起来怪异。内存使用量上升到35GB(GB,而不是MB)。我如何能减少内存使用情况?

 子bin2list {
    #该子转换二元载体到的1的列表,0
    我的$矢量=转变;
    我@unpacked =拆分//(解压B *,$向量);
    返回@unpacked;}


解决方案

标量包含了大量的信息。

  $ perl的-MDevel ::皮克-e'Dump(0)
在0x42c57b8 SV = PV(0x42a8330)
  REFCNT = 1
  标志=(PADTMP,POK,只读,pPOK)
  PV = 0x42ce6700\\ 0
  CUR = 1
  LEN = 16

在以保持它们尽可能小,标量由两个存储块的 [1] ,一个固定大小的头,并可以​​升级包含更多的信息的主体

最小的类型的标,可以包含一个字符串(如拆分返回的)是一个 SVt_PV 。 (它通常称为光伏,但光伏也可以指指向字符串缓冲区的字段的名称,所以我会去与常量的名称。)

第一块是头部。


  • 任何是一个指向身体。

  • REFCNT 是一个引用计数,使Perl来知道什么时候能标被释放。

  • FLAGS 包含什么标量实际上包含的信息。 (例如 SVf_POK 表示标包含字符串)。

  • 键入包含的信息标量的类型(是什么样的信息,它的可以的遏制。)

  • 对于 SVt_PV ,最后一个字段指向字符串缓冲区。

第二块是体内。的身体 SVt_PV 具有以下字段:


  • STASH 因为他们是不是对象是不是在有问题的标量使用。

  • MAGIC 没有使用有问题的标量。魔术允许当访问变量被称为code。

  • CUR 是在缓冲区中的字符串的长度。

  • LEN 是字符串缓冲区的长度。 Perl的过度分配给加快串联。

右边的块是字符串缓冲区。正如你可能已经注意到,Perl的过度进行分配。这将加快串联。

忽略底部的块。这是为特殊字符串字符串缓冲区格式(如哈希键)。

替代

要多少钱加起来?

  $ perl的-MDevel ::大小= TOTAL_SIZE -E'say TOTAL_SIZE(0)
28#32位的Perl
56#64位的Perl

这还只是标量本身。它没有考虑到在三个存储器块的存储器分配系统的开销。


这些标量是在数组中。数组是真的只是一个标量。

因此​​,一个阵列听到。

  $ perl的-MDevel ::大小= TOTAL_SIZE -E'say TOTAL_SIZE([])'
56#32位的Perl
64#64位的Perl

这是一个空数组。你有1.4亿在你的标量,因此它需要一个可以包含1.4亿指针的缓冲区。 (在这种特殊情况下,阵列将不会过度分配,至少。)每个指针是一个64

32位系统,8 4个字节

这使总数达:


  • 32位:56 +(4 + 28)* 140,000,000 = 4480000056

  • 64位:64 +(8 + 56)* 140,000,000 = 8960000064

这不会在内存分配开销的因素,但它仍然从你给的数字有很大不同。为什么?那么,由拆分返回的标量是比数组里面的标量实际上是不同的。所以对于一个时刻,你居然在内存2.8亿标量!


内存的其余部分是由词法变量在当前不是正在执行的潜艇可能举行。词法变量,因为它的预期,子将需要记忆的下一次它被称为不正常的范围出口释放。这意味着 bin2list 继续退出后使用了内存140MB。


脚注


  1. 标量是不确定的可以逃脱没有身体,直到值分配给他们。仅包含一个整数标量可以蒙混过关,而不通过存储在同一领域作为 SVt_PV 存储指向字符串缓冲区。整数分配为人体内存块


该图片均来自 illguts 。他们均受版权保护。

I'm trying to unpack binary vector of 140 Million bits into list. I'm checking the memory usage of this function, but it looks weird. the memory usage rises to 35GB (GB and not MB). how can I reduce the memory usage?

sub bin2list {
    # This sub translates a binary vector to a list of "1","0" 
    my $vector = shift;
    my @unpacked = split //, (unpack "B*", $vector );
    return @unpacked;

}

解决方案

Scalars contain a lot of information.

$ perl -MDevel::Peek -e'Dump("0")'
SV = PV(0x42a8330) at 0x42c57b8
  REFCNT = 1
  FLAGS = (PADTMP,POK,READONLY,pPOK)
  PV = 0x42ce670 "0"\0
  CUR = 1
  LEN = 16

In order to keep them as small as possible, a scalar consists of two memory blocks[1], a fixed-sized head, and a body that can be "upgraded" to contain more information.

The smallest type of scalar that can contain a string (such as the ones returned by split) is a SVt_PV. (It's usually called PV, but PV can also refer to the name of the field that points to the string buffer, so I'll go with the name of the constant.)

The first block is the head.

  • ANY is a pointer to the body.
  • REFCNT is a reference count that allows Perl to know when the scalar can be deallocated.
  • FLAGS contains information about what the scalar actually contains. (e.g. SVf_POK means the scalar contains a string.)
  • TYPE contains information the type of scalar (what kind of information it can contain.)
  • For an SVt_PV, the last field points to the string buffer.

The second block is the body. The body of an SVt_PV has the following fields:

  • STASH is not used in the scalars in question since they're not objects.
  • MAGIC is not used for the scalars in question. Magic allows code to be called when the variable is accessed.
  • CUR is the length of the string in the buffer.
  • LEN is the length of the string buffer. Perl over-allocates to speed up concatenation.

The block on the right is the string buffer. As you might have noticed, Perl over-allocates. This speeds up concatenation.

Ignore the block on the bottom. It's an alternative to the string buffer format for special strings (e.g. hash keys).

To how much does that add up?

$ perl -MDevel::Size=total_size -E'say total_size("0")'
28   # 32-bit Perl
56   # 64-bit Perl

That's just for the scalar itself. It doesn't take into the overhead in the memory allocation system of three memory blocks.


These scalars are in an array. An array is really just a scalar.

So an array has overheard.

$ perl -MDevel::Size=total_size -E'say total_size([])'
56   # 32-bit Perl
64   # 64-bit Perl

That's an empty array. You have 140 million of the scalars in yours, so it needs a buffer that can contain 140 million pointers. (In this particular case, the array won't be over-allocated, at least.) Each pointer is 4 bytes on a 32-bit system, 8 on a 64.

That brings the total up to:

  • 32-bit: 56 + (4 + 28) * 140,000,000 = 4,480,000,056
  • 64-bit: 64 + (8 + 56) * 140,000,000 = 8,960,000,064

That doesn't factor in the memory allocation overhead, but it's still very different from the numbers you gave. Why? Well, the scalars returned by split are actually different than the scalars inside the array. So for a moment, you actually have 280,000,000 scalars in memory!


The rest of the memory is probably held by lexical variables in subs that aren't currently executing. Lexical variables aren't normally freed on scope exit since it's expected that the sub will need the memory the next time it's called. That means bin2list continues to use up 140MB of memory after it exits.


Footnotes

  1. Scalars that are undefined can get away without a body until a value is assigned to them. Scalars that contain only an integer can get away without allocating a memory block for the body by storing the integer in the same field as a SVt_PV stores the pointer to the string buffer.


The images are from illguts. They are protected by Copyright.

这篇关于为什么这个功能会使用大量的内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆