在 Matlab 中截断大数组的内存高效方法 [英] Memory-efficient way to truncate large array in Matlab
问题描述
我在 Matlab 中有一个大型(多 GB)数组,我想截断它¹.天真地,我认为截断不需要太多内存,但后来我意识到它可能可以:
I have a large (multi-GB) array in Matlab, that I want to truncate¹. Naively, I thought that truncating can't need much memory, but then I realised that it probably can:
>> Z = zeros(628000000, 1, 'single');
>> Z(364000000:end) = [];
Out of memory. Type HELP MEMORY for your options.
除非 Matlab 做了一些巧妙的优化,否则在截断 Z
之前,这段代码实际上创建一个数组(double 类型!)364000000:628000000
.我不需要这个数组,所以我可以做:
Unless Matlab does some clever optimisations, before truncating Z
, this code actually creates an array (of type double!) 364000000:628000000
. I don't need this array, so I can do instead:
>> Z = Z(1:363999999);
在这种情况下,第二个示例有效,并且适合我的目的.但是为什么有效?如果 Z(364000000:end) = 0
由于中间数组 364000000:628000000
所需的内存而失败,那么为什么 Z = Z(1:363999999)
由于中间数组 1:363999999
所需的内存而失败,即 更大?当然,我不需要需要这个中间数组,并且如果没有任何中间数组就可以截断我的数组的解决方案,或者如果 Matlab 优化了特定方法,我会很高兴.
In this case, the second example works, and is fine for my purpose. But why does it work? If Z(364000000:end) = 0
fails due to the memory needed for the intermediate array 364000000:628000000
, then why does not Z = Z(1:363999999)
fail due to the memory needed for the intermediate array 1:363999999
, that is larger? Of course, I don't need this intermediate array, and would be happy with either a solution that truncates my array without having any intermediate array, or, failing that, if Matlab optimises a particular method.
- 有没有办法在不创建中间索引数组的情况下截断数组?
- 如果不是,上述方法中的任何一种是否比另一种更节省内存(似乎不是)?如果是这样,为什么?Matlab真的在两个例子中都创建了中间数组吗?
- Is there any way to truncate an array without creating an intermediate indexing array?
- If not, is either of the aforementioned methods more memory-efficient than the other (it appears ot is)? If so, why? Does Matlab really create intermediate arrays in both examples?
¹原因:我正在处理数据,但不知道要预分配多少.我做了一个有根据的猜测,我经常分配太多.我根据可用内存选择块大小,因为拆分成更少的块意味着更快的代码.所以我想避免任何不必要的内存使用.另请参阅这篇关于按块分配的帖子.
推荐答案
我在具有 24GB RAM 的机器上使用 profile('-memory','on');
运行了这两个示例.此分析器选项将显示在每一行代码上分配和释放的内存.这些应该是总而非净额.我检查了一个简单的函数,它有 net 0 free 和 alloc,它报告了总金额.但是,没有 .m 代码支持它们的内置命令似乎不会向分析器提供细粒度的内存报告.
I ran both examples on a machine with 24GB of RAM with profile('-memory','on');
. This profiler option will show memory allocated and freed on each line of code. These are supposed to be gross not net amounts. I checked with a simple function that has net 0 free and alloc and it reported the gross amounts. However, it seems likely that builtin commands with no .m code to back them do not give fine-grained memory reporting to the profiler.
我对以下代码进行了几次测试:
I ran a couple tests for the following code:
% truncTest.m
N = 628000000;
M = 364000000;
clear Z
Z = zeros(N,1,'single');
Z(M:end) = [];
Z(1) % just because
clear Z
Z = zeros(N,1,'single');
Z = Z(1:M);
Z(1)
就它们的价值而言,这个 N
和 M
的内存分析结果是:
For what they are worth, the memory profiling results for this N
and M
are:
好吧,这两行看起来在分配和释放的内存方面是一样的.也许这不是全部.
Well, both lines look the same in terms of memory allocated and freed. Maybe that's not the whole truth.
所以,出于好奇,我将 M
减少到 200
(仅 200!)而不更改 N
,profile 是否清除
并重新运行.分析声明:
So, out of curiosity I decreased M
to 200
(just 200!) without changing N
, did profile clear
and reran. Profiling claims:
有趣的是,Z=Z(1:M);
现在几乎是瞬时的,而 Z(M:end)=[];
更快一点.正如预期的那样,两者都释放了大约 2.4GB 的内存.
Interestingly, Z=Z(1:M);
is practically instantaneous now, and Z(M:end)=[];
is a little faster. Both free about 2.4GB of memory, as expected.
最后,如果我们往另一个方向走,设置M=600000000;
:
Finally, if we go the other direction and set M=600000000;
:
现在即使 Z=Z(1:M);
也很慢,但比 Z(M:end)=[]; 快 两倍
.
Now even Z=Z(1:M);
is slow, but about twice as fast as Z(M:end)=[];
.
这表明:
Z=Z(1:M);
只是抓取指定的元素,将它们存储在一个新的缓冲区或临时变量中,释放旧的缓冲区并将新的/临时的分配给数组Z
.我能够让我的较弱的 4GB 机器从 2.45 秒到处理页面文件 5 分钟,只需 增加M
并留下N
.对于小的M/N
,绝对更喜欢这个选项,可能在所有情况下.Z(M:end)=[];
总是重写缓冲区,执行时间也随着M
的增加而增加.实际上总是更慢,而且似乎呈指数增长,不像Z=Z(1:M);
.- 内存分析不会提供有关这些内置操作的细粒度信息,不应被误解为提供在命令执行期间释放和分配的内存总量,而是净变化.
Z=Z(1:M);
just grabs the indicated elements, stores them in a new buffer or temporary variable, releases the old buffer and assigns the new/temporary to the arrayZ
. I was able to make my weaker 4GB machine go from 2.45 seconds to thrashing the page file for 5 minutes just by increasingM
and leavingN
alone. Definitely prefer this option for smallM/N
, probably in all cases.Z(M:end)=[];
always rewrites the buffer, and execution time increases withM
too. Actually always slower, and seems to increase exponentially, unlikeZ=Z(1:M);
.- Memory profiling does not give fine-grained information about these builtin operations and should not be misinterpreted as giving a total of memory freed and allocated over the commands execution, but rather a net change.
更新 1:只是为了好玩,我在 M
的一系列值上对测试进行计时:
UPDATE 1: Just for fun I timed the tests at a range of values of M
:
显然比分析提供的信息更多.两种方法都不是no-ops,但是Z=Z(1:M);
是最快的,但是对于M,它几乎可以使用
接近 1.Z
的两倍的内存/N
Clearly more informative than the profiling. Both methods are not no-ops, but Z=Z(1:M);
is fastest, but it can use almost double the memory of Z
for M/N
near 1.
更新 2:
在 R2008b 之前的 32 位 Windows 中提供了一个相对未知的 功能
,称为 mtic
(和 mtoc
).我仍然将它安装在一台机器上,所以我决定看看这是否提供了更多的洞察力,并了解到 (a) 从那时起发生了很大的变化,并且 (b) 它是 32 位 MATLAB 中使用的完全不同的内存管理器.尽管如此,我还是将测试大小减少到 N=128000000;M=101000000;
看看.首先,feature mtic
for Z=Z(1:M-1);
A relatively unknown feature
called mtic
(and mtoc
) were available in 32-bit Windows prior to R2008b. I still have it installed on one machine, so I decided to see if that provides any more insight, with the understanding that (a) much has changed since then and (b) it's a completely different memory manager used in 32-bit MATLAB. Still, I reduced the test size to N=128000000; M=101000000;
and had a look. First, feature mtic
for Z=Z(1:M-1);
>> tic; feature mtic; Z=Z(1:M-1); feature mtoc, toc
ans =
TotalAllocated: 808011592
TotalFreed: 916009628
LargestAllocated: 403999996
NumAllocs: 86
NumFrees: 77
Peak: 808002024
Elapsed time is 0.951283 seconds.
清理,重新创建Z
,另一种方式:
Clearing up, recreating Z
, the other way:
>> tic; feature mtic; Z(M:end) = []; feature mtoc, toc
ans =
TotalAllocated: 1428019588
TotalFreed: 1536018372
LargestAllocated: 512000000
NumAllocs: 164
NumFrees: 157
Peak: 1320001404
Elapsed time is 4.533953 seconds.
在每个指标(TotalAllocated
、TotalFreed
、NumAllocs
等)中,Z(M:end) = [];
的效率低于 Z=Z(1:M-1);
.我希望可以通过检查 N
和 M
的这些值的这些数字来辨别内存中发生了什么,但我们猜测的是旧的 MATLAB
In every metric (TotalAllocated
, TotalFreed
, NumAllocs
, etc.), Z(M:end) = [];
is less efficient than Z=Z(1:M-1);
. I expect it is possible to discern what is going on in memory by examining these numbers for these values of N
and M
, but we'd be guessing about an old MATLAB
这篇关于在 Matlab 中截断大数组的内存高效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!