MATLAB:将几个变量保存到“-v7.3” (HDF5).mat-files在使用“-append”时似乎更快。旗。怎么来的? [英] MATLAB: Saving several variables to "-v7.3" (HDF5) .mat-files seems to be faster when using the "-append" flag. How come?

查看:1808
本文介绍了MATLAB:将几个变量保存到“-v7.3” (HDF5).mat-files在使用“-append”时似乎更快。旗。怎么来的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

注意: 这个问题涉及到一个旧版MATLAB(R2009a)在2011年观察到的问题。根据以下的更新,从2016年7月开始,MATLAB中的问题似乎已经不复存在了(用R2016a进行测试;向下滚动到问题的结尾来查看更新)

我正在使用MATLAB R2009b,而且我需要编写一个更大的脚本,将更大的.zip文件集的内容转换为v7.3 mat文件(具有基础的HDF5数据模型)。阅读是好的。问题是保存。而且实际上没有问题。我的文件使用 save 命令很好地保存。



我的问题更多的是这样的:为什么我观察到以下令人惊讶的事情在MATLAB中的行为?



让我们看看我的问题一般。在这个当前的测试场景中,我将生成一个输出:A -v7.3 mat-file。这个.mat文件将包含40个作为单个变量。每个变量都将被命名为block_NNN,从1到40,并且将包含一个带有框架 blockNo 字段的结构。字段框架包含一个480x240x65的uint8 imagedata序列(这里只是使用 randi 生成的随机数据)。字段 blockNo 包含块号。



备注:在真实脚本中(我还没有完成),我会做以上共计370次,共转换原始数据108GB。这就是为什么我关心以下几点。

无论如何,首先我要定义一些常规变量:

 
%假数据和循环的一些大小:
num_blockCount = 40;
num_blockLength = 65;
num_frameHeight = 480;
num_frameWidth = 240;

然后我生成一些形状和大小与实际原始数据相同的虚拟代码:

 
%生成空结构:
stu_data2disk = struct();

%循环块:
为num_k = 1:num_blockCount
$ b $生成块名称:
temp_str_blockName = sprintf('block_%03u' ,num_k);

%为当前块生成临时结构:
temp_stu_value = struct();
temp_stu_value.frames = randi(...
[0 255],...
[num_frameHeight num_frameWidth num_blockLength],...
'uint8'...
);
temp_stu_value.blockNo = num_k;

%使用动态字段名称:
stu_data2disk。(sprintf('block_%03u',num_k))= temp_stu_value;

end

现在我已经将所有随机测试数据存储在struct stu_data2disk 。现在我想用两种可能的方法之一来保存数据:

首先尝试一下简单的数据:

< pre
%保存数据(简单):
disp('保存数据的简单方法:')
tic;
保存converted.mat -struct stu_data2disk -v7.3;
toc;

文件写入没有问题(286MB)。输出是:

 
保存数据的简单方法:
已用时间为14.004449秒。

确定 - 然后我想起来,我想遵循40个块的保存程序。因此,而不是上面我循环块,并按顺序追加:

 
%保存到文件,使用追加:
disp('使用-append:'保存数据)
tic;
for num_k = 1:num_blockCount
$ b $ generate block-name:
temp_str_blockName = sprintf('block_%03u',num_k);

temp_str_appendToggle ='';
if(num_k> 1)
temp_str_appendToggle ='--append';
end

%保存命令:
temp_str_saveCommand = [...
'save',... $ b $''converted_append.mat',。 ..
'-struct stu_data2disk',temp_str_blockName,''...
temp_str_appendToggle,'',...
'-v7.3',...
' ;'...
];

%评估保存命令:
eval(temp_str_saveCommand);

结束
toc;

文件再次保存(286MB)。输出是:

 
使用-append:$ b $保存数据已用时间为0.956968秒。

有趣的是append-method更快吗? 我的问题是为什么?


$ b

dir转换的输出* .mat

 
09-02-2011 20:38 300,236,392 converted.mat
09-02-2011 20: 37 300,264,316 converted_append.mat
2文件600,500,708字节

文件大小不一致。而在Windows 7中测试 fc 显示了很多二进制差异。也许数据有点偏移了 - 这就告诉我们什么都没有。



有人知道这里发生了什么吗?附加的文件是否使用了更加优化的数据结构?或者,也许Windows已经缓存了文件,并使访问速度更快?

我做了两个文件的测试读取努力。如果不在这里提供数字,附加的版本会更快一些(可能意味着长期来说)。我只是尝试使用没有格式标志(默认为-v7在我的系统),并没有太大的区别了:

$ $ p
保存数据的简单方式(-v7):
已用时间为13.092084秒。
使用-append(-v7)保存数据:
已用时间为14.345314秒。

:我纠正了上述错误。以前我提到的是统计是-v6,但是我错了。我刚刚删除了格式标志,并假设默认是-v6,但实际上它是-v7。

我已经为我的系统上的所有格式创建了新的测试统计使用安德鲁精美的框架(所有的格式都是相同的随机测试数据,现在从文件中读取):

 
15:15:51.422:测试速度, format = -v6,PC200上的R2009b,arch = x86,os = Microsoft Windows 7 Professional 6.1.7600 N / A Build 7600
15:16:00.829:保存简单方法:0.358 sec
15: 16:01.188:使用多个追加保存:7.432秒
15:16:08.614:使用一个大的追加保存:1.161秒

15:16:24.659:测试速度,format = -v7 ,PC200上的R2009b,arch = x86,os = Microsoft Windows 7 Professional 6.1.7600 N / A Build 7600
15:16:33.442:保存简单方法:12.884 sec
15:16:46.329:使用多个追加保存:14.442秒
15:17:00.775:使用一个大的追加保存:13.390秒

15:17:31.579:测试速度,format = -v7.3,R2009b在PCWIN上,arch = x86,os = Microsoft Windows 7 Professional 6.1.7600 N / A Build 7600
15:17:40.690:保存简单方法:13.751 sec
15:17:54.434:使用多个append保存:3.970秒
15:17:58.412:使用一个大的追加保存:6.138秒

文件大小:

 
转换格式-v6.mat
10-02-2011 15:16 299,528,768 converted_append_format -v6.mat
10-02-2011 15:16 299,528,832 converted_append_batch_format-v6.mat
10-02-2011 15:16 299,894,027 converted_format-v7.mat
10-02-2011 15 :17 299,894,027 converted_append_format-v7.mat
10-02-2011 15:17 299,894,075 converted_append_batch_format-v7.mat
10-02-2011 15:17 300,236,392 converted_format-v7.3.mat
10-02-2011 15:17 300,264,316 converted_append_format-v7.3.mat
10-02-2011 15:18 300,101,800 converted_append_batch_format-v7.3.mat
9文件2,698,871,005字节

因此-v6似乎是写作中最快的。也没有任何大的文件大小差异。据我所知,HDF5确实有一些基本的内置方法。



嗯,可能是底层HDF5写入函数的一些优化?现在我仍然认为一些底层的基础HDF5编写函数是为了向HDF5文件中添加数据集而优化的(这正是在向-7.3文件添加新变量时发生)。我相信我已经读过HDF5应该以这种方式进行优化...虽然不能确定。



其他细节要注意:

这个行为是非常系统的,正如我们在下面Andrew的答案中看到的。对于是否在局部函数范围内或在m脚本的全局中运行这些东西似乎也是非常重要的。我的第一个结果来自一个m脚本,其中文件被写入当前目录。我仍然只能在m脚本中重现-7.3的1秒写入。
$ b

2016年7月更新

我再次发现这一点,并认为我可以用目前可用的最新MATLAB来测试它。使用Windows 7 x64上的MATLAB R2016a,问题似乎已经得到解决:
$ b $ $ p
14:04:06.277测试速度imax = 255,在PCWIN64,arch = AMD64,16 GB,os = Microsoft Windows 7企业版本6.1(内部版本7601:Service Pack 1)上的R2016a
14:04:10.600:basic -v7.3:7.599 sec 5.261 GB used
14:04:18.229:basic -v7.3:7.894 sec 5.383 GB使用
14:04:26.154:basic -v7.3:7.909 sec 5.457使用GB
14:04:34.096:基本-v7.3:7.919秒5.498 GB使用
14:04:42.048:基本-v7.3:7.886秒5.516 GB使用286 MB文件7.841秒平均值
14:04:50.581:multiappend - v7.3:7.928 sec 5.819 GB使用
14:04:58.544:multiappend -v7.3:7.905 sec 5.834 GB使用
14:05:06.485:multiappend -v7.3:8.013 sec 5.844 GB使用
14:05:14.542:multiappend -v7.3:8.591 sec 5.860 G B使用
14:05:23.168:multiappend -v7.3:8.059 sec 5.868 GB used 286 MB file 8.099 sec mean
14:05:31.913:bigappend -v7.3:7.727 sec 5.837 GB used
14:05:39.676:bigappend -v7.3:7.740 sec 5.879 GB used
14:05:47.453:bigappend -v7.3:7.645 sec 5.884 GB used
14:05: 55.133:bigappend -v7.3:7.656秒5.877 GB使用
14:06:02.824:bigappend -v7.3:7.963秒5.871 GB使用286 MB文件7.746秒均值

在接受的答案(5遍格式为7.3)中使用了Andrew Janke的 reproMatfileAppendSpeedup 函数进行了测试。现在, -append 对于单个保存同样缓慢或更慢,因为它应该是。也许这是在R2009a中使用HDF5驱动程序的早期版本的问题。

我可以重现。也尝试了单追加的变化;它甚至更快。看起来像-append只是神奇地使基于HDF5的save()快了30倍。我没有一个解释,但我想分享我发现。

我把你的测试代码封装在一个函数中,重构它以使保存逻辑不可知测试数据结构,所以你可以运行它在其他数据集,并添加了一些更多的诊断输出。

没有看到到处的大加速。在我的64位XP盒子和一个32位的Server 2003盒子上,我的64位Windows 7盒子很大,在32位XP盒子上不存在。 (尽管在Server 2003上有多个附件是巨大的损失。)在很多情况下,R2010b速度较慢。也许HDF5追加或保存的使用只是在较新的Windows版本上摇摆。 (XP的x64实际上是Server 2003的内核。)或者也许这只是一个机器配置的差异。在XP x64机器上有一个快速的RAID,而32位XP的RAM比其他的少。你运行什么操作系统和体系结构?你可以试试这个repro吗?

  19:36:40.289:在PCWIN64上测试速度format = -v7.3,R2009b ,arch = AMD64,os = Microsoft(R)Windows(R)XP Professional x64 Edition 5.2.3790 Service Pack 2 Build 3790 
19:36:55.930:保存简单的方法:11.493 sec
19: 37:07.415:保存使用多个附加:1.594秒
19:37:09.009:保存使用一个大的附加值:0.424秒

$ b 19:39:21.681:测试速度, format = -v7.3,R2009b在PCWIN上,arch = x86,os = Microsoft Windows XP Professional 5.1.2600 Service Pack 3 Build 2600
19:39:37.493:保存简单的方法:10.881 sec
19:39:48.368:保存使用多个附加:10.187秒
19:39:58.556:使用一个大附加节省:11.956秒


19:44:33.410:测试格式= -v7.3,R2009b在PCWIN64,拱= AMD64,操作系统=微软Windows 7专业版6.1.7600不适用内置7600
19:44:50.789:保存简单的方法:14.354秒
19:45:05.156:使用多个appe保存nd:6.321秒
19:45:11.474:使用一个大的附加值保存:2.143秒


20:03:37.907:测试速度,format = -v7.3, R2009b在PCWIN上,arch = x86,os = Microsoft(R)Windows(R)Server 2003,Enterprise Edition 5.2.3790 Service Pack 2 Build 3790
20:03:58.532:保存简单方法:19.730 sec
20:04:18.252:使用多个追加保存:77.897秒
20:05:36.160:使用一个大追加保存:0.630秒

这看起来很大。如果它支持其他数据集,我可能会在很多地方使用这个技巧。这也可能是MathWorks提出来的。他们可以在普通保存或其他操作系统版本中使用快速附加技术吗?

这里是自包含的repro函数。

  function out = reproMatfileAppendSpeedup(nPasses,tests,imax,formats)
%REPROMATFILEAPPENDSPEEDUP显示如何使用vap3让v7.3节省更多

%示例:
%reproMatfileAppendSpeedup()
reproMatfileAppendSpeedup(2,[],0,{'7.3','7','6'}); %低熵测试

如果 2 ||的isEmpty(测试); tests = {'basic','multiappend','bigappend'};如果nargin 3 ||的isEmpty(IMAX); imax = 255;如果nargin 4 ||的isEmpty(格式); formats ='7.3';结束%-v7和-v6不显示加速
tests = cellstr(tests);
formats = cellstr(formats);

fprintf('%s:测试速度,imax =%d,R%s on%s\\\
',...
timestamp,imax,version(' - release' ), 系统描述());

tempDir = setupTempDir();
testData = generateTestData(imax);

testMap = struct('basic','saveSimple','multiappend','saveMultiAppend','bigappend','saveBigAppend');

为iFormat = 1:numel(格式)
格式=格式{iFormat};
formatFlag = ['-v'format];
%fprintf('%s:Format%s \\\
',timestamp,formatFlag);
for iTest = 1:numel(tests)
testName = tests {iTest};
saveFcn = testMap。(testName);
te = NaN(1,nPasses);
为iPass = 1:nPasses
fprintf('%s:%-30s',timestamp,[testName''formatFlag':']);
t0 = tic;
matFile = fullfile(tempDir,sprintf('converted-%s-%s-%d.mat',testName,format,i));
feval(saveFcn,matFile,testData,formatFlag);
te(iPass)= toc(t0);
if iPass == nPasses
fprintf('%7.3f sec%5.3ff GB used%5.0f MB file%5.3f sec mean'\
',...
te(iPass ),physicalMemoryUsed /(2 ^ 30),getfield(dir(matFile),'bytes')/(2 ^ 20),mean(te));
else
fprintf('%7.3f sec%5.3f'GB used \\\
',te(iPass),physicalMemoryUsed /(2 ^ 30));
end
end
%验证数据以确保我们正常
gotBack = load(matFile);
gotBack = rmfield(gotBack,intersect({'dummy'},fieldnames(gotBack)));
if〜isequal(gotBack,testData)
fprintf('ERROR:加载的数据不同于原来的%s%s \ n',formatFlag,testName);
end
end
end

%清理
rmdir(tempDir,'s');
$ b $%
function saveSimple(file,data,formatFlag)
save(file,'-struct','data',formatFlag);

%%
函数out = physicalMemoryUsed()
if〜ispc
out = NaN;
return; %memory()只适用于Windows
end
[u,s] = memory();
out = s.PhysicalMemory.Total - s.PhysicalMemory.Available;

%%
函数saveBigAppend(file,data,formatFlag)
dummy = 0;
save(file,'dummy',formatFlag);
fieldNames = fieldnames(data);
save(file,'-struct','data',fieldNames {:},'-append',formatFlag);

%%
function saveMultiAppend(file,data,formatFlag)
fieldNames = fieldnames(data);
for i = 1:numel(fieldNames)
if(i> 1); appendFlag ='-append';其他; appendFlag ='';结束
保存(文件,' - 结构','数据',fieldNames {我},appendFlag,formatFlag);
end


%%
function testData = generateTestData(imax)
nBlocks = 40;
blockSize = [65 480 240];
for i = 1:nBlocks
testData。(sprintf('block_%03u',i))= struct('blockNo',i,...
'frames',randi [0 imax],blockSize,'uint8'));
end
$ b $%
function out = timestamp()
%TIMESTAMP显示时间戳以确保它不是一个tic / toc问题
out = datestr(现在,'HH:MM:SS.FFF');

$%
$ b $ function $ = $系统描述()
如果ispc
platform = [system_dependent('getos'),'',system_dependent('getwinsys')] ;
elseif ismac
[fail,input] = unix('sw_vers');
if〜fail
platform = strrep(input,'ProductName:','');
platform = strrep(platform,sprintf('\t'),'');
platform = strrep(platform,sprintf('\\\
'),'');
platform = strrep(platform,'ProductVersion:','Version:');
platform = strrep(platform,'BuildVersion:','Build:');
else
platform = system_dependent('getos');
end
else
platform = system_dependent('getos');
end
arch = getenv('PROCESSOR_ARCHITEW6432');
if isempty(arch)
arch = getenv('PROCESSOR_ARCHITECTURE');
end
try
[〜,sysMem] = memory();
catch
sysMem.PhysicalMemory.Total = NaN;
end $ b $ out = sprintf('%s,arch =%s,%.0f GB,os =%s',...
computer,arch,sysMem.PhysicalMemory.Total / (2 ^ 30),平台);

%%
function out = setupTempDir()
out = fullfile(tempdir,sprintf('%s - %s',mfilename,datestr(now,'yyyymmdd -HHMMSS -FFF')));
mkdir(out);

编辑:我修改了repro函数,添加了多个迭代并为保存样式,文件格式,和imax为randi生成器。

我认为文件系统缓存是快速启动行为的一个重要因素。当我用reproMatfileAppendSpeedup(20)连续执行一堆运行,并在Process Explorer中观察系统信息时,大多数情况都在一秒之内,物理内存使用情况迅速增加了几GB。然后,每打十几遍,写入停顿并花费20或30秒,物理内存使用量缓慢下降到大约开始的地方。我认为这意味着Windows正在RAM中缓存大量的写入内容,而关于-append的内容使得它更愿意这样做。但是,包括这些摊位在内的摊销时间比我的基本储蓄还要快很多。顺便说一句,经过几个小时的多次传球之后,很难再现原来的时机。

NOTE: This question deals with an issue observed back in 2011 with an old MATLAB version (R2009a). As per the update below from July 2016, the issue/bug in MATLAB seems to no longer exist (tested with R2016a; scroll down to end of question to see update).

I am using MATLAB R2009b and I need to write a larger script that converts the contents of a larger set of .zip files to v7.3 mat files (with an underlying HDF5-datamodel). Reading is OK. The issue is with saving. And there is actually no problem. My files saves nicely using the save command.

My question is more in the sense: Why am I observing the following surprising (for me) behavior in MATLAB?

let's look at my issue in general. In this current test-scenario I will be generating one output: A -v7.3 mat-file. This .mat-file will contain 40 blocks as individual variables. Each variable will be named "block_NNN" from 1 to 40 and will contain a struct with fields frames and blockNo. Field frames contains a 480x240x65 sequence of uint8 imagedata (here just random data generated using randi). Field blockNo contains the block number.

Remark: In the real script (that I have yet to finish) I will be doing the above at total of 370 times, converting a total of 108GB of raw data. Which is why I am concerned with the following.

Anyway, first I define some general variables:

% some sizes for dummy data and loops:
num_blockCount = 40;
num_blockLength = 65;
num_frameHeight = 480;
num_frameWidth = 240;

I then generate some dummy code that has shape and size identical to the actual raw data:

% generate empty struct:
stu_data2disk = struct();

% loop over blocks:
for num_k = 1:num_blockCount

   % generate block-name:
   temp_str_blockName = sprintf('block_%03u', num_k);

   % generate temp struct for current block:
   temp_stu_value = struct();
   temp_stu_value.frames = randi( ...
      [0 255], ...
      [num_frameHeight num_frameWidth num_blockLength], ...
      'uint8' ...
   );
   temp_stu_value.blockNo = num_k;

   % using dynamic field names:
   stu_data2disk.(sprintf('block_%03u', num_k)) = temp_stu_value;

end

I now have all my random test-data in a struct stu_data2disk. Now I would like to save the data using one of two possible methods.

Let's try the simple one first:

% save data (simple):
disp('Save data the simple way:')
tic;
save converted.mat -struct stu_data2disk -v7.3;
toc;

The file is written without problems (286MB). The output is:

Save data the simple way:
Elapsed time is 14.004449 seconds.

OK - then I remembered that I would like to follow the save-procedure over the 40 blocks. Thus instead of the above I loop over the blocks and append them in sequence:

% save to file, using append:
disp('Save data using -append:')
tic;
for num_k = 1:num_blockCount

   % generate block-name:
   temp_str_blockName = sprintf('block_%03u', num_k);

   temp_str_appendToggle = '';
   if (num_k > 1)
      temp_str_appendToggle = '-append';
   end

   % generate save command:
   temp_str_saveCommand = [ ...
      'save ', ...
      'converted_append.mat ', ...
      '-struct stu_data2disk ', temp_str_blockName, ' '...
      temp_str_appendToggle, ' ', ...
      '-v7.3', ...
      ';' ...
   ];

   % evaluate save command:
   eval(temp_str_saveCommand);

end
toc;

And again the file saves nicely (286MB). The output is:

Save data using -append:
Elapsed time is 0.956968 seconds.

Interestingly the append-method is much faster? My question is why?

Output from dir converted*.mat:

09-02-2011  20:38       300,236,392 converted.mat
09-02-2011  20:37       300,264,316 converted_append.mat
               2 File(s)    600,500,708 bytes

The files are not identical in size. And a test with fc in windows 7 revealed ... well many binary differences. Perhaps the data was shifted a bit - thus this tells us nothing.

Does someone have an idea what is going on here? Is the appended file using a much more optimized data-structure perhaps? Or maybe windows has cached the file and makes access to it much faster?

I made the effort of test-reading from the two files as well. Without presenting the numbers here the appended version was a little bit faster (could mean something in the long run though).

[EDIT]: I just tried using no format flag (defaults to -v7 on my system) and there is not much difference anymore:

Save data the simple way (-v7):
Elapsed time is 13.092084 seconds.
Save data using -append (-v7):
Elapsed time is 14.345314 seconds.

[EDIT]: I corrected the above mistake. Previously I mentioned that the stats were for -v6 but I was mistaken. I had just removed the format flag and assumed the default was -v6 but actually it is -v7.

I have created new test stats for all formats on my system using Andrew's fine framework (all formats are for the same random test data, now read from file):

15:15:51.422: Testing speed, format=-v6, R2009b on PCWIN, arch=x86, os=Microsoft Windows 7 Professional  6.1.7600 N/A Build 7600
15:16:00.829: Save the simple way:            0.358 sec
15:16:01.188: Save using multiple append:     7.432 sec
15:16:08.614: Save using one big append:      1.161 sec

15:16:24.659: Testing speed, format=-v7, R2009b on PCWIN, arch=x86, os=Microsoft Windows 7 Professional  6.1.7600 N/A Build 7600
15:16:33.442: Save the simple way:           12.884 sec
15:16:46.329: Save using multiple append:    14.442 sec
15:17:00.775: Save using one big append:     13.390 sec

15:17:31.579: Testing speed, format=-v7.3, R2009b on PCWIN, arch=x86, os=Microsoft Windows 7 Professional  6.1.7600 N/A Build 7600
15:17:40.690: Save the simple way:           13.751 sec
15:17:54.434: Save using multiple append:     3.970 sec
15:17:58.412: Save using one big append:      6.138 sec

And the sizes of the files:

10-02-2011  15:16       299,528,768 converted_format-v6.mat
10-02-2011  15:16       299,528,768 converted_append_format-v6.mat
10-02-2011  15:16       299,528,832 converted_append_batch_format-v6.mat
10-02-2011  15:16       299,894,027 converted_format-v7.mat
10-02-2011  15:17       299,894,027 converted_append_format-v7.mat
10-02-2011  15:17       299,894,075 converted_append_batch_format-v7.mat
10-02-2011  15:17       300,236,392 converted_format-v7.3.mat
10-02-2011  15:17       300,264,316 converted_append_format-v7.3.mat
10-02-2011  15:18       300,101,800 converted_append_batch_format-v7.3.mat
               9 File(s)  2,698,871,005 bytes

Thus -v6 seems to be the fastest for writing. Also not any large differences in files sizes. HDF5 does have some basic inflate-method built-in as far as I know.

Hmm, probably some optimization in the underlying HDF5-write functions?

Currently I still think that some underlying fundamental HDF5-write function is optimized for adding datasets to an HDF5-file (which is what happens when adding new variables to a -7.3 file). I believe I have read somewhere that HDF5 should optimized in this very way... though cannot be sure.

Other details to note:

The behavior is very systemic as we see in Andrew's answer below. It also seems to be quite important as to whether or not you run these things in a local scope of a function or in the "global" of an m-script. My first results were from an m-script where files were written to the current directory. I can still only reproduce the 1-second write for -7.3 in the m-script. The function-calls add some overhead apparently.

Update July 2016:

I found this again and thought I might test it with the newest MATLAB available to me at the moment. With MATLAB R2016a on Windows 7 x64 the problem seems to have been fixed:

14:04:06.277: Testing speed, imax=255, R2016a on PCWIN64, arch=AMD64, 16 GB, os=Microsoft Windows 7 Enterprise  Version 6.1 (Build 7601: Service Pack 1)
14:04:10.600: basic -v7.3:                    7.599 sec      5.261 GB used
14:04:18.229: basic -v7.3:                    7.894 sec      5.383 GB used
14:04:26.154: basic -v7.3:                    7.909 sec      5.457 GB used
14:04:34.096: basic -v7.3:                    7.919 sec      5.498 GB used
14:04:42.048: basic -v7.3:                    7.886 sec      5.516 GB used     286 MB file   7.841 sec mean
14:04:50.581: multiappend -v7.3:              7.928 sec      5.819 GB used
14:04:58.544: multiappend -v7.3:              7.905 sec      5.834 GB used
14:05:06.485: multiappend -v7.3:              8.013 sec      5.844 GB used
14:05:14.542: multiappend -v7.3:              8.591 sec      5.860 GB used
14:05:23.168: multiappend -v7.3:              8.059 sec      5.868 GB used     286 MB file   8.099 sec mean
14:05:31.913: bigappend -v7.3:                7.727 sec      5.837 GB used
14:05:39.676: bigappend -v7.3:                7.740 sec      5.879 GB used
14:05:47.453: bigappend -v7.3:                7.645 sec      5.884 GB used
14:05:55.133: bigappend -v7.3:                7.656 sec      5.877 GB used
14:06:02.824: bigappend -v7.3:                7.963 sec      5.871 GB used     286 MB file   7.746 sec mean

This was tested with Andrew Janke's reproMatfileAppendSpeedup function in the accepted answer below (5 passes with format 7.3). Now, -append is equally slow, or slower, to a single save - as it should be. Perhaps it was a problem with an early build of the HDF5 driver used in R2009a.

解决方案

Holy cow. I can reproduce. Tried the single-append variation too; it's even speedier. Looks like "-append" just magically makes HDF5-based save() 30x faster. I don't have an explanation but I wanted to share what I found.

I wrapped up your test code in a function, refactoring it to make the save logic agnostic about the test data structure so you can run it on other data sets, and added some more diagnostic output.

Don't see the big speedup everywhere. It's huge on my 64-bit XP box and a 32-bit Server 2003 box, big on my 64-bit Windows 7 box, nonexistent on a 32-bit XP box. (Though multiple appends are a huge loss on Server 2003.) R2010b is slower in many cases. Maybe HDF5 appends or save's use of it just rock on newer Windows builds. (XP x64 is actually the Server 2003 kernel.) Or maybe it's just a machine config difference. There's a fast RAID on the XP x64 machine, and the 32-bit XP has less RAM than the rest. What OS and architecture are you running? Can you try this repro too?

19:36:40.289: Testing speed, format=-v7.3, R2009b on PCWIN64, arch=AMD64, os=Microsoft(R) Windows(R) XP Professional x64 Edition 5.2.3790 Service Pack 2 Build 3790
19:36:55.930: Save the simple way:           11.493 sec
19:37:07.415: Save using multiple append:     1.594 sec
19:37:09.009: Save using one big append:      0.424 sec


19:39:21.681: Testing speed, format=-v7.3, R2009b on PCWIN, arch=x86, os=Microsoft Windows XP Professional 5.1.2600 Service Pack 3 Build 2600
19:39:37.493: Save the simple way:           10.881 sec
19:39:48.368: Save using multiple append:    10.187 sec
19:39:58.556: Save using one big append:     11.956 sec


19:44:33.410: Testing speed, format=-v7.3, R2009b on PCWIN64, arch=AMD64, os=Microsoft Windows 7 Professional  6.1.7600 N/A Build 7600
19:44:50.789: Save the simple way:           14.354 sec
19:45:05.156: Save using multiple append:     6.321 sec
19:45:11.474: Save using one big append:      2.143 sec


20:03:37.907: Testing speed, format=-v7.3, R2009b on PCWIN, arch=x86, os=Microsoft(R) Windows(R) Server 2003, Enterprise Edition 5.2.3790 Service Pack 2 Build 3790
20:03:58.532: Save the simple way:           19.730 sec
20:04:18.252: Save using multiple append:    77.897 sec
20:05:36.160: Save using one big append:      0.630 sec

This looks huge. If it holds up on other data sets, I might use this trick in a lot of places myself. It may be something to bring up with MathWorks, too. Could they use the fast append technique in normal saves or other OS versions, too?

Here's the self-contained repro function.

function out = reproMatfileAppendSpeedup(nPasses, tests, imax, formats)
%REPROMATFILEAPPENDSPEEDUP Show how -append makes v7.3 saves much faster
%
% Examples:
% reproMatfileAppendSpeedup()
% reproMatfileAppendSpeedup(2, [], 0, {'7.3','7','6'}); % low-entropy test

if nargin < 1 || isempty(nPasses);  nPasses = 1;  end
if nargin < 2 || isempty(tests);    tests = {'basic','multiappend','bigappend'}; end
if nargin < 3 || isempty(imax);     imax = 255; end
if nargin < 4 || isempty(formats);  formats = '7.3'; end % -v7 and -v6 do not show the speedup
tests = cellstr(tests);
formats = cellstr(formats);

fprintf('%s: Testing speed, imax=%d, R%s on %s\n',...
    timestamp, imax, version('-release'), systemDescription());

tempDir = setupTempDir();
testData = generateTestData(imax);

testMap = struct('basic','saveSimple', 'multiappend','saveMultiAppend', 'bigappend','saveBigAppend');

for iFormat = 1:numel(formats)
    format = formats{iFormat};
    formatFlag = ['-v' format];
    %fprintf('%s: Format %s\n', timestamp, formatFlag);
    for iTest = 1:numel(tests)
        testName = tests{iTest};
        saveFcn = testMap.(testName);
        te = NaN(1, nPasses);
        for iPass = 1:nPasses
            fprintf('%s: %-30s', timestamp, [testName ' ' formatFlag ':']);
            t0 = tic;
            matFile = fullfile(tempDir, sprintf('converted-%s-%s-%d.mat', testName, format, i));
            feval(saveFcn, matFile, testData, formatFlag);
            te(iPass) = toc(t0);
            if iPass == nPasses
                fprintf('%7.3f sec      %5.3f GB used   %5.0f MB file   %5.3f sec mean\n',...
                    te(iPass), physicalMemoryUsed/(2^30), getfield(dir(matFile),'bytes')/(2^20), mean(te));
            else
                fprintf('%7.3f sec      %5.3f GB used\n', te(iPass), physicalMemoryUsed/(2^30));
            end
        end
        % Verify data to make sure we are sane
        gotBack = load(matFile);
        gotBack = rmfield(gotBack, intersect({'dummy'}, fieldnames(gotBack)));
        if ~isequal(gotBack, testData)
            fprintf('ERROR: Loaded data differs from original for %s %s\n', formatFlag, testName);
        end
    end
end

% Clean up
rmdir(tempDir, 's');

%%
function saveSimple(file, data, formatFlag)
save(file, '-struct', 'data', formatFlag);

%%
function out = physicalMemoryUsed()
if ~ispc
    out = NaN;
    return; % memory() only works on Windows
end
[u,s] = memory();
out = s.PhysicalMemory.Total - s.PhysicalMemory.Available;

%%
function saveBigAppend(file, data, formatFlag)
dummy = 0;
save(file, 'dummy', formatFlag);
fieldNames = fieldnames(data);
save(file, '-struct', 'data', fieldNames{:}, '-append', formatFlag);

%%
function saveMultiAppend(file, data, formatFlag)
fieldNames = fieldnames(data);
for i = 1:numel(fieldNames)
    if (i > 1); appendFlag = '-append'; else; appendFlag = ''; end
    save(file, '-struct', 'data', fieldNames{i}, appendFlag, formatFlag);
end


%%
function testData = generateTestData(imax)
nBlocks = 40;
blockSize = [65 480 240];
for i = 1:nBlocks
    testData.(sprintf('block_%03u', i)) = struct('blockNo',i,...
        'frames', randi([0 imax], blockSize, 'uint8'));
end

%%
function out = timestamp()
%TIMESTAMP Showing timestamps to make sure it is not a tic/toc problem
out = datestr(now, 'HH:MM:SS.FFF');

%%
function out = systemDescription()
if ispc
    platform = [system_dependent('getos'),' ',system_dependent('getwinsys')];
elseif ismac
    [fail, input] = unix('sw_vers');
    if ~fail
        platform = strrep(input, 'ProductName:', '');
        platform = strrep(platform, sprintf('\t'), '');
        platform = strrep(platform, sprintf('\n'), ' ');
        platform = strrep(platform, 'ProductVersion:', ' Version: ');
        platform = strrep(platform, 'BuildVersion:', 'Build: ');
    else
        platform = system_dependent('getos');
    end
else
    platform = system_dependent('getos');
end
arch = getenv('PROCESSOR_ARCHITEW6432');
if isempty(arch)
    arch = getenv('PROCESSOR_ARCHITECTURE');
end
try
    [~,sysMem] = memory();
catch
    sysMem.PhysicalMemory.Total = NaN;
end
out = sprintf('%s, arch=%s, %.0f GB, os=%s',...
    computer, arch, sysMem.PhysicalMemory.Total/(2^30), platform);

%%
function out = setupTempDir()
out = fullfile(tempdir, sprintf('%s - %s', mfilename, datestr(now, 'yyyymmdd-HHMMSS-FFF')));
mkdir(out);

EDIT: I modified the repro function, adding multiple iterations and parameterizing it for save styles, file formats, and imax for the randi generator.

I think filesystem caching is a big factor to the fast -append behavior. When I do a bunch of runs in a row with reproMatfileAppendSpeedup(20) and watch System Information in Process Explorer, most of them are under a second, and physical memory usage quickly ramps up by a couple GB. Then every dozen passes, the write stalls and takes 20 or 30 seconds, and physical RAM usage slowly ramps down to about where it started. I think this means that Windows is caching a lot of writes in RAM, and something about -append makes it more willing to do so. But the amortized time including those stalls is still a lot faster than the basic save, for me.

By the way, after doing multiple passes for a couple hours, I'm having a hard time reproducing the original timings.

这篇关于MATLAB:将几个变量保存到“-v7.3” (HDF5).mat-files在使用“-append”时似乎更快。旗。怎么来的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆