为什么要保存到一个名为temp的文件夹导致数据加载在Matlab中for循环放慢? [英] Why would saving to a folder called temp cause data loading to slow down in a for loop in Matlab?

查看:271
本文介绍了为什么要保存到一个名为temp的文件夹导致数据加载在Matlab中for循环放慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

重要的更新



我刚刚发现,在重新启动Matlab和计算机之后,这个简化代码不再为我重现问题了...我是如此不好意思用一段没用的脚本占用你的时间。然而,老的问题仍然存在于我原来的脚本,如果我保存任何文件夹(我已经尝试)在内部的'for'循环。为了我的目的,我已经解决了这个问题,除非我绝对需要,否则不要这样做。原始脚本的用于循环,使用 save 或 load

  load()%.mat文件,大小365x92x240 
for day = 1 :365
load()%.mat文件,大小8x92x240

for type = 1:17
load()%.mat文件大小17x92x240
load() %.mat文件大小92x240

第1步:8
仅计算
结束
保存()%.mat文件大小8x92x240

结束
save()%.mat文件,大小8x92x240
结束

%加载并保存在for循环之外,但是不会影响()%.mat文件大小8x92x240
save()%.mat文件大小2920x92x240
load()
save()%.mat文件尺寸365x92x240
load()
save()%.mat文件大小12x92x240

如果全部运行,脚本可以保存约。 10 Gb和负载约。 2Gb的数据。

整个脚本比较冗长,会导致大量的保存和加载。不幸的是,在我设法以简化版本重现问题之前,分享一切都是不切实际的。当我令人沮丧地发现,相同的代码可能会有不同的行为,它会立即变得比预期更繁琐,找到一个简单的行为。一旦我确定产生问题的可管理代码,我会尽快回复。



以前的问题描述
(注意下面的代码不能肯定地重现所描述的问题)

我刚刚学会了很难,在Matlab中,你不能在 temp > for 循环,而不会减慢下一轮循环中的数据加载。我的问题是为什么?

如果您有兴趣自行重现问题,请参阅下面的代码。要运行它,还需要一个名为 anyData.mat 的matfile来加载和两个文件夹进行保存,一个名为 temp ,另一个名为 temporary

 清除所有; clc;关闭所有;档案关闭; 
简介

tT =零(1,endDay + 1);
tTD =零(1,endDay + 1);

for day = 0:2;
tic
T = importdata('anyData.mat')
tT(day + 1)= toc; %载入时间(以秒计)

tic
TD = importdata('anyData.mat')
tTD(day + 1)= toc;

类型= 0:1
saveFile = ones(92,240); $'
$ b save('AnyFolder \temporary\saveFile.mat','saveFile')%导致快速的数据加载
%save('AnyFolder \temp\saveFile.mat' ,'saveFile')%导致数据加载缓慢

结束%类型结束

结束%结束日期

关闭
简介报告

plot(tT)

你会在y-在后面的 for 循环保存到 temp 而不是临时。有没有人知道为什么会出现这种情况?解决方案

我不能重现这个问题,怀疑它是系统和数据 - 大小具体。但是一些一般的评论可以帮助你摆脱困境:正如评论者和上面的答案所指出的,在一个double for循环中的文件I / O可以是非常寄生的,特别是在您只需要访问文件中的部分数据,其他系统操作延迟进程的情况下,或者数据文件足够大以至于需要虚拟内存(windows)/交换空间(linux)加载它们。在后一种情况下,您可能会遇到这样的情况:打开文件时,将文件从硬盘的一部分移到另一部分!



我假设你正在加载/保存,因为你没有c.10GB的RAM来保存所有的内存进行计算。实际的问题没有描述,所以我不能确定,但​​是你可能会发现 matfile 类是有用的... TMW文档。这用于直接映射到/从mat文件。这样:
$ b


  • 减少文件流的开启和关闭IOPS

  • p>允许任意大的变量大小(由磁盘大小管理,而不是内存管理)
  • 允许你读/写部分数据(例如只写一些数组元素没有加载整个文件)

  • 在你的mat文件太大而无法保存在内存中的情况下,避免将其加载到交换空间,这将是非常麻烦。




希望这有帮助。 / p>

IMPORTANT UPDATE

I just made the discovery that after restarting Matlab and the computer, this simplified code no longer reproduces the problem for me either... I am so sorry for taking up your time with a script that didn't work. However, the old problem still persists in my original script if I save anything in any folder (that I have tried) in the inner 'for' loop. For my purposes, I have worked around it by simply not make this save unless I absolutely need it. The original script has the following structure in terms of for loops and use of save or load:

load() % .mat files, size 365x92x240
for day = 1:365
    load() % .mat files, size 8x92x240

    for type = 1:17
        load() % .mat files size 17x92x240
        load() % .mat files size 92x240

        for step 1:8
            %only calculations
        end
        save() % .mat files size 8x92x240

    end 
    save() % .mat files, size 8x92x240
end

% the load and saves outside the are in for loops too, but do not seem to affect the described behavior in the above script
load() % .mat files size 8x92x240
save() % .mat files size 2920x92x240
load() 
save() % .mat files size 365x92x240
load()
save() % .mat files size 12x92x240

If run in full, the script saves approx. 10 Gb and loads approx. 2Gb of data.

The entire script is rather lengthy and makes a lot of saves and loads. It would be rather impractical too share all here before I have managed to reproduce the problem in a reduced version, unfortunately. As I frustratingly discovered that the very same code could behave differently from to time to time, it immediately got more tedious than anticipated to find a simplification that consistently reproduces the behavior. I will get back as soon as I am sure about a manageable code that produces the problem.


PREVIOUS PROBLEM DESCRIPTION (NB. The code below does not for sure reproduce the described problem.):

I just learnt the hard way that, in Matlab, you can't name a saving folder to temp in a for loop without slowing down data loading in the next round of the loop. My question is why?

If you are interested in reproducing the problem yourself, please see the code below. To run it, you will also need a matfile called anyData.mat to load and two folders for saving, one called temp and the other called temporary.

clear all;clc;close all;profile off;
profile on

tT= zeros(1,endDay+1);
tTD= zeros(1,endDay+1);

for day = 0:2;
    tic
    T = importdata('anyData.mat')
    tT(day+1)=toc; %loading time in seconds

    tic
    TD = importdata('anyData.mat')
    tTD(day+1)=toc;

    for type = 0:1
        saveFile = ones(92,240);

        save('AnyFolder\temporary\saveFile.mat', 'saveFile') % leads to fast data loading 
        %save('AnyFolder\temp\saveFile.mat', 'saveFile') %leads to slow data loading

    end % end of type 

end% end of day

profile off
profile report

plot(tT)

You will see in y-axis of the plot that data loading takes significantly longer time when you in the later for loop save to temp rather than temporary. Is there anyone out there who knows why this occurs?

解决方案

I can't reproduce the problem, suspect it's system and data-size specific. But some general comments which could help you out of the predicament:

As pointed out by commenters and the above answers, file i/o within a double for loop can be extremely parasitic, especially in cases where you only need to access part of the data in the file, where other system operations delay the process, or where the data files are large enough to require virtual memory (windows) / swap space (linux) to even load them. In the latter case, you could be in a situation where you're moving a file from one part of the hard disk to another when you open it!

I assume that you're loading/saving because you don't have c.10GB of ram to hold everything in memory for computation. The actual problem is not described, so I can't be certain, but think you might find that the matfile class to be useful... TMW documentation. This is used to map directly to/from a mat file. This:

  • reduces file stream opening and closing IOPS

  • allows arbitrarily large variable sizes (governed by disk size, not memory)

  • allows you to read/write partially (i.e. write only some elements of an array without loading the whole file)

  • in the case that your mat file is too large to be held in memory, avoids loading it into swap space which would be extremely cumbersome.

Hope this helps.

Tom

这篇关于为什么要保存到一个名为temp的文件夹导致数据加载在Matlab中for循环放慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆