为什么写入不相关的文件会导致加载功能如此缓慢? [英] Why does writing to an unrelated file cause the load function to be so slow?
问题描述
我花了一段时间调试一些特别慢的代码,并被MATLAB profiler彻底抛弃了.在我看来,这似乎是一个巨大的错误,所以我想知道是否有人可以对这里发生的事情有所了解.
I've just spent a while debugging some particularly slow code and have been completely thrown off by the MATLAB profiler. This looks to me like a massive bug, so I was wondering if anyone could cast any light on to what is going on here.
以下是会导致问题的代码:
Here is some code that will cause the problem:
function profiler_test
%%% Create 20 files with random data
count = 20;
for i = 1 : count
x = rand(3);
save(sprintf('temp_file_%06d', i), 'x');
end
%%% Load them in a for loop
xs = cell(1, count);
tic;
for i = 1 : count
x = load(sprintf('temp_file_%06d', i), 'x');
xs{i} = x.x;
end
toc
%%% Load them in a for loop, but writing a small log file on the way
tic;
for i = 1 : count
x = load(sprintf('temp_file_%06d', i), 'x');
xs{i} = x.x;
file = fopen(sprintf('temp_logfile_%d', i), 'w');
fprintf(file, 'Success\n');
fclose(file);
end
toc
end
第一个for
循环耗时0.239739秒,第二个耗时4.411179.
The first for
loop takes 0.239739 seconds, the second takes 4.411179.
现在,我应该明确地说,我知道我在第二个for
循环示例中显示的草率想法,即为每个结果创建一个日志文件-这是因为我在群集上运行我看不到输出,我想以廉价的方式指示函数的进度,而事实证明这是瓶颈.我对此很满意.
Now, I should make it clear that I am aware of the sloppy idea I had, shown in the second for
loop example, of creating a log file for each result - it was because I was running on a cluster where I couldn't see the output, I wanted a cheap indication of the function's progress, and this turned out to be the bottleneck. I'm fine with that.
但是,我的问题是我花了一天的时间来尝试优化错误的行,因为MATLAB profiler会这样说:
My problem however is that I've spent a day trying to optimise the wrong line, because the MATLAB profiler says this:
1 24 tic;
1 25 for i = 1 : count
4.41 20 26 x = load(sprintf('temp_file_%06d', i), 'x');
20 27 xs{i} = x.x;
28
20 29 file = fopen(sprintf('temp_logfile_%d', i), 'w');
20 30 fprintf(file, 'Success\n');
20 31 fclose(file);
20 32 end
1 33 toc
在执行load
的行中,放置了执行最后三行所花费的 entire 时间.在我的实际程序中,load
并没有那么接近,所以直到我决定不信任探查器时,它才出现在我身上.我的问题是:这是怎么回事?为什么会发生这种情况,我应该提防这种奇怪的行为吗?
It's placed the entire time taken to execute the final three lines on the line for load
. In my actual program, the load
was not so close to the other bit so it didn't occur to me until I decided to distrust the profiler. My question is: what is going on here? Why has this happened and should I be watching out for any more bizarre behaviour like this?
我正在使用MATLAB 2011a.非常感谢.
I'm using MATLAB 2011a. Many thanks.
编辑:我似乎引起了一些混乱和歉意.情况如下:
I seem to be causing some confusion, apologies. Here is the situation:
- 上面显示的两个
for
循环是相同的,不同之处在于第二个循环的底部有三行,每次循环都写入一个临时文件. - 第二个循环的运行时间要长得多:结论是,最后三行应归咎于速度的提高.删除它们后,代码又很快了.
- 但是,事件探查器在任何时候都不会将第二个循环的时间归因于最后三个语句.相反,它告诉我,我的
load
函数调用-与第一个循环完全相同,但调用速度更快-现在所需的时间为4秒,而不是0.2秒.因此,或者最后三行导致load
变慢(我忽略了这一点;甚至有可能吗?),或者MATLAB探查器错误地报告了load
是显然不是 会花费4秒钟.
- The two
for
loops shown above are identical, except that the second one has three lines at the bottom which write to a temporary file each iteration. - The second loop takes substantially longer to run: the conclusion is that those last three lines are to blame for the speed increase. When they are removed, the code is fast again.
- However, the profiler does not attribute any of the time for the second loop to those final three statements. Instead, it tells me that my
load
function call - exactly the same call as the first loop, which was faster - is now taking 4 seconds instead of 0.2. So either the presence of the last three lines causes theload
to be slow (I had disregarded this; is that even a possibility?), OR the MATLAB profiler is incorrectly reporting thatload
is taking 4 seconds when it is clearly not.
无论哪种方式,在我看来都正在发生非常奇怪的事情.
Either way it seems to me that something very strange is happening.
编辑:似乎自己已经回答了,请参见下文.更改标题,因为它具有误导性
Seem to have answered it myself, see below. Changed the title as it was misleading
推荐答案
实际上,我认为我已经解决了.我不能得出这样的结论,即新行上出现了额外的处理时间,所以我是错的,所以我的问题现在有点误导了-分析器是正确的.但是,我仍然不明白为什么写入临时文件会导致load
变慢.我有一个想法,那就是尝试:
Actually, I think I've solved it. I was wrong to jump to the conclusion that the additional processing time was occurring on the new lines, so my question is now a little misleading - the profiler is correct. However, I still didn't understand why writing to a temporary file would cause load
to slow down. I had a thought, which was to try this:
file = fopen(sprintf('../temp_logfile_%d', i), 'w');
也就是说,写入父目录中的文件,而不是当前工作目录中的文件.这消除了问题,并且速度非常快.我猜想,原因是当前目录与其他目录一样在我的MATLAB搜索路径中.我假设每次MATLAB使用一个函数时,就像load
一样,它在整个搜索路径中进行查找,它会检查是否已修改任何目录,如果已修改,则重新分析整个目录以查看可用的文件.将新文件写入工作目录肯定会导致这种情况.就我而言,这可能更糟,因为我在工作目录中还有一整个子目录树,它们是搜索路径的一部分.
That is, write to a file in the parent directory instead of the current working directory. This removed the problem, and was very fast. The reason, I am guessing, is that the current directory is in my MATLAB search path, as are a bunch of other directories. I presume that every time MATLAB uses a function which looks though the whole search path, as load
does, it checks to see if any directories have been modified, and if so re-parses the whole lot to see what files are available. Writing a new file to the working directory certainly would have caused this. This may have been worse in my case since I also have a whole tree of subdirectories in the working directory which are part of the search path.
无论如何,多亏那些看过该书的人,对这个答案与问题大相径庭,感到遗憾.使用依赖于整个搜索路径的功能时要注意!
Anyway, thanks to those who had a look and sorry that the answer turned out to be something quite different from the question. Be aware when using functions which rely on the entire search path!
这篇关于为什么写入不相关的文件会导致加载功能如此缓慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!