将Matlab函数结果缓存到文件 [英] Caching Matlab function results to file

查看:355
本文介绍了将Matlab函数结果缓存到文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用Matlab写一个仿真. 我最终将运行此仿真数百次. 在每个模拟运行中,都有数百万个模拟周期. 在每个循环中,我计算出一个非常复杂的函数,该函数需要~0.5秒的时间才能完成. 函数输入是一个长位数组(> 1000位)-它是01的数组. 我将位数组保存在01矩阵中,并且对于每个数组,我只运行一次函数-将结果保存到另一个数组中(res),然后检查位数组是否在运行函数之前的矩阵:

I'm writing a simulation in Matlab. I will eventually run this simulation hundreds of times. In each simulation run, there are millions of simulation cycles. In each of these cycles, I calculate a very complex function, which takes ~0.5 sec to finish. The function input is a long bit array (>1000 bits) - which is an array of 0 and 1. I hold the bit arrays in a matrix of 0 and 1, and for each one of them I only run the function once - as I save the result in a different array (res) and check if the bit array is in the matrix before running the functions:

for i=1:1000000000
    %pick a bit array somehow
    [~,indx] = ismember(bit_array,bit_matrix,'rows');
    if indx == 0
        indx = length(results) + 1;
        bit_matrix(indx,:) = bit_array;
        res(indx) = complex_function(bit_array);
    end
    result = res(indx)
    %do something with result
end

我有两个问题,真的:

  1. 在矩阵中找到行的索引是否比'ismember'更有效?

  1. Is there a more efficient way to find the index of a row in a matrix then 'ismember'?

由于我多次运行模拟,而且我得到的位阵列有很大的重叠,因此我想在两次运行之间缓存矩阵,以便不对同一函数重新计算函数位数组一遍又一遍.我怎么做?

Since I run the simulation many times, and there is a big overlap in the bit-arrays I'm getting, I want to cache the matrix between runs so that I don't recalculate the function over the same bit-arrays over and over again. How do I do that?

推荐答案

两个问题的答案都是使用地图.有几个步骤可以做到这一点.

The answer to both questions is to use a map. There are a few steps to do this.

  1. 首先,您需要一个函数来将bit_array转换为数字或字符串.例如,将[0 1 1 0 1 0]变成'011010'. (Matlab仅支持标量或字符串键,因此需要执行此步骤.)

  1. First you will need a function to turn your bit_array into either a number or a string. For example, turn [0 1 1 0 1 0] into '011010'. (Matlab only supports scalar or string keys, which is why this step is required.)

定义了地图对象

cachedRunMap = containers.Map;  %See edit below for more on this

  • 要检查是否已运行特定案例,请使用iskey.

    cachedRunMap.isKey('011010');
    

  • 要添加运行结果,请使用附加语法

  • To add the results of a run use the appending syntax

    cachedRunMap('011010') = [0 1 1 0 1];  %Or whatever your result is.  
    

  • 要获取缓存的结果,请使用geting语法

  • To retrieve cached results, use the getting syntax

    tmpResult = cachedRunMap.values({'011010'});
    

  • 这应该有效地存储和检索值,直到耗尽系统内存为止.

    This should efficiently store and retrieve values until you run out of system memory.

    将其放在一起,现在您的代码应如下所示:

    Putting this together, now your code would look like this:

    %Hacky magic function to convert an array into a string of '0' and '1'
    strFromBits = @(x) char((x(:)'~=0)+48); %'
    
    %Initialize the map
    cachedRunMap = containers.Map;
    
    %Loop, computing and storing results as needed
    for i=1:1000000000
        %pick a bit array somehow
        strKey = strFromBits(bit_array);
        if cachedRunMap.isKey(strKey)
            result = cachedRunMap(strKey);
        else
            result = complex_function(bit_array);
            cachedRunMap(strKey) = reult;
        end
        %do something with result
    end
    


    如果您想要一个不是字符串的键,则需要在第2步中声明它.一些示例是:


    If you want a key which is not a string, that needs to be declared at step 2. Some examples are:

    cachedRunMap = containers.Map('KeyType', 'char', 'ValueType', 'any');
    cachedRunMap = containers.Map('KeyType', 'double', 'ValueType', 'any');
    cachedRunMap = containers.Map('KeyType', 'uint64', 'ValueType', 'any');
    cachedRunMap = containers.Map('KeyType', 'uint64', 'ValueType', 'double');
    

    KeyType设置为'char'会将地图设置为使用字符串作为键.所有其他类型都必须是标量.

    Setting a KeyType of 'char' sets the map to use strings as keys. All other types must be scalars.

    关于问题(按您最近的评论)

    Regarding issues as you scale this up (per your recent comments)

    • 在会话之间保存数据:将映射保存到* .mat文件应该没有问题,最多不超过系统内存的大小

    • Saving data between sessions: There should be no issues saving this map to a *.mat file, up to the limits of your systems memory

    清除旧数据:我不知道将LRU功能添加到此地图的直接方法.如果可以找到Java实现,则可以在Matlab中轻松使用它.否则,需要花点时间来确定最有效的方法来跟踪上次使用密钥的时间.

    Purging old data: I am not aware of a straightforward way to add LRU features to this map. If you can find a Java implementation you can use it within Matlab pretty easily. Otherwise it would take some thought to determine the most efficient method of keeping track of the last time a key was used.

    在并发会话之间共享数据:如您所指出的,这可能需要数据库有效地执行. DB表将分为两列(如果要实现LRU功能,则为3列),键,值和(如果需要的话,还有上次使用的时间).如果您的结果"不是很容易适合SQL的类型(例如,非均匀大小的数组或复杂的结构),那么您将需要进一步思考如何存储它.您还需要一种访问数据库的方法(例如,数据库工具箱或Mathworks文件交换上的各种工具).最终,您将需要在服务器上实际设置数据库(例如,如果您像我这样便宜,则是MySql,或者是您经验最多或可以找到最大帮助的任何东西.)这实际上并不那么难,但是它确实很困难第一次需要一些时间和精力.

    Sharing data between concurrent sessions: As you indicated, this probably requires a database to perform efficiently. The DB table would be two columns (3 if you want to implement LRU features), the key, value, (and last used time if desired). If your "result" is not a type which easily fits into SQL (e.g. a non-uniform size array, or complex structure) then you will need to put additional thought into how to store it. You will also need a method to access the database (e.g. the database toolbox, or various tools on the Mathworks file exchange). Finally you will need to actually setup a database on a server (e.g. MySql if you are cheap, like me, or whatever you have the most experience with, or can find the most help with.) This is not actually that hard, but it takes a bit of time and effort the first time through.

    要考虑的另一种方法(效率低得多,但不需要数据库)将数据存储分解为大量(例如1000或数百万个)地图.将每个文件保存到单独的* .mat文件中,文件名基于该映射中包含的键(例如,字符串键的前N个字符),然后根据需要在会话之间加载/保存这些文件.这将非常慢...根据您的使用情况,每次从源函数重新计算可能会更快...但是,这是我无需设置数据库即可想到的最佳方法(显然是更好的答案).

    Another approach to consider (much less efficient, but not requiring a database) would be to break up the data store into a large (e.g. 1000's or millions) number of maps. Save each into a separate *.mat file, with a filename based on the keys contained in that map (e.g. the first N characters of your string key), and then load/save these files between sessions as needed. This will be pretty slow ... depending on your usage it may be faster to recalculate from the source function each time ... but it's the best way I can think of without setting up the DB (clearly a better answer).

    这篇关于将Matlab函数结果缓存到文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆