MATLAB性能基准测试 [英] MATLAB performance benchmarking

查看:134
本文介绍了MATLAB性能基准测试的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这篇文章是关于测试以下问题的解决方案的性能:

This post is about testing the performances of solutions to the following problem:

给出了由S字符串组成的单元格数组,其格式为用下划线分隔的数字,例如:

A cell array of S strings formatted as numbers separated by underscores is given, e.g.:

        list_of_words = repmat({'02_04_04_52_23_14_54_672_0'},10,1);

        list_of_words = repmat({'02_04_04_52_23_14_54_672_0'},10,1);

对于所有符合以下条件的字符串均得到保证:

  1. 它们仅包含十进制数字和下划线;
  2. 下划线的数目相同;
  3. 没有两个连续的下划线.
  1. they contain only decimal digits and underscores;
  2. the number of underscores is the same;
  3. there are no two consecutive underscores.

编写MATLAB代码,将字符串的单元格数组转换为双精度(值小1000倍)的数字矩阵S× M,其中S为字符串的数量,M是字符串中的数字的数量.

Write MATLAB code that would convert the cell array of strings to a numeric matrix S×M of doubles (values 1000 time smaller), where S is the number of strings and M is the number of numbers in a string.

StackOverflow上发布了一个非常类似的问题,并提出了几种解决方案.最初的目标是提高速度性能.

A very similar problem was posted on StackOverflow, with several solutions proposed. The original goal was improving speed performance.

为速度而设计的两种解决方案似乎在从一种MATLAB安装到另一种,甚至从一种运行到另一种的运行中,都具有截然不同的性能.而且,它们的实现方式确实不同.

Two of the solutions, designed for speed, seem to have wildly varying performances from one MATLAB installation to another, or even from one run to another. Also, they have really different implementation styles.

在黑暗的角落,您将找到一种反对 MATLAB最神圣的宗旨的解决方案:eval是邪恶的,应不惜一切代价避免循环.但是,该代码尝试通过避免重复分配内存,使用将字符串转换为数字的快速方法以及执行就地操作来优化速度:

In the dark corner you'll find a solution that goes against MATLAB's most sacred tenets: eval is evil, and loops should be avoided at all costs. However, the code tries to optimise the speed by avoiding repeated memory allocations, by using the fast ways of converting strings to numbers, and by doing in-place operations:

%'eval_and_loops_solution.m'
function array_of_numbers = eval_and_loops_solution(list_of_words)

        n_numbers = 1 + sum(list_of_words{1}=='_');
        n_words   = numel(list_of_words);

        array_of_numbers = zeros(n_numbers, n_words);

        for k = 1:n_words
                temp = ['[', list_of_words{k}, ']'];
                temp(temp=='_') = ';';
                array_of_numbers(:,k) = eval(temp);
        end;

         %'this is a waste of memory, but is kind of required'
        array_of_numbers = transpose(array_of_numbers / 1000);

end

注意:原始解决方案以M× S double数组返回结果.该代码适用于S× M;但是,这种修改占用了两倍的内存.是的,我写了这个解决方案.

Note: The original solution returned the result as M×S double array. The code was adapted for S×M; however, this adaptation consumes twice the memory. And yes, I wrote this solution.

在空白处,您将找到真正符合MATLAB精神的解决方案,该解决方案避免使用循环和eval,而只需要对 all 进行一次sscanf读取.字符串(这是避免调用函数S的开销的聪明方法):

In the clear corner you'll find a solution true to MATLAB spirit, that avoids using loops and eval, favoring a single sscanf read of all the strings (which is a clever way of avoiding the overhead of calling the function S times):

%'single_sscanf_solution.m'
function array_of_numbers = single_sscanf_solution(list_of_words)

        N = 1 + sum(list_of_words{1}=='_'); %'hard-coded in the original'
        lens = cellfun(@numel,list_of_words);
        tlens = sum(lens);
        idx(1,tlens)=0;
        idx(cumsum(lens(1:end-1))+1)=1;
        idx2 = (1:tlens) + cumsum(idx);

        one_str(1:max(idx2)+1)='_';
        one_str(idx2) = [list_of_words{:}];
        delim = repmat('%d_',1,N*numel(lens));
        array_of_numbers = reshape(sscanf(one_str, delim),N,[])'./1000;

end

注意:此解决方案属于 @Divakar .

裁判由两个功能组成:一个功能生成测试用例,另一个功能执行timig.

The referee is made up of two functions: one that generates test cases, and one that does the timig.

测试用例生成器将单元格数组中下划线分隔的字符串(从1到9999之间的10个随机整数)分组(为简单起见);但是,实现应仅假定数字为正数或零,并且该数字应位于double中:

The test case generator groups in a cell array underscore-delimited strings out of 10 random integers between 1 and 9999 (for the sake of simplicity); however, an implementation should only assume that the numbers are positive or zero, and the number should fit in a double:

%'referee_test_case.m'
function list_of_words = referee_test_case(n_words)

        NUM_PER_WORD  = 10;
        NUM_MAX       = 9999;

        word_format   = [repmat('%d_', 1, NUM_PER_WORD-1), '%d'];
        list_of_words = cell(n_words,1);

        fprintf('Generating %d-words test case...\n', n_words);
        for k = 1:n_words
                list_of_words{k} = sprintf(word_format, randi(NUM_MAX, [1, NUM_PER_WORD]));
        end;

end

计时功能将测试用例和任意数量的处理功能句柄作为参数;它的实现是,例如某个功能中的错误不应打扰其他功能的执行.它使用@Divakar和@LuisMendo推荐的timeit;对于那些在默认的MATLAB安装中没有此功能的用户,可以从MATLAB Central/文件交换中下载它:

The timing function takes as arguments a test case and an arbitrary number of processing function handles; it is implemented such as errors in one function should not bother the execution of the others. It uses timeit as recommended by @Divakar and @LuisMendo; for those who don't have this function in their default MATLAB installation, it may be downloaded from MATLAB Central / File Exchange:

%'referee_timing.m'
function referee_timing(test_case, varargin)
        %' Specify the test case as a cell array of strings, then the function '
        %' handles that will process the test case.                            '
        %'                                                                     '
        %' The function uses timeit() to evaluate the performance of different '
        %' processing handles; if your MATLAB installation does not have it    '
        %' natively, download the available version from File Exchange:        '
        %'                                                                     '
        %'     http://www.mathworks.com/matlabcentral/fileexchange/18798-timeit-benchmarking-function '

        fprintf('Timing %d-words test case...\n', numel(test_case));
        for k = 1:numel(varargin)
                try
                        t = timeit(@() varargin{k}(test_case));
                        fprintf('  %s: %f[s]\n', func2str(varargin{k}), t);
                catch ME
                        fprintf('  %s: Error - %s\n', func2str(varargin{k}), ME.message);
                end;
        end;
end

问题:

如前所述,从一次安装到另一次安装,甚至从一次运行到另一次,结果似乎都不同.我提出的问题有三个方面:

The problem:

As said before, the results seem to vary from one MATLAB installation to another, and even from one run to another. The problem that I propose is three-fold:

  1. 在您的特定MATLAB安装(版本+ OS +硬件)上,这两种解决方案之间的性能如何?
  2. 是否有可能以一种可观的方式改进一种解决方案?
  3. 是否存在基于完全不同的思想/功能的甚至更快的MATLAB本机解决方案(例如,没有MEX或特殊的工具箱)?

对于第1点(基准测试),请在MATLAB路径中创建四个功能文件eval_and_loops_solution.msingle_sscanf_solution.mreferee_test_case.mreferee_timig.m和其他您可能要测试的功能(例如,答案提出的实施方式).将它们用于几个单词,例如像这样的脚本:

For point 1. (benchmarking), please create in your MATLAB path the four function files eval_and_loops_solution.m, single_sscanf_solution.m, referee_test_case.m, referee_timig.m and other functions that you might want to test (for example, the implementations proposed by answers). Use them for several numbers of words, e.g. like this script:

%'test.m'
clc;
feature accel on;  %'tune for speed'

%'low memory stress'
referee_timing( ...
   referee_test_case(10^4), ...
   @eval_and_loops_solution, ...
   @single_sscanf_solution ... %'add here other functions'
);

%'medium memory stress'
referee_timing( ...
   referee_test_case(10^5), ...
   @eval_and_loops_solution, ...
   @single_sscanf_solution ... %'add here other functions'
);

%'high memory stress'
referee_timing( ...
   referee_test_case(10^6), ...
   @eval_and_loops_solution, ...
   @single_sscanf_solution ... %'add here other functions'
);

发布结果时,请指定您的MATLAB版本,操作系统,RAM大小和CPU型号. 请注意,这些测试可能需要花费一些时间来计算大量单词!.但是,运行此脚本不会改变您当前工作空间的内容.

When posting the results, please specify your MATLAB version, the OS, the size of RAM and the CPU model. Please be aware that these tests might take some time for large number of words! However, running this script will not alter the content of your current workspace.

对于第2或第3点,您可以发布使用与eval_and_loops_solution.msingle_sscanf_solution.m相同的输入/输出约定的代码,以及支持该声明的基准测试.

For points 2. or 3. you can post code that use the same in/out conventions as eval_and_loops_solution.m and single_sscanf_solution.m, along with the benchmarking that supports the claim.

我将对每个基准测试答案进行投票,并鼓励每个人都这样做.对于那些以自己的技能,时间和处理能力做出贡献的人来说,这是最起码的事情.

I will up-vote every benchmarking answer, and I encourage everybody to do that. Is the least that one can do for people that contribute with their skills, time and processing power.

+500赏金将颁发给最快的代码的作者,无论是提议的两个代码之一,还是性能优于它们的第三个新代码.我真的希望这将与一般协议相符.奖金将在原始发布日期后的 month 周内提供.

A +500 bounty bonus will go to the author of the fastest code, be it one of the two proposed, or a third new one that outperforms them. I really hope that this will match the general agreement. The bonus will be given in a month week from the original date of posting.

推荐答案

方法1

从广义上讲,第一种方法包括两个部分.第一个为我们提供了一个很大的长字符串,该字符串具有单元格数组中的所有字符,并带有下划线分隔 两个单元,基本上用cumsum实现.基于bsxfun的第二个数字为我们提供了所需格式的所需数字输出.

This first approach in a broad sense has two parts. The first one gets us one big long string that has all the characters from the cell array with underscores separating two cells, which is essentially implemented with cumsum. The second one based on bsxfun gets us the desired numeric output in the desired format.

代码看起来像这样-

function out = solution_cumsum_bsxfun(list_of_words)

N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell
lens = cellfun('length',list_of_words); %// No. of chars [lengths] in each cell
tlens = sum(lens); %// Total number of characters [total of lengths]
cumlens = cumsum(lens); %// Cumulative lengths

%// Create an array of ones and zeros. 
%// The length of this array would be equal to sum of characters in all cells. 
%// The ones would be at the positions where new cells start, zeros otherwise
startpos_newcell(1,tlens)=0;
startpos_newcell(cumlens(1:end-1)+1)=1;

%// Calculate new positions for all characters in the cell array with 
%// places/positions left between two cells for putting underscores
newpos = (1:tlens) + cumsum(startpos_newcell);

%// Create one long string that has all the characters from the cell arary
%// with underscores separating cells
one_str(newpos) = [list_of_words{:}];
one_str(cumlens + [1:numel(cumlens)]') = '_'; %//'#

pos_us = find(one_str=='_'); %// positions of underscores
wordend_idx = pos_us - [1:numel(pos_us)]; %// word end indices
wordlens = [wordend_idx(1) diff(wordend_idx)]; %// word lengths
max_wordlens = max(wordlens); %// maximum word length

%// Create mask where digit characters are to be placed
mask = bsxfun(@ge,[1:max_wordlens]',[max_wordlens+1-wordlens]); %//'#

%// Create a numeric array with columns for each word and each row holding a digit.
%// The most significant digits to least significant going from top to bottom
num1(max_wordlens,size(mask,2)) = 0;
num1(mask) = one_str(one_str~='_') - 48;

%// Finally get the desired output converting each column to a single
%// number, reshaping to desired size and scaling down by a factor of 1000
out = reshape(10.^(max_wordlens-4:-1:-3)*num1,N,[]).'; %//'#

return;

方法2

方法#1略有不同,因为它使用sprintf完成了第一部分的工作.现在,这个主意是在看到它付诸实践后才出现的 @chappjc的解决方案.感谢他让我在此修改版本中使用它.

This a slight variation of approach #1 in that it does the job of first part with sprintf. Now, the idea came to me after seeing it in action in @chappjc's solution. I appreciate him on letting me use it in this modified version.

这是代码-

function out = solution_sprintf_bsxfun(list_of_words)

N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell

%// Create one long string that has all the characters from the cell arary
%// with underscores separating cells
one_str = sprintf('%s_',list_of_words{:});

pos_us = find(one_str=='_'); %// positions of underscores
wordend_idx = pos_us - [1:numel(pos_us)]; %// word end indices
wordlens = [wordend_idx(1) diff(wordend_idx)]; %// word lengths
max_wordlens = max(wordlens); %// maximum word length

%// Create mask where digit characters are to be placed
mask = bsxfun(@ge,[1:max_wordlens]',[max_wordlens+1-wordlens]); %//'#

%// Create a numeric array with columns for each word and each row holding a digit.
%// The most significant digits to least significant going from top to bottom
num1(max_wordlens,size(mask,2)) = 0;
num1(mask) = one_str(one_str~='_') - 48;

%// Finally get the desired output converting each column to a single
%// number, reshaping to desired size and scaling down by a factor of 1000
out = reshape(10.^(max_wordlens-4:-1:-3)*num1,N,[]).'; %//'#

return;

方法3

这是基于getByteStreamFromArray-

function out = solution_bytstrm_bsxfun(list_of_words)

allchars = [list_of_words{:}];
digits_array = getByteStreamFromArray(allchars);

%// At my 32-bit system getByteStreamFromArray gets the significant digits 
%// from index 65 onwards. Thus, we need to crop/index it accordingly
digits_array = digits_array(65:65+numel(allchars)-1) - 48;
% --------------------------------------------------------------------
N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell
lens = cellfun('length',list_of_words); %// No. of chars [lengths] in each cell
cumlens = cumsum(lens)'; %//'# Cumulative lengths
% ----------------------------------------------------------
pos_us = find(digits_array==47);
starts = [[1 cumlens(1:end-1)+1];reshape(pos_us+1,N-1,[])];
ends = [reshape(pos_us-1,N-1,[]) ; cumlens];
gaps = ends(:) - starts(:) + 1;

maxg = max(gaps);
mask1 = bsxfun(@lt,maxg-gaps',[1:maxg]');

num_array(maxg,numel(lens)*N)=0;
num_array(mask1) = digits_array(digits_array~=47);
out = reshape(10.^(maxg-4:-1:-3)*num_array,N,[]).';

return;


接下来列出

上述方法的GPU版本.


GPU versions of the above approaches are listed next.

方法4

function out = soln_sprintf_bsxfun_gpu(list_of_words)

N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell

%// Create one long string that has all the characters from the cell arary
%// with underscores separating cells. Now this one uses sprintf as
%// firstly proposed in @chappjc's solution and he has agreed to let me use
%// it, so appreciating his help on this.
digits_array = gpuArray(single(sprintf('%s_',list_of_words{:})));
digits_array = digits_array - 48;

mask_us = digits_array==47; %// mask of underscores
pos_us = find(mask_us);

wordend_idx = pos_us - gpuArray.colon(1,numel(pos_us)); %// word end indices
wordlens = [wordend_idx(1) diff(wordend_idx)]; %// word lengths
max_wordlens = max(wordlens); %// maximum word length

%// Create a numeric array with columns for each word and each row holding a digit.
%// The most significant digits to least significant going from top to bottom
num1 =  single(zeros(max_wordlens,numel(pos_us),'gpuArray')); %//'
num1(bsxfun(@ge,gpuArray.colon(1,max_wordlens)',...
    max_wordlens+1-single(wordlens))) = digits_array(~mask_us); %//'

%// Finally get the desired output converting each column to a single
%// number, reshaping to desired size and scaling down by a factor of 1000
outg = reshape(10.^(max_wordlens-4:-1:-3)*num1,N,[]).'; %//'#
out = gather(outg);

返回;

方法5

function out = soln_bytstrm_bsxfun_gpu(list_of_words)

us_ascii_num = 95;
allchars = [list_of_words{:}];

%// At my 32-bit system getByteStreamFromArray gets the significant digits 
%// from index 65 onwards. Thus, we need to crop/index it accordingly
alldigits = getByteStreamFromArray(allchars);
digits_array1 = gpuArray(alldigits(65:65+numel(allchars)-1));
% --------------------------------------------------------------------
lens = cellfun('length',list_of_words); %// No. of chars [lengths] in each cell
N = sum(digits_array1(1:lens(1))==us_ascii_num)+1; %// Number of "words" per cell
lens = gpuArray(lens);
cumlens = cumsum(lens)'; %//'# Cumulative lengths
% ----------------------------------------------------------
mask_us = digits_array1==us_ascii_num; %// mask of underscores
pos_us = find(mask_us);
starts = [[1 cumlens(1:end-1)+1];reshape(pos_us+1,N-1,[])];
ends = [reshape(pos_us-1,N-1,[]) ; cumlens];
gaps = ends(:) - starts(:) + 1;

maxg = max(gaps);
mask1 = bsxfun(@lt,maxg-gaps',[1:maxg]');

num_array = zeros(maxg,numel(lens)*N,'gpuArray'); %//'
num_array(mask1) = digits_array1(~mask_us)-48;
out = reshape(10.^(maxg-4:-1:-3)*num_array,N,[]).'; %//'
out = gather(out);

return;

这篇关于MATLAB性能基准测试的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆