加速MATLAB代码以进行FDR估算 [英] Speeding up MATLAB code for FDR estimation

查看:403
本文介绍了加速MATLAB代码以进行FDR估算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个输入变量:

  • 具有 N 个元素(未排序)的p值( p )的向量
  • N x M 矩阵,其中p值是通过 M 迭代进行随机排列( pr )获得的p值. N 很大,从10K到100K甚至更多. M 假设为100.

我正在估计p的每个元素的错误发现率(FDR),如果当前的p值(来自p)将作为阈值,则代表通过随机排列的p值将通过多少. /p>

我用ARRAYFUN编写了该函数,但是对于大的N(对于 N = 20K,则为2 min ),它花费了很多时间,相当于for循环.

function pfdr = fdr_from_random_permutations(p, pr)
%# ... skipping arguments checks
pfdr = arrayfun( @(x) mean(sum(pr<=x))./sum(p<=x), p);

任何想法如何使其更快?

也欢迎您对此处的统计问题发表评论.

测试数据可以生成为p = rand(N,1); pr = rand(N,M);.

解决方案

好吧,诀窍确实是对向量进行排序.我为此赞扬@EgonGeerardyn.另外,也无需使用mean.之后,您可以将所有内容除以M.对p进行排序时,查找小于当前x的值的数量只是一个运行索引. pr是一个更有趣的情况-我使用一个名为place的运行索引来发现有多少元素少于x.

编辑(2):这是我想出的最快的版本:

 function Speedup2()
    N = 10000/4 ;
    M = 100/4 ;
    p = rand(N,1); pr = rand(N,M);

    tic
    pfdr = arrayfun( @(x) mean(sum(pr<=x))./sum(p<=x), p);
    toc

    tic
    out = zeros(numel(p),1);
    [p,sortIndex] = sort(p);
    pr = sort(pr(:));
    pr(end+1) = Inf;
    place = 1;
    N =  numel(pr);
    for i=1:numel(p)
        x = p(i);
        while pr(place)<=x
            place = place+1;
        end
        exp1a = place-1;
        exp2 = i;
        out(i) = exp1a/exp2;
    end
    out(sortIndex) = out/ M;
    toc
    disp(max(abs(pfdr-out)));

end

以及N = 10000/4 ; M = 100/4的基准测试结果:

经过的时间为0.898689秒.
经过的时间为0.007697秒.
2.220446049250313e-016

N = 10000 ; M = 100;

经过的时间为39.730695秒.
经过的时间是0.088870秒.
2.220446049250313e-016

I have 2 input variables:

  • a vector of p-values (p) with N elements (unsorted)
  • and N x M matrix with p-values obtained by random permutations (pr) with M iterations. N is quite large, 10K to 100K or more. M let's say 100.

I'm estimating the False Discovery Rate (FDR) for each element of p representing how many p-values from random permutations will pass if the current p-value (from p) will be the threshold.

I wrote the function with ARRAYFUN, but it takes lot of time for large N (2 min for N=20K), comparable to for-loop.

function pfdr = fdr_from_random_permutations(p, pr)
%# ... skipping arguments checks
pfdr = arrayfun( @(x) mean(sum(pr<=x))./sum(p<=x), p);

Any ideas how to make it faster?

Comments about statistical issues here are also welcome.

The test data can be generated as p = rand(N,1); pr = rand(N,M);.

解决方案

Well, the trick was indeed sorting the vectors. I give credit to @EgonGeerardyn for that. Also, there is no need to use mean. You can just divide everything afterwards by M. When p is sorted, finding the amount of values that are less than current x, is just a running index. pr is a more interesting case - I used a running index called place to discover how many elements are less than x.

Edit(2): Here is the fastest version I come up with:

 function Speedup2()
    N = 10000/4 ;
    M = 100/4 ;
    p = rand(N,1); pr = rand(N,M);

    tic
    pfdr = arrayfun( @(x) mean(sum(pr<=x))./sum(p<=x), p);
    toc

    tic
    out = zeros(numel(p),1);
    [p,sortIndex] = sort(p);
    pr = sort(pr(:));
    pr(end+1) = Inf;
    place = 1;
    N =  numel(pr);
    for i=1:numel(p)
        x = p(i);
        while pr(place)<=x
            place = place+1;
        end
        exp1a = place-1;
        exp2 = i;
        out(i) = exp1a/exp2;
    end
    out(sortIndex) = out/ M;
    toc
    disp(max(abs(pfdr-out)));

end

And the benchmark results for N = 10000/4 ; M = 100/4 :

Elapsed time is 0.898689 seconds.
Elapsed time is 0.007697 seconds.
2.220446049250313e-016

and for N = 10000 ; M = 100 ;

Elapsed time is 39.730695 seconds.
Elapsed time is 0.088870 seconds.
2.220446049250313e-016

这篇关于加速MATLAB代码以进行FDR估算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆