如何改善该功能的执行时间? [英] How to improve the execution time of this function?

查看:87
本文介绍了如何改善该功能的执行时间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设 F(X,Y)是一个二元函数如下:

Suppose that f(x,y) is a bivariate function as follows:

function [ f ] = f(x,y)
    UN=(g)1.6*(1-acos(g)/pi)-0.8;
    f= 1+UN(cos(0.5*pi*x+y));
end

如何提高函数的执行时间 F(N)通过以下code:

function [VAL] = F(N)
x=0:4/N:4;
y=0:2*pi/1000:2*pi;
VAL=zeros(N+1,3);
for i = 1:N+1
    val = zeros(1,N+1);
    for j = 1:N+1
        val(j) = trapz(y,f(0,y).*f(x(i),y).*f(x(j),y))/2/pi;
    end
    val = fftshift(fft(val))/N;
    l = (length(val)+1)/2;
    VAL(i,:)= val(l-1:l+1);
end
VAL = fftshift(fft(VAL,[],1),1)/N;
L = (size(VAL,1)+1)/2;
VAL = VAL(L-1:L+1,:);
end

注意 N = 2 ^ P ,其中 P> 10 ,所以请考虑内存限制,同时优化使用code ndgrid arrayfun

Note that N=2^p where p>10, so please consider the memory limitations while optimizing the code using ndgrid, arrayfun, etc.

FYI:在code打算找到的 FFTN 中央3×3子矩阵

FYI: The code intends to find the central 3-by-3 submatrix of the fftn of

fun=@(a,b) trapz(y,f(0,y).*f(a,y).*f(b,y))/2/pi;

其中, A,B [0,4] 。关键的想法是,我们可以使用上述特别是当 N 是非常大code节省内存。但执行时间是因为嵌套循环的仍是一个问题。请参阅下图的 N = 2 ^ 2

where a,b are in [0,4]. The key idea is that we can save memory using the code above specially when N is very large. But the execution time is still an issue because of nested loops. See the figure below for N=2^2:

推荐答案

这是不是一个完整的答案,但一些可能有用的提示:

This is not a full answer, but some possibly helpful hints:

0)的简单:你确定你需要Numerics的?你能不能做运算解析?

0) The trivial: Are you sure you need numerics? Can't you do the computation analytically?

1)不要使用函数处理:

1) Do not use function handles:

function [ f ] = f(x,y)
    f= 1+1.6*(1-acos(cos(0.5*pi*x+y))/pi)-0.8
end

2)简化解析: ACOS(COS(X))相同 ABS(MOD(X + PI,2 * PI) - PI),这应该稍微快一点计算。或者,而不是抽样,然后数值积分,先整合解析和样品的结果。

2) Simplify analytically: acos(cos(x)) is the same as abs(mod(x + pi, 2 * pi) - pi), which should compute slightly faster. Or, instead of sampling and then numerically integrating, first integrate analytically and sample the result.

3)FFT是一个非常有效的算法来计算完整的DFT,但你并不需要完整的DFT。由于您只想中央3×3个系数,它可能是更有效的直接应用这些 DFT定义和评估公式只为您希望这些系数。这应该是快速和内存效率。

3) The FFT is a very efficient algorithm to compute the full DFT, but you don't need the full DFT. Since you only want the central 3 x 3 coefficients, it might be more efficient to directly apply the DFT definition and evaluate the formula only for those coefficients that you want. That should be both fast and memory-efficient.

4)如果反复做这个计算,这可能是有益的precompute DFT系数。在这里, dftmtx 从信号处理工具箱可以协助。

4) If you repeatedly do this computation, it might be helpful to precompute DFT coefficients. Here, dftmtx from the Signal Processing toolbox can assist.

5)为了摆脱循环,想想不计算指令形式的问题,而是一个矩阵操作。如果你认为你输入的N×N的矩阵与N²元素的载体,你的输出3×3矩阵为9元向量,则整个操作可以通过 trapz 和DFT通过 FFT )似乎是一个简单的线性变换,它应该是可能的前preSS作为N²×9矩阵。

5) To get rid of the loops, think about the problem not in the form of computation instructions, but a single matrix operation. If you consider your input N x N matrix as a vector with N² elements, and your output 3 x 3 matrix as a 9-element vector, then the whole operation you apply (numerical integration via trapz and DFT via fft) appears to be a simple linear transform, which it should be possible to express as an N² x 9 matrix.

这篇关于如何改善该功能的执行时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆