如何改善该功能的执行时间? [英] How to improve the execution time of this function?
问题描述
假设 F(X,Y)
是一个二元函数如下:
Suppose that f(x,y)
is a bivariate function as follows:
function [ f ] = f(x,y)
UN=(g)1.6*(1-acos(g)/pi)-0.8;
f= 1+UN(cos(0.5*pi*x+y));
end
如何提高函数的执行时间 F(N)
通过以下code:
function [VAL] = F(N)
x=0:4/N:4;
y=0:2*pi/1000:2*pi;
VAL=zeros(N+1,3);
for i = 1:N+1
val = zeros(1,N+1);
for j = 1:N+1
val(j) = trapz(y,f(0,y).*f(x(i),y).*f(x(j),y))/2/pi;
end
val = fftshift(fft(val))/N;
l = (length(val)+1)/2;
VAL(i,:)= val(l-1:l+1);
end
VAL = fftshift(fft(VAL,[],1),1)/N;
L = (size(VAL,1)+1)/2;
VAL = VAL(L-1:L+1,:);
end
注意 N = 2 ^ P
,其中 P> 10
,所以请考虑内存限制,同时优化使用code ndgrid
, arrayfun
等
Note that N=2^p
where p>10
, so please consider the memory limitations while optimizing the code using ndgrid
, arrayfun
, etc.
FYI:在code打算找到的 FFTN
中央3×3子矩阵
FYI: The code intends to find the central 3-by-3 submatrix of the fftn
of
fun=@(a,b) trapz(y,f(0,y).*f(a,y).*f(b,y))/2/pi;
其中, A,B
在 [0,4]
。关键的想法是,我们可以使用上述特别是当 N
是非常大code节省内存。但执行时间是因为嵌套循环的仍是一个问题。请参阅下图的 N = 2 ^ 2
:
where a,b
are in [0,4]
. The key idea is that we can save memory using the code above specially when N
is very large. But the execution time is still an issue because of nested loops. See the figure below for N=2^2
:
推荐答案
这是不是一个完整的答案,但一些可能有用的提示:
This is not a full answer, but some possibly helpful hints:
0)的简单:你确定你需要Numerics的?你能不能做运算解析?
0) The trivial: Are you sure you need numerics? Can't you do the computation analytically?
1)不要使用函数处理:
1) Do not use function handles:
function [ f ] = f(x,y)
f= 1+1.6*(1-acos(cos(0.5*pi*x+y))/pi)-0.8
end
2)简化解析: ACOS(COS(X))
相同 ABS(MOD(X + PI,2 * PI) - PI)
,这应该稍微快一点计算。或者,而不是抽样,然后数值积分,先整合解析和样品的结果。
2) Simplify analytically: acos(cos(x))
is the same as abs(mod(x + pi, 2 * pi) - pi)
, which should compute slightly faster. Or, instead of sampling and then numerically integrating, first integrate analytically and sample the result.
3)FFT是一个非常有效的算法来计算完整的DFT,但你并不需要完整的DFT。由于您只想中央3×3个系数,它可能是更有效的直接应用这些 DFT定义和评估公式只为您希望这些系数。这应该是快速和内存效率。
3) The FFT is a very efficient algorithm to compute the full DFT, but you don't need the full DFT. Since you only want the central 3 x 3 coefficients, it might be more efficient to directly apply the DFT definition and evaluate the formula only for those coefficients that you want. That should be both fast and memory-efficient.
4)如果反复做这个计算,这可能是有益的precompute DFT系数。在这里, dftmtx
从信号处理工具箱可以协助。
4) If you repeatedly do this computation, it might be helpful to precompute DFT coefficients. Here, dftmtx
from the Signal Processing toolbox can assist.
5)为了摆脱循环,想想不计算指令形式的问题,而是一个矩阵操作。如果你认为你输入的N×N的矩阵与N²元素的载体,你的输出3×3矩阵为9元向量,则整个操作可以通过 trapz $申请(数值积分C $ C>和DFT通过
FFT
)似乎是一个简单的线性变换,它应该是可能的前preSS作为N²×9矩阵。
5) To get rid of the loops, think about the problem not in the form of computation instructions, but a single matrix operation. If you consider your input N x N matrix as a vector with N² elements, and your output 3 x 3 matrix as a 9-element vector, then the whole operation you apply (numerical integration via trapz
and DFT via fft
) appears to be a simple linear transform, which it should be possible to express as an N² x 9 matrix.
这篇关于如何改善该功能的执行时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!