为什么我的 julia 代码运行如此缓慢? [英] Why does my julia code run so slowly?

查看:18
本文介绍了为什么我的 julia 代码运行如此缓慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

redim = 2;
# Loading data
iris_data = readdlm("iris_data.csv");
iris_target = readdlm("iris_target.csv");

# Center data
iris_data = broadcast(-, iris_data, mean(iris_data, 1));
n_data, n_dim = size(iris_data);

Sw = zeros(n_dim, n_dim);
Sb = zeros(n_dim, n_dim);

C = cov(iris_data);


classes = unique(iris_target);

for i=1:length(classes)
    index = find(x -> x==classes[i], iris_target);
    d = iris_data[index,:];
    classcov = cov(d);
    Sw += length(index) / n_data .* classcov;
end
Sb = C - Sw;

evals, evecs = eig(Sw, Sb);
w = evecs[:,1:redim];
new_data = iris_data * w;

此代码仅对 iris_data 进行 LDA(线性判别分析).将 iris_data 的维度减少到 2.大约需要 4 秒,但 Python(numpy/scipy) 只需要大约 0.6 秒.为什么?

This code just does LDA (linear discriminant analysis) for the iris_data. Reduct the dimensions of the iris_data to 2. It will takes about 4 seconds, but Python(numpy/scipy) only takes about 0.6 seconds. Why?

推荐答案

这是来自Julia 手册:

由于 Julia 的编译器与用于 Python 或 R 等语言的解释器不同,您可能会发现 Julia 的性能一开始并不直观.如果您发现某些东西很慢,我们强烈建议您阅读 性能提示部分,然后再尝试其他任何操作.一旦您了解了 Julia 的工作原理,就很容易编写几乎与 C 一样快的代码.

Because Julia’s compiler is different from the interpreters used for languages like Python or R, you may find that Julia’s performance is unintuitive at first. If you find that something is slow, we highly recommend reading through the Performance Tips section before trying anything else. Once you understand how Julia works, it’s easy to write code that’s nearly as fast as C.


摘录:


Excerpt:

一个全局变量可能有它的值,因此它的类型会在任何时候改变.这使得编译器很难使用全局变量优化代码.变量应该是本地的,或者尽可能作为参数传递给函数.

Avoid global variables

A global variable might have its value, and therefore its type, change at any point. This makes it difficult for the compiler to optimize code using global variables. Variables should be local, or passed as arguments to functions, whenever possible.

任何对性能至关重要或进行基准测试的代码都应该在函数内部.

Any code that is performance critical or being benchmarked should be inside a function.

我们发现全局名称通常是常量,因此声明它们会大大提高性能

We find that global names are frequently constants, and declaring them as such greatly improves performance


知道 script(所有程序顶级代码)样式在许多科学计算用户中如此普遍,我建议您至少将整个文件包装在 let 初学者的表达式(让我们引入一个新的本地范围),即:


Knowing that the script (all procedural top level code) style is so pervasive among many scientific computing users, I would recommend you to at least wrap the whole file inside a let expression for starters (let introduces a new local scope), ie:

let

redim = 2
# Loading data
iris_data = readdlm("iris_data.csv")
iris_target = readdlm("iris_target.csv")

# Center data
iris_data = broadcast(-, iris_data, mean(iris_data, 1))
n_data, n_dim = size(iris_data)

Sw = zeros(n_dim, n_dim)
Sb = zeros(n_dim, n_dim)

C = cov(iris_data)


classes = unique(iris_target)

for i=1:length(classes)
    index = find(x -> x==classes[i], iris_target)
    d = iris_data[index,:]
    classcov = cov(d)
    Sw += length(index) / n_data .* classcov
end
Sb = C - Sw

evals, evecs = eig(Sw, Sb)
w = evecs[:,1:redim]
new_data = iris_data * w

end

但我也敦促您将其重构为小函数,然后编写一个调用其余部分的 main 函数,类似这样,请注意此重构如何使您的代码通用且可重用(并且快速):

But I would also urge you to refactor that into small functions and then compose a main function that calls the rest, something like this, notice how this refactor makes your code general and reusable (and fast):

module LinearDiscriminantAnalysis

export load_data, center_data

"Returns data and target Matrices."
load_data(data_path, target_path) = (readdlm(data_path), readdlm(target_path))

function center_data(data, target)
    data = broadcast(-, data, mean(data, 1))
    n_data, n_dim = size(data)
    Sw = zeros(n_dim, n_dim)
    Sb = zeros(n_dim, n_dim)
    C = cov(data)
    classes = unique(target)
    for i=1:length(classes)
        index = find(x -> x==classes[i], target)
        d = data[index,:]
        classcov = cov(d)
        Sw += length(index) / n_data .* classcov
    end
    Sb = C - Sw
    evals, evecs = eig(Sw, Sb)
    redim = 2
    w = evecs[:,1:redim]
    return data * w
end

end


using LinearDiscriminantAnalysis

function main()
    iris_data, iris_target = load_data("iris_data.csv", "iris_target.csv")
    result = center_data(iris_data, iris_target)
    @show result
end

main()

注意事项:

  • 您不需要所有这些分号.
  • 匿名函数目前速度很慢,但在 v0.5 中会有所改变.如果性能至关重要,您现在可以使用 FastAnonymous.
  • 总而言之,请仔细阅读并考虑所有性能提示.
  • main 只是一个名称,可以是您喜欢的任何其他名称.
  • You don't need all those semicolons.
  • anonymous functions are currently slow but that will change in v0.5. You can use FastAnonymous for now, if performance is critical.
  • In summary read carefully and take into account all the performance tips.
  • main is just a name, it could be anything else you like.

这篇关于为什么我的 julia 代码运行如此缓慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆