为什么我的julia代码运行得这么慢? [英] Why does my julia code run so slowly?

查看:448
本文介绍了为什么我的julia代码运行得这么慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

redim = 2;
# Loading data
iris_data = readdlm("iris_data.csv");
iris_target = readdlm("iris_target.csv");

# Center data
iris_data = broadcast(-, iris_data, mean(iris_data, 1));
n_data, n_dim = size(iris_data);

Sw = zeros(n_dim, n_dim);
Sb = zeros(n_dim, n_dim);

C = cov(iris_data);


classes = unique(iris_target);

for i=1:length(classes)
    index = find(x -> x==classes[i], iris_target);
    d = iris_data[index,:];
    classcov = cov(d);
    Sw += length(index) / n_data .* classcov;
end
Sb = C - Sw;

evals, evecs = eig(Sw, Sb);
w = evecs[:,1:redim];
new_data = iris_data * w;

此代码仅对iris_data执行LDA(线性判别分析). 将iris_data的尺寸减小为2. 大约需要4秒钟,但是Python(numpy/scipy)仅需要0.6秒钟. 为什么?

This code just does LDA (linear discriminant analysis) for the iris_data. Reduct the dimensions of the iris_data to 2. It will takes about 4 seconds, but Python(numpy/scipy) only takes about 0.6 seconds. Why?

推荐答案

这是

由于Julia的编译器不同于Python或R等语言的解释器,因此您可能会发现Julia的性能起初并不直观.如果您发现操作缓慢,强烈建议您阅读 性能提示 部分,然后再尝试其他操作.一旦了解了Julia的工作原理,编写与C一样快的代码就变得很容易.

Because Julia’s compiler is different from the interpreters used for languages like Python or R, you may find that Julia’s performance is unintuitive at first. If you find that something is slow, we highly recommend reading through the Performance Tips section before trying anything else. Once you understand how Julia works, it’s easy to write code that’s nearly as fast as C.


节选:


Excerpt:

避免全局变量

全局变量可能随时更改其值,因此其类型也会更改.这使编译器很难使用全局变量来优化代码.变量应为局部变量,或尽可能作为参数传递给函数.

Avoid global variables

A global variable might have its value, and therefore its type, change at any point. This makes it difficult for the compiler to optimize code using global variables. Variables should be local, or passed as arguments to functions, whenever possible.

任何对性能有严格要求或经过基准测试的代码都应位于函数内部.

Any code that is performance critical or being benchmarked should be inside a function.

我们发现全局名称经常是常量,将它们声明为常量可以大大提高性能

We find that global names are frequently constants, and declaring them as such greatly improves performance


知道 script (所有过程性顶层代码)样式在许多科学计算用户中如此普遍,我建议您至少将整个文件包装在let表达式中,以供初学者使用. (让我们介绍一个新的本地范围),即:


Knowing that the script (all procedural top level code) style is so pervasive among many scientific computing users, I would recommend you to at least wrap the whole file inside a let expression for starters (let introduces a new local scope), ie:

let

redim = 2
# Loading data
iris_data = readdlm("iris_data.csv")
iris_target = readdlm("iris_target.csv")

# Center data
iris_data = broadcast(-, iris_data, mean(iris_data, 1))
n_data, n_dim = size(iris_data)

Sw = zeros(n_dim, n_dim)
Sb = zeros(n_dim, n_dim)

C = cov(iris_data)


classes = unique(iris_target)

for i=1:length(classes)
    index = find(x -> x==classes[i], iris_target)
    d = iris_data[index,:]
    classcov = cov(d)
    Sw += length(index) / n_data .* classcov
end
Sb = C - Sw

evals, evecs = eig(Sw, Sb)
w = evecs[:,1:redim]
new_data = iris_data * w

end

但是我还敦促您将其重构为小的函数,然后组成一个main函数,该函数调用其余函数,如下所示,请注意此重构如何使您的代码具有通用性和可重用性(且快速): >

But I would also urge you to refactor that into small functions and then compose a main function that calls the rest, something like this, notice how this refactor makes your code general and reusable (and fast):

module LinearDiscriminantAnalysis

export load_data, center_data

"Returns data and target Matrices."
load_data(data_path, target_path) = (readdlm(data_path), readdlm(target_path))

function center_data(data, target)
    data = broadcast(-, data, mean(data, 1))
    n_data, n_dim = size(data)
    Sw = zeros(n_dim, n_dim)
    Sb = zeros(n_dim, n_dim)
    C = cov(data)
    classes = unique(target)
    for i=1:length(classes)
        index = find(x -> x==classes[i], target)
        d = data[index,:]
        classcov = cov(d)
        Sw += length(index) / n_data .* classcov
    end
    Sb = C - Sw
    evals, evecs = eig(Sw, Sb)
    redim = 2
    w = evecs[:,1:redim]
    return data * w
end

end


using LinearDiscriminantAnalysis

function main()
    iris_data, iris_target = load_data("iris_data.csv", "iris_target.csv")
    result = center_data(iris_data, iris_target)
    @show result
end

main()

注意:

  • 您不需要所有这些分号.
  • 匿名函数当前运行缓慢,但将在v0.5中更改.如果性能至关重要,则可以暂时使用 FastAnonymous .
  • 摘要中,请仔细阅读并考虑所有性能提示.
  • main只是一个名称,它可以是您喜欢的任何其他名称.
  • You don't need all those semicolons.
  • anonymous functions are currently slow but that will change in v0.5. You can use FastAnonymous for now, if performance is critical.
  • In summary read carefully and take into account all the performance tips.
  • main is just a name, it could be anything else you like.

这篇关于为什么我的julia代码运行得这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆