优化我的Backprop ANN [英] Optimizing my Backprop ANN

查看:104
本文介绍了优化我的Backprop ANN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的反向传播算法之后,我已经知道它负责占用我60%的计算时间。
在开始寻找并行替代方法之前,我想看看是否还有其他可以做的事。

After profiling my Back propagation algorithm, I have learnt it is responsible for taking up 60% of my computation time. Before I start looking at parallel alternatives, I would like to see if there is anything further I can do.

activate const double input [])函数的配置只需要〜5%的时间。
渐​​变(const double input)函数实现如下:

The activate(const double input[]) function is profiled to only take ~5% of the time. The gradient(const double input) function is implemented as follows:

inline double gradient(const double input) { return (1 - (input * input)); }

问题中的训练功能:

void train(const vector<double>& data, const vector<double>& desired, const double learn_rate, const double momentum) {
        this->activate(data);
        this->calculate_error(desired);

        // adjust weights for layers
        const auto n_layers = this->config.size();
        const auto adjustment = (1 - momentum) * learn_rate;

        for (size_t i = 1; i < n_layers; ++i) {
            const auto& inputs = i - 1 > 0 ? this->outputs[i - 1] : data;
            const auto n_inputs = this->config[i - 1];
            const auto n_neurons = this->config[i];

            for (auto j = 0; j < n_neurons; ++j) {
                const auto adjusted_error = adjustment * this->errors[i][j];

                for (auto k = 0; k < n_inputs; ++k) {
                    const auto delta = adjusted_error * inputs[k] + (momentum * this->deltas[i][j][k]);

                    this->deltas[i][j][k] = delta;
                    this->weights[i][j][k] += delta;
                }

                const auto delta = adjusted_error * this->bias + (momentum * this->deltas[i][j][n_inputs]);

                this->deltas[i][j][n_inputs] = delta;
                this->weights[i][j][n_inputs] += delta;
            }
        }
    }
}

问题可能更适合 http://codereview.stackexchange.com/
对于那些感兴趣的人,最小编译所需的代码可以在这里找到: Backprop.cpp

推荐答案

如果你想要一个O训练/使用NN。但它是完全适合矢量算术。例如,聪明地使用SSE或AVX,您可以处理4或8块的神经元,并使用乘法 - 加法,而不是两个单独的指令。

You can't avoid an O(n^2) algorithm if you want to train/use a NN. But it is perfectly suited for vector arithmetic. For example with clever use of SSE or AVX you could process the neurons in chunks of 4 or 8 and use a multiply-add instead of two separate instructions.

一个现代的编译器,仔细重新编写算法和使用正确的开关,你甚至可以让编译器为你自动向量化循环,但你的里程可能会有所不同。

If you use a modern compiler and carefully reformulate the algorithm and use the right switches, you might even get the compiler to autovectorize the loops for you, but your mileage may vary.

对于gcc ,使用-O3或-ftree-vectorize激活自动向量化。你需要一个向量能cpu当然,像-march = core2 -mssse4.1或类似,取决于目标cpu。如果使用-ftree-vectorizer-verbose = 2,你会得到详细的解释,为什么和在哪里循环没有向量化。请查看 http://gcc.gnu.org/projects/tree-ssa/vectorization.html

For gcc, autovectorization is activated using -O3 or -ftree-vectorize. You need an vector capable cpu of course, something like -march=core2 -mssse4.1 or similar, depending on the target cpu. If you use -ftree-vectorizer-verbose=2 you get detailed explanations, why and where loops were not vectorized. Have a look at http://gcc.gnu.org/projects/tree-ssa/vectorization.html .

Better当然是直接使用编译器内在函数。

Better is of course using the compiler intrinsics directly.

这篇关于优化我的Backprop ANN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆