为什么R深网慢?我怎样才能加快速度? [英] Why R deepnet slow? How can I speed up?

查看:39
本文介绍了为什么R深网慢?我怎样才能加快速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我使用deepnet"来训练 MNIST.

我的代码是:

testMNIST <- function(){mnist <- load.mnist("./mnist/")cat ("加载 MNIST 数据成功!", "\n")train_x <- mnist$train$xtrain_y <- mnist$train$ytrain_y_mat <- mnist$train$yytest_x <- mnist$test$xtest_y <- mnist$test$ytest_y_mat <- mnist$test$yydnn <- dbn.dnn.train(train_x, train_y, hidden = c(1000, 500, 200), learningrate = 0.01, numepochs = 100)err_rate <- nn.test(dnn, test_x, test_y)cat ("使用标签向量训练 DBN 的错误率:", "\n")打印(错误率)}

我在Linux服务(24GB内存,1T硬盘)上运行代码,但是速度很慢.它只训练了一层 12 小时.那么,我怎样才能提高性能呢?而对于像 MNIST 数据这样的标签,在训练时,向量标签和矩阵标签哪个更好?

解决方案

首先deepnet 是 R 自己写的,所以速度有点慢.deepnet 中最耗时的函数是矩阵乘法.因此,在后端添加并行 BLAS 将非常有用,例如 Intel MKLOpenBLAS 甚至 NVIDIA cuBLAS.

R with Parallel Computing with User Perspectives 博客中的一个示例在隐式并行模式部分,与原生 R + 深度网络相比,OpenBLAS 可以2.5 倍加速.

#install.packages("data.table")#install.packages("deepnet")图书馆(数据表)图书馆(深网)# 在下面的链接中下载 MNIST 数据集# https://h2o-public-test-data.s3.amazonaws.com/bigdata/laptop/mnist/train.csv.gz# https://h2o-public-test-data.s3.amazonaws.com/bigdata/laptop/mnist/test.csv.gzmnist.train <- as.matrix(fread("./train.csv", header=F))mnist.test <- as.matrix(fread("./test.csv", header=F))# V785 是标签x <- mnist.train[, 1:784]/255y <- model.matrix(~as.factor(mnist.train[, 785])-1)系统时间(nn <- dbn.dnn.train(x,y,隐藏=c(64),#hidden=c(500,500,250,125),输出=softmax",批量=128,numepochs=100,学习率 = 0.1))

结果如下 2115 秒 .vs.867 秒.

<代码>>R CMD BATCH deepnet_mnist.R>cat deepnet_mnist.Routdeep nn 已经训练好了.用户系统已过期2110.710 2.311 2115.042>env LD_PRELOAD=/.../tools/OpenBLAS/lib/libopenblas.so R CMD BATCH deepnet_mnist.R>cat deepnet_mnist.Routdeep nn 已经训练好了.用户系统已过期2197.394 10496.190 867.748

其次H2OMXNET 对于深度学习来说会更快.

最后,请注意您的 hidden = c(1000, 500, 200) 网络非常庞大,即使使用多核也会非常慢:( >

Recently, I use the "deepnet" to train the MNIST.

My code is :

testMNIST <- function(){

    mnist <- load.mnist("./mnist/")
    cat ("Load MNIST data succeed!", "\n")

    train_x <- mnist$train$x
    train_y <- mnist$train$y
    train_y_mat <- mnist$train$yy

    test_x <- mnist$test$x
    test_y <- mnist$test$y
    test_y_mat <- mnist$test$yy

    dnn <- dbn.dnn.train(train_x, train_y, hidden = c(1000, 500, 200), learningrate = 0.01, numepochs = 100)

    err_rate <- nn.test(dnn, test_x, test_y)
    cat ("The Error rate of training DBN with label vector:", "\n")
    print (err_rate)
}

I run the code on a Linux service (24GB memory, 1T hard disc), but the speed is so slowly. It just trained one layer for 12 hours. So, how can I improve the performance? And for labels like MNIST data, when training, vector label or matrix label, which is better?

解决方案

Firstly, deepnet is written by R itself so it's a little slow. The most time-consuming function in deepnet is matrix multiplications. Therefore, adding parallel BLAS in the backend would be very useful, such as Intel MKL, OpenBLAS and even NVIDIA cuBLAS.

One example in the R with Parallel Computing with User Perspectives blog of implicit parallel mode parts, the OpenBLAS can 2.5X speedup compared w/ native R + deepnet.

#install.packages("data.table")
#install.packages("deepnet")

library(data.table)
library(deepnet)

# download MNIST dataset in below links
# https://h2o-public-test-data.s3.amazonaws.com/bigdata/laptop/mnist/train.csv.gz
# https://h2o-public-test-data.s3.amazonaws.com/bigdata/laptop/mnist/test.csv.gz
mnist.train <- as.matrix(fread("./train.csv", header=F))
mnist.test  <- as.matrix(fread("./test.csv", header=F))

# V785 is the label
x <- mnist.train[, 1:784]/255
y <- model.matrix(~as.factor(mnist.train[, 785])-1)

system.time(
nn <- dbn.dnn.train(x,y,
                    hidden=c(64),
                    #hidden=c(500,500,250,125),
                    output="softmax",
                    batchsize=128, 
                    numepochs=100, 
                    learningrate = 0.1)
)

And the results as below 2115 sec .vs. 867 sec.

> R CMD BATCH deepnet_mnist.R
> cat deepnet_mnist.Rout
deep nn has been trained.
 user system elapsed 
 2110.710 2.311 2115.042

> env LD_PRELOAD=/.../tools/OpenBLAS/lib/libopenblas.so R CMD BATCH deepnet_mnist.R
> cat deepnet_mnist.Rout
deep nn has been trained.
 user system elapsed 
 2197.394 10496.190 867.748

Secondly, H2O and MXNET will be faster for deep learning.

Finally, please note your network of hidden = c(1000, 500, 200) is really huge and it will be VERY slow even with multicores :(

这篇关于为什么R深网慢?我怎样才能加快速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆