使用R中的CNN MXnet进行标量输出的图像识别 [英] Image Recognition with Scalar output using CNN MXnet in R

查看:329
本文介绍了使用R中的CNN MXnet进行标量输出的图像识别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我尝试使用CN中的mxnet包使用CNN来尝试根据图像预测标量输出(在我的情况下等待时间)。

So I am trying to use image recognition using the mxnet package in R using a CNN to try and predict a scalar output (in my case wait time) based on the image.

然而,当我这样做时,我得到相同的结果输出(它预测相同的数字,这可能只是所有结果的平均值)。如何让它正确预测标量输出。

However, when I do this, I get the same resultant output (it predicts the same number which is probably just the average of all of the results). How do I get it to predict the scalar output correctly.

此外,我的图像已经通过灰度化并转换为下面的像素格式进行了预处理。
我本质上是使用图像来预测等待时间,这就是为什么我的train_y是以秒为单位的当前等待时间,因此为什么我没有将它转换为[0,1]范围。我更喜欢回归类型输出或某种标量输出,它根据图像输出预测的等待时间。

Also, my image has already been pre-processed by greyscaling it and converting into the pixel format below. I am essentially using images to predict wait times which is why my train_y is the current wait times in seconds, hence why I didn't convert it into a [0,1] range. I would prefer a regression type output or some kind of scalar output that outputs the predicted wait time based on the image.

您还有什么其他方法可以解决这个问题,不确定我的方法是否正确。

What other ways would you recommend to tackle this problem, not sure if my approach is correct.

这是我的可重复代码:

set.seed(0)

df <- data.frame(replicate(784,runif(7538)))
df$waittime <- 1000*runif(7538)


training_index <- createDataPartition(df$waittime, p = .9, times = 1)
training_index <- unlist(training_index)

train_set <- df[training_index,]
dim(train_set)
test_set <- df[-training_index,]
dim(test_set)


## Fix train and test datasets
train_data <- data.matrix(train_set)
train_x <- t(train_data[, -785])
train_y <- train_data[,785]
train_array <- train_x
dim(train_array) <- c(28, 28, 1, ncol(train_array))


test_data <- data.matrix(test_set)
test_x <- t(test_set[,-785])
test_y <- test_set[,785]
test_array <- test_x
dim(test_array) <- c(28, 28, 1, ncol(test_x))




library(mxnet)
## Model
mx_data <- mx.symbol.Variable('data')
## 1st convolutional layer 5x5 kernel and 20 filters.
conv_1 <- mx.symbol.Convolution(data = mx_data, kernel = c(5, 5), num_filter = 20)
tanh_1 <- mx.symbol.Activation(data = conv_1, act_type = "tanh")
pool_1 <- mx.symbol.Pooling(data = tanh_1, pool_type = "max", kernel = c(2, 2), stride = c(2,2 ))
## 2nd convolutional layer 5x5 kernel and 50 filters.
conv_2 <- mx.symbol.Convolution(data = pool_1, kernel = c(5,5), num_filter = 50)
tanh_2 <- mx.symbol.Activation(data = conv_2, act_type = "tanh")
pool_2 <- mx.symbol.Pooling(data = tanh_2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
## 1st fully connected layer
flat <- mx.symbol.Flatten(data = pool_2)
fcl_1 <- mx.symbol.FullyConnected(data = flat, num_hidden = 500)
tanh_3 <- mx.symbol.Activation(data = fcl_1, act_type = "tanh")
## 2nd fully connected layer
fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 1)
## Output
#NN_model <- mx.symbol.SoftmaxOutput(data = fcl_2)
label <- mx.symbol.Variable("label")
#NN_model <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fcl_2, shape = 0) - label))
NN_model <- mx.symbol.LinearRegressionOutput(fcl_2)


## Device used. Sadly not the GPU :-(
#device <- mx.gpu
#Didn't work well, predicted same number continuously regardless of image
## Train on 1200 samples
model <- mx.model.FeedForward.create(NN_model, X = train_array, y = train_y,
                                     #                                     ctx = device,
                                     num.round = 30,
                                     array.batch.size = 100,
                                     initializer=mx.init.uniform(0.002), 
                                     learning.rate = 0.00001,
                                     momentum = 0.9,
                                     wd = 0.00001,
                                     eval.metric = mx.metric.rmse)
                                     epoch.end.callback = mx.callback.log.train.metric(100))



pred <- predict(model, test_array)
#gives the same numeric output 


推荐答案

由于存在许多潜力,您的网络似乎正在崩溃。我会尝试以下修改:

It appears that your network is collapsing, due to a number of potentials. I would try the following modifications:


  • 使用ReLU激活而不是tanh。事实证明,ReLU在Conv网络中比sigmoid或tanh更强大。

  • 卷积层输入之间的用户批量标准化(参见文章此处)。

  • 将您的范围划分为多个部分并使用softmax。如果必须进行回归,请考虑为每个范围建立单独的回归网络,并根据softmax的输出选择正确的回归网络。交叉熵损失在学习高度非线性函数方面取得了更大成功。

  • Use ReLU activation instead of tanh. ReLU has proven to be a much more robust activation in Conv networks than sigmoid or tanh.
  • User batch-normalization between at the input of your convolutional layers (see paper here).
  • Divide your range into sections and use softmax. If you must have regression, consider a separate regression network for each range and select the correct regression net based on the output of the softmax. Cross Entropy loss has shown more success in learning highly non-linear functions.

这篇关于使用R中的CNN MXnet进行标量输出的图像识别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆