Food101 SqueezeNet Caffe2迭代次数 [英] Food101 SqueezeNet Caffe2 number of iterations

查看:90
本文介绍了Food101 SqueezeNet Caffe2迭代次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Caffe2中的squeezenet对ETH Food-101数据集进行分类.我的模型是从Model Zoo导入的,并且对模型进行了两种修改:

I am trying to classify the ETH Food-101 dataset using squeezenet in Caffe2. My model is imported from the Model Zoo and I made two types of modifications to the model:

1)更改最后一层的尺寸以具有101个输出

1) Changing the dimensions of the last layer to have 101 outputs

2)数据库中的图像为NHWC格式,我只是将权重的尺寸进行了翻转以进行匹配. (我打算对此进行更改)

2) The images from the database are in NHWC form and I just flipped the dimensions of the weights to match. (I plan on changing this)

Food101数据集有75,000张图像用于训练,我目前使用的批次大小为128,起始学习率为-0.01,伽玛为0.999,步长为1.网络的准确度徘徊在1/128左右,这需要一个小时左右的时间才能完成.

The Food101 dataset has 75,000 images for training and I am currently using a batch size of 128 and a starting learning rate of -0.01 with a gamma of 0.999 and stepsize of 1. What I noticed is that for the first 2000 iterations of the network the accuracy hovered around 1/128 and this took an hour or so to complete.

我将所有权重添加到了model.params中,以便它们可以在梯度下降期间得到更新(数据除外),并将所有权重重新初始化为Xavier并偏置为常数.我希望精度在前十万次迭代中会相当快地增长,然后随着迭代次数的增加而下降.就我而言,学习保持恒定在0附近.

I added all the weights to the model.params so they can get updated during gradient descent(except for data) and reinitialized all weights as Xavier and biases to constant. I would expect the accuracy to grow fairly quickly in the first hundred to thousand iterations and then tail off as the number of iterations grow. In my case, the learning is staying constant around 0.

当我查看渐变文件时,我发现平均值约为10 ^ -6,标准偏差为10 ^ -7.这解释了学习速度慢的原因,但是我无法使梯度开始变得更高.

When I look at the gradient file I find that the average is on the order of 10^-6 with a standard deviation of 10^-7. This explains the slow learning rate, but I haven't been able to get the gradient to start much higher.

这些是经过几次迭代后第一个卷积的梯度统计信息

These are the gradient statistics for the first convolution after a few iterations

    Min        Max          Avg       Sdev
-1.69821e-05 2.10922e-05 1.52149e-06 5.7707e-06
-1.60263e-05 2.01478e-05 1.49323e-06 5.41754e-06
-1.62501e-05 1.97764e-05 1.49046e-06 5.2904e-06
-1.64293e-05 1.90508e-05 1.45681e-06 5.22742e-06

这是我代码的核心部分:

Here are the core parts of my code:

#init_path is path to init_net protobuf 
#pred_path is path to pred_net protobuf
def main(init_path, pred_path):
    ws.ResetWorkspace()
    data_folder = '/home/myhome/food101/'
    #some debug code here
    arg_scope = {"order":"NCHW"}
    train_model = model_helper.ModelHelper(name="food101_train", arg_scope=arg_scope)
    if not debug:
            data, label = AddInput(
                    train_model, batch_size=128,
                    db=os.path.join(data_folder, 'food101-train-nchw-leveldb'),
                    db_type='leveldb')
    init_net_def, pred_net_def = update_squeeze_net(init_path, pred_path)
    #print str(init_net_def)
    train_model.param_init_net.AppendNet(core.Net(init_net_def))
    train_model.net.AppendNet(core.Net(pred_net_def))
    ws.RunNetOnce(train_model.param_init_net)
    add_params(train_model, init_net_def)
    AddTrainingOperators(train_model, 'softmaxout', 'label')
    AddBookkeepingOperators(train_model)

    ws.RunNetOnce(train_model.param_init_net)
    if debug:
            ws.FeedBlob('data', data)
            ws.FeedBlob('label', label)
    ws.CreateNet(train_model.net)

    total_iters = 10000
    accuracy = np.zeros(total_iters)
    loss = np.zeros(total_iters)
    # Now, we will manually run the network for 200 iterations.
    for i in range(total_iters):
            #try:
            conv1_w = ws.FetchBlob('conv1_w')
            print conv1_w[0][0]
            ws.RunNet("food101_train")
            #except RuntimeError:
            #       print ws.FetchBlob('conv1').shape
            #       print ws.FetchBlob('pool1').shape
            #       print ws.FetchBlob('fire2/squeeze1x1_w').shape
            #       print ws.FetchBlob('fire2/squeeze1x1_b').shape
            #softmax = ws.FetchBlob('softmaxout')
            #print softmax[i]
            #print softmax[i][0][0]
            #print softmax[i][0][:5]
            #print softmax[64*i]
            accuracy[i] = ws.FetchBlob('accuracy')
            loss[i] = ws.FetchBlob('loss')
            print accuracy[i], loss[i]

我的add_params函数按如下所示初始化权重

My add_params function initializes the weights as follows

#ops allows me to only initialize the weights of specific ops because i initially was going to do last layer training
def add_params(model, init_net_def, ops=[]):
    def add_param(op):
            for output in op.output:
                    if "_w" in output:
                            weight_shape = []
                            for arg in op.arg:
                                    if arg.name == 'shape':
                                            weight_shape = arg.ints
                            weight_initializer = initializers.update_initializer(
                                                    None,
                                                    None,
                                                    ("XavierFill", {}))
                            model.create_param(
                                    param_name=output,
                                    shape=weight_shape,
                                    initializer=weight_initializer,
                                    tags=ParameterTags.WEIGHT)
                    elif "_b" in output:
                            weight_shape = []
                            for arg in op.arg:
                                    if arg.name == 'shape':
                                            weight_shape = arg.ints
                            weight_initializer = initializers.update_initializer(
                                                    None,
                                                    None,
                                                    ("ConstantFill", {}))
                            model.create_param(
                                    param_name=output,
                                    shape=weight_shape,
                                    initializer=weight_initializer,

我发现当我使用完整的训练集时,损失函数会波动,但是如果我只使用一个批次并对其进行多次迭代,我会发现损失函数会下降但非常缓慢.

I find that my loss function fluctuates when I use the full training set, but If i use just one batch and iterate over it several times I find that the loss function goes down but very slowly.

推荐答案

虽然SqueezeNet的参数比AlexNet少50倍,但它仍然是一个非常大的网络. 原始论文没有提到培训时间,而是基于SqueezeNet的

While SqueezeNet has 50x fewer parameters than AlexNet, it is still a very large network. The original paper does not mention a training time, but the SqueezeNet-based SQ required 22 hours to train using two Titan X graphics cards - and that was with some of the weights pre-trained! I haven't gone over your code in detail, but what you describe is expected behavior - your network is able to learn on the single batch, just not as quickly as you expected.

我建议重用尽可能多的权重,而不是像SQ的创建者一样重新初始化它们.这被称为转移学习,之所以起作用,是因为无论图像的内容如何,​​图像中的许多低层特征(直线,曲线,基本形状)都是相同的,并且为这些层重新使用权重可避免网络具有重新从头开始学习.

I suggest reusing as many of the weights as possible instead of reinitializing them, just as the creators of SQ did. This is known as transfer learning, and it works because many of the lower-level features (lines, curves, basic shapes) in an image are the same regardless of the image's content, and reusing the weights for these layers saves the network from having to re-learn them from scratch.

这篇关于Food101 SqueezeNet Caffe2迭代次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆