如果无法在神经网络中进行大批量处理,如何进行补偿 [英] How to compensate if I cant do a large batch size in neural network

查看:76
本文介绍了如果无法在神经网络中进行大批量处理,如何进行补偿的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从GitHub运行动作识别代码.原始代码使用128个批处理大小和4个GPU.我只有两个GPU,所以我无法匹配其bacth大小数字.无论如何,我可以批量补偿这种差异.我在某处看到iter_size可以根据公式 effective_batchsize = batch_size * iter_size * n_gpu 进行补偿.这个公式中的iter_size是多少?我使用的是PYthorch而不是Caffe.

I am trying to run an action recognition code from GitHub. The original code used a batch size of 128 with 4 GPUS. I only have two gpus so I cannot match their bacth size number. Is there anyway I can compensate this difference in batch. I saw somewhere that iter_size might compensate according to a formula effective_batchsize= batch_size*iter_size*n_gpu. what is iter_size in this formula? I am using PYthorch not Caffe.

推荐答案

在pytorch中,当您执行后退步骤(调用 loss.backward()或类似方法)时,梯度会累积就地.这意味着,如果多次调用 loss.backward(),则不会替换先前计算的渐变,而是将新的渐变添加到先前的渐变中.这就是为什么在使用pytorch时,通常有必要将微型批处理之间的梯度显式归零(通过调用 optimiser.zero_grad()或类似方法).

In pytorch, when you perform the backward step (calling loss.backward() or similar) the gradients are accumulated in-place. This means that if you call loss.backward() multiple times, the previously calculated gradients are not replaced, but in stead the new gradients get added on to the previous ones. That is why, when using pytorch, it is usually necessary to explicitly zero the gradients between minibatches (by calling optimiser.zero_grad() or similar).

如果您的批量大小有限,则可以通过将大批量分解成较小的部分,并仅调用 optimiser.step()模拟处理完所有零件后,更新模型参数.

If your batch size is limited, you can simulate a larger batch size by breaking a large batch up into smaller pieces, and only calling optimiser.step() to update the model parameters after all the pieces have been processed.

例如,假设您只能处理64个批次,但是您希望模拟128个批次.如果原始训练循环如下:

For example, suppose you are only able to do batches of size 64, but you wish to simulate a batch size of 128. If the original training loop looks like:

optimiser.zero_grad()
loss = model(batch_data) # batch_data is a batch of size 128
loss.backward()
optimiser.step()

然后您可以将其更改为:

then you could change this to:

optimiser.zero_grad()

smaller_batches = batch_data[:64], batch_data[64:128]
for batch in smaller_batches:
    loss = model(batch) / 2
    loss.backward()

optimiser.step()

,并且每种情况下对模型参数的更新都是相同的(除了一些小的数值误差外).请注意,您必须重新调整损耗以使更新相同.

and the updates to the model parameters would be the same in each case (apart maybe from some small numerical error). Note that you have to rescale the loss to make the update the same.

这篇关于如果无法在神经网络中进行大批量处理,如何进行补偿的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆