间歇性&运行错误：Cuda内存不足&Google Colab微调Bert Base时出错，原因是Transformers和PyTorch [英] Intermittent "RuntimeError: CUDA out of memory" error in Google Colab Fine Tuning BERT Base Cased with Transformers and PyTorch

查看：24 发布时间：2022/3/15 14:41:07 python machine-learning pytorch google-colaboratory

本文介绍了间歇性&运行错误：Cuda内存不足&Google Colab微调Bert Base时出错，原因是Transformers和PyTorch的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在运行以下代码来微调Google Colab中的Bert Base Case模型。有时代码第一次运行得很好，没有错误。其他时候，使用相同数据的相同代码会导致"CUDA内存不足"错误。以前，重新启动运行库或退出笔记本，返回笔记本，执行工厂运行时重新启动，然后重新运行代码即可成功运行，而不会出现错误。不过，刚才我尝试了5次重启和重试，每次都出现错误。

问题似乎不在于我正在使用的数据和代码的组合，因为有时它工作起来没有错误。因此，这似乎与Google Colab运行时有关。

有没有人知道为什么会发生这种情况，为什么是间歇性的，和/或我能为此做些什么？

我正在使用Huggingface的transformers库和PyTorch。

导致错误的代码单元格：

# train the model
%%time

history = defaultdict(list)

for epoch in range(EPOCHS):

  print(f'Epoch {epoch + 1}/{EPOCHS}')
  print('-' * 10)

  train_acc, train_loss = train_epoch(
    model,
    train_data_loader,    
    loss_fn, 
    optimizer, 
    device, 
    scheduler, 
    train_set_length
  )

  print(f'Train loss {train_loss} accuracy {train_acc}')

  dev_acc, dev_loss = eval_model(
    model,
    dev_data_loader,
    loss_fn, 
    device, 
    evaluation_set_length
  )

  print(f'Dev   loss {dev_loss} accuracy {dev_acc}')

  history['train_acc'].append(train_acc)
  history['train_loss'].append(train_loss)
  history['dev_acc'].append(dev_acc)
  history['dev_loss'].append(dev_loss)

  model_filename = f'model_{epoch}_state.bin'
  torch.save(model.state_dict(), model_filename)

完整错误：


RuntimeError                              Traceback (most recent call last)
<ipython-input-29-a13774d7aa75> in <module>()
----> 1 get_ipython().run_cell_magic('time', '', "
history = defaultdict(list)

for epoch in range(EPOCHS):

  print(f'Epoch {epoch + 1}/{EPOCHS}')
  print('-' * 10)

  train_acc, train_loss = train_epoch(
    model,
    train_data_loader,    
    loss_fn, 
    optimizer, 
    device, 
    scheduler, 
    train_set_length
  )

  print(f'Train loss {train_loss} accuracy {train_acc}')

  dev_acc, dev_loss = eval_model(
    model,
    dev_data_loader,
    loss_fn, 
    device, 
    evaluation_set_length
  )

  print(f'Dev   loss {dev_loss} accuracy {dev_acc}')

  history['train_acc'].append(train_acc)
  history['train_loss'].append(train_loss)
  history['dev_acc'].append(dev_acc)
  history['dev_loss'].append(dev_loss)
  
  model_filename = f'model_{epoch}_state.bin'
  torch.save(model.state_dict(), model_filename)")

15 frames
<decorator-gen-60> in time(self, line, cell, local_ns)

<timed exec> in <module>()

/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask)
    234         # Take the dot product between "query" and "key" to get the raw attention scores.
    235         attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
--> 236         attention_scores = attention_scores / math.sqrt(self.attention_head_size)
    237         if attention_mask is not None:
    238             # Apply the attention mask is (precomputed for all layers in BertModel forward() function)

RuntimeError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 7.43 GiB total capacity; 5.42 GiB already allocated; 8.94 MiB free; 5.79 GiB reserved in total by PyTorch)

间歇性&运行错误：Cuda内存不足&Google Colab微调Bert Base时出错，原因是Transformers和PyTorch [英] Intermittent "RuntimeError: CUDA out of memory" error in Google Colab Fine Tuning BERT Base Cased with Transformers and PyTorch

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

间歇性&运行错误：Cuda内存不足&Google Colab微调Bert Base时出错，原因是Transformers和PyTorch [英] Intermittent &quot;RuntimeError: CUDA out of memory&quot; error in Google Colab Fine Tuning BERT Base Cased with Transformers and PyTorch

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

间歇性&运行错误：Cuda内存不足&Google Colab微调Bert Base时出错，原因是Transformers和PyTorch [英] Intermittent "RuntimeError: CUDA out of memory" error in Google Colab Fine Tuning BERT Base Cased with Transformers and PyTorch

登录关闭