精细调整DistilBertForSequenceClassification:不学习，为什么损失没有改变?重量未更新? [英] Fine-Tuning DistilBertForSequenceClassification: Is not learning, why is loss not changing? Weights not updated?

查看：163 发布时间：2021/5/30 21:27:00 nlp pytorch text-classification loss-function huggingface-transformers

本文介绍了精细调整DistilBertForSequenceClassification:不学习，为什么损失没有改变?重量未更新?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对 PyTorch 和 Huggingface-transformers 比较陌生，并在这个 Kaggle 上试验了 DistillBertForSequenceClassification-数据集.

I am relatively new to PyTorch and Huggingface-transformers and experimented with DistillBertForSequenceClassification on this Kaggle-Dataset.

from transformers import DistilBertForSequenceClassification
import torch.optim as optim
import torch.nn as nn
from transformers import get_linear_schedule_with_warmup

n_epochs = 5 # or whatever
batch_size = 32 # or whatever

bert_distil = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
#bert_distil.classifier = nn.Sequential(nn.Linear(in_features=768, out_features=1), nn.Sigmoid())
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(bert_distil.parameters(), lr=0.1)

X_train = []
Y_train = []

for row in train_df.iterrows():
    seq = tokenizer.encode(preprocess_text(row[1]['text']),  add_special_tokens=True, pad_to_max_length=True)
    X_train.append(torch.tensor(seq).unsqueeze(0))
    Y_train.append(torch.tensor([row[1]['target']]).unsqueeze(0))
X_train = torch.cat(X_train)
Y_train = torch.cat(Y_train)

running_loss = 0.0
bert_distil.cuda()
bert_distil.train(True)
for epoch in range(n_epochs):
    permutation = torch.randperm(len(X_train))
    j = 0
    for i in range(0,len(X_train), batch_size):
        optimizer.zero_grad()
        indices = permutation[i:i+batch_size]
        batch_x, batch_y = X_train[indices], Y_train[indices]
        batch_x.cuda()
        batch_y.cuda()
        outputs = bert_distil.forward(batch_x.cuda())
        loss = criterion(outputs[0],batch_y.squeeze().cuda())
        loss.requires_grad = True
   
        loss.backward()
        optimizer.step()
   
        running_loss += loss.item()  
        j+=1
        if j == 20:   
            #print(outputs[0])
            print('[%d, %5d] running loss: %.3f loss: %.3f ' %
              (epoch + 1, i*1, running_loss / 20, loss.item()))
            running_loss = 0.0
            j = 0

[1, 608] 运行损失:0.689 损失:0.687[1，1248]行驶损失:0.693损失:0.694[1,888]营业亏损:0.693亏损:0.683[1，2528]运营亏损:0.689亏损:0.701[1，3168]行驶损失:0.690损失:0.684[1，3808]行驶损耗:0.689损耗:0.688[1，4448]行驶损耗:0.689损耗:0.692等...

[1, 608] running loss: 0.689 loss: 0.687 [1, 1248] running loss: 0.693 loss: 0.694 [1, 1888] running loss: 0.693 loss: 0.683 [1, 2528] running loss: 0.689 loss: 0.701 [1, 3168] running loss: 0.690 loss: 0.684 [1, 3808] running loss: 0.689 loss: 0.688 [1, 4448] running loss: 0.689 loss: 0.692 etc...

无论我尝试什么，损失都不会减少甚至增加，预测也不会变得更好.在我看来，我忘记了一些东西，因此权重实际上并未更新.有人有主意吗?

Regardless on what I tried, loss did never decrease, or even increase, nor did the prediction get better. It seems to me that I forgot something so that weights are actually not updated. Someone has an idea? O

我尝试了什么

不同的损失函数

BCE
交叉熵
MSE损失均匀

推荐答案

查看运行损失和小批量损失很容易引起误解.您应该查看时代损失，因为每次损失的输入都是相同的.

Looking at running loss and minibatch loss is easily misleading. You should look at epoch loss, because the inputs are the same for every loss.

此外，您的代码中存在一些问题，可以解决所有问题，并且行为符合预期:损失在每个时期后逐渐减少，它也可能过度适合小批量生产.请查看代码，更改包括:使用 model(x)代替 model.forward(x)， cuda()仅调用一次，学习率较低等.

Besides, there are some problems in your code, fixing all of them and the behavior is as expected: the loss slowly decreases after each epoch, and it can also overfit to a small minibatch. Please look at the code, changes include: using model(x) instead of model.forward(x), cuda() only called once, smaller learning rate, etc.

调整和微调ML模型是一项艰巨的工作.

Tuning and fine-tuning ML models are difficult work.

n_epochs = 5
batch_size = 1

bert_distil = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(bert_distil.parameters(), lr=1e-3)

X_train = []
Y_train = []
for row in train_df.iterrows():
    seq = tokenizer.encode(row[1]['text'],  add_special_tokens=True, pad_to_max_length=True)[:100]
    X_train.append(torch.tensor(seq).unsqueeze(0))
    Y_train.append(torch.tensor([row[1]['target']]))
X_train = torch.cat(X_train)
Y_train = torch.cat(Y_train)

running_loss = 0.0
bert_distil.cuda()
bert_distil.train(True)
for epoch in range(n_epochs):
    permutation = torch.randperm(len(X_train))
    for i in range(0,len(X_train), batch_size):
        optimizer.zero_grad()
        indices = permutation[i:i+batch_size]
        batch_x, batch_y = X_train[indices].cuda(), Y_train[indices].cuda()
        outputs = bert_distil(batch_x)
        loss = criterion(outputs[0], batch_y)
        loss.backward()
        optimizer.step()
   
        running_loss += loss.item()  

    print('[%d] epoch loss: %.3f' %
      (epoch + 1, running_loss / len(X_train) * batch_size))
    running_loss = 0.0

输出:

[1] epoch loss: 0.695
[2] epoch loss: 0.690
[3] epoch loss: 0.687
[4] epoch loss: 0.685
[5] epoch loss: 0.684

这篇关于精细调整DistilBertForSequenceClassification:不学习，为什么损失没有改变?重量未更新?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

精细调整DistilBertForSequenceClassification:不学习，为什么损失没有改变?重量未更新? [英] Fine-Tuning DistilBertForSequenceClassification: Is not learning, why is loss not changing? Weights not updated?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

精细调整DistilBertForSequenceClassification:不学习，为什么损失没有改变?重量未更新? [英] Fine-Tuning DistilBertForSequenceClassification: Is not learning, why is loss not changing? Weights not updated?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭