火炬损失值不变 [英] pytorch loss value not change
问题描述
我根据这篇文章编写了一个模块: http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/
I wrote a module based on this article: http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/
这个想法是将输入传递到多个流中,然后连接在一起并连接到FC层。我将源代码分为3个自定义模块: TextClassifyCnnNet
>> FlatCnnLayer
>> FilterLayer
The idea is pass the input into multiple streams then concat together and connect to a FC layer. I divided my source code into 3 custom modules: TextClassifyCnnNet
>> FlatCnnLayer
>> FilterLayer
FilterLayer:
FilterLayer:
class FilterLayer(nn.Module):
def __init__(self, filter_size, embedding_size, sequence_length, out_channels=128):
super(FilterLayer, self).__init__()
self.model = nn.Sequential(
nn.Conv2d(1, out_channels, (filter_size, embedding_size)),
nn.ReLU(inplace=True),
nn.MaxPool2d((sequence_length - filter_size + 1, 1), stride=1)
)
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
def forward(self, x):
return self.model(x)
FlatCnnLayer:
FlatCnnLayer:
class FlatCnnLayer(nn.Module):
def __init__(self, embedding_size, sequence_length, filter_sizes=[3, 4, 5], out_channels=128):
super(FlatCnnLayer, self).__init__()
self.filter_layers = nn.ModuleList(
[FilterLayer(filter_size, embedding_size, sequence_length, out_channels=out_channels) for
filter_size in filter_sizes])
def forward(self, x):
pools = []
for filter_layer in self.filter_layers:
out_filter = filter_layer(x)
# reshape from (batch_size, out_channels, h, w) to (batch_size, h, w, out_channels)
pools.append(out_filter.view(out_filter.size()[0], 1, 1, -1))
x = torch.cat(pools, dim=3)
x = x.view(x.size()[0], -1)
x = F.dropout(x, p=dropout_prob, training=True)
return x
TextClassifyCnnNet(主模块) :
TextClassifyCnnNet (main module):
class TextClassifyCnnNet(nn.Module):
def __init__(self, embedding_size, sequence_length, num_classes, filter_sizes=[3, 4, 5], out_channels=128):
super(TextClassifyCnnNet, self).__init__()
self.flat_layer = FlatCnnLayer(embedding_size, sequence_length, filter_sizes=filter_sizes,
out_channels=out_channels)
self.model = nn.Sequential(
self.flat_layer,
nn.Linear(out_channels * len(filter_sizes), num_classes)
)
def forward(self, x):
x = self.model(x)
return x
def fit(net, data, save_path):
if torch.cuda.is_available():
net = net.cuda()
for param in list(net.parameters()):
print(type(param.data), param.size())
optimizer = optim.Adam(net.parameters(), lr=0.01, weight_decay=0.1)
X_train, X_test = data['X_train'], data['X_test']
Y_train, Y_test = data['Y_train'], data['Y_test']
X_valid, Y_valid = data['X_valid'], data['Y_valid']
n_batch = len(X_train) // batch_size
for epoch in range(1, n_epochs + 1): # loop over the dataset multiple times
net.train()
start = 0
end = batch_size
for batch_idx in range(1, n_batch + 1):
# get the inputs
x, y = X_train[start:end], Y_train[start:end]
start = end
end = start + batch_size
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
predicts = _get_predict(net, x)
loss = _get_loss(predicts, y)
loss.backward()
optimizer.step()
if batch_idx % display_step == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(x), len(X_train), 100. * batch_idx / (n_batch + 1), loss.data[0]))
# print statistics
if epoch % display_step == 0 or epoch == 1:
net.eval()
valid_predicts = _get_predict(net, X_valid)
valid_loss = _get_loss(valid_predicts, Y_valid)
valid_accuracy = _get_accuracy(valid_predicts, Y_valid)
print('\r[%d] loss: %.3f - accuracy: %.2f' % (epoch, valid_loss.data[0], valid_accuracy * 100))
print('\rFinished Training\n')
net.eval()
test_predicts = _get_predict(net, X_test)
test_loss = _get_loss(test_predicts, Y_test).data[0]
test_accuracy = _get_accuracy(test_predicts, Y_test)
print('Test loss: %.3f - Test accuracy: %.2f' % (test_loss, test_accuracy * 100))
torch.save(net.flat_layer.state_dict(), save_path)
def _get_accuracy(predicts, labels):
predicts = torch.max(predicts, 1)[1].data[0]
return np.mean(predicts == labels)
def _get_predict(net, x):
# wrap them in Variable
inputs = torch.from_numpy(x).float()
# convert to cuda tensors if cuda flag is true
if torch.cuda.is_available:
inputs = inputs.cuda()
inputs = Variable(inputs)
return net(inputs)
def _get_loss(predicts, labels):
labels = torch.from_numpy(labels).long()
# convert to cuda tensors if cuda flag is true
if torch.cuda.is_available:
labels = labels.cuda()
labels = Variable(labels)
return F.cross_entropy(predicts, labels)
似乎参数只是在每个时期稍作更新,在整个过程中精度仍然保持不变。在Tensorflow中使用相同的实现和相同的参数时,它可以正常运行。
It seems that parameters 're just updated slightly each epoch, the accuracy remains for all the process. While with the same implementation and the same params in Tensorflow, it runs correctly.
我是Pytorch的新手,所以也许我的说明有问题,请帮助我找出。谢谢!
I'm new to Pytorch, so maybe my instructions has something wrong, please help me to find out. Thank you!
Ps:我尝试使用 F.nll_loss
+ F.log_softmax
而不是 F.cross_entropy
。从理论上讲,它应该返回相同的值,但实际上会打印出另一个结果(但仍然是错误的损失值)
P.s: I try to use F.nll_loss
+ F.log_softmax
instead of F.cross_entropy
. Theoretically, it should return the same, but in fact another result is printed out (but it still be a wrong loss value)
推荐答案
我意识到Adam Optimizer中的L2_loss保持 loss
的值不变(我还没有在其他Optimizer中尝试过)。当我删除L2_loss时它会起作用:
I realised that L2_loss in Adam Optimizer make loss
value remain unchanged (I haven't tried in other Optimizer yet). It works when I remove L2_loss:
# optimizer = optim.Adam(net.parameters(), lr=0.01, weight_decay=0.1)
optimizer = optim.Adam(model.parameters(), lr=0.001)
===更新(有关更多详细信息,请参见上面的答案!)===
=== UPDATE (See above answer for more detail!) ===
self.features = nn.Sequential(self.flat_layer)
self.classifier = nn.Linear(out_channels * len(filter_sizes), num_classes)
...
optimizer = optim.Adam([
{'params': model.features.parameters()},
{'params': model.classifier.parameters(), 'weight_decay': 0.1}
], lr=0.001)
这篇关于火炬损失值不变的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!