火炬损失值不变 [英] pytorch loss value not change

查看:73
本文介绍了火炬损失值不变的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我根据这篇文章编写了一个模块: http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/

I wrote a module based on this article: http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/

这个想法是将输入传递到多个流中,然后连接在一起并连接到FC层。我将源代码分为3个自定义模块: TextClassifyCnnNet >> FlatCnnLayer >> FilterLayer

The idea is pass the input into multiple streams then concat together and connect to a FC layer. I divided my source code into 3 custom modules: TextClassifyCnnNet >> FlatCnnLayer >> FilterLayer

FilterLayer:

FilterLayer:

class FilterLayer(nn.Module):
    def __init__(self, filter_size, embedding_size, sequence_length, out_channels=128):
        super(FilterLayer, self).__init__()

        self.model = nn.Sequential(
            nn.Conv2d(1, out_channels, (filter_size, embedding_size)),
            nn.ReLU(inplace=True),
            nn.MaxPool2d((sequence_length - filter_size + 1, 1), stride=1)
        )

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))

    def forward(self, x):
        return self.model(x)

FlatCnnLayer:

FlatCnnLayer:

class FlatCnnLayer(nn.Module):
    def __init__(self, embedding_size, sequence_length, filter_sizes=[3, 4, 5], out_channels=128):
        super(FlatCnnLayer, self).__init__()

        self.filter_layers = nn.ModuleList(
            [FilterLayer(filter_size, embedding_size, sequence_length, out_channels=out_channels) for
             filter_size in filter_sizes])

    def forward(self, x):
        pools = []
        for filter_layer in self.filter_layers:
            out_filter = filter_layer(x)
            # reshape from (batch_size, out_channels, h, w) to (batch_size, h, w, out_channels)
            pools.append(out_filter.view(out_filter.size()[0], 1, 1, -1))
        x = torch.cat(pools, dim=3)

        x = x.view(x.size()[0], -1)
        x = F.dropout(x, p=dropout_prob, training=True)

        return x

TextClassifyCnnNet(主模块) :

TextClassifyCnnNet (main module):

class TextClassifyCnnNet(nn.Module):
    def __init__(self, embedding_size, sequence_length, num_classes, filter_sizes=[3, 4, 5], out_channels=128):
        super(TextClassifyCnnNet, self).__init__()

        self.flat_layer = FlatCnnLayer(embedding_size, sequence_length, filter_sizes=filter_sizes,
                                       out_channels=out_channels)

        self.model = nn.Sequential(
            self.flat_layer,
            nn.Linear(out_channels * len(filter_sizes), num_classes)
        )

    def forward(self, x):
        x = self.model(x)

        return x


def fit(net, data, save_path):
    if torch.cuda.is_available():
        net = net.cuda()

    for param in list(net.parameters()):
        print(type(param.data), param.size())

    optimizer = optim.Adam(net.parameters(), lr=0.01, weight_decay=0.1)

    X_train, X_test = data['X_train'], data['X_test']
    Y_train, Y_test = data['Y_train'], data['Y_test']

    X_valid, Y_valid = data['X_valid'], data['Y_valid']

    n_batch = len(X_train) // batch_size

    for epoch in range(1, n_epochs + 1):  # loop over the dataset multiple times
        net.train()
        start = 0
        end = batch_size

        for batch_idx in range(1, n_batch + 1):
            # get the inputs
            x, y = X_train[start:end], Y_train[start:end]
            start = end
            end = start + batch_size

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            predicts = _get_predict(net, x)
            loss = _get_loss(predicts, y)
            loss.backward()
            optimizer.step()

            if batch_idx % display_step == 0:
                print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                    epoch, batch_idx * len(x), len(X_train), 100. * batch_idx / (n_batch + 1), loss.data[0]))

        # print statistics
        if epoch % display_step == 0 or epoch == 1:
            net.eval()
            valid_predicts = _get_predict(net, X_valid)
            valid_loss = _get_loss(valid_predicts, Y_valid)
            valid_accuracy = _get_accuracy(valid_predicts, Y_valid)
            print('\r[%d] loss: %.3f - accuracy: %.2f' % (epoch, valid_loss.data[0], valid_accuracy * 100))

    print('\rFinished Training\n')

    net.eval()

    test_predicts = _get_predict(net, X_test)
    test_loss = _get_loss(test_predicts, Y_test).data[0]
    test_accuracy = _get_accuracy(test_predicts, Y_test)
    print('Test loss: %.3f - Test accuracy: %.2f' % (test_loss, test_accuracy * 100))

    torch.save(net.flat_layer.state_dict(), save_path)


def _get_accuracy(predicts, labels):
    predicts = torch.max(predicts, 1)[1].data[0]
    return np.mean(predicts == labels)


def _get_predict(net, x):
    # wrap them in Variable
    inputs = torch.from_numpy(x).float()
    # convert to cuda tensors if cuda flag is true
    if torch.cuda.is_available:
        inputs = inputs.cuda()
    inputs = Variable(inputs)
    return net(inputs)


def _get_loss(predicts, labels):
    labels = torch.from_numpy(labels).long()
    # convert to cuda tensors if cuda flag is true
    if torch.cuda.is_available:
        labels = labels.cuda()
    labels = Variable(labels)
    return F.cross_entropy(predicts, labels)

似乎参数只是在每个时期稍作更新,在整个过程中精度仍然保持不变。在Tensorflow中使用相同的实现和相同的参数时,它可以正常运行。

It seems that parameters 're just updated slightly each epoch, the accuracy remains for all the process. While with the same implementation and the same params in Tensorflow, it runs correctly.

我是Pytorch的新手,所以也许我的说明有问题,请帮助我找出。谢谢!

I'm new to Pytorch, so maybe my instructions has something wrong, please help me to find out. Thank you!

Ps:我尝试使用 F.nll_loss + F.log_softmax 而不是 F.cross_entropy 。从理论上讲,它应该返回相同的值,但实际上会打印出另一个结果(但仍然是错误的损失值)

P.s: I try to use F.nll_loss + F.log_softmax instead of F.cross_entropy. Theoretically, it should return the same, but in fact another result is printed out (but it still be a wrong loss value)

推荐答案

我意识到Adam Optimizer中的L2_loss保持 loss 的值不变(我还没有在其他Optimizer中尝试过)。当我删除L2_loss时它会起作用:

I realised that L2_loss in Adam Optimizer make loss value remain unchanged (I haven't tried in other Optimizer yet). It works when I remove L2_loss:

# optimizer = optim.Adam(net.parameters(), lr=0.01, weight_decay=0.1)
optimizer = optim.Adam(model.parameters(), lr=0.001)

===更新(有关更多详细信息,请参见上面的答案!)===

=== UPDATE (See above answer for more detail!) ===

self.features = nn.Sequential(self.flat_layer)
self.classifier = nn.Linear(out_channels * len(filter_sizes), num_classes)

...

optimizer = optim.Adam([
    {'params': model.features.parameters()},
    {'params': model.classifier.parameters(), 'weight_decay': 0.1}
], lr=0.001)

这篇关于火炬损失值不变的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆