如何在pytorch中进行并行处理 [英] How to do parallel processing in pytorch

查看:337
本文介绍了如何在pytorch中进行并行处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究深度学习问题.我正在使用pytorch解决它.我有两个GPU在同一台计算机上(16273MiB,12193MiB).我想将两个GPU都用于训练(视频数据集).

I am working on a deep learning problem. I am solving it using pytorch. I have two GPU's which are on the same machine (16273MiB,12193MiB). I want to use both the GPU's for my training (video dataset).

我收到警告:

GPU之间存在不平衡.您可能要排除GPU 1,具有少于GPU 0的75%的内存或内核.您可以通过设置将device_ids参数设置为DataParallel,或者通过设置CUDA_VISIBLE_DEVICES环境变量.warnings.warn(imbalance_warn.format(device_ids [min_pos],device_ids [max_pos]))

There is an imbalance between your GPUs. You may want to exclude GPU 1 which has less than 75% of the memory or cores of GPU 0. You can do so by setting the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES environment variable. warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))

我也收到错误消息:

raise TypeError('未为CPU张量实现广播功能')TypeError:未为CPU张量实现广播功能

raise TypeError('Broadcast function not implemented for CPU tensors') TypeError: Broadcast function not implemented for CPU tensors

if __name__ == '__main__':

    opt.scales = [opt.initial_scale]
    for i in range(1, opt.n_scales):
        opt.scales.append(opt.scales[-1] * opt.scale_step)
    opt.arch = '{}-{}'.format(opt.model, opt.model_depth)
    opt.mean = get_mean(opt.norm_value)
    opt.std = get_std(opt.norm_value)
    print("opt",opt)
    with open(os.path.join(opt.result_path, 'opts.json'), 'w') as opt_file:
        json.dump(vars(opt), opt_file)

    torch.manual_seed(opt.manual_seed)

    model, parameters = generate_model(opt)
    #print(model)

    pytorch_total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print("Total number of trainable parameters: ", pytorch_total_params)

    # Define Class weights
    if opt.weighted:
        print("Weighted Loss is created")
        if opt.n_finetune_classes == 2:
            weight = torch.tensor([1.0, 3.0])
        else:
            weight = torch.ones(opt.n_finetune_classes)
    else:
        weight = None

    criterion = nn.CrossEntropyLoss()
    if not opt.no_cuda:



        criterion = nn.DataParallel(criterion.cuda())




    if opt.no_mean_norm and not opt.std_norm:
        norm_method = Normalize([0, 0, 0], [1, 1, 1])
    elif not opt.std_norm:
        norm_method = Normalize(opt.mean, [1, 1, 1])
    else:
        norm_method = Normalize(opt.mean, opt.std)

        train_loader = torch.utils.data.DataLoader(
            training_data,
            batch_size=opt.batch_size,
            shuffle=True,
            num_workers=opt.n_threads,
            pin_memory=True)
        train_logger = Logger(
            os.path.join(opt.result_path, 'train.log'),
            ['epoch', 'loss', 'acc', 'precision','recall','lr'])
        train_batch_logger = Logger(
            os.path.join(opt.result_path, 'train_batch.log'),
            ['epoch', 'batch', 'iter', 'loss', 'acc', 'precision', 'recall', 'lr'])

        if opt.nesterov:
            dampening = 0
        else:
            dampening = opt.dampening
        optimizer = optim.SGD(
            parameters,
            lr=opt.learning_rate,
            momentum=opt.momentum,
            dampening=dampening,
            weight_decay=opt.weight_decay,
            nesterov=opt.nesterov)
        # scheduler = lr_scheduler.ReduceLROnPlateau(
        #     optimizer, 'min', patience=opt.lr_patience)
    if not opt.no_val:
        spatial_transform = Compose([
            Scale(opt.sample_size),
            CenterCrop(opt.sample_size),
            ToTensor(opt.norm_value), norm_method
        ])




    print('run')
    for i in range(opt.begin_epoch, opt.n_epochs + 1):
        if not opt.no_train:
            adjust_learning_rate(optimizer, i, opt.lr_steps)
            train_epoch(i, train_loader, model, criterion, optimizer, opt,
                        train_logger, train_batch_logger)


我还对火车文件进行了更改:

I have also made changes in my train file:

      model = nn.DataParallel(model(),device_ids=[0,1]).cuda() 
        outputs = model(inputs)

它似乎无法正常工作,并且显示错误.请指教,我是pytorch的新手.

It does not seem to work properly and is giving error. Please advice, I am new to pytorch.

谢谢

推荐答案

如此链接所述,您必须先将model.cuda()传递给nn.DataParallel.

As mentioned in this link, you have to do model.cuda() before passing it to nn.DataParallel.

net = nn.DataParallel(model.cuda(), device_ids=[0,1])

https://github.com/pytorch/pytorch/issues/17065

这篇关于如何在pytorch中进行并行处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆