如何以干净有效的方式在 pytorch 中获得小批量? [英] How to get mini-batches in pytorch in a clean and efficient way?

查看:22
本文介绍了如何以干净有效的方式在 pytorch 中获得小批量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图做一个简单的事情,即使用 Torch 使用随机梯度下降 (SGD) 训练线性模型:

将 numpy 导入为 np进口火炬从 torch.autograd 导入变量导入 pdbdef get_batch2(X,Y,M,dtype):X,Y = X.data.numpy(), Y.data.numpy()N = len(Y)valid_indices = np.array( range(N) )batch_indices = np.random.choice(valid_indices,size=M,replace=False)batch_xs = torch.FloatTensor(X[batch_indices,:]).type(dtype)batch_ys = torch.FloatTensor(Y[batch_indices]).type(dtype)返回变量(batch_xs,requires_grad=False),变量(batch_ys,requires_grad=False)def poly_kernel_matrix( x,D ):N = len(x)Kern = np.zeros( (N,D+1) )对于范围内的 n(N):对于范围内的 d(D+1):核[n,d] = x[n]**d;返回克恩##数据参数N=5 # 数据集大小degree=4 # 数量维度/特征D_sgd = 度数+1##x_true = np.linspace(0,1,N) # 真实数据点y = np.sin(2*np.pi*x_true)y.shape = (N,1)## 火炬dtype = torch.FloatTensor# dtype = torch.cuda.FloatTensor # 取消注释以在 GPU 上运行X_mdl = poly_kernel_matrix( x_true,Degree )X_mdl = 变量(torch.FloatTensor(X_mdl).type(dtype), requires_grad=False)y = 变量(torch.FloatTensor(y).type(dtype), requires_grad=False)##新加坡元mdlw_init = torch.zeros(D_sgd,1).type(dtype)W = 变量(w_init,requires_grad=True)M = 5 # 小批量大小eta = 0.1 # 步长对于我在范围内(500):batch_xs, batch_ys = get_batch2(X_mdl,y,M,dtype)# 前向传递:使用对变量的操作计算预测的 yy_pred = batch_xs.mm(W)# 使用对变量的操作来计算和打印损失.现在 loss 是一个形状为 (1,) 的变量, loss.data 是一个形状为 (1,) 的张量;loss.data[0] 是保持损失的标量值.损失 = (1/N)*(y_pred - batch_ys).pow(2).sum()# 使用 autograd 计算反向传播.现在 w 将有梯度损失.向后()# 使用梯度下降更新权重;w1.data 是张量,# w.grad 是变量,w.grad.data 是张量.W.data -= eta * W.grad.data# 更新权重后手动将梯度归零W.grad.data.zero_()#c_sgd = W.data.numpy()X_mdl = X_mdl.data.numpy()y = y.data.numpy()#Xc_pinv = np.dot(X_mdl,c_sgd)打印('J(c_sgd) = ', (1/N)*(np.linalg.norm(y-Xc_pinv)**2))打印('损失=',损失.数据[0])

代码运行良好,尽管我的 get_batch2 方法看起来很愚蠢/幼稚,这可能是因为我是 pytorch 的新手,但我还没有找到他们讨论如何检索数据批次的好地方.我浏览了他们的教程(http://pytorch.org/tutorials/beginner/pytorch_with_examples.html) 并通过数据集 (http://pytorch.org/tutorials/beginner/data_loading_tutorial.html) 没有运气.这些教程似乎都假设一开始就已经有了批次和批次大小,然后继续使用该数据进行训练而不更改它(具体请参见 http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-variables-and-autograd)..>

所以我的问题是我真的需要将我的数据转回 numpy 以便我可以获取它的一些随机样本,然后将它转回带有变量的 pytorch 以便能够在内存中进行训练吗?有没有办法用火炬获得小批量?

我查看了 Torch 提供的一些功能,但没有运气:

#pdb.set_trace()#valid_indices = torch.arange(0,N).numpy()#valid_indices = np.array( range(N) )#batch_indices = np.random.choice(valid_indices,size=M,replace=False)#indices = torch.LongTensor(batch_indices)#batch_xs,batch_ys = torch.index_select(X_mdl,0,指数),torch.index_select(y,0,指数)#batch_xs,batch_ys = torch.index_select(X_mdl, 0, 指数), torch.index_select(y, 0, 指数)

即使我提供的代码工作正常,我也担心它不是一个有效的实现,而且如果我使用 GPU 会进一步减慢速度(因为我猜它把东西放在内存中然后获取它们回去把它们放在 GPU 上是愚蠢的).

<小时>

我根据建议使用 torch.index_select() 的答案实现了一个新的:

def get_batch2(X,Y,M):'''获取 pytorch 模型的批次'''# TODO 修复并使其更好,有 pytorch 论坛问题#X,Y = X.data.numpy(), Y.data.numpy()X,Y = X, YN = X.size()[0]batch_indices = torch.LongTensor( np.random.randint(0,N+1,size=M) )pdb.set_trace()batch_xs = torch.index_select(X,0,batch_indices)batch_ys = torch.index_select(Y,0,batch_indices)返回变量(batch_xs,requires_grad=False),变量(batch_ys,requires_grad=False)

然而,这似乎有问题,因为如果 X,Y 不是变量,它就不起作用……这真的很奇怪.我将此添加到 pytorch 论坛:https://discuss.pytorch.org/t/how-to-get-mini-batches-in-pytorch-in-a-clean-and-efficient-way/10322

现在我正在努力使这项工作适用于 gpu.我的最新版本:

def get_batch2(X,Y,M,dtype):'''获取 pytorch 模型的批次'''# TODO 修复并使其更好,有 pytorch 论坛问题#X,Y = X.data.numpy(), Y.data.numpy()X,Y = X, YN = X.size()[0]如果 dtype == torch.cuda.FloatTensor:batch_indices = torch.cuda.LongTensor( np.random.randint(0,N,size=M) )#无替换别的:batch_indices = torch.LongTensor( np.random.randint(0,N,size=M) ).type(dtype) # 无替换pdb.set_trace()batch_xs = torch.index_select(X,0,batch_indices)batch_ys = torch.index_select(Y,0,batch_indices)返回变量(batch_xs,requires_grad=False),变量(batch_ys,requires_grad=False)

错误:

RuntimeError: 试图从一个 int 序列构造一个张量,但在索引 (0) 处发现了一个 numpy.int64 类型的项目

我不明白,我真的必须这样做吗:

ints = [ random.randint(0,N) for i i range(M)]

得到整数?

如果数据可以是变量,那也是理想的.torch.index_select 似乎不适用于 Variable 类型的数据.

这个整数列表仍然不起作用:

TypeError: torch.addmm 收到了无效的参数组合 - got (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor),但是预期之一:* (torch.cuda.FloatTensor source, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)*(torch.cuda.FloatTensor 源,torch.cuda.sparse.FloatTensor mat1,torch.cuda.FloatTensor mat2,*,torch.cuda.FloatTensor out)*(浮动测试版,torch.cuda.FloatTensor 源,torch.cuda.FloatTensor mat1,torch.cuda.FloatTensor mat2,*,torch.cuda.FloatTensor 输出)* (torch.cuda.FloatTensor source, float alpha, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)*(浮动测试版,torch.cuda.FloatTensor 源,torch.cuda.sparse.FloatTensor mat1,torch.cuda.FloatTensor mat2,*,torch.cuda.FloatTensor 输出)* (torch.cuda.FloatTensor source, float alpha, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)* (float beta, torch.cuda.FloatTensor source, float alpha, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)不匹配,因为某些参数的类型无效:(int,torch.cuda.FloatTensor,int,torch.cuda.FloatTensor,torch.FloatTensor,out=torch.cuda.FloatTensor)* (float beta, torch.cuda.FloatTensor source, float alpha, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)不匹配,因为某些参数的类型无效:(int,torch.cuda.FloatTensor,int,torch.cuda.FloatTensor,torch.FloatTensor,out=torch.cuda.FloatTensor)

如果我正确理解您的代码,您的 get_batch2 函数似乎正在从您的数据集中获取随机小批量,而没有跟踪哪个你在一个时代已经使用过的索引.此实现的问题在于它可能不会使用您的所有数据.

我通常进行批处理的方式是使用 torch.randperm(N) 创建所有可能顶点的随机排列,并分批循环遍历它们.例如:

n_epochs = 100 # 或其他batch_size = 128 # 或其他对于范围内的纪元(n_epochs):# X 是一个火炬变量permutation = torch.randperm(X.size()[0])对于范围内的 i (0,X.size()[0], batch_size):optimizer.zero_grad()指数 = 排列[i:i+batch_size]batch_x, batch_y = X[指数], Y[指数]# 如果你想要一个半完整的例子输出 = model.forward(batch_x)损失 = 损失函数(输出,batch_y)损失.向后()优化器.step()

如果您喜欢复制和粘贴,请确保在 epoch 循环开始之前的某处定义优化器、模型和损失函数.

关于您的错误,请尝试使用 torch.from_numpy(np.random.randint(0,N,size=M)).long() 而不是 torch.LongTensor(np.random.randint(0,N,size=M)).我不确定这是否会解决您遇到的错误,但它会解决未来的错误.

I was trying to do a simple thing which was train a linear model with Stochastic Gradient Descent (SGD) using torch:

import numpy as np

import torch
from torch.autograd import Variable

import pdb

def get_batch2(X,Y,M,dtype):
    X,Y = X.data.numpy(), Y.data.numpy()
    N = len(Y)
    valid_indices = np.array( range(N) )
    batch_indices = np.random.choice(valid_indices,size=M,replace=False)
    batch_xs = torch.FloatTensor(X[batch_indices,:]).type(dtype)
    batch_ys = torch.FloatTensor(Y[batch_indices]).type(dtype)
    return Variable(batch_xs, requires_grad=False), Variable(batch_ys, requires_grad=False)

def poly_kernel_matrix( x,D ):
    N = len(x)
    Kern = np.zeros( (N,D+1) )
    for n in range(N):
        for d in range(D+1):
            Kern[n,d] = x[n]**d;
    return Kern

## data params
N=5 # data set size
Degree=4 # number dimensions/features
D_sgd = Degree+1
##
x_true = np.linspace(0,1,N) # the real data points
y = np.sin(2*np.pi*x_true)
y.shape = (N,1)
## TORCH
dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU
X_mdl = poly_kernel_matrix( x_true,Degree )
X_mdl = Variable(torch.FloatTensor(X_mdl).type(dtype), requires_grad=False)
y = Variable(torch.FloatTensor(y).type(dtype), requires_grad=False)
## SGD mdl
w_init = torch.zeros(D_sgd,1).type(dtype)
W = Variable(w_init, requires_grad=True)
M = 5 # mini-batch size
eta = 0.1 # step size
for i in range(500):
    batch_xs, batch_ys = get_batch2(X_mdl,y,M,dtype)
    # Forward pass: compute predicted y using operations on Variables
    y_pred = batch_xs.mm(W)
    # Compute and print loss using operations on Variables. Now loss is a Variable of shape (1,) and loss.data is a Tensor of shape (1,); loss.data[0] is a scalar value holding the loss.
    loss = (1/N)*(y_pred - batch_ys).pow(2).sum()
    # Use autograd to compute the backward pass. Now w will have gradients
    loss.backward()
    # Update weights using gradient descent; w1.data are Tensors,
    # w.grad are Variables and w.grad.data are Tensors.
    W.data -= eta * W.grad.data
    # Manually zero the gradients after updating weights
    W.grad.data.zero_()

#
c_sgd = W.data.numpy()
X_mdl = X_mdl.data.numpy()
y = y.data.numpy()
#
Xc_pinv = np.dot(X_mdl,c_sgd)
print('J(c_sgd) = ', (1/N)*(np.linalg.norm(y-Xc_pinv)**2) )
print('loss = ',loss.data[0])

the code runs fine and all though my get_batch2 method seems really dum/naive, its probably because I am new to pytorch but I have not found a good place where they discuss how to retrieve data batches. I went through their tutorials (http://pytorch.org/tutorials/beginner/pytorch_with_examples.html) and through the data set (http://pytorch.org/tutorials/beginner/data_loading_tutorial.html) with no luck. The tutorials all seem to assume that one already has the batch and batch-size at the beginning and then proceeds to train with that data without changing it (specifically look at http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-variables-and-autograd).

So my question is do I really need to turn my data back into numpy so that I can fetch some random sample of it and then turn it back to pytorch with Variable to be able to train in memory? Is there no way to get mini-batches with torch?

I looked at a few functions torch provides but with no luck:

#pdb.set_trace()
#valid_indices = torch.arange(0,N).numpy()
#valid_indices = np.array( range(N) )
#batch_indices = np.random.choice(valid_indices,size=M,replace=False)
#indices = torch.LongTensor(batch_indices)
#batch_xs, batch_ys = torch.index_select(X_mdl, 0, indices), torch.index_select(y, 0, indices)
#batch_xs,batch_ys = torch.index_select(X_mdl, 0, indices), torch.index_select(y, 0, indices)

even though the code I provided works fine I am worried that its not an efficient implementation AND that if I were to use GPUs that there would be a considerable further slow down (because my guess it putting things in memory and then fetching them back to put them GPU like that is silly).


I implemented a new one based on the answer that suggested to use torch.index_select():

def get_batch2(X,Y,M):
    '''
    get batch for pytorch model
    '''
    # TODO fix and make it nicer, there is pytorch forum question
    #X,Y = X.data.numpy(), Y.data.numpy()
    X,Y = X, Y
    N = X.size()[0]
    batch_indices = torch.LongTensor( np.random.randint(0,N+1,size=M) )
    pdb.set_trace()
    batch_xs = torch.index_select(X,0,batch_indices)
    batch_ys = torch.index_select(Y,0,batch_indices)
    return Variable(batch_xs, requires_grad=False), Variable(batch_ys, requires_grad=False)

however, this seems to have issues because it does not work if X,Y are NOT variables...which is really odd. I added this to the pytorch forum: https://discuss.pytorch.org/t/how-to-get-mini-batches-in-pytorch-in-a-clean-and-efficient-way/10322

Right now what I am struggling with is making this work for gpu. My most current version:

def get_batch2(X,Y,M,dtype):
    '''
    get batch for pytorch model
    '''
    # TODO fix and make it nicer, there is pytorch forum question
    #X,Y = X.data.numpy(), Y.data.numpy()
    X,Y = X, Y
    N = X.size()[0]
    if dtype ==  torch.cuda.FloatTensor:
        batch_indices = torch.cuda.LongTensor( np.random.randint(0,N,size=M) )# without replacement
    else:
        batch_indices = torch.LongTensor( np.random.randint(0,N,size=M) ).type(dtype)  # without replacement
    pdb.set_trace()
    batch_xs = torch.index_select(X,0,batch_indices)
    batch_ys = torch.index_select(Y,0,batch_indices)
    return Variable(batch_xs, requires_grad=False), Variable(batch_ys, requires_grad=False)

the error:

RuntimeError: tried to construct a tensor from a int sequence, but found an item of type numpy.int64 at index (0)

I don't get it, do I really have to do:

ints = [ random.randint(0,N) for i i range(M)]

to get the integers?

It would also be ideal if the data could be a variable. It seems that it torch.index_select does not work for Variable type data.

this list of integers thing still doesn't work:

TypeError: torch.addmm received an invalid combination of arguments - got (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor), but expected one of:
 * (torch.cuda.FloatTensor source, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
 * (torch.cuda.FloatTensor source, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
 * (float beta, torch.cuda.FloatTensor source, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
 * (torch.cuda.FloatTensor source, float alpha, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
 * (float beta, torch.cuda.FloatTensor source, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
 * (torch.cuda.FloatTensor source, float alpha, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
 * (float beta, torch.cuda.FloatTensor source, float alpha, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
      didn't match because some of the arguments have invalid types: (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor)
 * (float beta, torch.cuda.FloatTensor source, float alpha, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
      didn't match because some of the arguments have invalid types: (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor)

解决方案

If I'm understanding your code correctly, your get_batch2 function appears to be taking random mini-batches from your dataset without tracking which indices you've used already in an epoch. The issue with this implementation is that it likely will not make use of all of your data.

The way I usually do batching is creating a random permutation of all the possible vertices using torch.randperm(N) and loop through them in batches. For example:

n_epochs = 100 # or whatever
batch_size = 128 # or whatever

for epoch in range(n_epochs):

    # X is a torch Variable
    permutation = torch.randperm(X.size()[0])

    for i in range(0,X.size()[0], batch_size):
        optimizer.zero_grad()

        indices = permutation[i:i+batch_size]
        batch_x, batch_y = X[indices], Y[indices]

        # in case you wanted a semi-full example
        outputs = model.forward(batch_x)
        loss = lossfunction(outputs,batch_y)

        loss.backward()
        optimizer.step()

If you like to copy and paste, make sure you define your optimizer, model, and lossfunction somewhere before the start of the epoch loop.

With regards to your error, try using torch.from_numpy(np.random.randint(0,N,size=M)).long() instead of torch.LongTensor(np.random.randint(0,N,size=M)). I'm not sure if this will solve the error you are getting, but it will solve a future error.

这篇关于如何以干净有效的方式在 pytorch 中获得小批量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆