使用numpy数组为Caffe创建大型LMDB [英] Creating large LMDBs for Caffe with numpy arrays

查看：145 发布时间：2020/5/18 23:23:28 python numpy deep-learning caffe lmdb

本文介绍了使用numpy数组为Caffe创建大型LMDB的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个60 x 80921矩阵，一个矩阵填充数据，一个参考矩阵.
我想将值作为键/值对存储在两个不同的LMDB中，一个用于训练(例如，我将围绕60000列标记进行切片)，另一个用于测试.这是我的主意；它行得通吗?

I have two 60 x 80921 matrices, one filled with data, one with reference.
I would like to store the values as key/value pairs in two different LMDBs, one for training (say I'll slice around the 60000 column mark) and one for testing. Here is my idea; does it work?

X_train = X[:,:60000]
Y_train = Y[:,:60000]
X_test = X[:,60000:]
Y_test = Y[:,60000:]

X_train = X_train.astype(int)
X_test = X_test.astype(int)
Y_train = Y_train.astype(int)
Y_test = Y_test.astype(int)

map_size = X_train.nbytes * 10
env = lmdb.open('sensormatrix_train_lmdb', map_size=map_size)
with env.begin(write=True) as txn:  
    for i in range(60):
        for j in range(60000):
            datum = caffe.proto.caffe_pb2.Datum()
            datum.height = X_train.shape[0]
            datum.width = X_train.shape[1]
            datum.data = X_train[i,j].tobytes()
            datum.label= int(Y[i,j])
            str_id= '{:08}'.format(i)

我真的不确定代码.最后一行format(i)指的是什么?

I'm really not sure of the code. And what does the last line format(i) refer to?

推荐答案

目前尚不清楚100％您正在尝试做什么:您是将每个条目视为一个单独的数据样本，还是要在60K一维矢量上进行训练of dim = 60 ...

It's not 100% clear what you are trying to do: are you treating each entry as a separate data sample, or are you trying to train on 60K 1D vectors of dim=60...

假设您有60K暗淡60的训练样本，则可以这样编写训练lmdb:

Assuming you have 60K training samples of dim 60, you can write the training lmdbs like this:

env_x = lmdb.open('sensormatrix_train_x_lmdb', map_size=map_size) # you can put map_size a little bigger 
env_y = lmdb.open('sensormatrix_train_y_lmdb', map_size=map_size)
with env_x.begin(write=True) as txn_x, env_y.begin(write=True) as txn_y:
    for i in xrange(X_train.shape[1]):
        x = X_train[:,i]
        y = Y_train[:,i] 

        datum_x = caffe.io.array_to_datum(arr=x.reshape((60,1,1)),label=i)
        datum_y = caffe.io.array_to_datum(arr=y.reshape((60,1,1)),label=i)
        keystr = '{:0>10d}'.format(i) # format an lmdb key for this entry
        txn_x.put( keystr, datum_x.SerializeToString() ) # actual write to lmdb
        txn_y.put( keystr, datum_y.SerializeToString() )

现在您有两个用于训练的lmdb，在您的'prototxt'中，您应该有两个对应的"Data"层:

Now you have two lmdb for training, in your 'prototxt' you should have two corresponding "Data" layers:

layer {
  name: "input_x"
  top: "x"
  top: "idx_x"
  type: "Data"
  data_param { source: "sensormatrix_train_x_lmdb" batch_size: 32 }
  include { phase: TRAIN }
}
layer {
  name: "input_y"
  top: "y"
  top: "idx_y"
  type: "Data"
  data_param { source: "sensormatrix_train_y_lmdb" batch_size: 32 }
  include { phase: TRAIN }
}

要确保您阅读了相应的x y对，可以添加健全性检查

To make sure you read corresponding x y pairs, you can add a sanity check

layer {
  name: "sanity"
  type: "EuclideanLoss"
  bottom: "idx_x"
  bottom: "idx_y"
  top: "sanity"
  loss_weight: 0 
  propagate_down: false
  propagate_down: false
}

这篇关于使用numpy数组为Caffe创建大型LMDB的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用numpy数组为Caffe创建大型LMDB [英] Creating large LMDBs for Caffe with numpy arrays

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用numpy数组为Caffe创建大型LMDB [英] Creating large LMDBs for Caffe with numpy arrays

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭