从 TF.record 数组的行中选择随机值,限制值是多少? [英] Select random value from row in a TF.record array, with limits on what the value can be?

查看:20
本文介绍了从 TF.record 数组的行中选择随机值,限制值是多少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个 Tf.record 文件,并且 tf.records 中的每一行都包含 0 或正数的整数,然后用 -1 填充,以便所有行的大小均匀.所以像

Say that I have a Tf.record file, and each row in the tf.records contain ints that are 0 or positive, and then padded with -1 so that all the rows are even size. So something like

0 3 43 223 23 -1 -1 -1
4 12 3  11  435 2 4 -1
9 3 11 32  34 322 9 7
. 
. 
. 

如何从每一行中随机选择 3 个数字?

How do I randomly select 3 numbers from each of the rows ?

这些数字将像索引一样在嵌入矩阵中查找值,然后这些嵌入将被平均(基本上是 word2vec CBOW 模型).

The numbers will act like indexes to look up values in an embedding matrix, and then those embeddings will be averaged (basically word2vec CBOW model).

更具体地说,我如何避免选择-1"的填充值.-1 正是我用来填充行的内容,以便每一行的大小都相同,以便使用 tf.record.(如果有办法在 tfrecords 中使用不同长度的行,请告诉我).

More specifically, how do I avoid selecting the padding values of '-1'. -1 is just what I used to pad my rows so that each row will be the same size in order to use tf.record.(If there is a way to use varying length rows in tfrecords, let me know).

推荐答案

我认为您正在寻找类似 tf.VarLenFeature(),更具体地说,您不必在创建 tfrecord 文件之前填充行.您可以创建 tf_example,

I think you're looking for something like tf.VarLenFeature(), more specifically, you do not necessarily have to pad your rows prior to creating the tfrecord file. You can create the tf_example,

from tensorflow.train import BytesList, Feature, Features, Example, Int64List

tf_example = Example(
    features=Features(
        feature={
            "my_feature": Feature(
                int64_list=Int64List(value=[0,3,43,223,23])
            )
        })
    )
)

with TFRecordWriter(tfrecord_file_path) as tf_writer:
    tf_writer.write(tf_example.SerializeToString())

对所有长度不同的行执行此操作.

Do this for all of your rows, that can vary in length.

您将使用类似的内容解析 tf_examples

You'll parse the tf_examples with something like,

def parse_tf_example(example):
    feature_spec = {
        "my_feature": tf.VarLenFeature(dtype=tf.int64)
    }
    return tf.parse_example([example], features=feature_spec)

现在,这会将您的特征返回为 tf.SparseTensors,如果你不想在这个阶段处理这个问题,并像往常一样继续使用张量操作,你可以简单地使用 tf.sparse_tensor_to_dense() 并像往常一样处理张量.

Now, this will return your features as tf.SparseTensors, if you don't want to deal with that at this stage, and carry on using tensor ops as you would normally, you can simply use tf.sparse_tensor_to_dense() and carry on as you normally would with tensors.

返回的密集张量将具有不同的长度,因此您不必担心选择-1",不会有任何.除非你批量将稀疏张量转换为密集,在这种情况下,批次将被填充到批次中最长张量的长度,并且填充值可以通过default_value设置 参数.

The returned dense tensors will be of varying lengths, so you shouldn't have to worry about selecting '-1's, there won't be any. Unless you convert the sparse tensors to dense in batches, in that case the batches will be padded to the length of the longest tensor in the batch, and the padding value can be set by the default_value parameter.

就您关于在 tfrecords 中使用变长行并取回变长张量的问题而言.

That is in so far as your question about using varying length rows in tfrecords and getting back varying length tensors.

关于lookup op,我自己没用过,但我觉得tf.nn.embedding_lookup_sparse() 可能对你有所帮助,它提供了从稀疏张量中查找嵌入的能力,无需先将其转换为密集张量,并且还有一个 combiner 参数来指定对这些嵌入的缩减操作,在您的情况下为平均".

With regards to the lookup op, I haven't used it myself, but I think tf.nn.embedding_lookup_sparse() might help you out here, it offers the ability to lookup the embeddings from the sparse tensor, forgoing the need to convert it to a dense tensor first, and also has a combiner parameter to specify a reduction op on those embeddings, which in your case would be 'mean'.

我希望这在某种程度上有所帮助,祝你好运.

I hope this helps in some way, good luck.

这篇关于从 TF.record 数组的行中选择随机值,限制值是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆