对于可变长度特征,使用 tf.train.SequenceExample 比 tf.train.Example 有什么优势? [英] What are the advantages of using tf.train.SequenceExample over tf.train.Example for variable length features?

查看:52
本文介绍了对于可变长度特征,使用 tf.train.SequenceExample 比 tf.train.Example 有什么优势?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近我阅读了这个 TensorFlow 中未记录特征的指南,因为我需要将可变长度序列作为输入传递.但是,我发现 tf.train.SequenceExample 的协议相对混乱(尤其是由于缺乏文档),并设法使用 tf.train.Example 构建了一个输入管道就好了.

Recently I read this guide on undocumented featuers in TensorFlow, as I needed to pass variable length sequences as input. However, I found the protocol for tf.train.SequenceExample relatively confusing (especially due to lack of documentation), and managed to build an input pipe using tf.train.Example just fine instead.

使用 tf.train.SequenceExample 有什么好处吗?当有一个专门用于可变长度序列的标准示例协议时使用标准示例协议似乎是一种欺骗,但它会产生任何后果吗?

Are there any advantages to using tf.train.SequenceExample? Using the standard example protocol when there is a dedicated one for variable length sequences seems like a cheat, but does it bear any consequence?

推荐答案

这里是 ExampleSequenceExample 协议缓冲区的定义,以及它们可能包含的所有原型:

Here are the definitions of the Example and SequenceExample protocol buffers, and all the protos they may contain:

message BytesList { repeated bytes value = 1; }
message FloatList { repeated float value = 1 [packed = true]; }
message Int64List { repeated int64 value = 1 [packed = true]; }
message Feature {
    oneof kind {
        BytesList bytes_list = 1;
        FloatList float_list = 2;
        Int64List int64_list = 3;
    }
};
message Features { map<string, Feature> feature = 1; };
message Example { Features features = 1; };

message FeatureList { repeated Feature feature = 1; };
message FeatureLists { map<string, FeatureList> feature_list = 1; };
message SequenceExample {
  Features context = 1;
  FeatureLists feature_lists = 2;
};

Example 包含一个 Features,它包含从特性名称到 Feature 的映射,其中包含一个 bytescode> 列表,或 float 列表或 int64 列表.

An Example contains a Features, which contains a mapping from feature name to Feature, which contains either a bytes list, or a float list or an int64 list.

A SequenceExample 还包含一个 Features,但它也包含一个 FeatureLists,其中包含从列表名称到 FeatureList 的映射,其中包含一个 Feature 列表.所以它可以做 Example 可以做的一切,甚至更多.但是你真的需要那些额外的功能吗?它有什么作用?

A SequenceExample also contains a Features, but it also contains a FeatureLists, which contains a mapping from list name to FeatureList, which contains a list of Feature. So it can do everything an Example can do, and more. But do you really need that extra functionality? What does it do?

因为每个 Feature 都包含一个值列表,所以 FeatureList 是一个列表列表.这就是关键:如果您需要值列表的列表,那么您需要 SequenceExample.

Since each Feature contains a list of values, a FeatureList is a list of lists. And that's the key: if you need lists of lists of values, then you need SequenceExample.

例如,如果您处理文本,则可以将其表示为一个大字符串:

For example, if you handle text, you can represent it as one big string:

from tensorflow.train import BytesList

BytesList(value=[b"This is the first sentence. And here's another."])

或者您可以将其表示为单词和标记列表:

Or you could represent it as a list of words and tokens:

BytesList(value=[b"This", b"is", b"the", b"first", b"sentence", b".", b"And", b"here",
                 b"'s", b"another", b"."])

或者您可以分别表示每个句子.这就是您需要列表列表的地方:

Or you could represent each sentence separately. That's where you would need a list of lists:

from tensorflow.train import BytesList, Feature, FeatureList

s1 = BytesList(value=[b"This", b"is", b"the", b"first", b"sentence", b"."])
s2 = BytesList(value=[b"And", b"here", b"'s", b"another", b"."])
fl = FeatureList(feature=[Feature(bytes_list=s1), Feature(bytes_list=s2)])

然后创建SequenceExample:

from tensorflow.train import SequenceExample, FeatureLists

seq = SequenceExample(feature_lists=FeatureLists(feature_list={
    "sentences": fl
}))

你可以序列化它,也许可以将它保存到一个 TFRecord 文件中.

And you can serialize it and perhaps save it to a TFRecord file.

data = seq.SerializeToString()

稍后读取数据时,可以使用tf.io.parse_single_sequence_example()进行解析.

Later, when you read the data, you can parse it using tf.io.parse_single_sequence_example().

这篇关于对于可变长度特征,使用 tf.train.SequenceExample 比 tf.train.Example 有什么优势?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆