对于可变长度特征,使用 tf.train.SequenceExample 比 tf.train.Example 有什么优势? [英] What are the advantages of using tf.train.SequenceExample over tf.train.Example for variable length features?
问题描述
最近我阅读了这个 TensorFlow 中未记录特征的指南,因为我需要将可变长度序列作为输入传递.但是,我发现 tf.train.SequenceExample
的协议相对混乱(尤其是由于缺乏文档),并设法使用 tf.train.Example
构建了一个输入管道就好了.
Recently I read this guide on undocumented featuers in TensorFlow, as I needed to pass variable length sequences as input. However, I found the protocol for tf.train.SequenceExample
relatively confusing (especially due to lack of documentation), and managed to build an input pipe using tf.train.Example
just fine instead.
使用 tf.train.SequenceExample
有什么好处吗?当有一个专门用于可变长度序列的标准示例协议时使用标准示例协议似乎是一种欺骗,但它会产生任何后果吗?
Are there any advantages to using tf.train.SequenceExample
? Using the standard example protocol when there is a dedicated one for variable length sequences seems like a cheat, but does it bear any consequence?
推荐答案
这里是 Example
和 SequenceExample
协议缓冲区的定义,以及它们可能包含的所有原型:
Here are the definitions of the Example
and SequenceExample
protocol buffers, and all the protos they may contain:
message BytesList { repeated bytes value = 1; }
message FloatList { repeated float value = 1 [packed = true]; }
message Int64List { repeated int64 value = 1 [packed = true]; }
message Feature {
oneof kind {
BytesList bytes_list = 1;
FloatList float_list = 2;
Int64List int64_list = 3;
}
};
message Features { map<string, Feature> feature = 1; };
message Example { Features features = 1; };
message FeatureList { repeated Feature feature = 1; };
message FeatureLists { map<string, FeatureList> feature_list = 1; };
message SequenceExample {
Features context = 1;
FeatureLists feature_lists = 2;
};
Example
包含一个 Features
,它包含从特性名称到 Feature
的映射,其中包含一个 bytes
code> 列表,或 float
列表或 int64
列表.
An Example
contains a Features
, which contains a mapping from feature name to Feature
, which contains either a bytes
list, or a float
list or an int64
list.
A SequenceExample
还包含一个 Features
,但它也包含一个 FeatureLists
,其中包含从列表名称到 FeatureList 的映射
,其中包含一个 Feature
列表.所以它可以做 Example
可以做的一切,甚至更多.但是你真的需要那些额外的功能吗?它有什么作用?
A SequenceExample
also contains a Features
, but it also contains a FeatureLists
, which contains a mapping from list name to FeatureList
, which contains a list of Feature
. So it can do everything an Example
can do, and more. But do you really need that extra functionality? What does it do?
因为每个 Feature
都包含一个值列表,所以 FeatureList
是一个列表列表.这就是关键:如果您需要值列表的列表,那么您需要 SequenceExample
.
Since each Feature
contains a list of values, a FeatureList
is a list of lists. And that's the key: if you need lists of lists of values, then you need SequenceExample
.
例如,如果您处理文本,则可以将其表示为一个大字符串:
For example, if you handle text, you can represent it as one big string:
from tensorflow.train import BytesList
BytesList(value=[b"This is the first sentence. And here's another."])
或者您可以将其表示为单词和标记列表:
Or you could represent it as a list of words and tokens:
BytesList(value=[b"This", b"is", b"the", b"first", b"sentence", b".", b"And", b"here",
b"'s", b"another", b"."])
或者您可以分别表示每个句子.这就是您需要列表列表的地方:
Or you could represent each sentence separately. That's where you would need a list of lists:
from tensorflow.train import BytesList, Feature, FeatureList
s1 = BytesList(value=[b"This", b"is", b"the", b"first", b"sentence", b"."])
s2 = BytesList(value=[b"And", b"here", b"'s", b"another", b"."])
fl = FeatureList(feature=[Feature(bytes_list=s1), Feature(bytes_list=s2)])
然后创建SequenceExample
:
from tensorflow.train import SequenceExample, FeatureLists
seq = SequenceExample(feature_lists=FeatureLists(feature_list={
"sentences": fl
}))
你可以序列化它,也许可以将它保存到一个 TFRecord 文件中.
And you can serialize it and perhaps save it to a TFRecord file.
data = seq.SerializeToString()
稍后读取数据时,可以使用tf.io.parse_single_sequence_example()
进行解析.
Later, when you read the data, you can parse it using tf.io.parse_single_sequence_example()
.
这篇关于对于可变长度特征,使用 tf.train.SequenceExample 比 tf.train.Example 有什么优势?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!