Tensorflow Pad序列功能列 [英] Tensorflow pad sequence feature column

查看：136 发布时间：2020/10/19 22:55:21 python tensorflow machine-learning deep-learning tensorflow2.0

本文介绍了Tensorflow Pad序列功能列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何填充要素列中的序列以及 feature_column 中的维度。

我正在使用 Tensorflow 2.0 并实现一个文本摘要示例。

我碰到了 feature_column ，发现它们对我很有用

在不使用 feature_column 的经典场景中，我可以对文本进行预处理，将其标记化，将其转换为数字序列，然后将其填充为说100个字的 maxlen 。当使用 feature_column 时，我无法完成此操作。

以下是我写过的书。

  
 train_dataset = tf.data.experimental.make_csv_dataset（
' asset / train_dataset.csv'，label_name = LABEL，num_epochs = 1，shuffle = True，shuffle_buffer_size = 10000，batch_size = 1，ignore_errors = True）
 
词汇= ds.get_vocabulary（）
 
 def text_demo（feature_column）：
 feature_layer = tf.keras.experimental.SequenceFeatures（feature_column）
文章，_ = next（iter（train_dataset.take（1）））
 
 tokenizer = tf_text.WhitespaceTokenizer（）
 
 tokenized = tokenizer.tokenize（article ['Text']）
 
 sequence_input，sequence_length = feature_layer（{'Text' ：tokenized.to_tensor（）}）
 
 print（sequence_input）
 
 def categorical_column（feature_column）：
 density_column = tf.keras.layers.DenseFeatures（feature_column） 
 
文章，_ = next（iter（train_dataset.take（1）））
 
 lang_tokenizer = tf.keras.preprocessing.text.Tokenizer（
 filter =''）
 lang_tokenizer.fit_on_texts（文章）
 
张量= lang_tokenizer.texts_to_sequences（文章）
 
张量= tf.keras.preprocessing.sequence。 pad_sequences（张量，
 padding ='post'，maxlen = 50）
 
 print（dense_column（tensor）.numpy（））
 
 
 text_seq_vocab_list = tf.feature_column.sequence_categorical_column_with_vocabulary_list（key ='Text'，vocabularyarylist = list（vocabulary））
 text_embedding = tf.feature_column.embedding_column（text_seq_vocab_list，Dimensions = b $ b）（b）
 $ b numeric_voacb_list = tf.feature_column.categorical_column_with_vocabulary_list（key ='Text'，vocabularyarylist = list（vocabulary））
 embedding = tf.feature_column.embedding_column（numerical_voacb_list， ）

我也很困惑在这里使用什么， sequence_categorical_colum n_with_vocabulary_list 或 categorical_column_with_vocabulary_list 。在文档中，也没有解释 SequenceFeatures ，尽管我知道这是一个实验性功能。

我是也无法理解尺寸参数的作用？

解决方案

实际上，这

我也很困惑在这里使用什么，
sequence_categorical_column_with_vocabulary_list或
categorical_column_with_vocabulary_list。

应该是第一个问题，因为它会影响主题名称的解释。

也不清楚您对文本摘要的含义。

顺便说一句，这很重要，因为<$为不同的网络体系结构和方法提供了c $ c> tf.keras.layers.DenseFeatures 和 tf.keras.experimental.SequenceFeatures 。 / p>

作为 SequenceFeatures层说， SequenceFeatures 层的输出应该被馈送到序列网络中，例如RNN。

DenseFeatures产生密集的Tensor作为输出，因此适合其他类型的网络。

在代码段中执行标记化时，您将在模型中使用嵌入。
然后，您有两个选择：

将学习到的嵌入向前传递到密集层。这意味着您将不会分析单词顺序。

将学习到的嵌入传递到卷积，递归，AveragePooling，LSTM层中，因此也使用单词顺序来学习

第一个选项需要使用：

tf.keras.layers.DenseFeatures 和

tf.feature_column.categorical_column _ *（）

 
 和 tf.feature_column.embedding_column（）

 
 
 第二个选项需要使用：
 
 
  
   tf.keras.experimental.SequenceFeatures 和
 
   tf.feature_column.sequence_categorical_column _ *（） 
 
中的一个和 tf.feature_column.embedding_column（） 
 
 
 
 
 以下是示例。 
这两个选项的预处理和训练部分相同：
 导入张量流为tf 
 print（ tf .__ version__）
 
来自tensorflow导入feature_column 
 
来自tensorflow.keras.preprocessing.text导入Tokenizer 
来自tensorflow.keras.preprocessing.sequence导入pad_sequences 
从tensorflow.keras.preprocessing.text导入text_to_word_sequence 
从tensorflow.keras.utils导入ku 
从tensorflow.keras.utils导入plot_model 
 
以pd $ b导入熊猫$ b from sklearn.model_selection import train_test_split 
 
 DATA_PATH ='C：\SoloLearnMachineLearning\Stackoverflow\TextDataset.csv'
 
＃它只是两列csv，就像：
＃文本;标签
＃使用维基软件运行Wiki; 0 
＃否则称为Wiki引擎。; 1 
 
 dataframe = pd.read_csv （DATA_PATH，定界符=';'）
 dataframe.head（）
 
＃在feature_clolumn包含
之前进行预处理＃-获取vocab ulary 
＃-令牌化，这意味着仅拆分令牌。 
＃用词汇编码句子将由feature_column完成！ 
＃-填充
＃-截断
 
＃建立词汇表
 vocab_size = 100 
 oov_tok ='< OOV>'
 
句子= dataframe ['text']。to_list（）
 
 tokenizer = Tokenizer（num_words = vocab_size，oov_token =< OOV>）
 
 tokenizer.fit_on_texts （句子）
 word_index = tokenizer.word_index 
 
＃如果word_index短于vocab_size的默认值，我们将保存实际大小
 vocab_size = len（word_index）
打印（ vocab_size = word_index =，len（word_index））
 
＃在令牌上分割sendensec。这里令牌=单词
＃text_to_word_sequence（）对于
具有良好的默认过滤器＃字符包括基本标点，制表符和换行符
 dataframe ['text'] = dataframe ['text']。apply （text_to_word_sequence）
 
 dataframe.head（）
 
 max_length = 6 
 
＃paddind和trancent sets 
＃直接用字符串来完成不使用tokenizer.texts_to_sequences（）
＃feature_colunm会将字符串转换为数字
 dataframe ['text'] = dataframe ['text']。apply（lambda x，N = max_length：（x + N * ['']）[：N]）
 dataframe ['text'] = dataframe ['text']。apply（lambda x，N = max_length：x [：N]）
 dataframe。 head（）
 
＃定义从熊猫数据框创建tf.data数据集的方法
 def df_to_dataset（dataframe，label_column，shuffle = True，batch_size = 32）：
 dataframe = dataframe .copy（）
 #labels = dataframe.pop（label_column）
标签= dataframe [label_column] 
 
 ds = tf.data.Dataset.from_tensor_slices（（dict（dataframe） ，标签））
（如果随机播放）：
 ds = ds.shuffle（buffer_size = len（dataframe））
 ds = ds.batch（batch_size）
 return ds 
 
＃将数据框分为训练和验证集
 train_df，val_df = train_test_split（dataframe，test_size = 0.2）
 
 print（len（train_df），'train示例'）
 print（len（val_df），'验证示例'）
 
 batch_size = 32 
 ds = df_to_dataset（dataframe，'label'，shuffle = False，batch_size = batch_size）
 
 train_ds = df_to_dataset（train_df，'label'，shuffle = False，batch_size = batch_size）
 val_ds = df_to_dataset（val_df，'label'，shuffle = False，batch_size = batch_size）
 
＃和小批量演示
 example_batch = next（iter（ds））[0] 
 example_batch 
 
＃辅助方法为定义的feature_column $ b打印示例输出
 $ b def demo（feature_column）：
 feature_layer = tf.keras.layers.DenseFeatures（feature_column）
 print（feature_layer（example_batch）.numpy（））
 
 def seqdemo（feature_column）：
 sequence_feature_layer = tf.keras.experimental.SequenceFeatures（feature_column）
 print（sequence_feature_layer（example_batch））
  
这里是第一个选项，当我们不使用单词顺序学习时
 ＃为我们的文本功能
＃定义分类库，该文本功能已预处理成令牌列表
＃注意，键名应与数据框中的原始列名相同
 text_column = feature_column。 
 categorical_column_with_vocabulary_list（key ='text'，
 vocabulary_list = list（word_index））
 #indicator_column产生一次热编码。这些行仅与嵌入
 #print（demo（feature_column.indicator_column（payment_description_3）））进行比较。
 #print（payment_description_2，'\n'）
 
＃参数维数这正是代币
＃将在模型学习期间呈现的空间的尺寸
＃参见位于https://www.tensorflow.org/beta/tutorials/text/word_embeddings $ b的教程$ b text_embedding = feature_column.embedding_column（text_column，尺寸= 8）
 print（demo（text_embedding））
 
＃定义图层并对其进行自我建模
＃此示例使用Keras Functional API而不是Sequential，只是为了更通用一点
 
＃定义DenseFeatures层将feature_columns传递到Keras模型中
 feature_layer = tf.keras.layers.DenseFeatures（text_embedding）
 
＃定义每个功能列的输入。 
＃参见https://github.com/tensorflow/tensorflow/issues/27416#issuecomment-502218673 
 feature_layer_inputs = {} 
 
＃这里只有一列
＃定义形状为
＃的tf.keras.Input很重要，它与我们的单词序列
的lentgh相对应，feature_layer_inputs ['text'] = tf.keras.Input（shape =（max_length，）， 
 name ='text'，
 dtype = tf.string）
 print（feature_layer_inputs）
 
＃定义DenseFeatures层的输出
＃并准确使用它们作为模型的第一层
 feature_layer_outputs = feature_layer（feature_layer_inputs）
 print（feature_layer_outputs）
 
＃添加结果图层。 
＃参见https://keras.io/getting-started/functional-api-guide/ 
x = tf.keras.layers.Dense（256，activation ='relu'）（feature_layer_outputs）
x = tf.keras.layers.Dropout（0.2）（x）
 
＃此示例假定二进制分类，因为标签为0或1 
x = tf.keras.layers.Dense（ 1，激活= Sigmoid）（x）
 
 model = tf.keras.models.Model（inputs = [v for feature_layer_inputs.values（）中的v，
 output = x ）
 
 model.summary（）
 
＃此示例假设二进制分类，因为标签为0或1 
 model.compile（optimizer ='adam'，
 loss ='binary_crossentropy'，
metrics = ['accuracy'] 
＃run_eagerly = True 
）
 
＃注意，fit（）方法向上看
中的名称中train_ds和valdation_ds中的功能＃tf.keras.Input（shape =（max_length，），name ='text'
 
＃原因模型由于伪造而不会学到任何东西数据
 
 num_epochs = 5 
历史= model.fit（train_ds，
validation_data = val_ds，
 epochs = num_epochs，
 verbose = 1 
）
  
第二个选项是当我们注意单词顺序并学习其模型时。
 ＃为我们的文本功能
＃定义分类列，
＃已预处理成令牌列表
＃注意，键名应与数据框中的原始列名相同
 text_column = feature_column。 
 sequence_categorical_column_with_vocabulary_list（key ='text'，
 vocabulary_list = list（word_index））
 
＃争论的维度这里正是
＃的空间尺寸将在模型学习期间显示
＃参见位于https://www.tensorflow.org/beta/tutorials/text/word_embeddings的教程
 text_embedding = feature_column.embedding_column（text_column，Dimensions = 8）
 print（seqdemo（text_embedding））
 
＃定义图层并对其进行自我建模
＃此示例使用Keras Functional API而不是顺序
＃只是为了提高通用性
 
＃定义SequenceFeatures层，以将feature_columns传递到Keras模型中。
 sequence_feature_layer = tf.keras.experimental.SequenceFeatures（text_embedding）
 
＃定义每个要素列的输入。参见
＃см。 https://github.com/tensorflow/tensorflow/issues/27416#issuecomment-502218673 
 feature_layer_inputs = {} 
 sequence_feature_layer_inputs = {} 
 
＃这里我们只有一列
 
 sequence_feature_layer_inputs ['text'] = tf.keras.Input（shape =（max_length，），
 name ='text'，
 dtype = tf.string）
 print（sequence_feature_layer_inputs）
 
＃定义SequenceFeatures层的输出
＃并准确地将它们用作模型的第一层
 
＃注意，这里的SequenceFeatures层产生两个张量的元组作为输出。 
＃我们只需要首先通过下一个。 
 sequence_feature_layer_outputs，_ = sequence_feature_layer（sequence_feature_layer_inputs）
 print（sequence_feature_layer_outputs）
＃添加结果图层。参见https://keras.io/getting-started/functional-api-guide/ 
 
＃Conv1D和MaxPooling1D将从单词
x = tf.keras.layers.Conv1D（ 8,4）（sequence_feature_layer_outputs）
x = tf.keras.layers.MaxPooling1D（2）（x）
＃添加后果图层。参见https://keras.io/getting-started/functional-api-guide/ 
x = tf.keras.layers.Dense（256，activation ='relu'）（x）
x = tf。 keras.layers.Dropout（0.2）（x）
 
＃此示例假设二进制分类，因为标签为0或1 
x = tf.keras.layers.Dense（1，activation =' sigmoid'] [x）
 
 model = tf.keras.models.Model（inputs = [v for sequence_feature_layer_inputs.values（）中的v），
 output = x）
 model.summary（）
 
＃此示例假设二进制分类，因为标签为0或1 
 model.compile（optimizer ='adam'，
 loss ='binary_crossentropy'， 
metrics = ['accuracy'] 
＃run_eagerly = True 
）
 
＃注意，fit（）方法通过$中的名称查找train_ds和valdation_ds中的特征b $ b＃tf.keras.Input（shape =（max_length，），name ='text'
 
＃这种原因模型由于虚假数据而不会学到任何东西。 b num_epochs = 5 
历史= model.fit （train_ds，
validation_data = val_ds，
 epochs = num_epochs，
 verbose = 1 
）
  
请在我的github上找到完整的木星笔记本，例如：
 
 
  
  答案。 Tensorflow垫序列功能列。 DenseFeatures.ipynb  
 
  答案。 Tensorflow垫序列功能列。 SequenceFeatures.ipynb  
 
 
 
 
   feature_column.embedding_column（）恰好是模型学习期间将在其中显示令牌的空间的尺寸。请参阅 https://www.tensorflow.org/beta/tutorials/text/word_embeddings上的教程详细说明
 
 
 还请注意，使用 feature_column.embedding_column（）是 tf.keras.layers.Embedding（）。如您所见， feature_column 从预处理管道中进行编码，但是您还是应该手动对句子进行拆分，填充和翻译。
 
How to pad sequences in the feature column and also what is a dimension in the feature_column.

I am using Tensorflow 2.0 and implementing an example of text summarization. Pretty new to machine learning, deep learning, and TensorFlow.

I came across feature_column and found them useful as I think they can be embedded in the processing pipeline of the model.

In a classic scenario where not using feature_column, I can pre-process the text, tokenize it, convert it into a sequence of numbers and then pad them to a maxlen of say 100 words. I am not able to get this done when using the feature_column.

Below is what I have written sofar. 

train_dataset = tf.data.experimental.make_csv_dataset(
    'assets/train_dataset.csv', label_name=LABEL, num_epochs=1, shuffle=True, shuffle_buffer_size=10000, batch_size=1, ignore_errors=True)

vocabulary = ds.get_vocabulary()

def text_demo(feature_column):
    feature_layer = tf.keras.experimental.SequenceFeatures(feature_column)
    article, _ = next(iter(train_dataset.take(1)))

    tokenizer = tf_text.WhitespaceTokenizer()

    tokenized = tokenizer.tokenize(article['Text'])

    sequence_input, sequence_length = feature_layer({'Text':tokenized.to_tensor()})

    print(sequence_input)

def categorical_column(feature_column):
    dense_column = tf.keras.layers.DenseFeatures(feature_column)

    article, _ = next(iter(train_dataset.take(1)))

    lang_tokenizer = tf.keras.preprocessing.text.Tokenizer(
      filters='')
    lang_tokenizer.fit_on_texts(article)

    tensor = lang_tokenizer.texts_to_sequences(article)

    tensor = tf.keras.preprocessing.sequence.pad_sequences(tensor,
                                                         padding='post', maxlen=50)

    print(dense_column(tensor).numpy())


text_seq_vocab_list = tf.feature_column.sequence_categorical_column_with_vocabulary_list(key='Text', vocabulary_list=list(vocabulary))
text_embedding = tf.feature_column.embedding_column(text_seq_vocab_list, dimension=8)
text_demo(text_embedding)

numerical_voacb_list = tf.feature_column.categorical_column_with_vocabulary_list(key='Text', vocabulary_list=list(vocabulary))
embedding = tf.feature_column.embedding_column(numerical_voacb_list, dimension=8)
categorical_column(embedding)

I am also confused as to what to use here, sequence_categorical_column_with_vocabulary_list or categorical_column_with_vocabulary_list. In the documentation, SequenceFeatures is also not explained, all though I know it is an experimental feature.

I am also not able to understand what does dimension param do?
 解决方案 
Actually, this

  I am also confused as to what to use here,
  sequence_categorical_column_with_vocabulary_list or
  categorical_column_with_vocabulary_list.
should be the first question, because it affects interpretation the one from the topic name.

Also it is not exactly clear what do you mean on text summarization. What type of model\layers you are going to pass the processed texts into?

By the way, it is important, because tf.keras.layers.DenseFeatures and tf.keras.experimental.SequenceFeatures is suppoused for the different networks architecture and approaches.

As documentation for SequenceFeatures layer says the outputs of SequenceFeatures layers is supposed to be fed into sequence networks, such as RNN.

And DenseFeatures produces a dense Tensor as an output and so suits for other types of networks.

As you perform tokenization in your code snippet, you are going to use embedding in your model.
Then you have two options:

pass learned embeddings forward into Dense layers. This means you will not analyze words order.
pass learned embeddings into Convolution, Reccurent, AveragePooling, LSTM layers and so use the word order to learn as well
The first option would require to use:


The tf.keras.layers.DenseFeatures with 
one of tf.feature_column.categorical_column_*() 
and tf.feature_column.embedding_column()


The second option would require to use:


The tf.keras.experimental.SequenceFeatures with 
one of tf.feature_column.sequence_categorical_column_*() 
and tf.feature_column.embedding_column()


Here are examples. 
The preprocessing and training part are the same for both options:
import tensorflow as tf
print(tf.__version__)

from tensorflow import feature_column

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import text_to_word_sequence
import tensorflow.keras.utils as ku
from tensorflow.keras.utils import plot_model

import pandas as pd
from sklearn.model_selection import train_test_split

DATA_PATH = 'C:\SoloLearnMachineLearning\Stackoverflow\TextDataset.csv'

#it is just two column csv, like:
# text;label
# A wiki is run using wiki software;0
# otherwise known as a wiki engine.;1

dataframe = pd.read_csv(DATA_PATH, delimiter = ';')
dataframe.head()

# Preprocessing before feature_clolumn includes
# - getting the vocabulary
# - tokenization, which means only splitting on tokens.
#   Encoding sentences with vocablary will be done by feature_column!
# - padding
# - truncating

# Build vacabulary
vocab_size = 100
oov_tok = '<OOV>'

sentences = dataframe['text'].to_list()

tokenizer = Tokenizer(num_words = vocab_size, oov_token="<OOV>")

tokenizer.fit_on_texts(sentences)
word_index = tokenizer.word_index

# if word_index shorter then default value of vocab_size we'll save actual size
vocab_size=len(word_index)
print("vocab_size = word_index = ",len(word_index))

# Split sentensec on tokens. here token = word
# text_to_word_sequence() has good default filter for 
# charachters include basic punctuation, tabs, and newlines
dataframe['text'] = dataframe['text'].apply(text_to_word_sequence)

dataframe.head()

max_length = 6

# paddind and trancating setnences
# do that directly with strings without using tokenizer.texts_to_sequences()
# the feature_colunm will convert strings into numbers
dataframe['text']=dataframe['text'].apply(lambda x, N=max_length: (x + N * [''])[:N])
dataframe['text']=dataframe['text'].apply(lambda x, N=max_length: x[:N])
dataframe.head()

# Define method to create tf.data dataset from Pandas Dataframe
def df_to_dataset(dataframe, label_column, shuffle=True, batch_size=32):
    dataframe = dataframe.copy()
    #labels = dataframe.pop(label_column)
    labels = dataframe[label_column]

    ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
    if shuffle:
        ds = ds.shuffle(buffer_size=len(dataframe))
    ds = ds.batch(batch_size)
    return ds

# Split dataframe into train and validation sets
train_df, val_df = train_test_split(dataframe, test_size=0.2)

print(len(train_df), 'train examples')
print(len(val_df), 'validation examples')

batch_size = 32
ds = df_to_dataset(dataframe, 'label',shuffle=False,batch_size=batch_size)

train_ds = df_to_dataset(train_df, 'label',  shuffle=False, batch_size=batch_size)
val_ds = df_to_dataset(val_df, 'label', shuffle=False, batch_size=batch_size)

# and small batch for demo
example_batch = next(iter(ds))[0]
example_batch

# Helper methods to print exxample outputs of for defined feature_column

def demo(feature_column):
    feature_layer = tf.keras.layers.DenseFeatures(feature_column)
    print(feature_layer(example_batch).numpy())

def seqdemo(feature_column):
    sequence_feature_layer = tf.keras.experimental.SequenceFeatures(feature_column)
    print(sequence_feature_layer(example_batch))
Here we come with the first option, when we do not use word order to learn 
# Define categorical colunm for our text feature, 
# which is preprocessed into lists of tokens
# Note that key name should be the same as original column name in dataframe
text_column = feature_column.
            categorical_column_with_vocabulary_list(key='text', 
                                                    vocabulary_list=list(word_index))
#indicator_column produce one-hot-encoding. These lines just to compare with embedding
#print(demo(feature_column.indicator_column(payment_description_3)))
#print(payment_description_2,'\n')

# argument dimention here is exactly the dimension of the space in which tokens 
# will be presented during model's learning
# see the tutorial at https://www.tensorflow.org/beta/tutorials/text/word_embeddings
text_embedding = feature_column.embedding_column(text_column, dimension=8)
print(demo(text_embedding))

# The define the layers and model it self
# This example uses Keras Functional API instead of Sequential just for more generallity

# Define DenseFeatures layer to pass feature_columns into Keras model
feature_layer = tf.keras.layers.DenseFeatures(text_embedding)

# Define inputs for each feature column.
# See https://github.com/tensorflow/tensorflow/issues/27416#issuecomment-502218673
feature_layer_inputs = {}

# Here we have just one column
# Important to define tf.keras.Input with shape 
# corresponding to lentgh of our sequence of words
feature_layer_inputs['text'] = tf.keras.Input(shape=(max_length,),
                                              name='text',
                                              dtype=tf.string)
print(feature_layer_inputs)

# Define outputs of DenseFeatures layer 
# And accually use them as first layer of the model
feature_layer_outputs = feature_layer(feature_layer_inputs)
print(feature_layer_outputs)

# Add consequences layers.
# See https://keras.io/getting-started/functional-api-guide/
x = tf.keras.layers.Dense(256, activation='relu')(feature_layer_outputs)
x = tf.keras.layers.Dropout(0.2)(x)

# This example supposes binary classification, as labels are 0 or 1
x = tf.keras.layers.Dense(1, activation='sigmoid')(x)

model = tf.keras.models.Model(inputs=[v for v in feature_layer_inputs.values()],
                              outputs=x)

model.summary()

# This example supposes binary classification, as labels are 0 or 1
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy']
              #run_eagerly=True
             )

# Note that fit() method looking up features in train_ds and valdation_ds by name in 
# tf.keras.Input(shape=(max_length,), name='text'

# This model of cause will learn nothing because of fake data.

num_epochs = 5
history = model.fit(train_ds,
                    validation_data=val_ds,
                    epochs=num_epochs,
                    verbose=1
                    )
And the second option when we take care about words order and learn it our model.
# Define categorical colunm for our text feature, 
# which is preprocessed into lists of tokens
# Note that key name should be the same as original column name in dataframe
text_column = feature_column.
              sequence_categorical_column_with_vocabulary_list(key='text', 
                                                vocabulary_list=list(word_index))

# arguemnt dimention here is exactly the dimension of the space in 
# which tokens will be presented during model's learning
# see the tutorial at https://www.tensorflow.org/beta/tutorials/text/word_embeddings
text_embedding = feature_column.embedding_column(text_column, dimension=8)
print(seqdemo(text_embedding))

# The define the layers and model it self
# This example uses Keras Functional API instead of Sequential 
# just for more generallity

# Define SequenceFeatures layer to pass feature_columns into Keras model
sequence_feature_layer = tf.keras.experimental.SequenceFeatures(text_embedding)

# Define inputs for each feature column. See
# см. https://github.com/tensorflow/tensorflow/issues/27416#issuecomment-502218673
feature_layer_inputs = {}
sequence_feature_layer_inputs = {}

# Here we have just one column

sequence_feature_layer_inputs['text'] = tf.keras.Input(shape=(max_length,),
                                                       name='text',
                                                       dtype=tf.string)
print(sequence_feature_layer_inputs)

# Define outputs of SequenceFeatures layer 
# And accually use them as first layer of the model

# Note here that SequenceFeatures layer produce tuple of two tensors as output.
# We need just first to pass next.
sequence_feature_layer_outputs, _ = sequence_feature_layer(sequence_feature_layer_inputs)
print(sequence_feature_layer_outputs)
# Add consequences layers. See https://keras.io/getting-started/functional-api-guide/

# Conv1D and MaxPooling1D will learn features from words order
x = tf.keras.layers.Conv1D(8,4)(sequence_feature_layer_outputs)
x = tf.keras.layers.MaxPooling1D(2)(x)
# Add consequences layers. See https://keras.io/getting-started/functional-api-guide/
x = tf.keras.layers.Dense(256, activation='relu')(x)
x = tf.keras.layers.Dropout(0.2)(x)

# This example supposes binary classification, as labels are 0 or 1
x = tf.keras.layers.Dense(1, activation='sigmoid')(x)

model = tf.keras.models.Model(inputs=[v for v in sequence_feature_layer_inputs.values()],
                              outputs=x)
model.summary()

# This example supposes binary classification, as labels are 0 or 1
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy']
              #run_eagerly=True
             )

# Note that fit() method looking up features in train_ds and valdation_ds by name in 
# tf.keras.Input(shape=(max_length,), name='text'

# This model of cause will learn nothing because of fake data.

num_epochs = 5
history = model.fit(train_ds,
                    validation_data=val_ds,
                    epochs=num_epochs,
                    verbose=1
                    )
Please find complete jupiter notebooks with this exapmles on my github:


Answer. Tensorflow pad sequence feature column. DenseFeatures.ipynb
Answer. Tensorflow pad sequence feature column. SequenceFeatures.ipynb


Argument dimention in feature_column.embedding_column() is exactly the dimension of the space in which tokens will be presented during model's learning. See the tutorial at https://www.tensorflow.org/beta/tutorials/text/word_embeddings for detailed explanation

Also note that using feature_column.embedding_column() is an alternative to tf.keras.layers.Embedding(). As you see feature_column make encoding step from a preprocessing pipeline, but you stil should manually do splitting, padding and trancation of sentences.

                        这篇关于Tensorflow Pad序列功能列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

Tensorflow Pad序列功能列 [英] Tensorflow pad sequence feature column

问题描述

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

Tensorflow Pad序列功能列 [英] Tensorflow pad sequence feature column

问题描述

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭