如何为类别值列表的列创建嵌入 [英] How to create embeddedings for a column that is a list of categorical values
问题描述
在确定如何为DNN模型的分类特征创建嵌入时遇到了一些麻烦.该功能由一组非固定的标签组成.
I am having some trouble deciding how to create embeddings for a categorical feature for my DNN model. The feature consists of a non fixed set of tags.
功能类似于:
column = [['Adventure','Animation','Comedy'],
['Adventure','Comedy'],
['Adventure','Children','Comedy']
I would like to do this with tensorflow
so I know the tf.feature_column module should work, I just don't know which version to use.
谢谢!
推荐答案
首先,您需要以相同的长度填写要素.
First you need to fill in your features to the same length.
import itertools
import numpy as np
column = np.array(list(itertools.zip_longest(*column, fillvalue='UNK'))).T
print(column)
[['Adventure' 'Animation' 'Comedy']
['Adventure' 'Comedy' 'UNK']
['Adventure' 'Children' 'Comedy']]
然后,您可以使用 tf.feature_column.embedding_column
为分类特征创建嵌入. embedding_column
的输入必须是由任何 categorical_column _ *
函数创建的 CategoricalColumn
.
Then you can use tf.feature_column.embedding_column
to create embeddings for a categorical feature. The inputs of embedding_column
must be a CategoricalColumn
created by any of the categorical_column_*
function.
# if you have big vocabulary list in files, you can use tf.feature_column.categorical_column_with_vocabulary_file
cat_fc = tf.feature_column.categorical_column_with_vocabulary_list(
'cat_data', # identifying the input feature
['Adventure', 'Animation', 'Comedy', 'Children'], # vocabulary list
dtype=tf.string,
default_value=-1)
cat_column = tf.feature_column.embedding_column(
categorical_column =cat_fc,
dimension = 5,
combiner='mean')
categorical_column_with_vocabulary_list
将忽略'UNK'
,因为词汇表中没有'UNK'
. dimension
指定嵌入的尺寸, combiner
指定如何减少单行中是否有多个条目,并且平均"为 embedding_column
中的默认值
categorical_column_with_vocabulary_list
will ignore the 'UNK'
since there is no 'UNK'
in vocabulary list. dimension
specifying dimension of the embedding and combiner
specifying how to reduce if there are multiple entries in a single row with 'mean' the default in embedding_column
.
结果:
tensor = tf.feature_column.input_layer({'cat_data':column}, [cat_column])
with tf.Session() as session:
session.run(tf.global_variables_initializer())
session.run(tf.tables_initializer())
print(session.run(tensor))
[[-0.694761 -0.0711766 0.05720187 0.01770079 -0.09884425]
[-0.8362482 0.11640486 -0.01767573 -0.00548441 -0.05738768]
[-0.71162754 -0.03012567 0.15568805 0.00752804 -0.1422816 ]]
这篇关于如何为类别值列表的列创建嵌入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!