张量流中的tf.contrib.layers.embedding_column [英] tf.contrib.layers.embedding_column from tensor flow

查看:658
本文介绍了张量流中的tf.contrib.layers.embedding_column的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读tensorflow教程张量流。我想找到以下行的描述:

I am going through tensorflow tutorial tensorflow. I would like to find description of the following line:

tf.contrib.layers.embedding_column

我想知道它是否使用了word2vec或其他任何东西,或者我在想完全错误的方向。我试图在GibHub上单击,但是什么也没找到。我猜在GitHub上查找并不容易,因为python可能引用某些C ++库。有人能指出我正确的方向吗?

I wonder if it uses word2vec or anything else, or maybe I am thinking in completely wrong direction. I tried to click around on GibHub, but found nothing. I am guessing looking on GitHub is not going to be easy, since python might refer to some C++ libraries. Could anybody point me in the right direction?

推荐答案

我也一直在想这个问题。我不清楚他们在做什么,但这就是我的发现。

I've been wondering about this too. It's not really clear to me what they're doing, but this is what I found.

关于广泛和深度学习的论文,他们将嵌入向量描述为随机初始化,然后在训练过程中进行调整以最大程度地减少误差。

In the paper on wide and deep learning, they describe the embedding vectors as being randomly initialized and then adjusted during training to minimize error.

通常,当您进行嵌入时,您需要对数据进行任意向量​​表示(例如单热向量),然后将其乘以代表嵌入的矩阵。可以通过PCA或在训练过程中(例如t-SNE或word2vec)找到此矩阵。

Normally when you do embeddings, you take some arbitrary vector representation of the data (such as one-hot vectors) and then multiply it by a matrix that represents the embedding. This matrix can be found by PCA or while training by something like t-SNE or word2vec.

embedding_column的实际代码为此处,它被实现为名为_EmbeddingColumn的类,它是一个子类_FeatureColumn。它将嵌入矩阵存储在其sparse_id_column属性内。然后,方法to_dnn_input_layer应用此嵌入矩阵为下一层生成嵌入。

The actual code for the embedding_column is here, and it's implemented as a class called _EmbeddingColumn which is a subclass of _FeatureColumn. It stores the embedding matrix inside its sparse_id_column attribute. Then, the method to_dnn_input_layer applies this embedding matrix to produce the embeddings for the next layer.

 def to_dnn_input_layer(self,
                         input_tensor,
                         weight_collections=None,
                         trainable=True):
    output, embedding_weights = _create_embedding_lookup(
        input_tensor=self.sparse_id_column.id_tensor(input_tensor),
        weight_tensor=self.sparse_id_column.weight_tensor(input_tensor),
        vocab_size=self.length,
        dimension=self.dimension,
        weight_collections=_add_variable_collection(weight_collections),
        initializer=self.initializer,
        combiner=self.combiner,
        trainable=trainable)

据我所知,似乎是通过将您正在使用的任何学习规则(梯度下降等)应用于嵌入矩阵来形成嵌入的。

So as far as I can see, it seems like the embeddings are formed by applying whatever learning rule you're using (gradient descent, etc.) to the embedding matrix.

这篇关于张量流中的tf.contrib.layers.embedding_column的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆