TensorFlow专家组合 [英] Mixture of Experts on TensorFlow

查看:65
本文介绍了TensorFlow专家组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在TensowFlow上实现一个通用模块,该模块接收TensorFlow模型列表(此处表示为Experts),并从中构建Experts混合物,如下图所示 http://www.aclweb.org/anthology/C16-1133

I'm want to implement a generic module on TensowFlow which receives a list of TensorFlow models (here denoted as experts) and builds from that a Mixture of Experts, as it is depicted in the following figure from http://www.aclweb.org/anthology/C16-1133

因此,此模型获得输入x,该输入被馈送到不同的专家以及选通网络.最终输出对应于ensemble output,它由来自不同专家的输出之和乘以对应的门控功能gm(来自门控网络)得出.所有专家网络都同时接受培训.

So this model gets an input x which is fed into the different experts as well as to the a gating network. The final output corresponds to ensemble output, which is given by the sum of the outputs from the different experts multiplied by the corresponding gating function gm, which comes from the gating network. All the expert networks are trained in simultaneous.

此模块适合批量培训很重要.我正在寻找已经实施的东西,并发现了这个 https://github.com/AmazaspShumik/Mixture-Models ,尽管它不在TensorFlow上.

It's important that this module is suitable for batch training. I was looking for something already implemented and found this https://github.com/AmazaspShumik/Mixture-Models although it's not on TensorFlow.

因此,现在我正在寻找有关构建此模块的最佳方法的指示和建议,即有关一些已经实现的TF层或包装器,它们将特别适合此应用程序.

So right now I am looking for pointers and suggestions regarding what the best approach to build this module would be, namely regarding some already implemented TF layers or wrappers that would be particularly suitable for this application.

推荐答案

是的,您可以通过使用门控占位符在多合一体系结构中进行此操作.

Yes, you can do this in an all-in-one architecture by using a gating placeholder.

让我们从一个简单的tensorflow概念代码开始,然后将其添加到其中:

Let's start with a simple tensorflow concept code like this then add to it:

m = tf.Variable( [width,height] , dtype=tf.float32  ))
b = tf.Variable( [height] , dtype=tf.float32  ))
h = tf.sigmoid( tf.matmul( x,m ) + b )

想象一下,这是您的单个专家"模型架构.我知道这是很基本的,但是出于我们的说明目的.

Imagine this is your single "expert" model architecture. I know it is fairly basic, but it will do for our purposes of illustration.

我们要做的是将所有专家系统存储在矩阵的 m b 中,并定义门控矩阵.

What we are going to do is store all of the expert systems in the matrix's m and b and define a gating matrix.

让我们将门控矩阵称为 g .这将阻止特定的神经连接.神经连接在 m 中定义.这将是您的新配置

Let's call the gating matrix g. It is going to block specific neural connections. The neural connections are defined in m. This would be your new configuration

g = tf.placeholder( [width,height] , dtype=tf.float32 )
m = tf.Variable( [width,height] , dtype=tf.float32  )
b = tf.Variable( [height] , dtype=tf.float32  )
h = tf.sigmoid( tf.matmul( x, tf.multiply(m,g) ) + b )

g 是1和0的矩阵.为要保留的每个神经连接插入一个1,为要阻止的每个神经连接插入一个0.如果您有4个专家系统,则连接的1/4将为1,连接的3/4将为0.

g is a matrix of 1's and 0's. Insert a 1 for every neural connection you want to keep and a 0 for every one you want to block. If you have 4 expert systems, then 1/4th of the connections will be 1's and 3/4ths will be 0s.

如果您希望他们所有人平等投票,则需要将所有 g 值设置为1/4.

If you want them all to vote equally, then you'll want to set all values of g to 1/4th.

这篇关于TensorFlow专家组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆