如何在张量流中实现多元线性随机梯度下降算法? [英] How to implement multivariate linear stochastic gradient descent algorithm in tensorflow?

查看:72
本文介绍了如何在张量流中实现多元线性随机梯度下降算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从单变量线性梯度下降的简单实现开始,但是不知道将其扩展到多元随机梯度下降算法吗?

I started with simple implementation of single variable linear gradient descent but don't know to extend it to multivariate stochastic gradient descent algorithm ?

单变量线性回归

import tensorflow as tf
import numpy as np

# create random data
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.5

# Find values for W that compute y_data = W * x_data 
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
y = W * x_data

# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)

# Before starting, initialize the variables
init = tf.initialize_all_variables()

# Launch the graph.
sess = tf.Session()
sess.run(init)

# Fit the line.
for step in xrange(2001):
    sess.run(train)
    if step % 200 == 0:
        print(step, sess.run(W))

推荐答案

您的问题有两个部分:

  • 如何将此问题更改为更高维度的空间.
  • 如何从批处理梯度下降更改为随机梯度下降.

要获得更高的尺寸设置,您可以定义线性问题y = <x, w>.然后,您只需要更改变量W的尺寸以匹配w的尺寸并将乘数W*x_data替换为标量积tf.matmul(x_data, W),代码就可以正常运行.

To get a higher dimensional setting, you can define your linear problem y = <x, w>. Then, you just need to change the dimension of your Variable W to match the one of w and replace the multiplication W*x_data by a scalar product tf.matmul(x_data, W) and your code should run just fine.

要将学习方法更改为随机梯度下降,您需要使用tf.placeholder抽象化成本函数的输入.
一旦定义了Xy_来保留每一步的输入,就可以构造相同的成本函数.然后,您需要通过提供适当的数据迷你批处理来调用您的步骤.

To change the learning method to a stochastic gradient descent, you need to abstract the input of your cost function by using tf.placeholder.
Once you have defined X and y_ to hold your input at each step, you can construct the same cost function. Then, you need to call your step by feeding the proper mini-batch of your data.

这里是如何实现这种行为的示例,它应显示W快速收敛到w.

Here is an example of how you could implement such behavior and it should show that W quickly converges to w.

import tensorflow as tf
import numpy as np

# Define dimensions
d = 10     # Size of the parameter space
N = 1000   # Number of data sample

# create random data
w = .5*np.ones(d)
x_data = np.random.random((N, d)).astype(np.float32)
y_data = x_data.dot(w).reshape((-1, 1))

# Define placeholders to feed mini_batches
X = tf.placeholder(tf.float32, shape=[None, d], name='X')
y_ = tf.placeholder(tf.float32, shape=[None, 1], name='y')

# Find values for W that compute y_data = <x, W>
W = tf.Variable(tf.random_uniform([d, 1], -1.0, 1.0))
y = tf.matmul(X, W, name='y_pred')

# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(y_ - y))
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)

# Before starting, initialize the variables
init = tf.initialize_all_variables()

# Launch the graph.
sess = tf.Session()
sess.run(init)

# Fit the line.
mini_batch_size = 100
n_batch = N // mini_batch_size + (N % mini_batch_size != 0)
for step in range(2001):
    i_batch = (step % n_batch)*mini_batch_size
    batch = x_data[i_batch:i_batch+mini_batch_size], y_data[i_batch:i_batch+mini_batch_size]
    sess.run(train, feed_dict={X: batch[0], y_: batch[1]})
    if step % 200 == 0:
        print(step, sess.run(W))

两个注意事项:

  • 下面的实现被称为小批量梯度下降,因为在每个步骤中,梯度都是使用大小为mini_batch_size的数据子集计算得出的.这是随机梯度下降的一种变体,通常用于稳定每个步骤的梯度估计.可以通过设置mini_batch_size = 1来获得随机梯度下降.

  • The implementation below is called a mini-batch gradient descent as at each step, the gradient is computed using a subset of our data of size mini_batch_size. This is a variant from the stochastic gradient descent that is usually used to stabilize the estimation of the gradient at each step. The stochastic gradient descent can be obtained by setting mini_batch_size = 1.

在每个时期都可以对数据集进行混洗,以使实现更接近理论上的考虑.最近的一些工作还考虑只对数据集使用一次传递,因为这样可以防止过度拟合.有关更详细的数学解释,请参见 Bottou12 .可以根据您的问题设置和要查找的统计信息属性轻松更改.

The dataset can be shuffle at every epoch to get an implementation closer to the theoretical consideration. Some recent work also consider only using one pass through your dataset as it prevent over-fitting. For a more mathematical and detailed explanation, you can see Bottou12. This can be easily change according to your problem setup and the statistic property your are looking for.

这篇关于如何在张量流中实现多元线性随机梯度下降算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆