张量流的线性回归 [英] Linear regression with tensorflow

查看:97
本文介绍了张量流的线性回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图理解线性回归...这是我试图理解的脚本:

I trying to understand linear regression... here is script that I tried to understand:

'''
A linear regression learning algorithm example using TensorFlow library.
Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
'''

from __future__ import print_function

import tensorflow as tf
from numpy import *
import numpy
import matplotlib.pyplot as plt
rng = numpy.random

# Parameters
learning_rate = 0.0001
training_epochs = 1000
display_step = 50

# Training Data
train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
                         7.042,10.791,5.313,7.997,5.654,9.27,3.1])
train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
                         2.827,3.465,1.65,2.904,2.42,2.94,1.3])

train_X=numpy.asarray(train_X)
train_Y=numpy.asarray(train_Y)
n_samples = train_X.shape[0]


# tf Graph Input
X = tf.placeholder("float")
Y = tf.placeholder("float")

# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")

# Construct a linear model
pred = tf.add(tf.multiply(X, W), b)


# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    # Fit all training data
    for epoch in range(training_epochs):
        for (x, y) in zip(train_X, train_Y):
            sess.run(optimizer, feed_dict={X: x, Y: y})

        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
            c = sess.run(cost, feed_dict={X: train_X, Y:train_Y})
            print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
                "W=", sess.run(W), "b=", sess.run(b))

    print("Optimization Finished!")
    training_cost = sess.run(cost, feed_dict={X: train_X, Y: train_Y})
    print("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')

    # Graphic display
    plt.plot(train_X, train_Y, 'ro', label='Original data')
    plt.plot(train_X, sess.run(W) * train_X + sess.run(b), label='Fitted line')
    plt.legend()
    plt.show()

问题是这部分代表什么:

Question is what this part represent:

# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")

为什么会有随机浮点数?

And why are there random float numbers?

您还可以给我看一些用形式表示成本,需求,优化变量的数学吗?

Also could you show me some math with formals represents cost, pred, optimizer variables?

推荐答案

让我们尝试结合tf方法建立一些直觉和资源.

let's try to put up some intuition&sources together with the tfapproach.

回归是一个监督学习问题.在其中,如Russel& Norvig的人工智能所定义,任务是:

Regression as presented here is a supervised learning problem. In it, as defined in Russel&Norvig's Artificial Intelligence, the task is:

给定>输入-输出对(x1, y1), (x2, y2), ... , (xm, ym)训练集 (X, y),其中每个输出都是由未知函数y = f(x)生成的,请发现近似的函数h true函数 f

given a training set (X, y) of m input-output pairs (x1, y1), (x2, y2), ... , (xm, ym), where each output was generated by an unknown function y = f(x), discover a function h that approximates the true function f

为此,h 假设函数以某种方式将每个x与要学习的参数组合在一起,以使输出与相应的,并且适用于整个数据集.希望结果函数将接近f.

For that sake, the h hypothesis function combines somehow each x with the to-be-learned parameters, in order to have an output that is as close to the corresponding y as possible, and this for the whole dataset. The hope is that the resulting function will be close to f.

但是如何学习此参数? 为了能够学习,该模型必须能够评估 .这里是 cost (也称为损失能源功绩 ...)的功能: 度量函数,用于将h的输出与相应的,并惩罚重大差异.

But how to learn this parameters? in order to be able to learn, the model has to be able to evaluate. Here comes the cost (also called loss, energy, merit...) function to play: it is a metric function that compares the output of h with the corresponding y, and penalizes big differences.

现在应该清楚这里的学习"过程到底是什么:更改参数以使成本函数的值降低.

Now it should be clear what is exactly the "learning" process here: alter the parameters in order to achieve a lower value for the cost function.

您要发布的示例执行参数线性回归,并根据均方误差作为成本函数,对梯度下降进行了优化.意思是:

The example that you are posting performs a parametric linear regression, optimized with gradient descent based on the mean squared error as cost function. Which means:

  • 参数:参数集是固定的.在整个学习过程中,它们被完全保留在相同的内存占位符中.

  • Parametric: The set of parameters is fixed. They are held in the exact same memory placeholders thorough the learning process.

线性:h的输出仅仅是输入x和您的参数之间的线性(实际上是仿射)组合.因此,如果xw是相同维的实值向量,并且b是实数,则将其保存为h(x,w, b)= w.transposed()*x+b. 深度学习书的第107页为此带来了更多优质的见解和直觉.

Linear: The output of h is merely a linear (actually, affine) combination between the input x and your parameters. So if x and w are real-valued vectors of the same dimensionality, and b is a real number, it holds that h(x,w, b)= w.transposed()*x+b. Page 107 of the Deep Learning Book brings more quality insights and intuitions into that.

成本函数:现在这是有趣的部分.均方误差是函数.这意味着它具有一个单一的全局最优值,而且可以直接通过一组正态方程找到(也在DLB中进行了说明).在您的示例中,使用了随机(和/或小批量)梯度下降方法:当优化非凸成本函数(在神经网络等更高级的模型中)时,这是首选方法具有巨大的尺寸(在DLB中也有说明).

Cost function: Now this is the interesting part. The average squared error is a convex function. This means it has a single, global optimum, and furthermore, it can be directly found with the set of normal equations (also explained in the DLB). In the case of your example, the stochastic (and/or minibatch) gradient descent method is used: this is the preferred method when optimizing non-convex cost functions (which is the case in more advanced models like neural networks) or when your dataset has a huge dimensionality (also explained in the DLB).

梯度下降:tf为您解决了这一问题,因此可以说,GD通过逐步遵循其导数向下"来最小化成本函数,直到到达鞍点. 如果您完全需要知道,TF所采用的确切技术称为自动区分 ,这是数字方法和符号方法之间的折衷方案.对于像您这样的凸函数,这一点将是全局最优值,并且(如果您的学习率不太高)它将始终收敛于该最优值,因此使用哪个值初始化变量并不重要 >.在像神经网络这样的更复杂的体系结构中,随机初始化是必需的.关于 minibatches 的管理,还有一些额外的代码,但是我不会赘述,因为这不是您问题的主要重点.

Gradient descent: tf deals with this for you, so it is enough to say that GD minimizes the cost function by following its derivative "downwards", in small steps, until reaching a saddle point. If you totally need to know, the exact technique applied by TF is called automatic differentiation, kind of a compromise between the numeric and symbolic approaches. For convex functions like yours this point will be the global optimum, and (if your learning rate is not too big) it will always converge to it, so it doesn't matter which values you initialize your Variables with. The random initialization is necessary in more complex architectures like neural networks. There is some extra code regarding the management of the minibatches, but I won't get into that because it is not the main focus of your question.

如今,深度学习框架涉及通过构建计算图来嵌套许多功能(您可能想看看声明式样式,这意味着必须在部署和执行图形之前首先完全定义和编译图形.如果您尚未阅读这篇短篇Wiki文章,建议您这样做.在这种情况下,设置分为两部分:

Deep Learning frameworks are nowadays about nesting lots of functions by building computational graphs (you may want to take a look at the presentation on DL frameworks that I did some weeks ago). For constructing and running the graph, TensoFlow follows a declarative style, which means that the graph has to be first completely defined and compiled, before it is deployed and executed. It is very reccommended to read this short wiki article, if you haven't yet. In this context, the setup is split in two parts:

  1. 首先,您定义计算式图形,将您的数据集和参数放在内存占位符中,定义假设和基于它们的成本函数,并告诉tf应用哪种优化技术.

  1. Firstly, you define your computational Graph, where you put your dataset and parameters in memory placeholders, define the hypothesis and cost functions building on them, and tell tf which optimization technique to apply.

然后您在会话中运行计算,该库将能够(重新)加载数据占位符并执行优化.

Then you run the computation in a Session and the library will be able to (re)load the data placeholders and perform the optimization.

代码:

该示例的代码紧密遵循此方法:

The code:

The code of the example follows this approach closely:

  1. 定义测试数据X并标记Y,并在图形中为它们准备一个占位符(由部分提供).

  1. Define the test data X and labels Y, and prepare a placeholder in the Graph for them (which is fed in the feed_dict part).

为参数定义"W"和"b"占位符.它们必须是变量,因为它们将在会话期间进行更新.

Define the 'W' and 'b' placeholders for the parameters. They have to be Variables because they will be updated during the Session.

定义pred(我们的假设)和cost,如前所述.

Define pred (our hypothesis) and cost as explained before.


由此,其余代码应更加清晰.关于优化器,正如我所说,tf已经知道如何处理此问题,但是您可能希望了解梯度下降的更多细节(同样,DLB是一个很好的参考)


From this, the rest of the code should be clearer. Regarding the optimizer, as I said, tf already knows how to deal with this but you may want to look into gradient descent for more details (again, the DLB is a pretty good reference for that)

干杯! 安德烈斯

这个小片段生成简单的多维数据集并测试这两种方法.请注意,正常方程式方法不需要循环,可以带来更好的结果.对于小尺寸(DIMENSIONS <30k),可能是首选方法:

This small snippets generate simple multi-dimensional datasets and test both approaches. Notice that the normal equations approach doesn't require looping, and brings better results. For small dimensionality (DIMENSIONS<30k) is probably the preferred approach:

from __future__ import absolute_import, division, print_function
import numpy as np
import tensorflow as tf

####################################################################################################
### GLOBALS
####################################################################################################
DIMENSIONS = 5
f = lambda(x): sum(x) # the "true" function: f = 0 + 1*x1 + 1*x2 + 1*x3 ...
noise = lambda: np.random.normal(0,10) # some noise

####################################################################################################
### GRADIENT DESCENT APPROACH
####################################################################################################
# dataset globals
DS_SIZE = 5000
TRAIN_RATIO = 0.6 # 60% of the dataset is used for training
_train_size = int(DS_SIZE*TRAIN_RATIO)
_test_size = DS_SIZE - _train_size
ALPHA = 1e-8 # learning rate
LAMBDA = 0.5 # L2 regularization factor
TRAINING_STEPS = 1000

# generate the dataset, the labels and split into train/test
ds = [[np.random.rand()*1000 for d in range(DIMENSIONS)] for _ in range(DS_SIZE)] # synthesize data
# ds = normalize_data(ds)
ds = [(x, [f(x)+noise()]) for x in ds] # add labels
np.random.shuffle(ds)
train_data, train_labels = zip(*ds[0:_train_size])
test_data, test_labels = zip(*ds[_train_size:])

# define the computational graph
graph = tf.Graph()
with graph.as_default():
  # declare graph inputs
  x_train = tf.placeholder(tf.float32, shape=(_train_size, DIMENSIONS))
  y_train = tf.placeholder(tf.float32, shape=(_train_size, 1))
  x_test = tf.placeholder(tf.float32, shape=(_test_size, DIMENSIONS))
  y_test = tf.placeholder(tf.float32, shape=(_test_size, 1))
  theta = tf.Variable([[0.0] for _ in range(DIMENSIONS)])
  theta_0 = tf.Variable([[0.0]]) # don't forget the bias term!
  # forward propagation
  train_prediction = tf.matmul(x_train, theta)+theta_0
  test_prediction  = tf.matmul(x_test, theta) +theta_0
  # cost function and optimizer
  train_cost = (tf.nn.l2_loss(train_prediction - y_train)+LAMBDA*tf.nn.l2_loss(theta))/float(_train_size)
  optimizer = tf.train.GradientDescentOptimizer(ALPHA).minimize(train_cost)
  # test results
  test_cost = (tf.nn.l2_loss(test_prediction - y_test)+LAMBDA*tf.nn.l2_loss(theta))/float(_test_size)

# run the computation
with tf.Session(graph=graph) as s:
  tf.initialize_all_variables().run()
  print("initialized"); print(theta.eval())
  for step in range(TRAINING_STEPS):
    _, train_c, test_c = s.run([optimizer, train_cost, test_cost],
                               feed_dict={x_train: train_data, y_train: train_labels,
                                          x_test: test_data, y_test: test_labels })
    if (step%100==0):
      # it should return bias close to zero and parameters all close to 1 (see definition of f)
      print("\nAfter", step, "iterations:")
      #print("   Bias =", theta_0.eval(), ", Weights = ", theta.eval())
      print("   train cost =", train_c); print("   test cost =", test_c)
  PARAMETERS_GRADDESC = tf.concat(0, [theta_0, theta]).eval()
  print("Solution for parameters:\n", PARAMETERS_GRADDESC)

####################################################################################################
### NORMAL EQUATIONS APPROACH
####################################################################################################
# dataset globals
DIMENSIONS = 5
DS_SIZE = 5000
TRAIN_RATIO = 0.6 # 60% of the dataset isused for training
_train_size = int(DS_SIZE*TRAIN_RATIO)
_test_size = DS_SIZE - _train_size
f = lambda(x): sum(x) # the "true" function: f = 0 + 1*x1 + 1*x2 + 1*x3 ...
noise = lambda: np.random.normal(0,10) # some noise
# training globals
LAMBDA = 1e6 # L2 regularization factor

# generate the dataset, the labels and split into train/test
ds = [[np.random.rand()*1000 for d in range(DIMENSIONS)] for _ in range(DS_SIZE)]
ds = [([1]+x, [f(x)+noise()]) for x in ds] # add x[0]=1 dimension and labels
np.random.shuffle(ds)
train_data, train_labels = zip(*ds[0:_train_size])
test_data, test_labels = zip(*ds[_train_size:])

# define the computational graph
graph = tf.Graph()
with graph.as_default():
  # declare graph inputs
  x_train = tf.placeholder(tf.float32, shape=(_train_size, DIMENSIONS+1))
  y_train = tf.placeholder(tf.float32, shape=(_train_size, 1))
  theta = tf.Variable([[0.0] for _ in range(DIMENSIONS+1)]) # implicit bias!
  # optimum
  optimum = tf.matrix_solve_ls(x_train, y_train, LAMBDA, fast=True)

# run the computation: no loop needed!
with tf.Session(graph=graph) as s:
  tf.initialize_all_variables().run()
  print("initialized")
  opt = s.run(optimum, feed_dict={x_train:train_data, y_train:train_labels})
  PARAMETERS_NORMEQ = opt
  print("Solution for parameters:\n",PARAMETERS_NORMEQ)

####################################################################################################
### PREDICTION AND ERROR RATE
####################################################################################################

# generate test dataset
ds = [[np.random.rand()*1000 for d in range(DIMENSIONS)] for _ in range(DS_SIZE)]
ds = [([1]+x, [f(x)+noise()]) for x in ds] # add x[0]=1 dimension and labels
test_data, test_labels = zip(*ds)
# define hypothesis
h_gd = lambda(x): PARAMETERS_GRADDESC.T.dot(x)
h_ne = lambda(x): PARAMETERS_NORMEQ.T.dot(x)
# define cost
mse = lambda pred, lab: ((pred-np.array(lab))**2).sum()/DS_SIZE
# make predictions!
predictions_gd = np.array([h_gd(x) for x in test_data])
predictions_ne = np.array([h_ne(x) for x in test_data])
# calculate and print total error
cost_gd = mse(predictions_gd, test_labels)
cost_ne = mse(predictions_ne, test_labels)
print("total cost with gradient descent:", cost_gd)
print("total cost with normal equations:", cost_ne)

这篇关于张量流的线性回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆