成本函数和渐变似乎有效,但scipy.optimize函数无效 [英] Cost Function and Gradient Seem to be Working, but scipy.optimize functions are not

查看:119
本文介绍了成本函数和渐变似乎有效,但scipy.optimize函数无效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理Andrew NG Coursera课程的Matlab代码,并将其转换为python.我正在进行非正则逻辑回归,在编写了梯度和成本函数后,我需要一些类似于fminunc的东西,并在进行了谷歌搜索之后,找到了一些选择.它们都返回相同的结果,但是与Andrew NG的预期结果代码中的内容不匹配.其他人似乎正在使它正常工作,但是我想知道为什么我的特定代码在使用scipy.optimize函数时似乎没有返回期望的结果,却在代码前面的成本和渐变部分中得到了返回.

I'm working through my Matlab code for the Andrew NG Coursera course and turning it into python. I am working on non-regularized logistic regression and after writing my gradient and cost functions I needed something similar to fminunc and after some googling, I found a couple options. They are both returning the same results, but they do not match what is in Andrew NG's expected results code. Others seem to be getting this to work correctly, but I'm wondering why my specific code does not seem to return the desired result when using scipy.optimize functions, but does for the cost and gradient pieces earlier in the code.

我正在使用的数据可以在下面的链接中找到;

The data I'm using can be found at the link below;

ex2data1

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as op


#Machine Learning Online Class - Exercise 2: Logistic Regression

#Load Data
#The first two columns contains the exam scores and the third column contains the label.

data = pd.read_csv('ex2data1.txt', header = None)
X = np.array(data.iloc[:, 0:2]) #100 x 3
y = np.array(data.iloc[:,2]) #100 x 1
y.shape = (len(y), 1)


#Creating sub-dataframes for plotting
pos_plot = data[data[2] == 1]
neg_plot = data[data[2] == 0]


#==================== Part 1: Plotting ====================
#We start the exercise by first plotting the data to understand the 
#the problem we are working with.

print('Plotting data with + indicating (y = 1) examples and o indicating (y = 0) examples.')

plt.plot(pos_plot[0], pos_plot[1], "+", label = "Admitted")
plt.plot(neg_plot[0], neg_plot[1], "o", label = "Not Admitted")
plt.xlabel('Exam 1 score')
plt.ylabel('Exam 2 score')
plt.legend()
plt.show()


def sigmoid(z):
    '''
    SIGMOID Compute sigmoid function
    g = SIGMOID(z) computes the sigmoid of z.
    Instructions: Compute the sigmoid of each value of z (z can be a matrix,
    vector or scalar).
    '''
    g = 1 / (1 + np.exp(-z))
    return g


def costFunction(theta, X, y):
    '''
    COSTFUNCTION Compute cost and gradient for logistic regression
    J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
    parameter for logistic regression and the gradient of the cost
    w.r.t. to the parameters.
    '''
    m = len(y) #number of training examples

    h = sigmoid(X.dot(theta)) #logisitic regression hypothesis
    J = (1/m) * np.sum((-y*np.log(h)) - ((1-y)*np.log(1-h)))

    #h is 100x1, y is %100x1, these end up as 2 vector we subtract from each other
    #then we sum the values by rows
    #cost function for logisitic regression
    return J

def gradient(theta, X, y):
    m = len(y)
    grad = np.zeros((theta.shape))
    h = sigmoid(X.dot(theta))
    for i in range(len(theta)): #number of rows in theta
        XT = X[:,i]
        XT.shape = (len(X),1)
        grad[i] = (1/m) * np.sum((h-y)*XT) #updating each row of the gradient
    return grad


#============ Part 2: Compute Cost and Gradient ============
#In this part of the exercise, you will implement the cost and gradient
#for logistic regression. You neeed to complete the code in costFunction.m


#Add intercept term to x and X_test
Bias = np.ones((len(X), 1))
X = np.column_stack((Bias, X))


#Initialize fitting parameters
initial_theta = np.zeros((len(X[0]), 1))


#Compute and display initial cost and gradient
(cost, grad) = costFunction(initial_theta, X, y), gradient(initial_theta, X, y)

print('Cost at initial theta (zeros): %f' % cost)
print('Expected cost (approx): 0.693\n')
print('Gradient at initial theta (zeros):')
print(grad)
print('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628')


#Compute and display cost and gradient with non-zero theta
test_theta = np.array([[-24], [0.2], [0.2]]);
(cost, grad) = costFunction(test_theta, X, y), gradient(test_theta, X, y)

print('\nCost at test theta: %f' % cost)
print('Expected cost (approx): 0.218\n')
print('Gradient at test theta:')
print(grad)
print('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n')


result = op.fmin_tnc(func = costFunction, x0 = initial_theta, fprime = gradient, args = (X,y))
result[1]


Result = op.minimize(fun = costFunction, 
                                 x0 = initial_theta, 
                                 args = (X, y),
                                 method = 'TNC',
                                 jac = gradient, options={'gtol': 1e-3, 'disp': True, 'maxiter': 1000})


theta = Result.x
theta

test = np.array([[1, 45, 85]]) 
prob = sigmoid(test.dot(theta))
print('For a student with scores 45 and 85, we predict an admission probability of %f,' % prob)
print('Expected value: 0.775 +/- 0.002\n')

推荐答案

这是一个很难调试的问题,并且说明了scipy.optimize接口的文档记录薄弱的方面.该文档模糊地指出theta将作为 vector 传递:

This was a very difficult problem to debug, and illustrates a poorly documented aspect of the scipy.optimize interface. The documentation vaguely indicates that theta will be passed around as a vector:

最小化一个或多个变量的标量函数.

Minimization of scalar function of one or more variables.

通常,优化问题的形式为:

In general, the optimization problems are of the form:

minimize f(x) subject to

g_i(x) >= 0,  i = 1,...,m
h_j(x)  = 0,  j = 1,...,p 

其中x是一个或多个变量的向量.

where x is a vector of one or more variables.

重要的是,它们实际上是最原始意义上的 vector ,即一维数组.因此,您必须期望,每当theta传递到您的一个回调中时,它将作为1-d数组传递.但是在numpy中,一维数组有时与二维行数组(显然与二维列数组)不同.

What's important is that they really mean vector in the most primitive sense, a 1-dimensional array. So you have to expect that whenever theta is passed into one of your callbacks, it will be passed in as a 1-d array. But in numpy, 1-d arrays sometimes behave differently from 2-d row arrays (and, obviously, from 2-d column arrays).

我不知道为什么会导致您的问题,但是无论如何都可以轻松解决.您只需在成本函数和梯度函数的顶部添加以下内容:

I don't know exactly why it's causing a problem in your case, but it's easily fixed regardless. You just have to add the following at the top of both your cost function and your gradient function:

theta = theta.reshape(-1, 1)                                           

这保证了theta将像预期的那样是一个二维列数组.完成此操作后,结果将是正确的.

This guarantees that theta will be a 2-d column array, as expected. Once you've done this, the results are correct.

这篇关于成本函数和渐变似乎有效,但scipy.optimize函数无效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆