使用 Optim.jl 在 Julia 中进行逻辑回归 [英] Logistic regression in Julia using Optim.jl

查看:33
本文介绍了使用 Optim.jl 在 Julia 中进行逻辑回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 Julia 中实现一个简单的正则化逻辑回归算法.我想使用 Optim.jl 库来最小化我的成本函数,但我无法让它工作.

I'm trying to implement a simple regularized logistic regression algorithm in Julia. I'd like to use Optim.jl library to minimize my cost function, but I can't get it to work.

我的代价函数和梯度如下:

My cost function and gradient are as follows:

function cost(X, y, theta, lambda)
    m = length(y)
    h = sigmoid(X * theta)
    reg = (lambda / (2*m)) * sum(theta[2:end].^2)
    J = (1/m) * sum( (-y).*log(h) - (1-y).*log(1-h) ) + reg
    return J
end

function grad(X, y, theta, lambda, gradient)
    m = length(y)
    h = sigmoid(X * theta)
    # gradient = zeros(size(theta))
    gradient = (1/m) * X' * (h - y)
    gradient[2:end] = gradient[2:end] + (lambda/m) * theta[2:end]
    return gradient
end

(其中 theta 是假设函数的参数向量,lambda 是正则化参数.)

(Where theta is a vector of parameters for the hypothesis function and lambda is a regularization parameter.)

然后,根据此处给出的说明:https://github.com/JuliaOpt/Optim.jl 我尝试这样调用优化函数:

Then, according to the instructions given here: https://github.com/JuliaOpt/Optim.jl I try to call the optimization function like this:

# those are handle functions I define to pass them as arguments:
c(theta::Vector) = cost(X, y, theta, lambda)
g!(theta::Vector, gradient::Vector) = grad(X, y, theta, lambda, gradient)

# then I do
optimize(c,some_initial_theta) 
# or maybe
optimize(c,g!,initial_theta,method = :l_bfgs) # try a different algorithm

在这两种情况下,它都表示它无法收敛,并且输出看起来有点尴尬:

In both cases it says that it fails to converge and the output looks kind of awkard:

julia> optimize(c,initial_theta)
Results of Optimization Algorithm
 * Algorithm: Nelder-Mead
 * Starting Point: [0.0,0.0,0.0,0.0,0.0]
 * Minimum: [1.7787162051775145,3.4584135105727145,-6.659680628594007,4.776952006060713,1.5034743945407143]
 * Value of Function at Minimum: -Inf
 * Iterations: 1000
 * Convergence: false
   * |x - x'| < NaN: false
   * |f(x) - f(x')| / |f(x)| < 1.0e-08: false
   * |g(x)| < NaN: false
   * Exceeded Maximum Number of Iterations: true
 * Objective Function Calls: 1013
 * Gradient Call: 0

julia> optimize(c,g!,initial_theta,method = :l_bfgs)
Results of Optimization Algorithm
 * Algorithm: L-BFGS
 * Starting Point: [0.0,0.0,0.0,0.0,0.0]
 * Minimum: [-6.7055e-320,-2.235e-320,-6.7055e-320,-2.244e-320,-6.339759952602652e-7]
 * Value of Function at Minimum: 0.693148
 * Iterations: 1
 * Convergence: false
   * |x - x'| < 1.0e-32: false
   * |f(x) - f(x')| / |f(x)| < 1.0e-08: false
   * |g(x)| < 1.0e-08: false
   * Exceeded Maximum Number of Iterations: false
 * Objective Function Calls: 75
 * Gradient Call: 75

问题

我的方法(来自我的第一个代码清单)不正确吗?还是我滥用了 Optim.jl 函数?无论哪种方式,在这里定义和最小化成本函数的正确方法是什么?

Question

Is my method (from my first code listing) incorrect? Or am I misusing Optim.jl functions? Either way, what is the proper way to define and minimize the cost function here?

这是我第一次接触 Julia,可能我做错了什么,但我不知道具体是什么.任何帮助将不胜感激!

It's my first time with Julia and probably I'm doing something terribly wrong, but I can't tell what exactly. Any help will be appreciated!

Xy 是训练集,X 是 90x5 矩阵,y 是 90x1 向量(也就是说,我的训练集取自 Iris - 我认为这并不重要).

X and y are the training set, X is a 90x5 matrix, y a 90x1 vector (namely, my training set is taken from Iris - I don't think it matters).

推荐答案

下面是我使用闭包和柯里化的逻辑回归的成本和梯度计算函数(适用于那些习惯于返回成本和梯度的函数的人的版本):

Below you find my cost and gradient computation functions for Logistic Regression using closures and currying (version for those who got used to a function that returns the cost and gradient):

function cost_gradient(θ, X, y, λ)
    m = length(y)
    return (θ::Array) -> begin 
        h = sigmoid(X * θ)   
        J = (1 / m) * sum(-y .* log(h) .- (1 - y) .* log(1 - h)) + λ / (2 * m) * sum(θ[2:end] .^ 2)         
    end, (θ::Array, storage::Array) -> begin  
        h = sigmoid(X * θ) 
        storage[:] = (1 / m) * (X' * (h .- y)) + (λ / m) * [0; θ[2:end]]        
    end
end

Sigmoid函数实现:

Sigmoid function implementation:

sigmoid(z) = 1.0 ./ (1.0 + exp(-z))

要在 Optim.jl 中应用 cost_gradient,请执行以下操作:

To apply cost_gradient in Optim.jl do the following:

using Optim
#...
# Prerequisites:
# X size is (m,d), where d is the number of training set features
# y size is (m,1)
# λ as the regularization parameter, e.g 1.5
# ITERATIONS number of iterations, e.g. 1000
X=[ones(size(X,1)) X] #add x_0=1.0 column; now X size is (m,d+1)
initialθ = zeros(size(X,2),1) #initialTheta size is (d+1, 1)
cost, gradient! = cost_gradient(initialθ, X, y, λ)
res = optimize(cost, gradient!, initialθ, method = ConjugateGradient(), iterations = ITERATIONS);
θ = Optim.minimizer(res);

现在,您可以轻松预测(例如训练集验证):

Now, you can easily predict (e.g. training set validation):

predictions = sigmoid(X * θ) #X size is (m,d+1)

要么尝试我的方法,要么将其与您的实现进行比较.

Either try my approach or compare it with your implementation.

这篇关于使用 Optim.jl 在 Julia 中进行逻辑回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆