R optim 与 Scipy 优化之间的差异:Nelder-Mead [英] Discrepancies between R optim vs Scipy optimize: Nelder-Mead

查看:39
本文介绍了R optim 与 Scipy 优化之间的差异:Nelder-Mead的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个脚本,我相信它应该在 Python 和 R 中产生相同的结果,但它们产生了截然不同的答案.每个尝试通过使用 Nelder-Mead 最小化偏差来使模型适合模拟数据.总的来说,R 中的 optim 表现要好得多.难道我做错了什么?R 和 SciPy 中实现的算法是否不同?

Python 结果:

<预><代码>>>>res = 最小化(choiceProbDev, sparams, (stim, dflt, dat, N), method='Nelder-Mead')final_simplex: (array([[-0.21483287, -1., -0.4645897, -4.65108495],[-0.21483909,-1., -0.4645915 , -4.65114839],[-0.21485426,-1., -0.46457789, -4.65107337],[-0.21483727,-1., -0.46459331, -4.65115965],[-0.21484398,-1., -0.46457725, -4.65099805]]), 数组([107.46037865, 107.46037868, 107.4603787, 107.46037875,107.46037875]))乐趣:107.4603786452194消息:'优化成功终止.'非传染性疾病:349尼特:197状态:0成功:正确x: 数组([-0.21483287, -1., -0.4645897, -4.65108495])

R 结果:

<代码>>res <- optim(sparams, choiceProbDev, stim=stim, dflt=dflt, dat=dat, N=N,方法="内尔德-米德")$par[1] 0.2641022 1.0000000 0.2086496 3.6688737$值[1] 110.4249$counts函数梯度第329话$收敛[1] 0$消息空值

我已经检查了我的代码,据我所知,这似乎是由于 optim 和 minimum 之间的一些差异,因为我试图最小化的函数(即choiceProbDev)在每个函数中都运行相同(除了输出,我还检查了函数中每个步骤的等效性).参见示例:

Python choiceProbDev:

<预><代码>>>>choiceProbDev(np.array([0.5, 0.5, 0.5, 3]), stim, dflt, dat, N)143.31438613033876

R choiceProbDev:

<代码>>choiceProbDev(c(0.5, 0.5, 0.5, 3), stim, dflt, dat, N)[1] 143.3144

我也尝试过调整每个优化函数的容差级别,但我不完全确定容差参数如何在两者之间匹配.无论哪种方式,到目前为止,我的摆弄都没有使两者达成一致.这是每个的完整代码.

蟒蛇:

# 加载模块导入数学将 numpy 导入为 np从 scipy.optimize 导入最小化从 scipy.stats 导入 binom# 初始化值dflt = 0.5N = 1# 设置生成数据的已知参数值b = 0.1w1 = 0.75w2 = 0.25t = 7θ = [b, w1, w2, t]# 产生刺激stim = np.array(np.meshgrid(np.arange(0, 1.1, 0.1),np.arange(0, 1.1, 0.1))).T.reshape(-1,2)# 起始值参数 = [-0.5, -0.5, -0.5, 4]# 生成接受提议的概率def choiceProb(stim, dflt, theta):utilProp = theta[0] + theta[1]*stim[:,0] + theta[2]*stim[:,1] # 提案效用utilDflt = theta[1]*dflt + theta[2]*dflt # 默认实用程序choiceProb = 1/(1 + np.exp(-1*theta[3]*(utilProp - utilDflt))) # 选择提议的概率返回选择概率# 计算偏差def choiceProbDev(theta, stim, dflt, dat, N):# 将 b、w1、w2 的权重限制在 -1 和 1 之间如果有的话([x > 1 or x <-1 for x in theta[:-1]]):返回 10000# 初始化nDat = dat.shape[0]dev = np.array([np.nan]*nDat)# 对于每次试验,计算偏差p = choiceProb(stim, dflt, theta)lk = binom.pmf(dat, N, p)对于我在范围内(nDat):如果 math.isclose(lk[i], 0):开发 [i] = 10000别的:dev[i] = -2*np.log(lk[i])返回 np.sum(dev)# 模拟数据probs = choiceProb(stim, dflt, theta)# 根据计算的概率随机生成数据# dat = np.random.binomial(1, probs, probs.shape[0])dat = np.array([0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1,0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1,0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1,0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])#拟合模型res = 最小化(choiceProbDev, sparams, (stim, dflt, dat, N), method='Nelder-Mead')

R:

图书馆(tidyverse)# 初始化值dflt <- 0.5N<- 1# 设置生成数据的已知参数值b <- 0.1w1 <- 0.75w2 <- 0.25t<- 7θ <- c(b, w1, w2, t)# 产生刺激刺激 <- expand.grid(seq(0, 1, 0.1),seq(0, 1, 0.1)) %>%dplyr::arrange(Var1, Var2)# 起始值参数 <- c(-0.5, -0.5, -0.5, 4)# 生成接受提议的概率choiceProb <- 函数(stim,dflt,theta){utilProp <- theta[1] + theta[2]*stim[,1] + theta[3]*stim[,2] # 提案效用utilDflt <- theta[2]*dflt + theta[3]*dflt # 默认实用程序choiceProb <- 1/(1 + exp(-1*theta[4]*(utilProp - utilDflt))) # 选择提议的概率返回(选择概率)}# 计算偏差choiceProbDev <- 函数(theta,stim,dflt,dat,N){# 将 b、w1、w2 的权重限制在 -1 和 1 之间if (any(theta[1:3] > 1 | theta[1:3] < -1)){回报(10000)}# 初始化nDat <- 长度(数据)开发 <- 代表(NA,nDat)# 对于每次试验,计算偏差p <- choiceProb(stim, dflt, theta)lk <- dbinom(dat, N, p)for (i in 1:nDat){如果 (dplyr::near(lk[i], 0)){开发 [i] <- 10000} 别的 {dev[i] <- -2*log(lk[i])}}回报(总和(开发))}# 模拟数据概率 <- choiceProb(stim, dflt, theta)# 与python脚本中的数据相同dat <- c(0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1,0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1,0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1,0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)#拟合模型res <- optim(sparams, choiceProbDev, stim=stim, dflt=dflt, dat=dat, N=N,方法="内尔德-米德")

更新:

在每次迭代中打印估计值后,现在在我看来,差异可能源于每个算法采用的步长"的差异.Scipy 似乎比 optim 采取更小的步骤(并且在不同的初始方向上).我还没有想出如何调整这个.

蟒蛇:

<预><代码>>>>res = 最小化(choiceProbDev, sparams, (stim, dflt, dat, N), method='Nelder-Mead')[-0.5 -0.5 -0.5 4. ][-0.525 -0.5 -0.5 4. ][-0.5 -0.525 -0.5 4. ][-0.5 -0.5 -0.525 4.][-0.5 -0.5 -0.5 4.2][-0.5125 -0.5125 -0.5125 3.8 ]...

R:

<代码>>res <- optim(sparams, choiceProbDev, stim=stim, dflt=dflt, dat=dat, N=N, method="Nelder-Mead")[1] -0.5 -0.5 -0.5 4.0[1] -0.1 -0.5 -0.5 4.0[1] -0.5 -0.1 -0.5 4.0[1] -0.5 -0.5 -0.1 4.0[1] -0.5 -0.5 -0.5 4.4[1] -0.3 -0.3 -0.3 3.6...

解决方案

'Nelder-Mead' 一直是一个有问题的优化方法,它在 optim 中的编码不是最新的.我们将尝试 R 包中提供的其他三种实现.

为了避免其他参数,让我们定义函数 fn

fn <- 函数(θ)choiceProbDev(theta, stim=stim, dflt=dflt, dat=dat, N=N)

然后求解器 dfoptim::nmk()adagio::neldermead()pracma::anms() 将全部返回相同的最小值 xmin = 105.7843,但在不同的位置,例如

dfoptim::nmk(sparams, fn)## $par## [1] 0.1274937 0.6671353 0.1919542 8.1731618## $值## [1] 105.7843

这些是真正的局部最小值,而例如,c(-0.21483287,-1.0,-0.4645897,-4.65108495) 处的 Python 解决方案 107.46038 不是.您的问题数据显然不足以拟合模型.

您可以尝试使用全局优化器,以便在特定范围内找到全局最优值.在我看来,所有局部最小值都具有相同的最小值.

I wrote a script that I believe should produce the same results in Python and R, but they are producing very different answers. Each attempts to fit a model to simulated data by minimizing deviance using Nelder-Mead. Overall, optim in R is performing much better. Am I doing something wrong? Are the algorithms implemented in R and SciPy different?

Python result:

>>> res = minimize(choiceProbDev, sparams, (stim, dflt, dat, N), method='Nelder-Mead')

 final_simplex: (array([[-0.21483287, -1.        , -0.4645897 , -4.65108495],
       [-0.21483909, -1.        , -0.4645915 , -4.65114839],
       [-0.21485426, -1.        , -0.46457789, -4.65107337],
       [-0.21483727, -1.        , -0.46459331, -4.65115965],
       [-0.21484398, -1.        , -0.46457725, -4.65099805]]), array([107.46037865, 107.46037868, 107.4603787 , 107.46037875,
       107.46037875]))
           fun: 107.4603786452194
       message: 'Optimization terminated successfully.'
          nfev: 349
           nit: 197
        status: 0
       success: True
             x: array([-0.21483287, -1.        , -0.4645897 , -4.65108495])

R result:

> res <- optim(sparams, choiceProbDev, stim=stim, dflt=dflt, dat=dat, N=N,
             method="Nelder-Mead")

$par
[1] 0.2641022 1.0000000 0.2086496 3.6688737

$value
[1] 110.4249

$counts
function gradient 
     329       NA 

$convergence
[1] 0

$message
NULL

I've checked over my code and as far as I can tell this appears to be due to some difference between optim and minimize because the function I'm trying to minimize (i.e., choiceProbDev) operates the same in each (besides the output, I've also checked the equivalence of each step within the function). See for example:

Python choiceProbDev:

>>> choiceProbDev(np.array([0.5, 0.5, 0.5, 3]), stim, dflt, dat, N)
143.31438613033876

R choiceProbDev:

> choiceProbDev(c(0.5, 0.5, 0.5, 3), stim, dflt, dat, N)
[1] 143.3144

I've also tried to play around with the tolerance levels for each optimization function, but I'm not entirely sure how the tolerance arguments match up between the two. Either way, my fiddling so far hasn't brought the two into agreement. Here is the entire code for each.

Python:

# load modules
import math
import numpy as np
from scipy.optimize import minimize
from scipy.stats import binom

# initialize values
dflt = 0.5
N = 1

# set the known parameter values for generating data
b = 0.1
w1 = 0.75
w2 = 0.25
t = 7

theta = [b, w1, w2, t]

# generate stimuli
stim = np.array(np.meshgrid(np.arange(0, 1.1, 0.1),
                            np.arange(0, 1.1, 0.1))).T.reshape(-1,2)

# starting values
sparams = [-0.5, -0.5, -0.5, 4]


# generate probability of accepting proposal
def choiceProb(stim, dflt, theta):

    utilProp = theta[0] + theta[1]*stim[:,0] + theta[2]*stim[:,1]  # proposal utility
    utilDflt = theta[1]*dflt + theta[2]*dflt  # default utility
    choiceProb = 1/(1 + np.exp(-1*theta[3]*(utilProp - utilDflt)))  # probability of choosing proposal

    return choiceProb

# calculate deviance
def choiceProbDev(theta, stim, dflt, dat, N):

    # restrict b, w1, w2 weights to between -1 and 1
    if any([x > 1 or x < -1 for x in theta[:-1]]):
        return 10000

    # initialize
    nDat = dat.shape[0]
    dev = np.array([np.nan]*nDat)

    # for each trial, calculate deviance
    p = choiceProb(stim, dflt, theta)
    lk = binom.pmf(dat, N, p)

    for i in range(nDat):
        if math.isclose(lk[i], 0):
            dev[i] = 10000
        else:
            dev[i] = -2*np.log(lk[i])

    return np.sum(dev)


# simulate data
probs = choiceProb(stim, dflt, theta)

# randomly generated data based on the calculated probabilities
# dat = np.random.binomial(1, probs, probs.shape[0])
dat = np.array([0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1,
       0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1,
       0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
       0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

# fit model
res = minimize(choiceProbDev, sparams, (stim, dflt, dat, N), method='Nelder-Mead')

R:

library(tidyverse)

# initialize values
dflt <- 0.5
N <- 1

# set the known parameter values for generating data
b <- 0.1
w1 <- 0.75
w2 <- 0.25
t <- 7

theta <- c(b, w1, w2, t)

# generate stimuli
stim <- expand.grid(seq(0, 1, 0.1),
                    seq(0, 1, 0.1)) %>%
  dplyr::arrange(Var1, Var2)

# starting values
sparams <- c(-0.5, -0.5, -0.5, 4)

# generate probability of accepting proposal
choiceProb <- function(stim, dflt, theta){
  utilProp <- theta[1] + theta[2]*stim[,1] + theta[3]*stim[,2]  # proposal utility
  utilDflt <- theta[2]*dflt + theta[3]*dflt  # default utility
  choiceProb <- 1/(1 + exp(-1*theta[4]*(utilProp - utilDflt)))  # probability of choosing proposal
  return(choiceProb)
}

# calculate deviance
choiceProbDev <- function(theta, stim, dflt, dat, N){
  # restrict b, w1, w2 weights to between -1 and 1
  if (any(theta[1:3] > 1 | theta[1:3] < -1)){
    return(10000)
  }

  # initialize
  nDat <- length(dat)
  dev <- rep(NA, nDat)

  # for each trial, calculate deviance
  p <- choiceProb(stim, dflt, theta)
  lk <- dbinom(dat, N, p)

  for (i in 1:nDat){
    if (dplyr::near(lk[i], 0)){
      dev[i] <- 10000
    } else {
      dev[i] <- -2*log(lk[i])
    }
  }
  return(sum(dev))
}

# simulate data
probs <- choiceProb(stim, dflt, theta)

# same data as in python script
dat <- c(0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1,
         0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1,
         0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1,
         0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
         0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)

# fit model
res <- optim(sparams, choiceProbDev, stim=stim, dflt=dflt, dat=dat, N=N,
             method="Nelder-Mead")

UPDATE:

After printing the estimates at each iteration, it now appears to me that the discrepancy might stem from differences in 'step sizes' that each algorithm takes. Scipy appears to take smaller steps than optim (and in a different initial direction). I haven't figured out how to adjust this.

Python:

>>> res = minimize(choiceProbDev, sparams, (stim, dflt, dat, N), method='Nelder-Mead')
[-0.5 -0.5 -0.5  4. ]
[-0.525 -0.5   -0.5    4.   ]
[-0.5   -0.525 -0.5    4.   ]
[-0.5   -0.5   -0.525  4.   ]
[-0.5 -0.5 -0.5  4.2]
[-0.5125 -0.5125 -0.5125  3.8   ]
...

R:

> res <- optim(sparams, choiceProbDev, stim=stim, dflt=dflt, dat=dat, N=N, method="Nelder-Mead")
[1] -0.5 -0.5 -0.5  4.0
[1] -0.1 -0.5 -0.5  4.0
[1] -0.5 -0.1 -0.5  4.0
[1] -0.5 -0.5 -0.1  4.0
[1] -0.5 -0.5 -0.5  4.4
[1] -0.3 -0.3 -0.3  3.6
...

解决方案

'Nelder-Mead' has always been a problematic optimization method, and its coding in optim is not up-to-date. We will try three other implementations available in R packages.

To avoild the other parameters, let's define function fn as

fn <- function(theta)
        choiceProbDev(theta, stim=stim, dflt=dflt, dat=dat, N=N)

Then the solvers dfoptim::nmk(), adagio::neldermead(), and pracma::anms() will all return the same minimum value xmin = 105.7843, but at different positions, for instance

dfoptim::nmk(sparams, fn)
## $par
## [1] 0.1274937 0.6671353 0.1919542 8.1731618
## $value
## [1] 105.7843

These are real local minima while, for example, the Python solution 107.46038 at c(-0.21483287,-1.0,-0.4645897,-4.65108495) is not. Your problem data are obviously not sufficient for fitting the model.

You might try a global optimizer to possibly find a global optimum within certain bounds. To me it looks like all local minima have the same minimum value.

这篇关于R optim 与 Scipy 优化之间的差异:Nelder-Mead的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆