ValueError:x_new 中的值低于插值范围 [英] ValueError: A value in x_new is below the interpolation range

查看:109
本文介绍了ValueError:x_new 中的值低于插值范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我在做的时候遇到的 scikit-learn 错误

my_estimator = LassoLarsCV(fit_intercept=False, normalize=False, positive=True, max_n_alphas=1e5)

请注意,如果我将 max_n_alphas 从 1e5 减小到 1e4,则不会再出现此错误.

有人知道发生了什么吗?

当我打电话时发生错误

my_estimator.fit(x, y)

我在 40 维中有 40k 个数据点.

完整的堆栈跟踪如下所示

 文件/usr/lib64/python2.7/site-packages/sklearn/linear_model/least_angle.py",第 1113 行,合适轴=0)(all_alphas)文件/usr/lib64/python2.7/site-packages/scipy/interpolate/polyint.py",第 79 行,在 __call__ 中y = self._evaluate(x)文件/usr/lib64/python2.7/site-packages/scipy/interpolate/interpolate.py",第 498 行,在 _evaluateout_of_bounds = self._check_bounds(x_new)文件/usr/lib64/python2.7/site-packages/scipy/interpolate/interpolate.py",第 525 行,在 _check_boundsraise ValueError("x_new 中的值低于插值"ValueError:x_new 中的值低于插值范围.

解决方案

您的数据必须具有特殊性.LassoLarsCV() 似乎与这个表现相当良好的数据合成示例一起正常工作:

导入numpy导入 sklearn.linear_model# 从带有一点噪声的线性模型创建 40000 x 40 样本数据npoints = 40000ndim = 40numpy.random.seed(1)X = numpy.random.random((npoints, ndims))w = numpy.random.random(ndims)y = X.dot(w) + numpy.random.random(npoints) * 0.1clf = sklearn.linear_model.LassoLarsCV(fit_intercept=False, normalize=False, max_n_alphas=1e6)clf.fit(X, y)# 系数几乎完全恢复,这会打印 0.00377打印最大值(abs( clf.coef_ - w ))# 实际使用的 alpha 是 41 或 ndims+1打印 clf.alphas_.shape

这是在 sklearn 0.16 中,我没有 positive=True 选项.

我不确定您为什么要使用非常大的 max_n_alphas.虽然我不知道为什么 1e+4 有效而 1e+5 在您的情况下无效,但我怀疑您从 max_n_alphas=ndims+1 和 max_n_alphas=1e+4 获得的路径或任何对于行为良好的数据都相同的路径.此外,通过 clf.alpha_ 中的交叉验证估计的最佳 alpha 将是相同的.查看 Lasso path using LARS 示例,了解 alpha 正在尝试做什么.

此外,来自 LassoLars 文档<块引用>

alphas_ 数组,形状 (n_alphas + 1,)

最大协方差(在绝对值)在每次迭代.n_alphas 要么是 max_iter,n_features,或路径中具有相关性的节点数大于 alpha,以较小者为准.

所以我们以上面大小为 ndims+1(即 n_features+1)的 alphas_ 结尾是有道理的.

附言用 sklearn 0.17.1 和 positive=True 测试,也用一些正负系数测试,结果相同:alphas_ 是 ndims+1 或更少.

This is a scikit-learn error that I get when I do

my_estimator = LassoLarsCV(fit_intercept=False, normalize=False, positive=True, max_n_alphas=1e5)

Note that if I decrease max_n_alphas from 1e5 down to 1e4 I do not get this error any more.

Anyone has an idea on what's going on?

The error happens when I call

my_estimator.fit(x, y)

I have 40k data points in 40 dimensions.

The full stack trace looks like this

  File "/usr/lib64/python2.7/site-packages/sklearn/linear_model/least_angle.py", line 1113, in fit
    axis=0)(all_alphas)
  File "/usr/lib64/python2.7/site-packages/scipy/interpolate/polyint.py", line 79, in __call__
    y = self._evaluate(x)
  File "/usr/lib64/python2.7/site-packages/scipy/interpolate/interpolate.py", line 498, in _evaluate
    out_of_bounds = self._check_bounds(x_new)
  File "/usr/lib64/python2.7/site-packages/scipy/interpolate/interpolate.py", line 525, in _check_bounds
    raise ValueError("A value in x_new is below the interpolation "
ValueError: A value in x_new is below the interpolation range.

解决方案

There must be something particular to your data. LassoLarsCV() seems to be working correctly with this synthetic example of fairly well-behaved data:

import numpy
import sklearn.linear_model

# create 40000 x 40 sample data from linear model with a bit of noise
npoints = 40000
ndims = 40
numpy.random.seed(1)
X = numpy.random.random((npoints, ndims))
w = numpy.random.random(ndims)
y = X.dot(w) + numpy.random.random(npoints) * 0.1

clf = sklearn.linear_model.LassoLarsCV(fit_intercept=False, normalize=False, max_n_alphas=1e6)
clf.fit(X, y)

# coefficients are almost exactly recovered, this prints 0.00377
print max(abs( clf.coef_ - w ))

# alphas actually used are 41 or ndims+1
print clf.alphas_.shape

This is in sklearn 0.16, I don't have positive=True option.

I'm not sure why you would want to use a very large max_n_alphas anyway. While I don't know why 1e+4 works and 1e+5 doesn't in your case, I suspect the paths you get from max_n_alphas=ndims+1 and max_n_alphas=1e+4 or whatever would be identical for well behaved data. Also the optimal alpha that is estimated by cross-validation in clf.alpha_ is going to be identical. Check out Lasso path using LARS example for what alpha is trying to do.

Also, from the LassoLars documentation

alphas_ array, shape (n_alphas + 1,)

Maximum of covariances (in absolute value) at each iteration. n_alphas is either max_iter, n_features, or the number of nodes in the path with correlation greater than alpha, whichever is smaller.

so it makes sense that we end with alphas_ of size ndims+1 (ie n_features+1) above.

P.S. Tested with sklearn 0.17.1 and positive=True as well, also tested with some positive and negative coefficients, same result: alphas_ is ndims+1 or less.

这篇关于ValueError:x_new 中的值低于插值范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆