为什么scikit-learn对于不同的回归变量需要不同的数据形状? [英] Why does scikit-learn demand different data shapes for different regressors?

查看:90
本文介绍了为什么scikit-learn对于不同的回归变量需要不同的数据形状?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用sklearn时,我总是发现自己在重塑数据,这很烦人并且使我的代码很难看.为什么不能使库能够处理各种数据形状,并进行适当的解释?例如,要使用线性回归器,我需要做

I always find myself reshaping my data when I'm working with sklearn, and it's irritating and makes my code ugly. Why can't the library be made to work with a variety of data shapes, interpreting appropriately? For example, to work with a linear regressor I need to do

from sklearn.linear_model import LinearRegression
x = np.random.rand(10).reshape(-1,1)
y = np.random.rand(10).reshape(-1,1)
regr = LinearRegression()
regr.fit(x,y)

但是如果我想使用支持向量回归器,那么我就不会重塑自变量:

but if I want to use a support vector regressor, then I don't reshape the independent variable:

from sklearn.svm import SVR
x = np.random.rand(10).reshape(-1,1)
y = np.random.rand(10)
regr = SVR()
regr.fit(x,y)

我认为以这种方式设计库是有原因的;谁能照亮我?

I presume there is some reason why the library is designed in this way; can anyone illuminate me?

推荐答案

执行y = np.random.rand(10)时,y是[10,]一维数组.它是行向量还是列向量都没有关系.它只是一个只有一个维度的向量.看看此答案和<也同样要.

When you do y = np.random.rand(10), y is a one dimensional array of [10,]. It doesnt matter if its a row vector or column vector. Its just a vector with only one dimension. Take a look at this answer and this too to understand the philosophy behind it.

它是" numpy哲学"的一部分.而sklearn取决于numpy.

Its a part of "numpy philosophy". And sklearn depends on numpy.

至于您的评论:-

为什么sklearn无法自动理解,如果我将其传递给n_samples = n和n_features = 1的形状(n,)

why sklearn doesn't automatically understand that if I pass it something of the shape (n,) that n_samples=n and n_features=1

sklearn 可能不能仅基于X数据推断其n_samples=n and n_features=1还是其他方式(n_samples=1 and n_features=n).如果传递了y,则可以这样做,这可以使n_samples更加清楚.

sklearn may not infer whether its n_samples=n and n_features=1 or other way around (n_samples=1 and n_features=n) based on X data alone. It may be done, if y is passed which may make it clear about the n_samples.

但这意味着更改所有依赖于这种语义类型的代码,并且可能会破坏很多事情,因为sklearn严重依赖于numpy操作.

But that means changing all the code which relies on this type of semantics and that may break many things, because sklearn depends on numpy operations heavily.

您可能还需要检查讨论类似问题的以下链接.

You may also want to check the following links where similar issues are discussed.

  • https://github.com/scikit-learn/scikit-learn/issues/4509
  • https://github.com/scikit-learn/scikit-learn/issues/4512
  • https://github.com/scikit-learn/scikit-learn/issues/4466
  • https://github.com/scikit-learn/scikit-learn/pull/5152

这篇关于为什么scikit-learn对于不同的回归变量需要不同的数据形状?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆