哪些 scikit-learn 工具可以处理多变量输出? [英] Which scikit-learn tools can handle multivariate output?

查看:52
本文介绍了哪些 scikit-learn 工具可以处理多变量输出?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在摆弄不同的 scikit-learn 工具.监督学习类都具有相同的通用 API,您可以在其中调用 model.fit(X, y) 来拟合模型.使用其中的一些(至少是 ExtraTreesRegressor),我可以为 y 传入一个二维数组,它工作正常.对于其他人,它不起作用.它通常不会说为什么它不起作用,但是:我得到形状不匹配错误,这表明它只能预测单个输出维度而没有实际说明.例如,对于随机梯度下降:

<预><代码>>>>X形(77946, 24)>>>y.形状(77946, 24)>>>mach = sklearn.linear_model.SGDRegressor()>>>mach.fit(X, y)回溯(最近一次调用最后一次):文件<pyshell#37>",第 1 行,在 <module> 中mach.fit(X, y)文件C:\FakeProgs\Python\lib\site-packages\sklearn\linear_model\stochastic_gradient.py",第 842 行,适合样本权重=样本权重)文件C:\FakeProgs\Python\lib\site-packages\sklearn\linear_model\stochastic_gradient.py",第 811 行,在 _fitcoef_init、intercept_init)文件C:\FakeProgs\Python\lib\site-packages\sklearn\linear_model\stochastic_gradient.py",第 752 行,在 _partial_fit_check_fit_data(X, y)文件C:\FakeProgs\Python\lib\site-packages\sklearn\linear_model\stochastic_gradient.py",第 228 行,在 _check_fit_dataraise ValueError("X 和 y 的形状不匹配.")ValueError:X 和 y 的形状不匹配.

嗯,是的,它们确实匹配.如果我只使用一列 y,它会起作用,但我不知道这是否意味着不支持多变量 y,或者我只是做得不对.

是否有明确的文档说明哪些 scikit 类可以接受二维 y 而哪些不能?如何判断给定类型的模型是否支持这一点,而不只是试图从错误消息中猜测?

解决方案

fit 方法 明确指出预期目标具有形状 (n_samples,) 因此是 1D.>

如果需要,您可以包装一个 for 循环以适合每个目标一个 SGDRegressor.否则,您可以尝试 RidgeRidgeCVElasticNetElasticNetCV.

此外,如果您想向 SGDRegressor 添加对多目标的支持,请随时发送请求请求.

I've been fiddling with different scikit-learn tools. The supervised-learning classes all have the same general API where you call model.fit(X, y) to fit the model. With some of these (at least ExtraTreesRegressor), I can pass in a 2-dimensional array for y and it works fine. With others, it doesn't work. It doesn't usually say why it doesn't work, though: I get shape mismatch errors that suggest that it can only predict a single output dimension without actually saying so. E.g., for stochastic gradient descent:

>>> X.shape
(77946, 24)
>>> y.shape
(77946, 24)
>>> mach = sklearn.linear_model.SGDRegressor()
>>> mach.fit(X, y)
Traceback (most recent call last):
  File "<pyshell#37>", line 1, in <module>
    mach.fit(X, y)
  File "C:\FakeProgs\Python\lib\site-packages\sklearn\linear_model\stochastic_gradient.py", line 842, in fit
    sample_weight=sample_weight)
  File "C:\FakeProgs\Python\lib\site-packages\sklearn\linear_model\stochastic_gradient.py", line 811, in _fit
    coef_init, intercept_init)
  File "C:\FakeProgs\Python\lib\site-packages\sklearn\linear_model\stochastic_gradient.py", line 752, in _partial_fit
    _check_fit_data(X, y)
  File "C:\FakeProgs\Python\lib\site-packages\sklearn\linear_model\stochastic_gradient.py", line 228, in _check_fit_data
    raise ValueError("Shapes of X and y do not match.")
ValueError: Shapes of X and y do not match.

Well, yes they do match. It works if I use just one column of y, but I don't know if this means multivariate y isn't supported, or I'm just not doing it right.

Is there explicit documentation saying which scikit classes can accept a 2-dimensional y and which cannot? How can I tell if a given kind of model supports this, without just trying to guess from the error messages?

解决方案

The fit method of SGDRegressor explicitly states that the expected target has shape (n_samples,) hence 1D.

You can wrap a for loop to fit one SGDRegressor per-target if you need. Otherwise you can try Ridge, RidgeCV, ElasticNet or ElasticNetCV.

Edit: also if you would like to add support for multi-target to SGDRegressor please feel free to send a pull-request.

这篇关于哪些 scikit-learn 工具可以处理多变量输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆