哪些 scikit-learn 工具可以处理多变量输出? [英] Which scikit-learn tools can handle multivariate output?
问题描述
我一直在摆弄不同的 scikit-learn 工具.监督学习类都具有相同的通用 API,您可以在其中调用 model.fit(X, y)
来拟合模型.使用其中的一些(至少是 ExtraTreesRegressor),我可以为 y
传入一个二维数组,它工作正常.对于其他人,它不起作用.它通常不会说为什么它不起作用,但是:我得到形状不匹配错误,这表明它只能预测单个输出维度而没有实际说明.例如,对于随机梯度下降:
嗯,是的,它们确实匹配.如果我只使用一列 y
,它会起作用,但我不知道这是否意味着不支持多变量 y
,或者我只是做得不对.
是否有明确的文档说明哪些 scikit 类可以接受二维 y
而哪些不能?如何判断给定类型的模型是否支持这一点,而不只是试图从错误消息中猜测?
fit 方法 明确指出预期目标具有形状 (n_samples,)
因此是 1D.>
如果需要,您可以包装一个 for 循环以适合每个目标一个 SGDRegressor
.否则,您可以尝试 Ridge
、RidgeCV
、ElasticNet
或 ElasticNetCV
.
此外,如果您想向 SGDRegressor
添加对多目标的支持,请随时发送请求请求.
I've been fiddling with different scikit-learn tools. The supervised-learning classes all have the same general API where you call model.fit(X, y)
to fit the model. With some of these (at least ExtraTreesRegressor), I can pass in a 2-dimensional array for y
and it works fine. With others, it doesn't work. It doesn't usually say why it doesn't work, though: I get shape mismatch errors that suggest that it can only predict a single output dimension without actually saying so. E.g., for stochastic gradient descent:
>>> X.shape
(77946, 24)
>>> y.shape
(77946, 24)
>>> mach = sklearn.linear_model.SGDRegressor()
>>> mach.fit(X, y)
Traceback (most recent call last):
File "<pyshell#37>", line 1, in <module>
mach.fit(X, y)
File "C:\FakeProgs\Python\lib\site-packages\sklearn\linear_model\stochastic_gradient.py", line 842, in fit
sample_weight=sample_weight)
File "C:\FakeProgs\Python\lib\site-packages\sklearn\linear_model\stochastic_gradient.py", line 811, in _fit
coef_init, intercept_init)
File "C:\FakeProgs\Python\lib\site-packages\sklearn\linear_model\stochastic_gradient.py", line 752, in _partial_fit
_check_fit_data(X, y)
File "C:\FakeProgs\Python\lib\site-packages\sklearn\linear_model\stochastic_gradient.py", line 228, in _check_fit_data
raise ValueError("Shapes of X and y do not match.")
ValueError: Shapes of X and y do not match.
Well, yes they do match. It works if I use just one column of y
, but I don't know if this means multivariate y
isn't supported, or I'm just not doing it right.
Is there explicit documentation saying which scikit classes can accept a 2-dimensional y
and which cannot? How can I tell if a given kind of model supports this, without just trying to guess from the error messages?
The fit method of SGDRegressor
explicitly states that the expected target has shape (n_samples,)
hence 1D.
You can wrap a for loop to fit one SGDRegressor
per-target if you need. Otherwise you can try Ridge
, RidgeCV
, ElasticNet
or ElasticNetCV
.
Edit: also if you would like to add support for multi-target to SGDRegressor
please feel free to send a pull-request.
这篇关于哪些 scikit-learn 工具可以处理多变量输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!