ValueError:不支持连续多输出 [英] ValueError: continuous-multioutput is not supported

查看:751
本文介绍了ValueError:不支持连续多输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在具有约5,000行和6个特征的数据集上运行几种回归类型(套索,岭,ElasticNet和SVR).线性回归.使用GridSearchCV进行交叉验证.代码虽然很广泛,但是这里有一些关键部分:

def splitTrainTestAdv(df):

    y = df.iloc[:,-5:]  # last 5 columns
    X = df.iloc[:,:-5]  # Except for last 5 columns


    #Scaling and Sampling

    X = StandardScaler().fit_transform(X)

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=0)


return X_train, X_test, y_train, y_test

def performSVR(x_train, y_train, X_test, parameter):



    C = parameter[0]
    epsilon = parameter[1] 
    kernel = parameter[2]

    model = svm.SVR(C = C, epsilon = epsilon, kernel = kernel)
    model.fit(x_train, y_train)



return model.predict(X_test)  #prediction for the test

def performRidge(X_train, y_train, X_test, parameter):

    alpha = parameter[0]

    model = linear_model.Ridge(alpha=alpha, normalize=True) 
    model.fit(X_train, y_train)



return model.predict(X_test)  #prediction for the test

MODELS = {
    'lasso': (
        linear_model.Lasso(),
        {'alpha': [0.95]}
    ),
    'ridge': (
        linear_model.Ridge(),
        {'alpha': [0.01]}
        ),
    )
}


def performParameterSelection(model_name, feature, X_test, y_test, X_train, y_train):


    print("# Tuning hyper-parameters for %s" % feature)
    print()

    model, param_grid = MODELS[model_name]
    gs = GridSearchCV(model, param_grid, n_jobs= 1, cv=5, verbose=1, scoring='%s_weighted' % feature)


    gs.fit(X_train, y_train) 


    print("Best parameters set found on development set:")

    print(gs.best_params_)
    print()
    print("Grid scores on development set:")
    print()
    for params, mean_score, scores in gs.grid_scores_:
        print("%0.3f (+/-%0.03f) for %r"
          % (mean_score, scores.std() * 2, params))

    print("Detailed classification report:")
    print()
    print("The model is trained on the full development set.")
    print("The scores are computed on the full evaluation set.")

    y_true, y_pred = y_test, gs.predict(X_test)
    print(classification_report(y_true, y_pred))

soil = pd.read_csv('C:/training.csv', index_col=0)
soil = getDummiedSoilDepth(soil)
np.random.seed(2015)
soil = shuffleData(soil)
soil = soil.drop('Depth', 1)

X_train, X_test, y_train, y_test = splitTrainTestAdv(soil)


scores = ['precision', 'recall']

for score in scores:




    for model in MODELS.keys():

        print '####################'
        print model, score
        print '####################'
        performParameterSelection(model, score, X_test, y_test, X_train, y_train)

您可以假定所有必需的导入均已完成

我收到此错误,不知道为什么:

ValueError                                Traceback (most recent call last)

()中的

18个打印模型,得分 19打印'####################' ---> 20 performParameterSelection(模型,得分,X_test,y_test,X_train,y_train) 21

<ipython-input-27-304555776e21> in performParameterSelection(model_name,  feature, X_test, y_test, X_train, y_train)
     12     # cv=5 - constant; verbose - keep writing
     13 
---> 14     gs.fit(X_train, y_train) # Will get grid scores with outputs from ALL models described above
     15 
     16         #pprint(sorted(gs.grid_scores_, key=lambda x: -x.mean_validation_score))

C:\Users\Tony\Anaconda\lib\site-packages\sklearn\grid_search.pyc in fit(self, X, y)

C:\Users\Tony\Anaconda\lib\site-packages\sklearn\metrics\classification.pyc in _check_targets(y_true, y_pred)
     90     if (y_type not in ["binary", "multiclass", "multilabel-indicator",
     91                        "multilabel-sequences"]):
---> 92         raise ValueError("{0} is not supported".format(y_type))
     93 
     94     if y_type in ["binary", "multiclass"]:

ValueError: continuous-multioutput is not supported

我对Python还是很陌生,这个错误使我感到困惑.当然,这不应该是因为我有6个功能.我试图遵循标准的内置功能.<​​/p>

请帮助

解决方案

首先让我们复制问题.

首先导入所需的库:

import numpy as np
import pandas as pd 
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split
from sklearn import linear_model
from sklearn.grid_search import GridSearchCV

然后创建一些数据:

df = pd.DataFrame(np.random.rand(5000,11))
y = df.iloc[:,-5:]  # last 5 columns
X = df.iloc[:,:-5]  # Except for last 5 columns
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=0)

现在,我们可以复制错误,还可以看到不复制错误的选项:

运行正常

gs = GridSearchCV(linear_model.Lasso(), {'alpha': [0.95]}, n_jobs= 1, cv=5, verbose=1)
print gs.fit(X_train, y_train) 

这不是

gs = GridSearchCV(linear_model.Lasso(), {'alpha': [0.95]}, n_jobs= 1, cv=5, verbose=1, scoring='recall')
gs.fit(X_train, y_train) 

的确与上面的错误完全相同; 不支持连续多输出".

如果您考虑召回措施,则与二进制或分类数据有关-我们可以根据这些数据定义误报之类的内容.至少在我复制您的数据时,我使用了连续数据,而回想只是没有定义.如上所示,如果您使用默认分数,它将起作用.

因此,您可能需要查看自己的预测并理解为什么它们是连续的(即使用分类器而不是回归).或使用其他分数.

顺便说一句,如果仅使用一组(列)y值运行回归,则仍然会出错.这次更简单地说不支持连续输出",即问题在于对连续数据使用召回率(或精度)(无论是否为多输出).

I want to run several regression types (Lasso, Ridge, ElasticNet and SVR) on a dataset with around 5,000 rows and 6 features. Linear regression. Use GridSearchCV for cross validation. The code is extensive but here are some critical parts:

def splitTrainTestAdv(df):

    y = df.iloc[:,-5:]  # last 5 columns
    X = df.iloc[:,:-5]  # Except for last 5 columns


    #Scaling and Sampling

    X = StandardScaler().fit_transform(X)

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=0)


return X_train, X_test, y_train, y_test

def performSVR(x_train, y_train, X_test, parameter):



    C = parameter[0]
    epsilon = parameter[1] 
    kernel = parameter[2]

    model = svm.SVR(C = C, epsilon = epsilon, kernel = kernel)
    model.fit(x_train, y_train)



return model.predict(X_test)  #prediction for the test

def performRidge(X_train, y_train, X_test, parameter):

    alpha = parameter[0]

    model = linear_model.Ridge(alpha=alpha, normalize=True) 
    model.fit(X_train, y_train)



return model.predict(X_test)  #prediction for the test

MODELS = {
    'lasso': (
        linear_model.Lasso(),
        {'alpha': [0.95]}
    ),
    'ridge': (
        linear_model.Ridge(),
        {'alpha': [0.01]}
        ),
    )
}


def performParameterSelection(model_name, feature, X_test, y_test, X_train, y_train):


    print("# Tuning hyper-parameters for %s" % feature)
    print()

    model, param_grid = MODELS[model_name]
    gs = GridSearchCV(model, param_grid, n_jobs= 1, cv=5, verbose=1, scoring='%s_weighted' % feature)


    gs.fit(X_train, y_train) 


    print("Best parameters set found on development set:")

    print(gs.best_params_)
    print()
    print("Grid scores on development set:")
    print()
    for params, mean_score, scores in gs.grid_scores_:
        print("%0.3f (+/-%0.03f) for %r"
          % (mean_score, scores.std() * 2, params))

    print("Detailed classification report:")
    print()
    print("The model is trained on the full development set.")
    print("The scores are computed on the full evaluation set.")

    y_true, y_pred = y_test, gs.predict(X_test)
    print(classification_report(y_true, y_pred))

soil = pd.read_csv('C:/training.csv', index_col=0)
soil = getDummiedSoilDepth(soil)
np.random.seed(2015)
soil = shuffleData(soil)
soil = soil.drop('Depth', 1)

X_train, X_test, y_train, y_test = splitTrainTestAdv(soil)


scores = ['precision', 'recall']

for score in scores:




    for model in MODELS.keys():

        print '####################'
        print model, score
        print '####################'
        performParameterSelection(model, score, X_test, y_test, X_train, y_train)

You can assume that all required imports are done

I am getting this error and do not know why:

ValueError                                Traceback (most recent call last)

in () 18 print model, score 19 print '####################' ---> 20 performParameterSelection(model, score, X_test, y_test, X_train, y_train) 21

<ipython-input-27-304555776e21> in performParameterSelection(model_name,  feature, X_test, y_test, X_train, y_train)
     12     # cv=5 - constant; verbose - keep writing
     13 
---> 14     gs.fit(X_train, y_train) # Will get grid scores with outputs from ALL models described above
     15 
     16         #pprint(sorted(gs.grid_scores_, key=lambda x: -x.mean_validation_score))

C:\Users\Tony\Anaconda\lib\site-packages\sklearn\grid_search.pyc in fit(self, X, y)

C:\Users\Tony\Anaconda\lib\site-packages\sklearn\metrics\classification.pyc in _check_targets(y_true, y_pred)
     90     if (y_type not in ["binary", "multiclass", "multilabel-indicator",
     91                        "multilabel-sequences"]):
---> 92         raise ValueError("{0} is not supported".format(y_type))
     93 
     94     if y_type in ["binary", "multiclass"]:

ValueError: continuous-multioutput is not supported

I am still very new to Python and this error puzzles me. This should not because I have 6 features, of course. I tried to follow standard buil-in functions.

Please, help

解决方案

First let's replicate the problem.

First import the libraries needed:

import numpy as np
import pandas as pd 
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split
from sklearn import linear_model
from sklearn.grid_search import GridSearchCV

Then create some data:

df = pd.DataFrame(np.random.rand(5000,11))
y = df.iloc[:,-5:]  # last 5 columns
X = df.iloc[:,:-5]  # Except for last 5 columns
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=0)

Now we can replicate the error and also see options which do not replicate the error:

This runs OK

gs = GridSearchCV(linear_model.Lasso(), {'alpha': [0.95]}, n_jobs= 1, cv=5, verbose=1)
print gs.fit(X_train, y_train) 

This does not

gs = GridSearchCV(linear_model.Lasso(), {'alpha': [0.95]}, n_jobs= 1, cv=5, verbose=1, scoring='recall')
gs.fit(X_train, y_train) 

and indeed the error is exactly as you have above; 'continuous multi-output is not supported'.

If you think about the recall measure, it is to do with binary or categorical data - about which we can define things like false positives and so on. At least in my replication of your data, I used continuous data and recall simply is not defined. If you use the default score it works, as you can see above.

So you probably need to look at your predictions and understand why they are continuous (i.e. use a classifier instead of regression). Or use a different score.

As an aside, if you run the regression with only one set of (column of) y values, you still get an error. This time it says more simply 'continuous output is not supported', i.e. the issue is using recall (or precision) on continuous data (whether or not it is multi-output).

这篇关于ValueError:不支持连续多输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆