ValueError:不支持连续 [英] ValueError: continuous is not supported

查看:66
本文介绍了ValueError:不支持连续的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用GridSearchCV进行线性回归(不是分类器或逻辑回归)的交叉验证.

我也使用StandardScaler对X进行归一化

我的数据框具有17个特征(X)和5个目标(y)(观察).约1150行

我不断收到ValueError:不支持Continuous的错误消息,并且用完了所有选项.

下面是一些代码(假设所有导入均正确完成):

soilM = pd.read_csv('C:/training.csv', index_col=0)SoilM = getDummiedSoilDepth(soilM)#在0和1中转换文本值SoilM = SoilM.drop('Depth',1)土壤= SoilM.iloc [:,-22:]X_train,X_test,Ca_train,Ca_test,P_train,P_test,pH_train,pH_test,SOC_train,SOC_test,Sand_train,Sand_test = splitTrainTestAdv(土壤)分数= [精度",召回率"]分数中的分数:对于MODELS.keys()中的模型:打印模型,得分performParameterSelection(模型,得分,X_test,Ca_test,X_train,Ca_train)def performParameterSelection(型号名称,条件,X_test,y_test,X_train,y_train):模型,param_grid =模型[模型名称]gs = GridSearchCV(模型,param_grid,n_jobs = 1,cv = 5,详细= 1,得分='%s_weighted'%条件)gs.fit(X_train,y_train)列印(gs.best_params_)对于参数,mean_score,在gs.grid_scores_中的得分:print(%r的%0.3f(+/-%0.03f)"%(mean_score,scores.std()* 2,params))y_true,y_pred = y_test,gs.predict(X_test)打印(classification_report(y_true,y_pred))型号= {'套索':(linear_model.Lasso(),{'alpha':[0.95]}),'山脊':(linear_model.Ridge(),{'alpha':[0.01]}),'elasticnet':(linear_model.ElasticNet(),{'alpha':[0.6],'l1_ratio':[0.4]}),'svr':(svm.SVR(),{'C':[5.0],'epsilon':[0.1],'内核':['线性']})}def performLasso(X_train,y_train,X_test,参数):阿尔法=参数[0]模型= linear_model.Lasso(alpha = alpha,normalize = True)#将alpha传递给套索model.fit(X_train,y_train)返回model.predict(X_test)def splitTrainTestAdv(df):y = df.iloc [:,-5:].copy()#最后5列X1 = df.iloc [:,:-5] .copy()#除了最后5列Ca = y ['Ca'].copy()P = y ['P'].copy()pH = y ['pH'].copy()SOC = y ['SOC'].copy()沙= y ['沙'] .copy()#缩放和采样X = StandardScaler(复制= False).fit_transform(X1)X_train,X_test,Ca_train,Ca_test = train_test_split(X,Ca,test_size = 0.2,random_state = 0)返回X_train,X_test,Ca_train,Ca_test,P_train,P_test,pH_train,pH_test,SOC_train,SOC_test,Sand_train,Sand_test 

这些是代码的主要部分

这是错误输出的主要部分:

  ValueError追溯(最近一次通话)< ipython-input-90-1315d47e2551>在< module>()中20打印'####################'21种打印功能V [1]--->22 performParameterSelection(model,score,X_test,Ca_test,X_train,Ca_train)23种打印功能V [2]24 performParameterSelection(模型,得分,X_test,P_test,X_train,P_train)< ipython-input-41-7075e1a49412>在performParameterSelection(模型名称,条件,X_test,y_test,X_train,y_train)中12#cv = 5-常数;详细-继续写作13--->14 gs.fit(X_train,y_train)#将从上述所有模型的输出中获取网格分数1516 #pprint(sorted(gs.grid_scores_,key = lambda x:-x.mean_validation_score))C:\ Users \ Tony \ Anaconda \ lib \ site-packages \ sklearn \ grid_search.py​​c in fit(self,X,y)730731"->732返回self._fit(X,y,ParameterGrid(self.param_grid))90 if(y_type不在["binary","multiclass","multilabel-indicator"中,91"multilabel-sequences"]):--->92引发ValueError(不支持{0}".format(y_type))9394如果y_type在["binary","multiclass"]中:ValueError:不支持连续 

以下是使用oil.head(15)之后的一些数据.它不会显示所有列,但应以8个功能(而不是17个)以相同的方式运行.至于目标:这是最后5列,但此处的代码仅计算了一个(Ca)

  BSAN BSAS BSAV CTI ELEV EVI LSTD LSTN REF1 REF2 ... RELI底土表土TMAP TMFI Ca P pH SOC砂PIDN92RkYor6 -0.405797 -0.563636 -0.806271 -0.228241 -0.691982 1.653790 -0.605889 0.627488 -0.856727 0.056586 ... -0.062181 0 1 0.896228 1.651807 -0.394962 0.031291 0.488676 -0.389042 0.630347nPv9P04t -0.688406 -0.709091 -0.739082 -0.189180 1.185523 0.395773 -0.381748 -0.338928 -0.774545 -0.818182 ... 2.999523 1 0 1.539208 1.618022 -0.460044 -0.366432 -0.549490 0.204798 -1.162260oCASbXEx -0.623188 -0.654545 -0.727884 -0.155835 0.711136 0.517493 -0.035002 -0.092554 -0.725818 -0.651206 ... -0.300034 1 0 0.286952 0.657765 0.259613 -0.407934 0.591558 -0.529688 -0.793082xq94dGBz -0.746377 -0.781818 -0.862262 -0.340427 0.791314 0.672741 -0.665032 -0.128613 -0.853091 -0.741187 ... -0.418960 0 1 0.276740 0.678724 -0.467854 -0.245386 -0.577548 -0.428111 -0.130845GYSYA8Yf -0.862319 -0.836364 -0.783875 -0.020427 4.715590 0.473032 -1.321194 -2.560069 -0.791273 -0.827458 ... 2.299354 1 0 0.583042 1.825040 1.442361 -0.328389 0.797320 -0.443738 -0.892037G4e9Ahvi -0.710145 -0.736364 -0.727884 -0.175122 -1.003786 0.744898 -0.678329 0.851702 -0.661818 -0.474954 ... -0.300034 1 0 1.544703 1.641861 -0.355335 -0.079380 -0.287610 -0.256209 0.287810SHU443XO -0.579710 -0.736364 -0.963046 -0.536744 -0.179733 1.793003 -0.914052 0.291898 -0.966545 -0.086271 ... 0.260618 0 1 1.840689 2.223996 -0.499961 0.155796 -0.886192 -0.107749 0.942435oAeygDKu -0.152174 -0.154545 -0.134378 1.252267 -0.796659 -0.155977 1.309391 0.642680 -0.205818 -0.341373 ... -0.537887 1 0 -0.320335 0.429981 -0.441821 -0.352598 0.339031 -0.826609 1.650344agBvYkUI -0.724638 -0.790909 -0.839866 0.114245 1.363697 0.726676 -1.687885 0.060034 -0.706909 -0.523191 ... 1.127081 1 0 1.254782 0.972442 -0.505456 -0.345681 -1.774712 0.071966 -1.2079318ujcZd8d -0.427536 -0.600000 -0.806271 -0.667808 -1.208686 2.008018 -1.276453 1.203854 -0.698182 0.224490 ... 0.107713 0 1 0.288463 0.013744 -0.362277 -0.338764 0.039740 -0.232768 0.451467hqO5LhmQ -0.644928 -0.690909 -0.772676 -0.195877 1.138753 0.390671 0.145537 -0.544813 -0.722909 -0.729128 ... -0.537887 0 1 0.153926 0.422784 -0.460333 -0.300721 -0.063142 -0.607825 1.208852QsfH8CWp -0.449275 -0.618182 -0.862262 -0.512923 -0.712027 1.537901 -0.665190 0.595265 -0.884364 -0.103896 ... -0.028203 1 0 0.896228 1.651807 -0.475953 -0.252303 -0.128612 -0.670335 0.7863915hhEGbrX -0.260870 -0.290909 -0.335946 -0.175122 -0.749889 0.400146 0.299908 0.567983 -0.423273 -0.244898 ... -0.520897 1 0 0.249117 0.907095 -0.142446 -0.397558 0.423206 -0.412483 -0.678903XlJWsmdz -0.768116 -0.800000 -0.873460 -0.737115 0.682183 1.013848 -1.013065 -0.376346 -0.837818 -0.544527 ... 1.619776 1 0 0.942437 1.482143 -0.358517 1.283256 -0.072494 -0.490620 -0.899649FY3riRgw -0.818841 -0.863636 -0.873460 -0.739177 1.715590 1.434402 -1.669818 -0.090647 -0.874909 -0.388683 ... 3.182807 0 1 1.254782 0.972442 -0.333063 0.020916 -0.942309 1.314342 -0.690321 

15行×22列

解决方案

您的错误 continuous不支持告诉我您正在尝试从分类域上的回归域执行操作".

由于您的目标是回归,至少有1件事引起了我的注意:

 分数= ['精度','召回'] 

首先,两者都与回归无关(正如@ zero323在对您的问题的评论中指出的那样):它们是分类的准确性度量.在 sklearn文档页面的"3.3部分"中,尝试任何适合您口味的回归评分..1.1.常见情况:预定义值"

就其余代码而言,我强烈建议您从头开始重写代码:Lasso的块,Ridge的块,ElasticNet的块和SVM的块(为什么要分别运行Ridge和Lasso来自ElasticNet,因为它们是ElasticNet的特例?).这将花费不超过10-15行代码.只有确保确保它们全部执行之后,才能找到最佳的超参数,并计算出所需的回归指标,然后我才尝试优化代码并将所有内容组合在一起.

PS:

这些循环应该如何运行:

 得分的得分:对于MODELS.keys()中的模型: 

在定义模型之前?

I am using GridSearchCV for cross validation of a linear regression (not a classifier nor a logistic regression).

I also use StandardScaler for normalization of X

My dataframe has 17 features (X) and 5 targets (y) (observations). Around 1150 rows

I keep getting ValueError: continuous is not supported error message and ran out of options.

here is some code (assume all imports are done properly):

soilM = pd.read_csv('C:/training.csv', index_col=0)
soilM = getDummiedSoilDepth(soilM) #transform text values in 0 and 1

soilM = soilM.drop('Depth', 1) 

soil = soilM.iloc[:,-22:]

X_train, X_test, Ca_train, Ca_test, P_train, P_test, pH_train, pH_test, SOC_train, SOC_test, Sand_train, Sand_test = splitTrainTestAdv(soil)

scores = ['precision', 'recall']


for score in scores:

    for model in MODELS.keys():

        print model, score

        performParameterSelection(model, score, X_test, Ca_test, X_train, Ca_train)

def performParameterSelection(model_name, criteria, X_test, y_test, X_train, y_train):

    model, param_grid = MODELS[model_name]
    gs = GridSearchCV(model, param_grid, n_jobs= 1, cv=5, verbose=1, scoring='%s_weighted' % criteria)

    gs.fit(X_train, y_train) 

    print(gs.best_params_)

    for params, mean_score, scores in gs.grid_scores_:
        print("%0.3f (+/-%0.03f) for %r"
          % (mean_score, scores.std() * 2, params))


    y_true, y_pred = y_test, gs.predict(X_test)
    print(classification_report(y_true, y_pred))


MODELS = {
    'lasso': (
        linear_model.Lasso(),
        {'alpha': [0.95]}
    ),
    'ridge': (
        linear_model.Ridge(),
        {'alpha': [0.01]}
    ),
    'elasticnet': (
        linear_model.ElasticNet(),
        {
            'alpha': [0.6],
            'l1_ratio': [0.4]
        }
    ),
    'svr': (
        svm.SVR(),
        {
            'C': [5.0],
            'epsilon': [0.1],
            'kernel': ['linear']
        }
    )
 }


def performLasso(X_train, y_train, X_test, parameter):

     alpha = parameter[0]

    model = linear_model.Lasso(alpha=alpha, normalize=True) #pass alpha to Lasso
    model.fit(X_train, y_train)



    return model.predict(X_test)

def splitTrainTestAdv(df):


    y = df.iloc[:,-5:].copy()  # last 5 columns
    X1 = df.iloc[:,:-5].copy()  # Except for last 5 columns

    Ca = y['Ca'].copy()
    P = y['P'].copy()
    pH = y['pH'].copy()
    SOC = y['SOC'].copy()
    Sand = y['Sand'].copy()


    #Scaling and Sampling

    X = StandardScaler(copy=False).fit_transform(X1)

    X_train, X_test, Ca_train, Ca_test = train_test_split(X, Ca, test_size=0.2, random_state=0)


    return X_train, X_test, Ca_train, Ca_test, P_train, P_test, pH_train, pH_test, SOC_train, SOC_test, Sand_train, Sand_test

These are the main pieces of the code

This is the main part of Error output:

ValueError                                Traceback (most recent call last)
<ipython-input-90-1315d47e2551> in <module>()
     20         print '####################'
     21         print featuresV[1]
---> 22         performParameterSelection(model, score, X_test, Ca_test,  X_train, Ca_train)
     23         print featuresV[2]
     24         performParameterSelection(model, score, X_test, P_test, X_train, P_train)

<ipython-input-41-7075e1a49412> in performParameterSelection(model_name, criteria, X_test, y_test, X_train, y_train)
     12     # cv=5 - constant; verbose - keep writing
     13 
---> 14     gs.fit(X_train, y_train) # Will get grid scores with outputs from ALL models described above
     15 
     16         #pprint(sorted(gs.grid_scores_, key=lambda x: -x.mean_validation_score))

C:\Users\Tony\Anaconda\lib\site-packages\sklearn\grid_search.pyc in fit(self, X, y)
    730 
    731         """
--> 732         return self._fit(X, y, ParameterGrid(self.param_grid))



     90     if (y_type not in ["binary", "multiclass", "multilabel-indicator",
     91                        "multilabel-sequences"]):
---> 92         raise ValueError("{0} is not supported".format(y_type))
     93 
     94     if y_type in ["binary", "multiclass"]:

 ValueError: continuous is not supported

Here is some data after using soil.head(15). It does not show all the columns but it should behave in the same way with 8 features instead of 17. As for target: these are the last 5 columns but the code here calculated only one (Ca)

    BSAN    BSAS    BSAV    CTI ELEV    EVI LSTD    LSTN    REF1    REF2    ... RELI    Subsoil Topsoil TMAP    TMFI    Ca  P   pH  SOC Sand
PIDN                                                                                    
92RkYor6    -0.405797   -0.563636   -0.806271   -0.228241   -0.691982     1.653790  -0.605889   0.627488    -0.856727   0.056586    ... -0.062181   0     1 0.896228    1.651807    -0.394962   0.031291    0.488676    -0.389042   0.630347
nPv9P04t    -0.688406   -0.709091   -0.739082   -0.189180   1.185523    0.395773    -0.381748   -0.338928   -0.774545   -0.818182   ... 2.995923    1   0   1.539208    1.618022    -0.460044   -0.366432   -0.549490   0.204798    -1.162260
oCASbXEx    -0.623188   -0.654545   -0.727884   -0.155835   0.711136    0.517493    -0.035002   -0.092554   -0.725818   -0.651206   ... -0.300034   1   0   0.286952    0.657765    0.259613    -0.407934   0.591558    -0.529688   -0.793082
xq94dGBz    -0.746377   -0.781818   -0.862262   -0.340427   0.791314    0.672741    -0.665032   -0.128613   -0.853091   -0.741187   ... -0.418960   0     1 0.276740    0.678724    -0.467854   -0.245386   -0.577548   -0.428111   -0.130845
GYSYA8Yf    -0.862319   -0.836364   -0.783875   -0.020427   4.715590    0.473032    -1.321194   -2.560069   -0.791273   -0.827458   ... 2.299354    1   0   0.583042    1.825040    1.442361    -0.328389   0.797320    -0.443738   -0.892037
G4e9Ahvi    -0.710145   -0.736364   -0.727884   -0.175122   -1.003786   0.744898    -0.678329   0.851702    -0.661818   -0.474954   ... -0.300034   1   0   1.544703    1.641861    -0.355335   -0.079380   -0.287610   -0.256209   0.287810
SHU443XO    -0.579710   -0.736364   -0.963046   -0.536744   -0.179733   1.793003    -0.914052   0.291898    -0.966545   -0.086271   ... 0.260618    0   1   1.840689    2.223996    -0.499961   0.155796    -0.886192   -0.107749   0.942435
oAeygDKu    -0.152174   -0.154545   -0.134378   1.252267    -0.796659   -0.155977   1.309391    0.642680    -0.205818   -0.341373   ... -0.537887   1   0   -0.320335   0.429981    -0.441821   -0.352598   0.339031    -0.826609   1.650344
agBvYkUI    -0.724638   -0.790909   -0.839866   0.114245    1.363697    0.726676    -1.687885   0.060034    -0.706909   -0.523191   ... 1.127081    1   0   1.254782    0.972442    -0.505456   -0.345681   -1.774712   0.071966    -1.207931
8ujcZd8d    -0.427536   -0.600000   -0.806271   -0.667808   -1.208686   2.008018    -1.276453   1.203854    -0.698182   0.224490    ... 0.107713    0   1   0.288463    0.013744    -0.362277   -0.338764   0.039740    -0.232768   0.451467
hqO5LhmQ    -0.644928   -0.690909   -0.772676   -0.195877   1.138753    0.390671    0.145537    -0.544813   -0.722909   -0.729128   ... -0.537887   0   1   0.153926    0.422784    -0.460333   -0.300721   -0.063142   -0.607825   1.208852
QsfH8CWp    -0.449275   -0.618182   -0.862262   -0.512923   -0.712027   1.537901    -0.665190   0.595265    -0.884364   -0.103896   ... -0.028203   1   0   0.896228    1.651807    -0.475953   -0.252303   -0.128612   -0.670335   0.786391
5hhEGbrX    -0.260870   -0.290909   -0.335946   -0.175122   -0.749889   0.400146    0.299908    0.567983    -0.423273   -0.244898   ... -0.520897   1   0   0.249117    0.907095    -0.142446   -0.397558   0.423206    -0.412483   -0.678903
XlJWsmdz    -0.768116   -0.800000   -0.873460   -0.737115   0.682183    1.013848    -1.013065   -0.376346   -0.837818   -0.544527   ... 1.619776    1   0   0.942437    1.482143    -0.358517   1.283256    -0.072494   -0.490620   -0.899649
FY3riRgw    -0.818841   -0.863636   -0.873460   -0.739177   1.715590    1.434402    -1.669818   -0.090647   -0.874909   -0.388683   ... 3.182807    0   1   1.254782    0.972442    -0.333063   0.020916    -0.942309   1.314342    -0.690321

15 rows × 22 columns

解决方案

Your error continuous is not supported tells me you're trying to do "something" from regression domain on classification domain.

At least 1 thing captures my eye as your target is regression:

 scores = ['precision', 'recall']

To start with, both have nothing to do with regression (as @zero323 pointed out in a comment to your question): they are accuracy measures for classification. Try any regression scores that suit your tastes from this sklearn docs page, section "3.3.1.1. Common cases: predefined values"

As far as the rest of the code is concerned, I would strongly encourage you to rewrite your code from scratch: chunk for Lasso, chunk for Ridge, chunk for ElasticNet and chunk for SVM (why would you run Ridge and Lasso separately from ElasticNet as they are special cases of ElasticNet???). This will take you no more than 10-15 lines of code. Only after you made it sure all of them execute, optimal hyperparameters are found, and desirable regression metrics are calculated I would attempt optimizing the code and putting everything together in a loop.

PS:

how are these loops supposed to run:

for score in scores:
  for model in MODELS.keys():

prior to defining MODELS?

这篇关于ValueError:不支持连续的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆