使用Sklearn Grid Search优化两个估计量(彼此依赖) [英] Optimizing two estimators (dependent on each other) using Sklearn Grid Search

查看:858
本文介绍了使用Sklearn Grid Search优化两个估计量(彼此依赖)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的程序的流程分为两个阶段.

The flow of my program is in two stages.

我正在使用Sklearn ExtraTreesClassifierSelectFromModel方法来选择最重要的功能.此处应注意,ExtraTreesClassifier采用许多参数作为输入,例如n_estimators等用于分类,并最终通过SelectFromModeln_estimators的不同值提供不同的重要特征集.这意味着我可以优化n_estimators以获得最佳功能.

I am using Sklearn ExtraTreesClassifier along with SelectFromModelmethod to select the most important features. Here it should be noted that the ExtraTreesClassifier takes many parameters as input like n_estimators etc for classification and eventually giving different set of important features for different values of n_estimators via SelectFromModel. This means that I can optimize the n_estimators to get the best features.

在第二阶段中,我将根据在第一阶段中选择的功能来训练我的NN keras模型.我使用AUROC作为网格搜索的分数,但是此AUROC是使用基于Keras的神经网络计算的.我想在我的ExtraTreesClassifier中使用Grid Search for n_estimators来优化keras神经网络的AUROC.我知道我必须使用Pipline,但是我对同时实现这两者感到困惑.我不知道在我的代码中放置管道的位置.我收到一条错误消息,内容为TypeError: estimator should be an estimator implementing 'fit' method, <function fs at 0x0000023A12974598> was passed

In the second stage, I am traing my NN keras model based on the features selected in the first stage. I am using AUROC as the score for grid search but this AUROC is calculated using Keras based neural network. I want to use Grid Search for n_estimators in my ExtraTreesClassifier to optimize the AUROC of keras neural Network. I know I have to use Pipline but I am confused in implementing both together. I don't know where to put Pipeline in my code. I am getting an error which saysTypeError: estimator should be an estimator implementing 'fit' method, <function fs at 0x0000023A12974598> was passed

#################################################################################
I concatenate the CV set and the train set so that I may select the most important features  
in both CV and Train together.
##############################################################################

frames11 = [train_x_upsampled, cross_val_x_upsampled]
train_cv_x = pd.concat(frames11)
frames22 = [train_y_upsampled, cross_val_y_upsampled]
train_cv_y = pd.concat(frames22)


def fs(n_estimators):
  m = ExtraTreesClassifier(n_estimators = tree_number)
  m.fit(train_cv_x,train_cv_y)
  sel = SelectFromModel(m, prefit=True)


  ##################################################
  The code below is to get the names of the selected important features
  ###################################################

  feature_idx = sel.get_support()
  feature_name = train_cv_x.columns[feature_idx]
  feature_name =pd.DataFrame(feature_name)

  X_new = sel.transform(train_cv_x)
  X_new =pd.DataFrame(X_new)

 ######################################################################
 So Now the important features selected are in the data-frame X_new. In 
 code below, I am again dividing the data into train and CV but this time 
 only with the important features selected.
 #################################################################### 

  train_selected_x = X_new.iloc[0:train_x_upsampled.shape[0], :]
  cv_selected_x = X_new.iloc[train_x_upsampled.shape[0]:train_x_upsampled.shape[0]+cross_val_x_upsampled.shape[0], :]

  train_selected_y = train_cv_y.iloc[0:train_x_upsampled.shape[0], :]
  cv_selected_y = train_cv_y.iloc[train_x_upsampled.shape[0]:train_x_upsampled.shape[0]+cross_val_x_upsampled.shape[0], :]

  train_selected_x=train_selected_x.values
  cv_selected_x=cv_selected_x.values
  train_selected_y=train_selected_y.values
  cv_selected_y=cv_selected_y.values

  ##############################################################
  Now with this new data which only contains the important features,
  I am training a neural network as below.
  #########################################################
  def create_model():
     n_x_new=train_selected_x.shape[1]

     model = Sequential()
     model.add(Dense(n_x_new, input_dim=n_x_new, kernel_initializer='glorot_normal', activation='relu'))
     model.add(Dense(10, kernel_initializer='glorot_normal', activation='relu'))
     model.add(Dropout(0.8))

     model.add(Dense(1, kernel_initializer='glorot_normal', activation='sigmoid'))
     optimizer = keras.optimizers.Adam(lr=0.001)


     model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])

  seed = 7
  np.random.seed(seed)

model = KerasClassifier(build_fn=create_model, epochs=20, batch_size=400, verbose=0)

n_estimators=[10,20,30]
param_grid = dict(n_estimators=n_estimators)

grid = GridSearchCV(estimator=fs, param_grid=param_grid,scoring='roc_auc',cv = PredefinedSplit(test_fold=my_test_fold), n_jobs=1)
grid_result = grid.fit(np.concatenate((train_selected_x, cv_selected_x), axis=0), np.concatenate((train_selected_y, cv_selected_y), axis=0))

推荐答案

这就是我构建自己的自定义转换器的方式. fs类(TransformerMixin,BaseEstimator):

This is how I built my own custom transformer. class fs(TransformerMixin, BaseEstimator):

def __init__(self, n_estimators=10 ):
    self.ss=None
    self.n_estimators = n_estimators
    self.x_new = None


def fit(self, X, y):
    m = ExtraTreesClassifier(10)
    m.fit(X,y)
    self.ss = SelectFromModel(m, prefit=True)
    return self

def transform(self, X):
    self.x_new=self.ss.transform(X)
    print(np.shape(self.x_new))
    return self.x_new

这篇关于使用Sklearn Grid Search优化两个估计量(彼此依赖)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆