提供"pickle.PicklingError"的自定义sklearn管道转换器 [英] Custom sklearn pipeline transformer giving "pickle.PicklingError"

查看:115
本文介绍了提供"pickle.PicklingError"的自定义sklearn管道转换器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据本教程的指导为Python sklearn管道创建自定义转换器: http://danielhnyk.cz/creating-your-own-estimator-scikit-learn/

I am trying to create a custom transformer for a Python sklearn pipeline based on guidance from this tutorial: http://danielhnyk.cz/creating-your-own-estimator-scikit-learn/

现在我的自定义类/变压器看起来像这样:

Right now my custom class/transformer looks like this:

class SelectBestPercFeats(BaseEstimator, TransformerMixin):
    def __init__(self, model=RandomForestRegressor(), percent=0.8,
                 random_state=52):
        self.model = model
        self.percent = percent
        self.random_state = random_state


    def fit(self, X, y, **fit_params):
        """
        Find features with best predictive power for the model, and
        have cumulative importance value less than self.percent
        """
        # Check parameters
        if not isinstance(self.percent, float):
            print("SelectBestPercFeats.percent is not a float, it should be...")
        elif not isinstance(self.random_state, int):
            print("SelectBestPercFeats.random_state is not a int, it should be...")

        # If checks are good proceed with fitting...
        else:
            try:
                self.model.fit(X, y)
            except:
                print("Error fitting model inside SelectBestPercFeats object")
                return self

            # Get feature importance
            try:
                feat_imp = list(self.model.feature_importances_)
                feat_imp_cum = pd.Series(feat_imp, index=X.columns) \
                    .sort_values(ascending=False).cumsum()

                # Get features whose cumulative importance is <= `percent`
                n_feats = len(feat_imp_cum[feat_imp_cum <= self.percent].index) + 1
                self.bestcolumns_ = list(feat_imp_cum.index)[:n_feats]
            except:
                print ("ERROR: SelectBestPercFeats can only be used with models with"\
                       " .feature_importances_ parameter")
        return self


    def transform(self, X, y=None, **fit_params):
        """
        Filter out only the important features (based on percent threshold)
        for the model supplied.

        :param X: Dataframe with features to be down selected
        """
        if self.bestcolumns_ is None:
            print("Must call fit function on SelectBestPercFeats object before transforming")
        else:
            return X[self.bestcolumns_]

我正在将此类集成到这样的sklearn管道中:

I am integrating this Class into an sklearn pipeline like this:

# Define feature selection and model pipeline components
rf_simp = RandomForestRegressor(criterion='mse', n_jobs=-1,
                                n_estimators=600)
bestfeat = SelectBestPercFeats(rf_simp, feat_perc)
rf = RandomForestRegressor(n_jobs=-1,
                           criterion='mse',
                           n_estimators=200,
                           max_features=0.4,
                           )

# Build Pipeline
master_model = Pipeline([('feat_sel', bestfeat), ('rf', rf)])

# define GridSearchCV parameter space to search, 
#   only listing one parameter to simplify troubleshooting
param_grid = {
    'feat_select__percent': [0.8],
}

# Fit pipeline model
grid = GridSearchCV(master_model, cv=3, n_jobs=-1,
                    param_grid=param_grid)

# Search grid using CV, and get the best estimator
grid.fit(X_train, y_train)

每当我运行最后一行代码(grid.fit(X_train, y_train))时,我都会得到以下"PicklingError".谁能在我的代码中看到导致此问题的原因?

Whenever I run the last line of code (grid.fit(X_train, y_train)) I get the following "PicklingError". Can anyone see what is causing this problem in my code?

或者,我的Python设置中有什么地方是错误的...我可能缺少软件包或类似的东西吗?我刚刚检查可以成功import pickle

Or, is there something in my Python setup that's wrong... Might I be missing a package or something similar? I just checked that I can import pickle successfully

回溯(最近一次通话最后一次):文件",第5行,在 文件 "C:\ Users \ jjaaae \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ sklearn \ model_selection_search.py​​", 945号线,适合 返回self._fit(X,y,组,ParameterGrid(self.param_grid))文件 "C:\ Users \ jjaaae \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ sklearn \ model_selection_search.py​​", _fit中的第564行 对于parameter_iterable文件"C:\ Users \ jjaaae \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ sklearn \ externals \ joblib \ parallel.py"中的参数, 第768行,在致电中 self.retrieve()文件"C:\ Users \ jjaaae \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ sklearn \ externals \ joblib \ parallel.py", 检索中的第719行 引发异常文件"C:\ Users \ jjaaae \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ sklearn \ externals \ joblib \ parallel.py", 检索中的第682行 self._output.extend(job.get(timeout = self.timeout))文件"C:\ Users \ jjaaae \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ multiprocessing \ pool.py", 第608行,进入 提高self._value文件"C:\ Users \ jjaaae \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ multiprocessing \ pool.py", 第385行,在_handle_tasks中 放置(任务)文件"C:\ Users \ jjaaae \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ sklearn \ externals \ joblib \ pool.py", 发送中的第371行 CustomizablePickler(缓冲区,self._reducers).dump(obj) _pickle.PicklingError:无法腌制:内置的属性查找SelectBestPercFeats失败

Traceback (most recent call last): File "", line 5, in File "C:\Users\jjaaae\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\model_selection_search.py", line 945, in fit return self._fit(X, y, groups, ParameterGrid(self.param_grid)) File "C:\Users\jjaaae\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\model_selection_search.py", line 564, in _fit for parameters in parameter_iterable File "C:\Users\jjaaae\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\externals\joblib\parallel.py", line 768, in call self.retrieve() File "C:\Users\jjaaae\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\externals\joblib\parallel.py", line 719, in retrieve raise exception File "C:\Users\jjaaae\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\externals\joblib\parallel.py", line 682, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "C:\Users\jjaaae\AppData\Local\Programs\Python\Python36\lib\multiprocessing\pool.py", line 608, in get raise self._value File "C:\Users\jjaaae\AppData\Local\Programs\Python\Python36\lib\multiprocessing\pool.py", line 385, in _handle_tasks put(task) File "C:\Users\jjaaae\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\externals\joblib\pool.py", line 371, in send CustomizablePickler(buffer, self._reducers).dump(obj) _pickle.PicklingError: Can't pickle : attribute lookup SelectBestPercFeats on builtins failed

推荐答案

pickle程序包需要在另一个模块中定义自定义类,然后将其导入.因此,创建另一个python软件包文件(例如transformation.py),然后像这样from transformation import SelectBestPercFeats导入它.这样可以解决酸洗错误.

The pickle package needs the custom class(es) to be defined in another module and then imported. So, create another python package file (e.g. transformation.py) and then import it like this from transformation import SelectBestPercFeats. That will resolve the pickling error.

这篇关于提供"pickle.PicklingError"的自定义sklearn管道转换器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆