我可以在 Scikit learn Pipeline 中添加异常值检测和移除吗? [英] Can I add outlier detection and removal to Scikit learn Pipeline?

查看:52
本文介绍了我可以在 Scikit learn Pipeline 中添加异常值检测和移除吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在 Scikit-Learn 中创建一个管道,其中一个特定步骤是异常值检测和移除,允许将转换后的数据传递给其他转换器和估计器.

I want to create a Pipeline in Scikit-Learn with a specific step being outlier detection and removal, allowing the transformed data to be passed to other transformers and estimator.

我搜索了 SE,但在任何地方都找不到这个答案.这可能吗?

I have searched SE but can't find this answer anywhere. Is this possible?

推荐答案

是的.子类 TransformerMixin 并构建自定义转换器.这是对现有异常值检测方法之一的扩展:

Yes. Subclass the TransformerMixin and build a custom transformer. Here is an extension to one of the existing outlier detection methods:

from sklearn.pipeline import Pipeline, TransformerMixin
from sklearn.neighbors import LocalOutlierFactor

class OutlierExtractor(TransformerMixin):
    def __init__(self, **kwargs):
        """
        Create a transformer to remove outliers. A threshold is set for selection
        criteria, and further arguments are passed to the LocalOutlierFactor class

        Keyword Args:
            neg_conf_val (float): The threshold for excluding samples with a lower
               negative outlier factor.

        Returns:
            object: to be used as a transformer method as part of Pipeline()
        """

        self.threshold = kwargs.pop('neg_conf_val', -10.0)

        self.kwargs = kwargs

    def transform(self, X, y):
        """
        Uses LocalOutlierFactor class to subselect data based on some threshold

        Returns:
            ndarray: subsampled data

        Notes:
            X should be of shape (n_samples, n_features)
        """
        X = np.asarray(X)
        y = np.asarray(y)
        lcf = LocalOutlierFactor(**self.kwargs)
        lcf.fit(X)
        return (X[lcf.negative_outlier_factor_ > self.threshold, :],
                y[lcf.negative_outlier_factor_ > self.threshold])

    def fit(self, *args, **kwargs):
        return self

然后创建一个管道:

pipe = Pipeline([('outliers', OutlierExtraction()), ...])

这篇关于我可以在 Scikit learn Pipeline 中添加异常值检测和移除吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆