使用scikit-learn(或任何其他python框架)集成不同类型的回归器 [英] Ensemble of different kinds of regressors using scikit-learn (or any other python framework)

查看:104
本文介绍了使用scikit-learn(或任何其他python框架)集成不同类型的回归器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解决回归任务.我发现3个模型对于不同的数据子集运行良好:LassoLARS,SVR和Gradient Tree Boosting.我注意到,当我使用所有这三个模型进行预测,然后制作真实输出"和这3个模型的输出的表格时,我看到每次至少有一个模型确实接近真实输出,尽管另外两个模型可能相对较远.

I am trying to solve the regression task. I found out that 3 models are working nicely for different subsets of data: LassoLARS, SVR and Gradient Tree Boosting. I noticed that when I make predictions using all these 3 models and then make a table of 'true output' and outputs of my 3 models I see that each time at least one of the models is really close to the true output, though 2 others could be relatively far away.

当我计算出最小的可能误差时(如果我从每个测试示例的最佳"预测变量中获取预测结果),我得到的误差要比仅任何模型的误差小得多.因此,我考虑过尝试将来自这三种不同模型的预测合并到某种整体中.问题是,如何正确执行此操作?我的所有3个模型都是使用scikit-learn构建和调整的,是否提供了某种可用于将模型打包到集合中的方法?这里的问题是,我不想仅对所有三个模型的预测取平均值,而是要通过加权来做到这一点,应该根据特定示例的属性确定加权.

When I compute minimal possible error (if I take prediction from 'best' predictor for each test example) I get a error which is much smaller than error of any model alone. So I thought about trying to combine predictions from these 3 diffent models into some kind of ensemble. Question is, how to do this properly? All my 3 models are build and tuned using scikit-learn, does it provide some kind of a method which could be used to pack models into ensemble? The problem here is that I don't want to just average predictions from all three models, I want to do this with weighting, where weighting should be determined based on properties of specific example.

即使scikit-learn不提供这样的功能,如果有人知道如何解决该任务-为数据中的每个示例计算每个模型的权重,也将是一个很好的选择.我认为这可以通过在所有这三个模型之上构建一个单独的回归器来完成,该回归器将尝试为这三个模型中的每个模型输出最佳权重,但是我不确定这是否是最好的方法.

Even if scikit-learn not provides such functionality, it would be nice if someone knows how to property address this task - of figuring out the weighting of each model for each example in data. I think that it might be done by a separate regressor built on top of all these 3 models, which will try output optimal weights for each of 3 models, but I am not sure if this is the best way of doing this.

推荐答案

这是分层预测中一个已知的有趣(通常很痛苦!)的问题.在训练数据上训练多个预测变量,然后再次使用训练数据在其上训练更高的预测变量的问题,与偏差方差分解有关.

This is a known interesting (and often painful!) problem with hierarchical predictions. A problem with training a number of predictors over the train data, then training a higher predictor over them, again using the train data - has to do with the bias-variance decomposition.

假设您有两个预测变量,一个本质上是另一个预测变量的过拟合版本,那么前者将出现在训练集上,比后者更好.组合预测变量无缘无故会偏爱前者,只是因为它无法将过度拟合与真正的高质量预测区分开.

Suppose you have two predictors, one essentially an overfitting version of the other, then the former will appear over the train set to be better than latter. The combining predictor will favor the former for no true reason, just because it cannot distinguish overfitting from true high-quality prediction.

已知的处理方式是,针对火车数据中的每一行,针对每个预测变量,根据适合于该行的模型 ,为该行进行预测.例如,对于过拟合的版本,该行平均不会产生良好的结果.这样,组合预测器将能够更好地评估用于组合较低级别预测器的公平模型.

The known way of dealing with this is to prepare, for each row in the train data, for each of the predictors, a prediction for the row, based on a model not fit for this row. For the overfitting version, e.g., this won't produce a good result for the row, on average. The combining predictor will then be able to better assess a fair model for combining the lower-level predictors.

Shahar Azulay&我为此编写了一个变形阶段:

Shahar Azulay & I wrote a transformer stage for dealing with this:

class Stacker(object):
    """
    A transformer applying fitting a predictor `pred` to data in a way
        that will allow a higher-up predictor to build a model utilizing both this 
        and other predictors correctly.

    The fit_transform(self, x, y) of this class will create a column matrix, whose 
        each row contains the prediction of `pred` fitted on other rows than this one. 
        This allows a higher-level predictor to correctly fit a model on this, and other
        column matrices obtained from other lower-level predictors.

    The fit(self, x, y) and transform(self, x_) methods, will fit `pred` on all 
        of `x`, and transform the output of `x_` (which is either `x` or not) using the fitted 
        `pred`.

    Arguments:    
        pred: A lower-level predictor to stack.

        cv_fn: Function taking `x`, and returning a cross-validation object. In `fit_transform`
            th train and test indices of the object will be iterated over. For each iteration, `pred` will
            be fitted to the `x` and `y` with rows corresponding to the
            train indices, and the test indices of the output will be obtained
            by predicting on the corresponding indices of `x`.
    """
    def __init__(self, pred, cv_fn=lambda x: sklearn.cross_validation.LeaveOneOut(x.shape[0])):
        self._pred, self._cv_fn  = pred, cv_fn

    def fit_transform(self, x, y):
        x_trans = self._train_transform(x, y)

        self.fit(x, y)

        return x_trans

    def fit(self, x, y):
        """
        Same signature as any sklearn transformer.
        """
        self._pred.fit(x, y)

        return self

    def transform(self, x):
        """
        Same signature as any sklearn transformer.
        """
        return self._test_transform(x)

    def _train_transform(self, x, y):
        x_trans = np.nan * np.ones((x.shape[0], 1))

        all_te = set()
        for tr, te in self._cv_fn(x):
            all_te = all_te | set(te)
            x_trans[te, 0] = self._pred.fit(x[tr, :], y[tr]).predict(x[te, :]) 
        if all_te != set(range(x.shape[0])):
            warnings.warn('Not all indices covered by Stacker', sklearn.exceptions.FitFailedWarning)

        return x_trans

    def _test_transform(self, x):
        return self._pred.predict(x)


以下是@MaximHaytovich答案中描述的设置改进示例.


Here is an example of the improvement for the setting described in @MaximHaytovich's answer.

首先,进行一些设置:

    from sklearn import linear_model
    from sklearn import cross_validation
    from sklearn import ensemble
    from sklearn import metrics

    y = np.random.randn(100)
    x0 = (y + 0.1 * np.random.randn(100)).reshape((100, 1)) 
    x1 = (y + 0.1 * np.random.randn(100)).reshape((100, 1)) 
    x = np.zeros((100, 2)) 

请注意,x0x1只是y的嘈杂版本.我们将使用前80行进行训练,使用后20行进行测试.

Note that x0 and x1 are just noisy versions of y. We'll use the first 80 rows for train, and the last 20 for test.

这是两个预测​​变量:高方差梯度增强器和线性预测变量:

These are the two predictors: a higher-variance gradient booster, and a linear predictor:

    g = ensemble.GradientBoostingRegressor()
    l = linear_model.LinearRegression()

这是答案中建议的方法:

Here is the methodology suggested in the answer:

    g.fit(x0[: 80, :], y[: 80])
    l.fit(x1[: 80, :], y[: 80])

    x[:, 0] = g.predict(x0)
    x[:, 1] = l.predict(x1)

    >>> metrics.r2_score(
        y[80: ],
        linear_model.LinearRegression().fit(x[: 80, :], y[: 80]).predict(x[80: , :]))
    0.940017788444

现在,使用堆栈:

    x[: 80, 0] = Stacker(g).fit_transform(x0[: 80, :], y[: 80])[:, 0]
    x[: 80, 1] = Stacker(l).fit_transform(x1[: 80, :], y[: 80])[:, 0]

    u = linear_model.LinearRegression().fit(x[: 80, :], y[: 80])

    x[80: , 0] = Stacker(g).fit(x0[: 80, :], y[: 80]).transform(x0[80:, :])
    x[80: , 1] = Stacker(l).fit(x1[: 80, :], y[: 80]).transform(x1[80:, :])

    >>> metrics.r2_score(
        y[80: ],
        u.predict(x[80:, :]))
    0.992196564279

堆叠预测效果更好.它意识到梯度增强器并不是那么好.

The stacking prediction does better. It realizes that the gradient booster is not that great.

这篇关于使用scikit-learn(或任何其他python框架)集成不同类型的回归器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆