在scikit学习中结合随机森林模型 [英] Combining random forest models in scikit learn

查看:28
本文介绍了在scikit学习中结合随机森林模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个 RandomForestClassifier 模型,我想将它们组合成一个元模型.他们都使用相似但不同的数据进行训练.我该怎么做?

I have two RandomForestClassifier models, and I would like to combine them into one meta model. They were both trained using similar, but different, data. How can I do this?

rf1 #this is my first fitted RandomForestClassifier object, with 250 trees
rf2 #this is my second fitted RandomForestClassifier object, also with 250 trees

我想创建 big_rf 将所有的树组合成一个 500 棵树模型

I want to create big_rf with all trees combined into one 500 tree model

推荐答案

我相信这可以通过修改 RandomForestClassifier 对象上的 estimators_n_estimators 属性来实现.森林中的每棵树都存储为一个 DecisionTreeClassifier 对象,这些树的列表存储在 estimators_ 属性中.为确保没有不连续性,更改 n_estimators 中的估算器数量也是有意义的.

I believe this is possible by modifying the estimators_ and n_estimators attributes on the RandomForestClassifier object. Each tree in the forest is stored as a DecisionTreeClassifier object, and the list of these trees is stored in the estimators_ attribute. To make sure there is no discontinuity, it also makes sense to change the number of estimators in n_estimators.

这种方法的优点是你可以在多台机器上并行构建一堆小森林并将它们组合起来.

The advantage of this method is that you could build a bunch of small forests in parallel across multiple machines and combine them.

以下是使用 iris 数据集的示例:

Here's an example using the iris data set:

from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import train_test_split
from sklearn.datasets import load_iris

def generate_rf(X_train, y_train, X_test, y_test):
    rf = RandomForestClassifier(n_estimators=5, min_samples_leaf=3)
    rf.fit(X_train, y_train)
    print "rf score ", rf.score(X_test, y_test)
    return rf

def combine_rfs(rf_a, rf_b):
    rf_a.estimators_ += rf_b.estimators_
    rf_a.n_estimators = len(rf_a.estimators_)
    return rf_a

iris = load_iris()
X, y = iris.data[:, [0,1,2]], iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.33)
# in the line below, we create 10 random forest classifier models
rfs = [generate_rf(X_train, y_train, X_test, y_test) for i in xrange(10)]
# in this step below, we combine the list of random forest models into one giant model
rf_combined = reduce(combine_rfs, rfs)
# the combined model scores better than *most* of the component models
print "rf combined score", rf_combined.score(X_test, y_test)

这篇关于在scikit学习中结合随机森林模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆