使用新数据更新决策树 [英] Updating a Decision Tree With New Data

查看:192
本文介绍了使用新数据更新决策树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是决策树的新手。我正计划构建一个大型决策树,我希望稍后使用其他数据进行更新。最好的方法是什么?可以在以后更新任何决策树吗?

I am new to decision trees. I am planning to build a large decision tree that I would like to update later with additional data. What is the best approach to this? Can any decision tree be later updated?

推荐答案

决策树最常接受所有可用数据的训练。也就是说,当您拥有新数据时,您将重新训练整个树。由于此过程非常快,因此通常不会出现问题。如果数据太大而无法容纳在内存中,您通常可以通过对训练集进行二次采样(行采样)来解决它,因为基于树的模型不需要太多的数据即可得出良好的结果。

Decision trees are most often trained on all available data. That is, when you have new data, you retrain the entire tree. Since this process is very fast it is usually not problematic. If data is too big to fit in memory, you can often get around it by subsampling (row sampling) the training set, since tree-based models don't need that much data to give good results.

请注意,决策树很容易过拟合,因此您应该考虑使用随机森林或其他集成方法。通过装袋,可以在不同的数据子集上训练不同的树。

Note that decision trees are quite vunerable to overfitting, and you should consider Random Forest or another ensemble method. With bagging it is possible to train different trees on different subsets of data.

对于决策树,还存在增量和在线学习方法。例如CART,ID3和VFDT学习器。

There also exists incremental and online learning methods for decision trees. CART, ID3 and VFDT learner are some examples.

这篇关于使用新数据更新决策树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆