帮助理解交叉验证和决策树 [英] Help Understanding Cross Validation and Decision Trees

查看：30 发布时间：2021/12/14 10:08:51 algorithm machine-learning decision-tree

本文介绍了帮助理解交叉验证和决策树的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在阅读决策树和交叉验证，我理解这两个概念.但是，我无法理解交叉验证，因为它与决策树有关.本质上，当数据集相对较小时，交叉验证允许您在训练和测试之间交替以最大化您的错误估计.一个非常简单的算法是这样的:

I've been reading up on Decision Trees and Cross Validation, and I understand both concepts. However, I'm having trouble understanding Cross Validation as it pertains to Decision Trees. Essentially Cross Validation allows you to alternate between training and testing when your dataset is relatively small to maximize your error estimation. A very simple algorithm goes something like this:

决定你想要的折叠次数 (k)
将您的数据集细分为 k 折
对训练集使用 k-1 折叠来构建一棵树.
使用测试集来估计有关树中错误的统计信息.
保存您的结果以备后用
重复步骤 3-6 k 次，为您的测试集留下不同的折叠.
平均迭代中的错误以预测整体错误

我想不通的问题是，最后你会有 k 个决策树，它们都可能略有不同，因为它们的分割方式可能不同，等等.你选择哪棵树?我的一个想法是选择错误最少的那个(尽管这并不能使它成为最佳选择，只是因为它在给出的折叠上表现最好 - 也许使用分层会有所帮助，但我读过的所有内容都说它只会有一点帮助).

The problem I can't figure out is at the end you'll have k Decision trees that could all be slightly different because they might not split the same way, etc. Which tree do you pick? One idea I had was pick the one with minimal errors (although that doesn't make it optimal just that it performed best on the fold it was given - maybe using stratification will help but everything I've read say it only helps a little bit).

据我所知，交叉验证的重点是计算节点统计数据，以便稍后用于修剪.因此，实际上树中的每个节点都会根据给定的测试集为其计算统计数据.重要的是节点统计数据中的这些，但如果您平均错误.当每棵树在选择拆分的内容等方面各不相同时，您如何在 k 棵树的每个节点内合并这些统计信息.

As I understand cross validation the point is to compute in node statistics that can later be used for pruning. So really each node in the tree will have statistics calculated for it based on the test set given to it. What's important are these in node stats, but if your averaging your error. How do you merge these stats within each node across k trees when each tree could vary in what they choose to split on, etc.

计算每次迭代的总体误差有什么意义?这不是修剪过程中可以使用的东西.

What's the point of calculating the overall error across each iteration? That's not something that could be used during pruning.

对这个小皱纹的任何帮助将不胜感激.

Any help with this little wrinkle would be much appreciated.

帮助理解交叉验证和决策树 [英] Help Understanding Cross Validation and Decision Trees

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

帮助理解交叉验证和决策树 [英] Help Understanding Cross Validation and Decision Trees

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭