机器学习培训与学习测试数据分割方法 [英] Machine Learning Training & Test data split method

查看：148 发布时间：2020/5/4 9:54:16 machine-learning scikit-learn training-data confusion-matrix

本文介绍了机器学习培训与学习测试数据分割方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在运行一个随机森林分类模型，最初将数据分为训练(80％)和测试(20％).但是，该预测有太多的误报，我认为这是因为训练数据中存在太多干扰，所以我决定采用其他方法拆分数据，这就是我的处理方法.

I was running a random forest classification model and initially divided the data into train (80%) and test (20%). However, the prediction had too many False Positive which I think was because there was too much noise in training data, so I decided to split the data in a different method and here's how I did it.

由于我认为较高的误报是由于列车数据中的噪声引起的，所以我使列车数据具有相等数量的目标变量.例如，如果我有10,000行的数据，而目标变量是8,000(0)和2,000(1)，则我的训练数据总共为4,000行，包括2,000(0)和2,000(1)，因此训练数据现在有更多信号.

Since I thought the high False Positive was due to the noise in the train data, I made the train data to have the equal number of target variables. For example, if I have data of 10,000 rows and the target variable is 8,000 (0) and 2,000 (1), I had the training data to be a total of 4,000 rows including 2,000 (0) and 2,000 (1) so that the training data now have more signals.

当我尝试这种新的拆分方法时，它通过将召回率从14％增加到70％来更好地预测了方法.

When I tried this new splitting method, it predicted way better by increasing the Recall Positive from 14 % to 70%.

如果我在这里做错了什么，我很想听听您的反馈.我担心自己的训练数据是否有偏差.

I would love to hear your feedback if I am doing anything wrong here. I am concerned if I am making my training data biased.

机器学习培训与学习测试数据分割方法 [英] Machine Learning Training & Test data split method

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

机器学习培训与学习测试数据分割方法 [英] Machine Learning Training &amp; Test data split method

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

机器学习培训与学习测试数据分割方法 [英] Machine Learning Training & Test data split method

登录关闭