如何生成基于组ID的Train-Test-Split? [英] How to generate a train-test-split based on a group id?

查看：136 发布时间：2020/11/21 1:00:24 python-3.x pandas machine-learning grouping train-test-split

本文介绍了如何生成基于组ID的Train-Test-Split?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下数据:

pd.DataFrame({'Group_ID':[1,1,1,2,2,2,3,4,5,5],
          'Item_id':[1,2,3,4,5,6,7,8,9,10],
          'Target': [0,0,1,0,1,1,0,0,0,1]})

   Group_ID Item_id Target
0         1       1      0
1         1       2      0
2         1       3      1
3         2       4      0
4         2       5      1
5         2       6      1
6         3       7      0
7         4       8      0
8         5       9      0
9         5      10      1

我需要根据"Group_ID"将数据集分为训练和测试集，以便80％的数据进入训练集，而20％的数据进入测试集.

I need to split the dataset into a training and testing set based on the "Group_ID" so that 80% of the data goes into a training set and 20% into a test set.

也就是说，我需要训练集看起来像这样:

That is, I need my training set to look something like:

    Group_ID Item_id Target
0          1       1      0
1          1       2      0
2          1       3      1
3          2       4      0
4          2       5      1
5          2       6      1
6          3       7      0
7          4       8      0

测试集:

Test Set
   Group_ID Item_id Target
8         5       9      0
9         5      10      1

最简单的方法是什么?据我所知，sklearn中的标准test_train_split函数不支持按组拆分，因为我也可以指出拆分的大小(例如80/20).

What would be the simplest way to do this? As far as I know, the standard test_train_split function in sklearn does not support splitting by groups in a way where I can also indicate the size of the split (e.g. 80/20).

如何生成基于组ID的Train-Test-Split? [英] How to generate a train-test-split based on a group id?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

如何生成基于组ID的Train-Test-Split? [英] How to generate a train-test-split based on a group id?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭