通过标准随机分割数据，使用R进行培训和测试数据集 [英] Randomly split data by criterion into training and testing data set using R

查看：535 发布时间：2017/3/26 0:56:34 r split dataframe random-sample

本文介绍了通过标准随机分割数据，使用R进行培训和测试数据集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Gidday，

我正在寻找一种随机分割数据框架（例如90/10 split）的方法，用于测试和训练模型，保持一定的分组标准。

I'm looking for a way to randomly split a data frame (e.g. 90/10 split) for testing and training of a model keeping a certain grouping criteria.

想象一下，我有一个这样的数据框：

Imagine I have a data frame like this:

> test[1:20,]
                companycode     year    expenses         
    1                 C1          1     8.47720                 
    2                 C1          2     8.45250                 
    3                 C1          3     8.46280                 
    4                 C2          1 14828.90603                 
    5                 C3          1   665.21565                 
    6                 C3          2   290.66596                 
    7                 C3          3   865.56265                 
    8                 C3          4   6785.03586                
    9                 C3          5   312.02617                 
    10                C3          6   760.48740               
    11                C3          7  1155.76758                
    12                C4          1  4565.78313                 
    13                C4          2  3340.36540                 
    14                C4          3  2656.73030                 
    15                C4          4  1079.46098                 
    16                C5          1    60.57039                 
    17                C6          1  6282.48118                 
    18                C6          2  7419.32720                 
    19                C7          1   644.90571                 
    20                C8          1 58332.34945

我要做的是将这个数据框拆分成一个训练和测试集使用定义的分割标准。使用提供的数据，我想以这两种数据框架中的公司不会混合的方式拆分数据。数据集1包含不同于数据集2的公司。

What I'm trying to do is to split this data frame into a training and a testing set using a defined splitting criterion. Using the provided data, I want to split the data in a way that the companies are not mixed up in both data frames. Data set 1 contains different companies than data set 2.

想象一下90/10分裂，理想的分割将如下所示：

Imagine a 90/10 split, an ideal split would look like this:

> data_90split

           companycode     year    expenses         

        4                 C2          1 14828.90603                                 
        12                C4          1  4565.78313                 
        13                C4          2  3340.36540                 
        14                C4          3  2656.73030                 
        15                C4          4  1079.46098                 
        16                C5          1    60.57039
        5                 C3          1   665.21565                 
        6                 C3          2   290.66596                 
        7                 C3          3   865.56265                 
        8                 C3          4   6785.03586                
        9                 C3          5   312.02617                 
        10                C3          6   760.48740               
        11                C3          7  1155.76758                 
        17                C6          1  6282.48118                 
        18                C6          2  7419.32720
        1                 C1          1     8.47720                 
        2                 C1          2     8.45250                 
        3                 C1          3     8.46280



 > data_10split
                    companycode     year   expenses
        20                C8          1 58332.34945 
        19                C7          1   644.90571

我希望我能清楚地指出我在找什么。
感谢您的反馈。

I hope I pointed out clearly what I'm looking for. Thanks for your feedback.

通过标准随机分割数据，使用R进行培训和测试数据集 [英] Randomly split data by criterion into training and testing data set using R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

通过标准随机分割数据，使用R进行培训和测试数据集 [英] Randomly split data by criterion into training and testing data set using R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭