如何将给定行数的数据帧随机分为三个较小的帧 [英] how to randomly split a data frame into three smaller ones with given numbers of rows
本文介绍了如何将给定行数的数据帧随机分为三个较小的帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
使用R,我想将一个数据帧随机分为三个较小的数据帧。第一个占总观测值的80%。第二个和第三个分别占总观测值的15%和5%。这三个数据帧不能有任何重叠。您有任何建议吗?
Using R, I want to randomly split a data frame into three smaller data frames. The first one has 80% of the total observations. The second and the third have, respectively, 15% and 5% of the total observations. The three data frames cannot have any overlaps. Do you have any suggestions?
推荐答案
这是一个快速功能,可根据您有多少个值分为任意数量的组在 props参数中指定。应该是很自我解释的
Here is a quick function to split into an arbitrary number of groups depending on how many values you specify in the 'props' parameter. It should be fairly self explanatory
#' Splits data.frame into arbitrary number of groups
#'
#' @param dat The data.frame to split into groups
#' @param props Numeric vector. What proportion of the data should
#' go in each group?
#' @param which.adjust Numeric. Which group size should we 'fudge' to
#' make sure that we sample enough (or not too much)
split_data <- function(dat, props = c(.8, .15, .05), which.adjust = 1){
# Make sure proportions are positive
# and the adjustment group isn't larger than the number
# of groups specified
stopifnot(all(props >= 0), which.adjust <= length(props))
# could check to see if the sum is 1
# but this is easier
props <- props/sum(props)
n <- nrow(dat)
# How large should each group be?
ns <- round(n * props)
# The previous step might give something that
# gives sum(ns) > n so let's force the group
# specified in which.adjust to be a value that
# makes it so that sum(ns) = n
ns[which.adjust] <- n - sum(ns[-which.adjust])
ids <- rep(1:length(props), ns)
# Shuffle ids so that the groups are randomized
which.group <- sample(ids)
split(dat, which.group)
}
split_data(mtcars)
split_data(mtcars, c(.7, .3))
这篇关于如何将给定行数的数据帧随机分为三个较小的帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文