从R中的数据集中获取不成比例的样本 [英] Taking a disproportionate sample from a dataset in R

查看:150
本文介绍了从R中的数据集中获取不成比例的样本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我在R中有一个大数据集,那么如何考虑原始数据的分布来对数据进行随机抽样,特别是如果数据偏斜并且只有1%属于次要类,并且我想有偏见的数据样本吗?

If I have a large dataset in R, how can I take random sample of the data taking into consideration the distribution of the original data, particularly if the data are skewed and only 1% belong to a minor class and I want to take a biased sample of the data?

推荐答案

sample(x, n, replace = FALSE, prob = NULL)函数从大小为n的向量x中获取样本.此样本可以是带有无需替换,并且选择样本中每个元素的概率可以每个元素相同或由用户通知的矢量.

The sample(x, n, replace = FALSE, prob = NULL) function takes a sample from a vector x of size n. This sample can be with or without replacement, and the probabilities of selecting each element to the sample can be either the same for each element, or a vector informed by the user.

如果要对50种情况的每个元素进行相同概率的抽样,那么您要做的就是

If you want to take a sample of same probabilities for each element with 50 cases, all you have to do is

n <- 50
smpl <- df[sample(nrow(df), 50),]

但是,如果要为元素提供不同的选择概率,则假设性别 M 的元素的概率为 0.25 ,而性别 F 的人的概率为 0.75 ,那么您应该

However, if you want to give different probabilities of being selected for the elements, let's say, elements that sex is M has probability 0.25, while those whose sex is F has prob 0.75, you should do

n <- 50
prb <- ifelse(sex=="M",0.25,0.75)
smpl <- df[sample(nrow(df), 50, prob = prb),]

这篇关于从R中的数据集中获取不成比例的样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆