来自R数据帧子集的行的随机样本 [英] Random sample of rows from subset of an R dataframe

查看:163
本文介绍了来自R数据帧子集的行的随机样本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



如果我只是有数据,例如


,那么有没有办法从数据框架的一部分获取行样? (F,M,M,F,F,M,F,..., F)
age< - c(23,25,27,29,31,33,35,37)

然后我可以轻松地用

 样本(年龄[性别==F],3)

并获取类似

  [1] 31 35 29 

但是如果我将这些数据转换成一个数据框,那么

  mydf<  -  data.frame(gender,age)

我无法使用明显的

 code> sample(mydf [mydf $ gender ==F,],3)

虽然我可以用一些荒谬的括号来表达一些奇怪的东西,如

  mydf [sample((1:nrow(mydf)) [mydf $ gender ==F],3),] 

a找到我想要的东西,就像

 性别年龄
7 F 35
4 F 29
1 F 23

有没有更好的方法,减少时间来解决如何写?

解决方案

你的复杂方式几乎是如何做的 - 我认为所有的答案将是这个主题的变体。



例如,我喜欢生成 mydf $ gender ==F >

  idx<  - 其中(mydf $ gender ==F)

然后我从中抽取:

  mydf [sample(idx ,3),] 

所以在一行(虽然,你减少了荒谬的数量的括号,通过拥有多行,使代码更容易理解):

  mydf [sample(which(mydf $ gender =='F' ),3),] 

虽然我是黑客!我的一部分喜欢一线,我的理智部分说,即使双线是两条线,这是更可以理解的 - 这只是你的选择。


Is there a good way of getting a sample of rows from part of a dataframe?

If I just have data such as

gender <- c("F", "M", "M", "F", "F", "M", "F", "F")
age    <- c(23, 25, 27, 29, 31, 33, 35, 37)

then I can easily sample the ages of three of the Fs with

sample(age[gender == "F"], 3)

and get something like

[1] 31 35 29

but if I turn this data into a dataframe

mydf <- data.frame(gender, age) 

I cannot use the obvious

sample(mydf[mydf$gender == "F", ], 3)

though I can concoct something convoluted with an absurd number of brackets like

mydf[sample((1:nrow(mydf))[mydf$gender == "F"], 3), ]

and get what I want which is something like

  gender age
7      F  35
4      F  29
1      F  23

Is there a better way that takes me less time to work out how to write?

解决方案

Your convoluted way is pretty much how to do it - I think all the answers will be variations on that theme.

For example, I like to generate the mydf$gender=="F" indices first:

idx <- which(mydf$gender=="F")

Then I sample from that:

mydf[ sample(idx,3), ]

So in one line (although, you reduce the absurd number of brackets and possibly make your code easier to understand by having multiple lines):

mydf[ sample( which(mydf$gender=='F'), 3 ), ]

While the "wheee I'm a hacker!" part of me prefers the one-liner, the sensible part of me says that even though the two-liner is two lines, it is much more understandable - it's just your choice.

这篇关于来自R数据帧子集的行的随机样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆