来自R数据帧子集的行的随机样本 [英] Random sample of rows from subset of an R dataframe

查看：163 发布时间：2017/3/26 0:05:43 r dataframe sample

本文介绍了来自R数据帧子集的行的随机样本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果我只是有数据，例如

，那么有没有办法从数据框架的一部分获取行样？（F，M，M，F，F，M，F，...， F）
age< - c（23,25,27,29,31,33,35,37）

然后我可以轻松地用

 样本（年龄[性别==F]，3）

并获取类似

  [1] 31 35 29

但是如果我将这些数据转换成一个数据框，那么

  mydf<  -  data.frame（gender，age）

我无法使用明显的

 code> sample（mydf [mydf $ gender ==F，]，3）

虽然我可以用一些荒谬的括号来表达一些奇怪的东西，如

  mydf [sample（（1：nrow（mydf）） [mydf $ gender ==F]，3），]

a找到我想要的东西，就像

 性别年龄
 7 F 35 
 4 F 29 
 1 F 23

有没有更好的方法，减少时间来解决如何写？

解决方案

你的复杂方式几乎是如何做的 - 我认为所有的答案将是这个主题的变体。

例如，我喜欢生成 mydf $ gender ==F >

  idx<  - 其中（mydf $ gender ==F）

然后我从中抽取：

  mydf [sample（idx ，3），]

所以在一行（虽然，你减少了荒谬的数量的括号，通过拥有多行，使代码更容易理解）：

  mydf [sample（which（mydf $ gender =='F' ），3），]

虽然我是黑客！我的一部分喜欢一线，我的理智部分说，即使双线是两条线，这是更可以理解的 - 这只是你的选择。

Is there a good way of getting a sample of rows from part of a dataframe?

If I just have data such as

gender <- c("F", "M", "M", "F", "F", "M", "F", "F")
age    <- c(23, 25, 27, 29, 31, 33, 35, 37)

then I can easily sample the ages of three of the Fs with

sample(age[gender == "F"], 3)

and get something like

[1] 31 35 29

but if I turn this data into a dataframe

mydf <- data.frame(gender, age)

I cannot use the obvious

sample(mydf[mydf$gender == "F", ], 3)

though I can concoct something convoluted with an absurd number of brackets like

mydf[sample((1:nrow(mydf))[mydf$gender == "F"], 3), ]

and get what I want which is something like

  gender age
7      F  35
4      F  29
1      F  23

Is there a better way that takes me less time to work out how to write?

解决方案

Your convoluted way is pretty much how to do it - I think all the answers will be variations on that theme.

For example, I like to generate the mydf$gender=="F" indices first:

idx <- which(mydf$gender=="F")

Then I sample from that:

mydf[ sample(idx,3), ]

So in one line (although, you reduce the absurd number of brackets and possibly make your code easier to understand by having multiple lines):

mydf[ sample( which(mydf$gender=='F'), 3 ), ]

While the "wheee I'm a hacker!" part of me prefers the one-liner, the sensible part of me says that even though the two-liner is two lines, it is much more understandable - it's just your choice.

这篇关于来自R数据帧子集的行的随机样本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

来自R数据帧子集的行的随机样本 [英] Random sample of rows from subset of an R dataframe

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

来自R数据帧子集的行的随机样本 [英] Random sample of rows from subset of an R dataframe

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭