在满足条件的同时,在 R 中的数据框的子集中对每列的单行进行采样 [英] Sample a single row, per column, within a subset of a data frame in R, while following conditions

查看:59
本文介绍了在满足条件的同时,在 R 中的数据框的子集中对每列的单行进行采样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为我的数据示例,我在一个数据框中有包含三行数据的 GROUP 1 和包含两行数据的 GROUP 2:

As an example of my data, I have GROUP 1 with three rows of data, and GROUP 2 with two rows of data, in a data frame:

GROUP   VARIABLE 1   VARIABLE 2   VARIABLE 3 
    1            2            6            5 
    1            4           NA            1 
    1           NA            3            8
    2            1           NA            2      
    2            9           NA           NA 

我想从 GROUP 1 的每一列中抽取一个变量来创建一个代表 GROUP 1 的新行.我不想从 GROUP 1 中抽取一个完整的行,而是需要单独进行采样对于每一列.我想对 GROUP 2 做同样的事情.此外,抽样不应考虑/包括 NA,除非该组变量的所有行都有 NA(例如上面的 GROUP 2、VARIABLE 2).

I would like to sample a single variable, per column from GROUP 1, to make a new row representing GROUP 1. I do not want to sample one single and complete row from GROUP 1, but rather the sampling needs to occur individually for each column. I would like to do the same for GROUP 2. Also, the sampling should not consider/include NA's, unless all rows for that group's variable have NA's (such as GROUP 2, VARIABLE 2, above).

例如,在采样之后,我可以得到以下结果:

For example, after sampling, I could have as a result:

GROUP   VARIABLE 1   VARIABLE 2   VARIABLE 3 
    1            4            6            1 
    2            9           NA            2 

此处只有 GROUP 2, VARIABLE 2 会导致 NA.我实际上有 39 个组、50,000 多个变量和大量的 NA.我真诚地感谢代码来创建一个新的行数据框,每一行都有每组的采样结果.

Only GROUP 2, VARIABLE 2, can result in NA here. I actually have 39 groups, 50,000+ variables, and a substantial number of NA. I would sincerely appreciate the code to make a new data frame of rows, each row having the sampling results per group.

推荐答案

我们可以使用data.table.将'data.frame'转换为'data.table'(setDT(df1)),按'GROUP'分组,我们循环遍历列(lapply(.SD,)>), if all 元素都是 NA 我们返回 NA 否则我们得到非 NA 元素的 sample.

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'GROUP', we loop through the columns (lapply(.SD,), if all of the elements are NA we return NA or else we get the sample of non-NA elements.

library(data.table)
setDT(df1)[,lapply(.SD, function(x)
     if(all(is.na(x))) NA_integer_ else sample(na.omit(x),1)) , by = GROUP]

这篇关于在满足条件的同时,在 R 中的数据框的子集中对每列的单行进行采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆