如何通过在R中有效过滤和分组来对数据进行子集 [英] How to subset data by filtering and grouping efficiently in R

查看:63
本文介绍了如何通过在R中有效过滤和分组来对数据进行子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在一个项目上,正在寻找一些帮助来使我的代码更有效地运行.我一直在寻找类似的问题,但似乎找不到比这个问题更精细的东西.我想出的解决方案非常笨拙.我相信必须有一种更有效的方法来处理 dplyr data.tables 等包.

I'm working on a project and am looking for some help to make my code run more efficiently. I've searched for similar problems but can't seem to find anything quite as granular as this one. The solution I've come up with is extremely clunky. I'm confident that there must be a more efficient way to do this with a package like dplyr, data.tables, etc.

问题::我有3列数据,分别是'ids''x.group''times'.我需要提取出现在每个'x.group'的每个'times'块中的前三个唯一的'ids .

Problem: I have 3 columns of data, 'ids', 'x.group', and 'times'. I need to extract the first 3 unique 'ids' that appear in each 'times' block for each 'x.group'.

但是,我不想包含任何等于"0"的'ids''x.group'.我的代码底部的输出会产生正确的值,但是我认为这是一种相当尴尬的方法.

However, I do not want to include any 'ids' or 'x.group' equal to "0". The output at the bottom of my code yields the correct values, but it's a rather embarrassing way of getting there in my opinion.

注意:在下面的代码示例中,我使用的是 x.groups = ['A','B','0'] ,但是在我的实际项目中,这些可以承担很多值,因此它们不会总是为'A'或'B',而是始终为'0'(例如,我可以使用 ['A','K','0'] ['M','W','0'] 等).您可以在这篇文章的底部找到示例数据集.

Note: In the code example below, I am using x.groups = ['A','B','0'], but in my actual project, these can take on many values, so they won't always be 'A' or 'B', but '0's will always be present (e.g., I could have ['A','K','0'] or ['M','W','0'], etc.). You can find the example dataset at the bottom of this post.

# find x.groups
xs <- unique(myDF$x.group)[unique(myDF$x.group) != "0"]

# DF without '0's as x.group entries
ps <- unique(myDF[which(myDF$x.group %in% xs) , c("ids","x.group","time")])

first3.x1.t1 <- ps[ps$x.group == xs[1] & ps$ids != "0" & ps$time == "1", ]$ids[1:3]
first3.x2.t1 <- ps[ps$x.group == xs[2] & ps$ids != "0" & ps$time == "1", ]$ids[1:3]
first3.x1.t2 <- ps[ps$x.group == xs[1] & ps$ids != "0" & ps$time == "2", ]$ids[1:3]
first3.x2.t2 <- ps[ps$x.group == xs[2] & ps$ids != "0" & ps$time == "2", ]$ids[1:3]
first3.x1.t3 <- ps[ps$x.group == xs[1] & ps$ids != "0" & ps$time == "3", ]$ids[1:3]
first3.x2.t3 <- ps[ps$x.group == xs[2] & ps$ids != "0" & ps$time == "3", ]$ids[1:3]

# First 3 unique ids from time block 1 for each x.group
> first3.x1.t1; first3.x2.t1;
[1] "2"  "17" "11"
[1] "5"  "10" "4"

# First 3 unique ids from time block 2 for each x.group
> first3.x1.t2; first3.x2.t2;
[1] "9"  "6"  "16"
[1] "8"  "13" "7" 

# First 3 unique ids from time block 3 for each x.group
> first3.x1.t3; first3.x2.t3;
[1] "11" "2"  "10"
[1] "1"  "3"  "13"

数据:

# create data frame
ids <- c("2","0","15","5","17","10","4","2","3","11","11","18","10","8","13","9","6","16","7","14",
     "16","7","11","12","14","5","1","11","3","2","10","17","3","13","10","17","2","10","16","10")
x.group <- c("A","A","0","B","A","B","B","A","B","A","A","0","B","B","B","A","A","A","B","B",
         "A","A","0","B","A","B","B","A","B","A","A","0","B","B","B","A","A","A","B","B")
time <- c(rep("1",13), rep("2",13), rep("3",14))

myDF <- as.data.frame(cbind(ids, x.group, time), stringsAsFactors = FALSE)
> myDF
   ids x.group time
1    2       A    1
2    0       A    1
3   15       0    1
4    5       B    1
5   17       A    1
6   10       B    1
7    4       B    1
8    2       A    1
9    3       B    1
10  11       A    1
11  11       A    1
12  18       0    1
13  10       B    1
14   8       B    2
15  13       B    2
16   9       A    2
17   6       A    2
18  16       A    2
19   7       B    2
20  14       B    2
21  16       A    2
22   7       A    2
23  11       0    2
24  12       B    2
25  14       A    2
26   5       B    2
27   1       B    3
28  11       A    3
29   3       B    3
30   2       A    3
31  10       A    3
32  17       0    3
33   3       B    3
34  13       B    3
35  10       B    3
36  17       A    3
37   2       A    3
38  10       A    3
39  16       B    3
40  10       B    3

推荐答案

aggregate(ids~.,myDF,function(x)unique(x)[1:3],subset = x.group!="0"&ids!=0)
  x.group time ids.1 ids.2 ids.3
1       A    1     2    17    11
2       B    1     5    10     4
3       A    2     9     6    16
4       B    2     8    13     7
5       A    3    11     2    10
6       B    3     1     3    13

这返回了一个嵌套的数据框.你可以嵌套为:

This returned a nested dataframe. You can unnest is as:

a=aggregate(ids~.,myDF,function(x)unique(x)[1:3],subset = x.group!="0"&ids!=0)
b=do.call(data.frame,a)#The unnested dataframe:
b
  x.group time ids.1 ids.2 ids.3
1       A    1     2    17    11
2       B    1     5    10     4
3       A    2     9     6    16
4       B    2     8    13     7
5       A    3    11     2    10
6       B    3     1     3    13

这篇关于如何通过在R中有效过滤和分组来对数据进行子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆