如何有条件地将观察分为几组? [英] How to conditionally partition observations into groups?
问题描述
我有以下输入内容:
C1 C2
1 1
1 1
1 2
1 3
1 4
2 1
. .
C1和C2是组,其中C2是C1中的嵌套组。现在,我想在C1上建立最小为2的子组。虽然不应该拆分C2中的组,但我希望有尽可能多的组。手动地,我将首先查看组C1,并将子组2、3和4一起加入(G = 1),然后将子组1(C2 = 1)作为组(G = 2)。预期的输出为(其中G是我尝试创建的组)
C1 and C2 are groups, where C2 is a nested group within C1. Now I'd like to build subgroups on C1 having a minimum size of 2. While the groups in C2 should not be split, I'd like to have as many groups as possible. Manually, I would first have a look at the group C1 and join subgroups 2, 3 and 4 together to (G=1) and take the subgroup 1 (C2=1) as a group (G=2). The expected output would be (where G are the groups I try to create)
C1 C2 G
1 1 1
1 1 1
1 2 2
1 3 2
1 4 2
2 1 3
. . .
我希望我的意思很清楚。
I hope it's clear what I mean. Any help is highly appreciated.
推荐答案
使用:
library(data.table)
setDT(mydf)[, G := {r <- rep(1:floor(.N/2), each = 2); if(length(r) != .N) c(r, tail(r,1)) else r}
, by = C1
][, G := rleid(G)][]
您将获得:
C1 C2 G
1: 1 1 1
2: 1 1 1
3: 1 2 2
4: 1 3 2
5: 1 4 2
6: 2 1 3
7: 2 1 3
8: 2 2 4
9: 2 3 4
10: 2 4 4
11: 3 1 5
12: 3 2 5
13: 3 3 6
14: 3 4 6
15: 3 5 6
已使用数据:
Used data:
mydf <- structure(list(C1 = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L),
C2 = c(1L, 1L, 2L, 3L, 4L, 1L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L)),
.Names = c("C1", "C2"), class = "data.frame", row.names = c(NA, -15L))
这篇关于如何有条件地将观察分为几组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!