根据唯一值和列值从数据框中随机绘制行 [英] Randomly draw rows from dataframe based on unique values and column values
问题描述
我有一个包含许多描述符变量(trt,个人,会话)的数据框。我希望能够随机选择可能的 trt x个人
组合的一部分,但要控制会话变量,以使随机抽取的会话数不相同。这是我的数据帧的样子:
trt<-c(rep(c(rep( A,3) ,rep( B,3),rep( C,3)),9))
个人<-rep(c( Bob, Nancy, Tim),27)
会话<-rep(1:27,每个= 3)
数据<-rnorm(81,平均值= 4,sd = 1)
df<-data.frame( trt,个人,会话,数据))
df
trt个人会话数据
1 A Bob 1 3.72013685581385
2 A Nancy 1 3.97225419000673
3 A Tim 1 4.44714175686225
4 B Bob 2 5.00024599458127
5 B Nancy 2 3.43615965145765
6 B Tim 2 6.7920094635501
7 C Bob 3 4.36315054477571
8 C Nancy 3 5.07117348146375
9 C Tim 3 4.38503325758969
10 A Bob 4 4.30677162933005
11 A Nancy 4 1.89311687510669
12 A Tim 4 3.09084920968413
13 B Bob 5 3.10436190897 144
14 B南希5 3.59454992439722
15 B蒂姆5 3.40778069131207
16 C鲍勃6 4.00171937800892
17辰南希6 0.14578811080644
18蒂姆6 4.20754733296227
19 A Bob 7 3.69131009783284
20 A Nancy 7 4.7025756891679
21 A Tim 7 4.46196017363017
22 B Bob 8 3.97573281432736
23 B Nancy 8 4.5373185942686
24 B Tim 8 2.40937847038141
25 C Bob 9 4.57519884980087
26 C Nancy 9 5.19143914630448
27 C Tim 9 4.83144732833874
28 A Bob 10 3.01769965527235
29 A Nancy 10 5.17300616827746
30 A Tim 10 4.65432284571663
31 B Bob 11 4.50892032922527
32 B Nancy 11 3.38082717995663
33 B Tim 11 4.92022245677209
34 C Bob 12 4.541 49796547394
35 C Nancy 12 3.21992774137179
36 C Tim 12 3.74507360931023
37 A Bob 13 3.39524949548056
38 A Nancy 13 4.17518916890901
39 A Tim 13 3.02932375225388
40 B Bob 14 3.59660910672907
41 B Nancy 14 2.08784850191654
42 B Tim 14 3.98446125755258
43 C Bob 15 4.01837496797085
44 C Nancy 15 3.40610126858125
45 C Tim 15 4.57107635588582
46 A Bob 16 3.15839276840723
47 A Nancy 16 2.19932140340504
48 A Tim 16 4.77588798035668
49 B Bob 17 4.3524768657397
50 B Nancy 17 4.49071625925856
51 B Tim 17 4.02576463486266
52 C Bob 18 3.74783360762117
53 C Nancy 18 2.84123227236184
54 C Tim 18 3.2024114782253
55 A鲍勃19 4.93837445490921
56 A Nancy 19 4.7103051496802
57 A Tim 19 6.22083635045134
58 B Bob 20 4.5177747677824
59 B Nancy 20 1.78839270771153
60 B Tim 20 5.07140678136995
61 C Bob 21 3.47818616035335
62 C Nancy 21 4.28526474048439
63 C Tim 21 4.22597602946575
64 A Bob 22 1.91700925257901
65 A Nancy 22 2.96317997587458
66 A Tim 22 2.53506974227672
67 B Bob 23 5.52714403395316
68 B Nancy 23 3.3618513551059
69 B Tim 23 4.85869007113978
70 C Bob 24 3.4367068543959
71 C Nancy 24 4.47769879000349
72 C Tim 24 5.77340483757836
73 A Bob 25 4.78524317734622
74 A Nancy 25 3.55373702554664
75 A Tim 25 2.88541465503637
76 BB ob 26 4.62885302019139
77 B南希26 3.59430293369092
78 B Tim 26 2.29610255924296
79 C Bob 27 4.38433001299722
80 C Nancy 27 3.77825207859976
81 C Tim 27 2.12163194694365
如何从每个 trt x个人中抽取2个具有唯一会话号的组合?这是一个示例,我希望数据帧看起来像这样:
trt个人会话数据
1 A Bob 1 3.72013685581385
5 B Nancy 2 3.43615965145765
7 C Bob 3 4.36315054477571
12 A Tim 4 3.09084920968413
15 B Tim 5 3.40778069131207
17 C Nancy 6 0.14578811080644
19 A Bob 7 3.69131009783284
29 A Nancy 10 5.17300616827746
31 B Bob 11 4.50892032922527
34 C Bob 12 4.54149796547394
39 A Tim 13 3.02932375225388
40 B Bob 14 3.59660910672907
47 A Nancy 16 2.19932140340504
51 B Tim 17 4.02576463486266
54 C Tim 18 3.2024114782253
59 B Nancy 20 1.78839270771153
71 C Nancy 24 4.47769879000349
81 C Tim 27 2.12163194694365
我尝试了几件事没有运气。 / p>
我试图随机选择两个 trt x单个
组合,但最终得到重复的会话值:
setDT((df))
df [,.SD [sample(.N,2)],keyby = 。((trt,个人)]
trt个人会话数据
1:A Bob 25 2.7560788894668
2:A Bob 19 4.12040841647523
3:A Nancy 4 5.35362338127901
4 :A Nancy 19 5.51636882737692
5:A Tim 19 5.10553640201998
6:A Tim 1 2.77380671625473
7:B Bob 23 3.50585105164409
8:B Bob 8 3.58167259470814
9 :B南希23 2.85301307507985
10:B南希8 2.85179395539781
11:B蒂姆26 2.40666507132474
12:B Tim 20 3.31276311351286
13:C Bob 24 3.19076007024549
14:C Bob 3 3.59146613276121
15:C Nancy 9 4.46606667880457
16:C Nancy 15 2.25405252536256
17:C Tim 12 4.43111661206133
18:C Tim 27 4.23868848646589
我尝试随机选择每个会话号,然后提取2个 trt x个人
组合,但是由于随机选择没有抓住相等数量的 trt x,通常会返回错误单个
组合:
ind<-sapply(unique(df $ session),function(x )sample(which(df $ session == x),1))
df.unique<-df [ind,]
df.sub<-df.unique [,.SD [sample (.N,2)],按=。(trt,单个)]
`[.data.frame`(df.unique,,.SD [sample(.N,2)]]中的错误,由= 。((trt,个)):
未使用的参数(by =。(trt,个))
预先感谢您的帮助!
也许是一种聪明的采样方式,但同时有一个简单的主意:
setDT(df)
setkey(df,session)
usedsessions = 0#一些不是会话号的值
df [,{
res = .SD [!。(usedsessions)] [sample(.N,2)]
usedsessions = c(已使用会话,res $ session)
res
}
,由=。(trt,单个)]
#trt个人会话数据
#1:一个Bob 7 4.256668
#2:一个Bob 25 2.431821
#3:一个Nancy 16 4.785859
#4:一个Nancy 19 4.865248
#5:A Tim 4 3.303689
#6:A Tim 13 3.550261
#7:B Bob 26 3.987136
#8:B Bob 17 3.283055
#9 :B南希14 3.177226
#10:B南希2 3.639542
#11: B Tim 8 2.168447
#12:B Tim 5 3.521123
#13:C Bob 21 3.284245
#14:C Bob 12 5.773098
#15:C Nancy 24 4.624428
#16:C Nancy 9 3.235467
#17:C Tim 18 4.001395
#18:C Tim 27 5.002110
您可能需要添加特殊情况处理(例如如果没有这样的抽样)。
I have a dataframe with many descriptor variables (trt, individual, session). I want to be able to randomly select a fraction of the possible trt x individual
combinations but control for the session variable such that no random pull has the same session number. Here is what my dataframe looks like:
trt <- c(rep(c(rep("A", 3), rep("B", 3), rep("C", 3)), 9))
individual <- rep(c("Bob", "Nancy", "Tim"), 27)
session <- rep(1:27, each = 3)
data <- rnorm(81, mean = 4, sd = 1)
df <- data.frame(trt, individual, session, data))
df
trt individual session data
1 A Bob 1 3.72013685581385
2 A Nancy 1 3.97225419000673
3 A Tim 1 4.44714175686225
4 B Bob 2 5.00024599458127
5 B Nancy 2 3.43615965145765
6 B Tim 2 6.7920094635501
7 C Bob 3 4.36315054477571
8 C Nancy 3 5.07117348146375
9 C Tim 3 4.38503325758969
10 A Bob 4 4.30677162933005
11 A Nancy 4 1.89311687510669
12 A Tim 4 3.09084920968413
13 B Bob 5 3.10436190897144
14 B Nancy 5 3.59454992439722
15 B Tim 5 3.40778069131207
16 C Bob 6 4.00171937800892
17 C Nancy 6 0.14578811080644
18 C Tim 6 4.20754733296227
19 A Bob 7 3.69131009783284
20 A Nancy 7 4.7025756891679
21 A Tim 7 4.46196017363017
22 B Bob 8 3.97573281432736
23 B Nancy 8 4.5373185942686
24 B Tim 8 2.40937847038141
25 C Bob 9 4.57519884980087
26 C Nancy 9 5.19143914630448
27 C Tim 9 4.83144732833874
28 A Bob 10 3.01769965527235
29 A Nancy 10 5.17300616827746
30 A Tim 10 4.65432284571663
31 B Bob 11 4.50892032922527
32 B Nancy 11 3.38082717995663
33 B Tim 11 4.92022245677209
34 C Bob 12 4.54149796547394
35 C Nancy 12 3.21992774137179
36 C Tim 12 3.74507360931023
37 A Bob 13 3.39524949548056
38 A Nancy 13 4.17518916890901
39 A Tim 13 3.02932375225388
40 B Bob 14 3.59660910672907
41 B Nancy 14 2.08784850191654
42 B Tim 14 3.98446125755258
43 C Bob 15 4.01837496797085
44 C Nancy 15 3.40610126858125
45 C Tim 15 4.57107635588582
46 A Bob 16 3.15839276840723
47 A Nancy 16 2.19932140340504
48 A Tim 16 4.77588798035668
49 B Bob 17 4.3524768657397
50 B Nancy 17 4.49071625925856
51 B Tim 17 4.02576463486266
52 C Bob 18 3.74783360762117
53 C Nancy 18 2.84123227236184
54 C Tim 18 3.2024114782253
55 A Bob 19 4.93837445490921
56 A Nancy 19 4.7103051496802
57 A Tim 19 6.22083635045134
58 B Bob 20 4.5177747677824
59 B Nancy 20 1.78839270771153
60 B Tim 20 5.07140678136995
61 C Bob 21 3.47818616035335
62 C Nancy 21 4.28526474048439
63 C Tim 21 4.22597602946575
64 A Bob 22 1.91700925257901
65 A Nancy 22 2.96317997587458
66 A Tim 22 2.53506974227672
67 B Bob 23 5.52714403395316
68 B Nancy 23 3.3618513551059
69 B Tim 23 4.85869007113978
70 C Bob 24 3.4367068543959
71 C Nancy 24 4.47769879000349
72 C Tim 24 5.77340483757836
73 A Bob 25 4.78524317734622
74 A Nancy 25 3.55373702554664
75 A Tim 25 2.88541465503637
76 B Bob 26 4.62885302019139
77 B Nancy 26 3.59430293369092
78 B Tim 26 2.29610255924296
79 C Bob 27 4.38433001299722
80 C Nancy 27 3.77825207859976
81 C Tim 27 2.12163194694365
How do I pull out 2 of each trt x individual
combinations with a unique session number? This is an example what I want the dataframe to look like:
trt individual session data
1 A Bob 1 3.72013685581385
5 B Nancy 2 3.43615965145765
7 C Bob 3 4.36315054477571
12 A Tim 4 3.09084920968413
15 B Tim 5 3.40778069131207
17 C Nancy 6 0.14578811080644
19 A Bob 7 3.69131009783284
29 A Nancy 10 5.17300616827746
31 B Bob 11 4.50892032922527
34 C Bob 12 4.54149796547394
39 A Tim 13 3.02932375225388
40 B Bob 14 3.59660910672907
47 A Nancy 16 2.19932140340504
51 B Tim 17 4.02576463486266
54 C Tim 18 3.2024114782253
59 B Nancy 20 1.78839270771153
71 C Nancy 24 4.47769879000349
81 C Tim 27 2.12163194694365
I have tried a couple things with no luck.
I have tried to just randomly select two trt x individual
combinations, but I end up with duplicate session values:
setDT((df))
df[ , .SD[sample(.N, 2)] , keyby = .(trt, individual)]
trt individual session data
1: A Bob 25 2.7560788894668
2: A Bob 19 4.12040841647523
3: A Nancy 4 5.35362338127901
4: A Nancy 19 5.51636882737692
5: A Tim 19 5.10553640201998
6: A Tim 1 2.77380671625473
7: B Bob 23 3.50585105164409
8: B Bob 8 3.58167259470814
9: B Nancy 23 2.85301307507985
10: B Nancy 8 2.85179395539781
11: B Tim 26 2.40666507132474
12: B Tim 20 3.31276311351286
13: C Bob 24 3.19076007024549
14: C Bob 3 3.59146613276121
15: C Nancy 9 4.46606667880457
16: C Nancy 15 2.25405252536256
17: C Tim 12 4.43111661206133
18: C Tim 27 4.23868848646589
I have tried randomly selecting one of each session number and then pulling 2 trt x individual
combinations, but it typically comes back with an error since the random selection doesnt grab an equal number of trt x individual
combinations:
ind <- sapply( unique(df$session ) , function(x) sample( which(df$session == x) , 1) )
df.unique <- df[ind, ]
df.sub <- df.unique[, .SD[sample(.N, 2)] , by = .(trt, individual)]
Error in `[.data.frame`(df.unique, , .SD[sample(.N, 2)], by = .(trt, individual)) :
unused argument (by = .(trt, individual))
Thanks in advance for your help!
Perhaps there is a clever way to sample, but here's a straightforward idea to get you started in the meanwhile:
setDT(df)
setkey(df, session)
usedsessions = 0 # some value that's not a session number
df[, {
res = .SD[!.(usedsessions)][sample(.N, 2)]
usedsessions = c(usedsessions, res$session)
res
}
, by = .(trt, individual)]
# trt individual session data
# 1: A Bob 7 4.256668
# 2: A Bob 25 2.431821
# 3: A Nancy 16 4.785859
# 4: A Nancy 19 4.865248
# 5: A Tim 4 3.303689
# 6: A Tim 13 3.550261
# 7: B Bob 26 3.987136
# 8: B Bob 17 3.283055
# 9: B Nancy 14 3.177226
#10: B Nancy 2 3.639542
#11: B Tim 8 2.168447
#12: B Tim 5 3.521123
#13: C Bob 21 3.284245
#14: C Bob 12 5.773098
#15: C Nancy 24 4.624428
#16: C Nancy 9 3.235467
#17: C Tim 18 4.001395
#18: C Tim 27 5.002110
You'll probably need to add corner case processing (e.g. if there is no such sampling).
这篇关于根据唯一值和列值从数据框中随机绘制行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!