python pandas:根据百分比随机分配对照组和治疗组 [英] python pandas: assign control vs. treatment groupings randomly based on %
本文介绍了python pandas:根据百分比随机分配对照组和治疗组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在做一个实验设计,我需要通过预先存在的分组按百分比将数据框df分为对照组和治疗组.
I am working on an experiment design, where I need to split a dataframe df into a control and treatment group by % by pre-existing groupings.
这是数据框df:
df.head()
customer_id | Group | many other columns
ABC 1
CDE 1
BHF 2
NID 1
WKL 2
SDI 2
pd.pivot_table(df,index=['Group'],values=["customer_id"],aggfunc=lambda x: len(x.unique()))
Group 1 : 55394
Group 2 : 34889
现在,我需要在df中添加一列"Flag". 对于第1组,我想随机分配50%的控件"和50%的测试". 对于第2组,我想随机分配40%的控件"和60%的测试".
Now I need to add a column labeled "Flag" into the df. For Group 1, I want to randomly assign 50% "Control" and 50% "Test". For Group 2, I want to randomly assign 40% "Control" and 60% "Test".
我正在寻找的输出:
customer_id | Group | many other columns | Flag
ABC 1 Test
CDE 1 Control
BHF 2 Test
NID 1 Test
WKL 2 Control
SDI 2 Test
推荐答案
我们可以使用 更新:
In [8]: df
Out[8]:
customer_id Group
0 ABC 1
1 CDE 1
2 BHF 2
3 NID 1
4 WKL 2
5 SDI 2
6 XXX 3
7 XYZ 3
8 XXX 3
In [9]: d = {1:[.5,.5], 2:[.4,.6], 3:[.2,.8]}
In [10]: df['Flag'] = \
...: df.groupby('Group')['customer_id'] \
...: .transform(lambda x: np.random.choice(['Control','Test'], len(x), p=d[x.name]))
...:
In [11]: df
Out[11]:
customer_id Group Flag
0 ABC 1 Test
1 CDE 1 Test
2 BHF 2 Control
3 NID 1 Control
4 WKL 2 Control
5 SDI 2 Test
6 XXX 3 Test
7 XYZ 3 Test
8 XXX 3 Test
这篇关于python pandas:根据百分比随机分配对照组和治疗组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文