python pandas:根据百分比随机分配对照组和治疗组 [英] python pandas: assign control vs. treatment groupings randomly based on %

查看:736
本文介绍了python pandas:根据百分比随机分配对照组和治疗组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一个实验设计,我需要通过预先存在的分组按百分比将数据框df分为对照组和治疗组.

I am working on an experiment design, where I need to split a dataframe df into a control and treatment group by % by pre-existing groupings.

这是数据框df:

df.head()

customer_id | Group | many other columns
ABC             1
CDE             1
BHF             2
NID             1
WKL             2
SDI             2

pd.pivot_table(df,index=['Group'],values=["customer_id"],aggfunc=lambda x: len(x.unique()))

Group 1  : 55394
Group 2  : 34889

现在,我需要在df中添加一列"Flag". 对于第1组,我想随机分配50%的控件"和50%的测试". 对于第2组,我想随机分配40%的控件"和60%的测试".

Now I need to add a column labeled "Flag" into the df. For Group 1, I want to randomly assign 50% "Control" and 50% "Test". For Group 2, I want to randomly assign 40% "Control" and 60% "Test".

我正在寻找的输出:

customer_id | Group | many other columns | Flag
ABC             1                          Test
CDE             1                          Control
BHF             2                          Test
NID             1                          Test
WKL             2                          Control
SDI             2                          Test

推荐答案

我们可以使用 更新:

In [8]: df
Out[8]:
  customer_id  Group
0         ABC      1
1         CDE      1
2         BHF      2
3         NID      1
4         WKL      2
5         SDI      2
6         XXX      3
7         XYZ      3
8         XXX      3

In [9]: d = {1:[.5,.5], 2:[.4,.6], 3:[.2,.8]}

In [10]: df['Flag'] = \
    ...: df.groupby('Group')['customer_id'] \
    ...:   .transform(lambda x: np.random.choice(['Control','Test'], len(x), p=d[x.name]))
    ...:

In [11]: df
Out[11]:
  customer_id  Group     Flag
0         ABC      1     Test
1         CDE      1     Test
2         BHF      2  Control
3         NID      1  Control
4         WKL      2  Control
5         SDI      2     Test
6         XXX      3     Test
7         XYZ      3     Test
8         XXX      3     Test

这篇关于python pandas:根据百分比随机分配对照组和治疗组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆