根据百分比为两个以上的组随机分配对照组和治疗组 [英] Assign control vs. treatment groupings randomly based on % for more than 2 groups

查看:309
本文介绍了根据百分比为两个以上的组随机分配对照组和治疗组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

小猪回避我自己的上一个问题 python熊猫:根据%

Piggy backing off my own previous question python pandas: assign control vs. treatment groupings randomly based on %

感谢@maxU,我知道如何将随机对照/治疗分组分配给2个组;但是如果我有3个或以上的小组怎么办?

Thanks to @maxU, I know how to assign random control/treatment groupings to 2 groups; but what if I have 3 groups or more?

例如:

df.head()

customer_id | Group | many other columns
ABC             1
CDE             3
BHF             2
NID             1
WKL             3
SDI             2
JSK             1
OSM             3
MPA             2
MAD             1

pd.pivot_table(df,index=['Group'],values=["customer_id"],aggfunc=lambda x: len(x.unique()))

Group 1  : 270
Group 2  : 180
Group 3  : 330

当我只有两个组时,我的回答很好:

I have a great answer, when I only have two groups:

df['Flag'] = df.groupby('Group')['customer_id']\
             .transform(lambda x: np.random.choice(['Control','Test'], len(x), 
                                                  p=[.5,.5] if x.name==1 else [.4,.6]))

但是,如果我想以这种方式拆分它:

But what if i want to split it this way:

  • 第1组:50%的控制权& 50%测试
  • 第2组:40%的控制权和60%测试
  • 第3组:控制和控制20% 80%测试

@MaxU的答案很好,但不幸的是,划分并不准确

@MaxU's answer is great, but unfortunately the split is not exact

d = {1:[.5,.5], 2:[.4,.6], 3:[.2,.8]}

df['Flag'] = df.groupby('Group')['customer_id'] \
             .transform(lambda x: np.random.choice(['Control','Test'], len(x), p=d[x.name]))

当我测试它时,我没有得到精确的分割.

When i test it, I don't get exact splits.

pd.pivot_table(df,index=['Group'],values=["customer_id"],columns=['Flag'], aggfunc=lambda x: len(x.unique()))

           Control  Treatment
Group 1:    138       132
Group 2:    78        102
Group 3:    79        251

第1组应该是135/135.

Group 1 should be 135/135.

推荐答案

听起来您正在寻找一种将customer_id分成精确比例而不依赖机会的方法.这是使用pandas.qcutnp.random.permutation做到这一点的一种方法.

It sounds like you're looking for a way to split your customer_id's into exact proportions, and not rely on chance. Here's one way to do that using pandas.qcut and np.random.permutation.

In [228]: df = pd.DataFrame({'customer_id': np.random.normal(size=10000), 
                             'group': np.random.choice(['a', 'b', 'c'], size=10000)})

In [229]: proportions = {'a':[.5,.5], 'b':[.4,.6], 'c':[.2,.8]}

In [230]: df.head()
Out[230]:
   customer_id group
0       0.6547     c
1       1.4190     a
2       0.4205     a
3       2.3266     a
4      -0.5691     b

In [231]: def assigner(gp):
     ...:     group = gp['group'].iloc[0]
     ...:     cut = pd.qcut(
                  np.arange(gp.shape[0]), 
                  q=np.cumsum([0] + proportions[group]), 
                  labels=range(len(proportions[group]))
              ).get_values()
     ...:     return pd.Series(cut[np.random.permutation(gp.shape[0])], index=gp.index, name='assignment')
     ...:

In [232]: df['assignment'] = df.groupby('group', group_keys=False).apply(assigner)

In [233]: df.head()
Out[233]:
   customer_id group  assignment
0       0.6547     c           1
1       1.4190     a           1
2       0.4205     a           0
3       2.3266     a           1
4      -0.5691     b           0

In [234]: (df.groupby(['group', 'assignment'])
             .size()
             .unstack()
             .assign(proportion=lambda x: x[0] / (x[0] + x[1])))
Out[234]:
assignment     0     1  proportion
group
a           1659  1658      0.5002
b           1335  2003      0.3999
c            669  2676      0.2000

这是怎么回事?

  1. 在每个组中,我们都调用函数assigner
  2. assigner从预定义的词典中获取组名和比例,然后调用pd.qcut拆分为0(控制)1(处理)
  3. np.random.permutation然后随机分配
  4. 在原始数据框中将其创建为新列
  1. Within each group we call the function assigner
  2. assigner grabs the group name and proportions from the predefined dictionary and calls pd.qcut to split into 0(control) 1(treatment)
  3. np.random.permutation then shuffles the the assignments
  4. Create this as a new column in the original dataframe

这篇关于根据百分比为两个以上的组随机分配对照组和治疗组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆