为每个客户提供不同样本量的样本 [英] Sample with different sample sizes per customer

查看:61
本文介绍了为每个客户提供不同样本量的样本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的数据框

    Customer   Day
0.    A         1
1.    A         1
2.    A         1
3.    A         2
4.    B         3
5.    B         4

我想从中取样,但我想为每个客户取样不同的尺寸.我在另一个数据框中有每个客户的大小.例如,

and I want to sample from it but I want to sample different sizes for each customer. I have the size of each customer in another dataframe. For example,

    Customer   Day
0.    A         2
1.    B         1

假设我想每天为每位客户取样.到目前为止,我有这个功能:

Suppose I want to sample per customer per day. So far I have this function:

def sampling(frame,a): 
    return np.random.choice(frame.Id,size=a) 

grouped = frame.groupby(['Customer','Day'])
sampled = grouped.apply(sampling, a=??).reset_index()

如果我将 size 参数设置为全局常量,它运行没有问题.但是当不同的值位于单独的数据帧上时,我不知道如何设置.

If I set the size parameter to a global constant, no problem it runs. But I don't know how to set this when the different values are on a separate dataframe.

推荐答案

您可以从具有样本大小的 df1 创建映射器并将该值用作样本大小,

You can create a mapper from the df1 with sample size and use that value as sample size,

mapper = df1.set_index('Customer')['Day'].to_dict()

df.groupby('Customer', as_index=False).apply(lambda x: x.sample(n = mapper[x.name]))


       Customer Day
0   3   A       2
    2   A       1
1   4   B       3

这个返回多索引,你可以随时reset_index,

This returns multi-index, you can always reset_index,

df.groupby('Customer').apply(lambda x: x.sample(n = mapper[x.name])).reset_index(drop = True)

df.groupby('Customer').apply(lambda x: x.sample(n = mapper[x.name])).reset_index(drop = True)

    Customer    Day
0   A           1
1   A           1
2   B           3

这篇关于为每个客户提供不同样本量的样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆