为每个客户提供不同样本量的样本 [英] Sample with different sample sizes per customer

查看：61 发布时间：2021/6/13 20:56:12 python python-3.x pandas sampling

本文介绍了为每个客户提供不同样本量的样本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个这样的数据框

    Customer   Day
0.    A         1
1.    A         1
2.    A         1
3.    A         2
4.    B         3
5.    B         4

我想从中取样，但我想为每个客户取样不同的尺寸.我在另一个数据框中有每个客户的大小.例如，

and I want to sample from it but I want to sample different sizes for each customer. I have the size of each customer in another dataframe. For example,

    Customer   Day
0.    A         2
1.    B         1

假设我想每天为每位客户取样.到目前为止，我有这个功能:

Suppose I want to sample per customer per day. So far I have this function:

def sampling(frame,a): 
    return np.random.choice(frame.Id,size=a) 

grouped = frame.groupby(['Customer','Day'])
sampled = grouped.apply(sampling, a=??).reset_index()

如果我将 size 参数设置为全局常量，它运行没有问题.但是当不同的值位于单独的数据帧上时，我不知道如何设置.

If I set the size parameter to a global constant, no problem it runs. But I don't know how to set this when the different values are on a separate dataframe.

推荐答案

您可以从具有样本大小的 df1 创建映射器并将该值用作样本大小，

You can create a mapper from the df1 with sample size and use that value as sample size,

mapper = df1.set_index('Customer')['Day'].to_dict()

df.groupby('Customer', as_index=False).apply(lambda x: x.sample(n = mapper[x.name]))


       Customer Day
0   3   A       2
    2   A       1
1   4   B       3

这个返回多索引，你可以随时reset_index，

This returns multi-index, you can always reset_index,

df.groupby('Customer').apply(lambda x: x.sample(n = mapper[x.name])).reset_index(drop = True)

    Customer    Day
0   A           1
1   A           1
2   B           3

这篇关于为每个客户提供不同样本量的样本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为每个客户提供不同样本量的样本 [英] Sample with different sample sizes per customer

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为每个客户提供不同样本量的样本 [英] Sample with different sample sizes per customer

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭