带权重的 Pandas 样本 [英] Pandas sample with weights

查看:42
本文介绍了带权重的 Pandas 样本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 df 并且我想从中抽取一些关于某些变量分布的样本.假设 df['type'].value_counts(normalize=True) 返回:

I have df and I'd like to make some sampling from it with respect to distribution of some variable. Let's say df['type'].value_counts(normalize=True) returns:

0.3 A
0.5 B
0.2 C

我想做类似 sampledf = df.sample(weights=df['type'].value_counts()) 的东西,这样 sampledf ['type'].value_counts(normalize=True) 将返回几乎相同的分布.如何在此处按频率传递 dict?

I'd like to make something like sampledf = df.sample(weights=df['type'].value_counts()) such that sampledf ['type'].value_counts(normalize=True) will return almost the same distridution. How to pass dict with frequency here?

推荐答案

Weights 必须采用 与原始df长度相同的系列,所以最好将其添加为一列:

Weights has to take a series of the same length as the original df, so best is to add it as a column:

df['freq'] = df.groupby('type')['type'].transform('count')
sampledf = df.sample(weights = df.freq)

或者不添加列:

sampledf = df.sample(weights = df.groupby('type')['type'].transform('count'))

这篇关于带权重的 Pandas 样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆