带权重的 Pandas 样本 [英] Pandas sample with weights
问题描述
我有 df
并且我想从中抽取一些关于某些变量分布的样本.假设 df['type'].value_counts(normalize=True)
返回:
I have df
and I'd like to make some sampling from it with respect to distribution of some variable. Let's say df['type'].value_counts(normalize=True)
returns:
0.3 A
0.5 B
0.2 C
我想做类似 sampledf = df.sample(weights=df['type'].value_counts())
的东西,这样 sampledf ['type'].value_counts(normalize=True)
将返回几乎相同的分布.如何在此处按频率传递 dict?
I'd like to make something like sampledf = df.sample(weights=df['type'].value_counts())
such that sampledf ['type'].value_counts(normalize=True)
will return almost the same distridution. How to pass dict with frequency here?
推荐答案
Weights
必须采用 与原始df长度相同的系列,所以最好将其添加为一列:
Weights
has to take a series of the same length as the original df, so best is to add it as a column:
df['freq'] = df.groupby('type')['type'].transform('count')
sampledf = df.sample(weights = df.freq)
或者不添加列:
sampledf = df.sample(weights = df.groupby('type')['type'].transform('count'))
这篇关于带权重的 Pandas 样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!