pandas 分层抽样 [英] Stratified Sampling in Pandas
本文介绍了 pandas 分层抽样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我查看了 Sklearn分层抽样文档以及 pandas文档和也来自熊猫的分层样本和
I've looked at the Sklearn stratified sampling docs as well as the pandas docs and also Stratified samples from Pandas and sklearn stratified sampling based on a column but they do not address this issue.
我正在寻找一种快速的pandas/sklearn/numpy方法从数据集中生成大小为n的分层样本.但是,对于少于指定采样数的行,应使用所有条目.
Im looking for a fast pandas/sklearn/numpy way to generate stratified samples of size n from a dataset. However, for rows with less than the specified sampling number, it should take all of the entries.
具体示例:
谢谢! :)
推荐答案
将数字传递给样本时,请使用min
.考虑数据帧df
Use min
when passing the number to sample. Consider the dataframe df
df = pd.DataFrame(dict(
A=[1, 1, 1, 2, 2, 2, 2, 3, 4, 4],
B=range(10)
))
df.groupby('A', group_keys=False).apply(lambda x: x.sample(min(len(x), 2)))
A B
1 1 1
2 1 2
3 2 3
6 2 6
7 3 7
9 4 9
8 4 8
这篇关于 pandas 分层抽样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文