Python/Pandas-将Pandas DataFrame划分为10个不相交的,大小相等的子集 [英] Python/Pandas - partitioning a pandas DataFrame in 10 disjoint, equally-sized subsets

查看:262
本文介绍了Python/Pandas-将Pandas DataFrame划分为10个不相交的,大小相等的子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将pandas DataFrame划分为十个不相交,大小相等,随机组成的子集.

I want to partition a pandas DataFrame into ten disjoint, equally-sized, randomly composed subsets.

我知道我可以使用以下方法随机采样原始熊猫数据框架的十分之一:

I know I can randomly sample one tenth of the original pandas DataFrame using:

partition_1 = pandas.DataFrame.sample(frac=(1/10))

但是,如何获得其他九个分区?如果再次执行pandas.DataFrame.sample(frac=(1/10)),则有可能我的子集不交集.

However, how can I obtain the other nine partitions? If I'd do pandas.DataFrame.sample(frac=(1/10)) again, there exists the possibility that my subsets are not disjoint.

感谢您的帮助!

推荐答案

从此开始.

 dfm = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',  'foo', 'bar', 'foo', 'foo']*2,
                      'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three']*2}) 

     A      B
0   foo    one
1   bar    one
2   foo    two
3   bar  three
4   foo    two
5   bar    two
6   foo    one
7   foo  three
8   foo    one
9   bar    one
10  foo    two
11  bar  three
12  foo    two
13  bar    two
14  foo    one
15  foo  three

Usage: 
Change "4" to "10", use [i] to get the slices.  

np.random.seed(32) # for reproducible results.
np.array_split(dfm.reindex(np.random.permutation(dfm.index)),4)[1]
      A    B
2   foo  two
5   bar  two
10  foo  two
12  foo  two

np.array_split(dfm.reindex(np.random.permutation(dfm.index)),4)[3]

     A      B
13  foo    two
11  bar  three
0   foo    one
7   foo  three

这篇关于Python/Pandas-将Pandas DataFrame划分为10个不相交的,大小相等的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆