Python Pandas从Groupby中选择组的随机样本 [英] Python Pandas Choosing Random Sample of Groups from Groupby

查看:347
本文介绍了Python Pandas从Groupby中选择组的随机样本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

获取groupby元素的随机样本的最佳方法是什么?据我了解,groupby只是在组上可迭代的.

What is the best way to get a random sample of the elements of a groupby? As I understand it, a groupby is just an iterable over groups.

如果我想选择N = 200元素,我将进行迭代的标准方法是:

The standard way I would do this for an iterable, if I wanted to select N = 200 elements is:

rand = random.sample(data, N)  

如果您尝试将数据进行分组"的上述操作,则由于某种原因,结果列表的元素将成为元组.

If you attempt the above where data is a 'grouped' the elements of the resultant list are tuples for some reason.

我发现下面的示例用于随机选择单个键groupby的元素,但是不适用于多键groupby.从,如何按键访问熊猫分组数据

I found the below example for randomly selecting the elements of a single key groupby, however this does not work with a multi-key groupby. From, How to access pandas groupby dataframe by key

创建分组对象

create groupby object

grouped = df.groupby('some_key')

选择N个数据帧并获取其索引

pick N dataframes and grab their indices

sampled_df_i = random.sample(grouped.indices, N)

使用groupby对象"get_group"方法获取组

grab the groups using the groupby object 'get_group' method

df_list = map(lambda df_i: grouped.get_group(df_i),sampled_df_i)

(可选)-将其全部转换回单个数据框对象

optionally - turn it all back into a single dataframe object

sampled_df = pd.concat(df_list, axis=0, join='outer')

推荐答案

您可以对df.some_key.unique()的唯一值进行随机抽样,然后使用该样本对df进行切片,最后对所得结果中的groupby进行切片:

You can take a randoms sample of the unique values of df.some_key.unique(), use that to slice the df and finally groupby on the resultant:

In [337]:

df = pd.DataFrame({'some_key': [0,1,2,3,0,1,2,3,0,1,2,3],
                   'val':      [1,2,3,4,1,5,1,5,1,6,7,8]})
In [338]:

print df[df.some_key.isin(random.sample(df.some_key.unique(),2))].groupby('some_key').mean()
               val
some_key          
0         1.000000
2         3.666667

如果有多个groupby键:

If there are more than one groupby keys:

In [358]:

df = pd.DataFrame({'some_key1':[0,1,2,3,0,1,2,3,0,1,2,3],
                   'some_key2':[0,0,0,0,1,1,1,1,2,2,2,2],
                   'val':      [1,2,3,4,1,5,1,5,1,6,7,8]})
In [359]:

gby = df.groupby(['some_key1', 'some_key2'])
In [360]:

print gby.mean().ix[random.sample(gby.indices.keys(),2)]
                     val
some_key1 some_key2     
1         1            5
3         2            8

但是,如果您只是要获取每个组的值,则甚至不需要groubpyMultiIndex会做到:

But if you are just going to get the values of each group, you don't even need to groubpy, MultiIndex will do:

In [372]:

idx = random.sample(set(pd.MultiIndex.from_product((df.some_key1, df.some_key2)).tolist()),
                    2)
print df.set_index(['some_key1', 'some_key2']).ix[idx]
                     val
some_key1 some_key2     
2         0            3
3         1            5

这篇关于Python Pandas从Groupby中选择组的随机样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆