如何从 pandas 多索引中获取随机(引导)样本 [英] How to get a random (bootstrap) sample from pandas multiindex

查看:73
本文介绍了如何从 pandas 多索引中获取随机(引导)样本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从Pandas中的多索引数据帧创建自举示例.下面是一些代码,可以生成所需的数据.

I'm trying to create a bootstrapped sample from a multiindex dataframe in Pandas. Below is some code to generate the kind of data I need.

from itertools import product
import pandas as pd
import numpy as np

df = pd.DataFrame({'group1': [1, 1, 1, 2, 2, 3],
                       'group2': [13, 18, 20, 77, 109, 123],
                       'value1': [1.1, 2, 3, 4, 5, 6],
                       'value2': [7.1, 8, 9, 10, 11, 12]
                       })
df = df.set_index(['group1', 'group2'])

print df

df数据帧如下:

                   value1  value2
group1 group2                
1      13         1.1     7.1
       18         2.0     8.0
       20         3.0     9.0
2      77         4.0    10.0
       109        5.0    11.0
3      123        6.0    12.0

我想从第一个索引中获取随机样本.例如,假设随机值np.random.randint(3,size=3)产生[3,2,2].我希望结果数据框看起来像:

I want to get a random sample from the first index. For example let's say the random values np.random.randint(3,size=3) produces [3,2,2]. I'd like the resultant dataframe to look like:

                   value1  value2
group1 group2                
3      123        6.0    12.0
2      77         4.0    10.0
       109        5.0    11.0
2      77         4.0    10.0
       109        5.0    11.0

我花了很多时间对此进行研究,但未能找到类似的示例,其中多索引值是整数,二级索引是可变长度,一级索引样本在重复.这就是我认为合适的自举实现方式将起作用的方式.

I've spent a lot of time researching this and I've been unable to find a similar example where the multiindex values are integers, the secondary index is of variable length, and the primary index samples are repeating. This is how I think an appropriate implementation for bootstrapping would work.

推荐答案

尝试:

df.unstack().sample(3, replace=True).stack()

这篇关于如何从 pandas 多索引中获取随机(引导)样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆