pandas 中的随机数据块 [英] Random blocks of data in Pandas

查看:54
本文介绍了 pandas 中的随机数据块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从数据帧df中获取随机数据块.我尝试使用df.sample(10),但是它仅生成单个样本,而不生成连续的块.有没有一种方法可以对随机块(例如,包含6个连续数据点的块)进行采样?

I need to get random blocks of data from my data frame df. I have tried using df.sample(10), but it only generates individual samples, and not contiguous blocks. Is there a way to sample random blocks (for instance, blocks of 6 continuous data points)?

这是数据帧的一个示例

Year_DoY_Hour
2015-11-20 12:00:00         NaN
2015-11-20 12:30:00         NaN
2015-11-20 13:00:00         NaN
2015-11-20 13:30:00         NaN
2015-11-20 14:00:00         NaN
2015-11-20 14:30:00         NaN
2015-11-20 15:00:00    0.083298
  ...
2016-04-30 13:00:00    0.055639
2016-04-30 13:30:00    0.030809
2016-04-30 14:00:00    0.079277
2016-04-30 14:30:00    0.040736
2016-04-30 15:00:00    0.066980
2016-04-30 15:30:00    0.076448
2016-04-30 16:00:00    0.066822
2016-04-30 16:30:00    0.073143
2016-04-30 17:00:00         NaN
2016-04-30 17:30:00         NaN
2016-04-30 18:00:00         NaN
2016-04-30 18:30:00         NaN
2016-04-30 19:00:00         NaN
2016-04-30 19:30:00         NaN

所以从df开始,我需要创建3条随机选择的6行代码块.

So from df I need to create 3 randomly chosen blocks with 6 lines.

示例:

block1

2016-04-30 15:00:00    0.066980
2016-04-30 15:30:00    0.076448
2016-04-30 16:00:00    0.066822
2016-04-30 16:30:00    0.073143
2016-04-30 17:00:00         NaN
2016-04-30 17:30:00         NaN

block2

2016-04-30 09:30:00    0.036728
2016-04-30 10:00:00    0.036108
2016-04-30 10:30:00    0.031045
2016-04-30 11:00:00    0.031762
2016-04-30 11:30:00    0.033714
2016-04-30 12:00:00    0.042499

block3

2015-11-20 04:30:00         NaN
2015-11-20 05:00:00         NaN
2015-11-20 05:30:00         NaN
2015-11-20 06:00:00         NaN
2015-11-20 06:30:00         NaN
2015-11-20 07:00:00         NaN

其中块应按随机顺序排列,但块中的数据必须按顺序排列.我没有找到任何功能或类似的东西来做到这一点.

Where the blocks should be in random order, but the data within the blocks must be in sequence. I have not found any function or anything like that to do this.

推荐答案

您可以生成一个从0到数据帧长度的随机数,然后在该索引处对数据帧进行切片.

You can generate a random number from 0 to the length of the data frame, then slice the data frame at that index.

import pandas as pd
import numpy as np

# create a fake data frame
index = pd.DatetimeIndex(start='2015-11-20', end='2016-04-30', freq='30min')
df = pd.DataFrame(np.random.normal(loc=10, size=len(index)), index=index, columns=['vals'])

# set the block size and the number of samples
block_size = 6
num_samples = 3
samples = [df.iloc[x:x+block_size] for x in np.random.randint(len(df), size=num_samples)]

# check results
samples[0]
                          vals
2016-01-06 00:30:00  10.313824
2016-01-06 01:00:00   9.445082
2016-01-06 01:30:00  11.952581
2016-01-06 02:00:00   9.496415
2016-01-06 02:30:00  10.404322
2016-01-06 03:00:00   8.506910

samples[1]
                          vals
2015-12-23 02:00:00  10.472048
2015-12-23 02:30:00  10.276933
2015-12-23 03:00:00  10.013481
2015-12-23 03:30:00  11.293218
2015-12-23 04:00:00  10.258379
2015-12-23 04:30:00   9.543600

samples[2]
                          vals
2016-01-10 06:00:00  10.809594
2016-01-10 06:30:00   8.953594
2016-01-10 07:00:00  10.254928
2016-01-10 07:30:00   9.911142
2016-01-10 08:00:00  10.377016
2016-01-10 08:30:00  11.907871

这篇关于 pandas 中的随机数据块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆