pandas :时间序列数据:如何选择一个小时，一天或一分钟的行? [英] Pandas: Timeseries data: How to select rows of an hour or a day or a minute?

查看：85 发布时间：2020/5/24 3:52:14 python python-2.7 pandas time-series

本文介绍了 pandas :时间序列数据:如何选择一个小时，一天或一分钟的行?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在.csv文件中有大量的时间序列数据集.文件中有两列:

I have huge time series dataset in a .csv file. There are two columns in the file:

values:这些是样本值.
dttm_utc:这是收集样本的时间戳.

values: These are sample values.
dttm_utc: These are the timestamps when the samples are collected.

我已经使用pd.read_csv(..., parse_dates=["dttm_utc"])将数据导入了熊猫.当我打印dttm_utc列的前50行时，它们看起来像这样:

I've imported the data into pandas using pd.read_csv(..., parse_dates=["dttm_utc"]). When I print the first 50 rows of dttm_utc column, they looks like this:

0    2012-01-01 00:05:00
1    2012-01-01 00:10:00
2    2012-01-01 00:15:00
3    2012-01-01 00:20:00
4    2012-01-01 00:25:00
5    2012-01-01 00:30:00
6    2012-01-01 00:35:00
7    2012-01-01 00:40:00
8    2012-01-01 00:45:00
9    2012-01-01 00:50:00
10   2012-01-01 00:55:00
11   2012-01-01 01:00:00
12   2012-01-01 01:05:00
13   2012-01-01 01:10:00
14   2012-01-01 01:15:00
15   2012-01-01 01:20:00
16   2012-01-01 01:25:00
17   2012-01-01 01:30:00
18   2012-01-01 01:35:00
19   2012-01-01 01:40:00
20   2012-01-01 01:45:00
21   2012-01-01 01:50:00
22   2012-01-01 01:55:00
23   2012-01-01 02:00:00
24   2012-01-01 02:05:00
25   2012-01-01 02:10:00
26   2012-01-01 02:15:00
27   2012-01-01 02:20:00
28   2012-01-01 02:25:00
29   2012-01-01 02:30:00
30   2012-01-01 02:35:00
31   2012-01-01 02:40:00
32   2012-01-01 02:45:00
33   2012-01-01 02:50:00
34   2012-01-01 02:55:00
35   2012-01-01 03:00:00
36   2012-01-01 03:05:00
37   2012-01-01 03:10:00
38   2012-01-01 03:15:00
39   2012-01-01 03:20:00
40   2012-01-01 03:25:00
41   2012-01-01 03:30:00
42   2012-01-01 03:35:00
43   2012-01-01 03:40:00
44   2012-01-01 03:45:00
45   2012-01-01 03:50:00
46   2012-01-01 03:55:00
47   2012-01-01 04:00:00
48   2012-01-01 04:05:00
49   2012-01-01 04:10:00
Name: dttm_utc, dtype: datetime64[ns]

现在，我要实现的是:

基本上，我想将数据降采样到每小时. 我想对第一个小时，第二个小时等等的样本求和并求平均值，即我想对所有编号为0-10的行的值求和并求平均值，因为它们是在第一个小时收集的，接下来我会希望对第11行和第22行之间的数据求和并取平均值，依此类推.如何使用pandas命令实现这一目标?

Basically, I would like to downsample the data down to every hour. I would like to sum and average out samples of the first hour, the second hour and so on i.e. I would like to sum and average all the values of rows numbered and 0-10 because they were collected in the first hour, next I would like to sum and average out data between rows 11 and 22 and so on. How can I achieve this using pandas commands?

现在，如果每5分钟更改一次采样，例如每2或10分钟，我希望我的解决方案仍然有效.

Right now the sampling is done every 5 minutes if it changes to, let's say, every 2 or 10 minutes I would like my solution to still work.

推荐答案

您的示例数据是Series，但是您的问题是要对行的值求和和求平均值，所以我不清楚您要求和的是什么和没有示例数据的平均值.

Your example data is a Series but your question is asking about summing and averaging values of rows so I'm unclear on what you're trying to sum and average without example data.

我认为您感兴趣的是resampling，但这只有在datetime列(dttm_utc)在索引中时才能完成.

I think what you're interested in is resampling but this can only be done when the datetime column (dttm_utc) is in the index.

s = pd.Series(pd.DatetimeIndex(start='2012-01-01 00:05:00', periods=50, 
                   freq=pd.offsets.Minute(n=5)), name='dttm_utc')
s.reset_index().set_index('dttm_utc').resample(pd.offsets.Hour()).agg([np.sum, np.mean])

为您提供这个...但是它是一个多索引，使事情变得更加复杂.

Gives you this... but it's a multi-index which makes things more complicated.

                    index      
                      sum  mean
dttm_utc                       
2012-01-01 00:00:00    55   5.0
2012-01-01 01:00:00   198  16.5
2012-01-01 02:00:00   342  28.5
2012-01-01 03:00:00   486  40.5
2012-01-01 04:00:00   144  48.0

如果要删除多索引(多级列)，可以执行以下操作:

If you wanted to remove the multi-index (multi-level columns), you could do this:

new_s = s.reset_index().set_index('dttm_utc').resample(pd.offsets.Hour()).agg([np.sum, np.mean])
new_s.columns = new_s.columns.droplevel(level=0)

                     sum  mean
dttm_utc                      
2012-01-01 00:00:00   55   5.0
2012-01-01 01:00:00  198  16.5
2012-01-01 02:00:00  342  28.5
2012-01-01 03:00:00  486  40.5
2012-01-01 04:00:00  144  48.0

这篇关于 pandas :时间序列数据:如何选择一个小时，一天或一分钟的行?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas :时间序列数据:如何选择一个小时，一天或一分钟的行? [英] Pandas: Timeseries data: How to select rows of an hour or a day or a minute?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas :时间序列数据:如何选择一个小时，一天或一分钟的行? [英] Pandas: Timeseries data: How to select rows of an hour or a day or a minute?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭