结合不同采样率的 pandas 数据框 [英] combining pandas dataframes of different sampling rates

查看:87
本文介绍了结合不同采样率的 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有三个熊猫数据框,其中包含测试期间记录的数据.一帧表示温度,另一帧表示真空,另一帧表示电压.

I have three pandas dataframes containing data that was recorded during a test. One frame is for temperature, the other for vacuum, and the other for voltage.

数据是独立捕获的,因此每个帧的时间值不会对齐.只有偶尔一帧中的时间戳在另一帧中有重复.

The data was captured independently, so that time values for each frame don't line up. Only occasionally does a time stamp from one frame have a duplicate in another frame.

我想做的是将它们组合成一个数据帧,然后对缺失的值进行插值,以便获得完整的数据帧.

What I would like to do is combine these into one data frame and then interpolate the missing values such that I have a complete dataframe.

我是熊猫的新手,并且一直在四处闲逛,但是我觉得自己什么都没有,或者即使我走的路也不对.

I'm new to pandas and have been poking around, but I don't feel like I've got anywhere or if I'm even on the right path.

推荐答案

import pandas as pd
import numpy as np

rng1 = pd.date_range(
    '1/1/2012', 
    periods=10, 
    freq='H'
)

s1 = pd.Series(
    np.arange(10),
    index=rng1
)

df1 = pd.DataFrame(
    {'temp': s1}
)

s2 = pd.Series(
    np.arange(5, 10),
    index=['1/1/2012 01:20:00',
           '1/1/2012 01:40:00',
           '1/1/2012 02:00:00',
           '1/1/2012 05:30:00',
           '1/1/2012 06:00:00']
)

df2 = pd.DataFrame(
    {'voltage': s2},
)

print df1
print df2 

--output:--
                     temp
2012-01-01 00:00:00     0
2012-01-01 01:00:00     1
2012-01-01 02:00:00     2
2012-01-01 03:00:00     3
2012-01-01 04:00:00     4
2012-01-01 05:00:00     5
2012-01-01 06:00:00     6
2012-01-01 07:00:00     7
2012-01-01 08:00:00     8
2012-01-01 09:00:00     9

                   voltage
1/1/2012 01:20:00        5
1/1/2012 01:40:00        6
1/1/2012 02:00:00        7
1/1/2012 05:30:00        8
1/1/2012 06:00:00        9


combined = df1.join(df2, how='outer')
print combined

--output:--
                     temp  voltage
2012-01-01 00:00:00     0      NaN
2012-01-01 01:00:00     1      NaN
2012-01-01 01:20:00   NaN        5
2012-01-01 01:40:00   NaN        6
2012-01-01 02:00:00     2        7
2012-01-01 03:00:00     3      NaN
2012-01-01 04:00:00     4      NaN
2012-01-01 05:00:00     5      NaN
2012-01-01 05:30:00   NaN        8
2012-01-01 06:00:00     6        9
2012-01-01 07:00:00     7      NaN
2012-01-01 08:00:00     8      NaN
2012-01-01 09:00:00     9      NaN

combined = combined.apply(
    pd.Series.interpolate, 
    args=('time',) 
)

print combined

--output:--
                         temp   voltage
2012-01-01 00:00:00  0.000000       NaN
2012-01-01 01:00:00  1.000000       NaN
2012-01-01 01:20:00  1.333333  5.000000
2012-01-01 01:40:00  1.666667  6.000000
2012-01-01 02:00:00  2.000000  7.000000
2012-01-01 03:00:00  3.000000  7.285714
2012-01-01 04:00:00  4.000000  7.571429
2012-01-01 05:00:00  5.000000  7.857143
2012-01-01 05:30:00  5.500000  8.000000
2012-01-01 06:00:00  6.000000  9.000000
2012-01-01 07:00:00  7.000000  9.000000
2012-01-01 08:00:00  8.000000  9.000000
2012-01-01 09:00:00  9.000000  9.000000

print combined.fillna(method='backfill')

--output:--
                         temp   voltage
2012-01-01 00:00:00  0.000000  5.000000
2012-01-01 01:00:00  1.000000  5.000000
2012-01-01 01:20:00  1.333333  5.000000
2012-01-01 01:40:00  1.666667  6.000000
2012-01-01 02:00:00  2.000000  7.000000
2012-01-01 03:00:00  3.000000  7.285714
2012-01-01 04:00:00  4.000000  7.571429
2012-01-01 05:00:00  5.000000  7.857143
2012-01-01 05:30:00  5.500000  8.000000
2012-01-01 06:00:00  6.000000  9.000000
2012-01-01 07:00:00  7.000000  9.000000
2012-01-01 08:00:00  8.000000  9.000000
2012-01-01 09:00:00  9.000000  9.000000

这篇关于结合不同采样率的 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆