合并两个Pandas数据帧,在一个时间列上重新采样,进行插值 [英] Combine two Pandas dataframes, resample on one time column, interpolate

查看:105
本文介绍了合并两个Pandas数据帧,在一个时间列上重新采样,进行插值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我关于stackoverflow的第一个问题.放开我吧!

我有两个数据集,它们是由不同的采集系统以不同的采样率同时采集的.一个非常规律​​,而另一个则不然.我想创建一个包含两个数据集的单个数据框,并使用规则间隔的时间戳(以秒为单位)作为这两个数据集的参考.不规则采样的数据应插在规则间隔的时间戳上.

以下是一些玩具数据,证明了我正在尝试做的事情:

import pandas as pd
import numpy as np

# evenly spaced times
t1 = np.array([0,0.5,1.0,1.5,2.0])
y1 = t1

# unevenly spaced times
t2 = np.array([0,0.34,1.01,1.4,1.6,1.7,2.01])
y2 = 3*t2

df1 = pd.DataFrame(data={'y1':y1,'t':t1})
df2 = pd.DataFrame(data={'y2':y2,'t':t2})

df1和df2看起来像这样:

df1:
    t   y1
0  0.0  0.0
1  0.5  0.5
2  1.0  1.0
3  1.5  1.5
4  2.0  2.0

df2:
    t    y2
0  0.00  0.00
1  0.34  1.02
2  1.01  3.03
3  1.40  4.20
4  1.60  4.80
5  1.70  5.10
6  2.01  6.03

我正在尝试合并df1和df2,将y2插值到df1.t上.理想的结果是:

df_combined:
     t   y1   y2
0  0.0  0.0  0.0
1  0.5  0.5  1.5
2  1.0  1.0  3.0
3  1.5  1.5  4.5
4  2.0  2.0  6.0

我一直在阅读pandas.resample的文档,以及搜索以前的stackoverflow问题,但是无法找到解决我特定问题的方法.有任何想法吗?似乎应该很容易.

更新: 我想出了一个可能的解决方案:首先插值第二个序列,然后附加到第一个数据帧:

from scipy.interpolate import interp1d
f2 = interp1d(t2,y2,bounds_error=False)
df1['y2'] = f2(df1.t)

给出:

df1:
    t   y1   y2
0  0.0  0.0  0.0
1  0.5  0.5  1.5
2  1.0  1.0  3.0
3  1.5  1.5  4.5
4  2.0  2.0  6.0

那行得通,但是如果有更好的方法,我仍然愿意接受其他解决方案.

解决方案

如果从Series构造单个DataFrame,则使用时间值作为索引,如下所示:

>>> t1 = np.array([0, 0.5, 1.0, 1.5, 2.0])
>>> y1 = pd.Series(t1, index=t1)

>>> t2 = np.array([0, 0.34, 1.01, 1.4, 1.6, 1.7, 2.01])
>>> y2 = pd.Series(3*t2, index=t2)

>>> df = pd.DataFrame({'y1': y1, 'y2': y2})
>>> df
       y1    y2
0.00  0.0  0.00
0.34  NaN  1.02
0.50  0.5   NaN
1.00  1.0   NaN
1.01  NaN  3.03
1.40  NaN  4.20
1.50  1.5   NaN
1.60  NaN  4.80
1.70  NaN  5.10
2.00  2.0   NaN
2.01  NaN  6.03

您可以简单地插入,并仅选择定义了y1的部分:

>>> df.interpolate('index').reindex(y1)
      y1   y2
0.0  0.0  0.0
0.5  0.5  1.5
1.0  1.0  3.0
1.5  1.5  4.5
2.0  2.0  6.0

This is my first question on stackoverflow. Go easy on me!

I have two data sets acquired simultaneously by different acquisition systems with different sampling rates. One is very regular, and the other is not. I would like to create a single dataframe containing both data sets, using the regularly spaced timestamps (in seconds) as the reference for both. The irregularly sampled data should be interpolated on the regularly spaced timestamps.

Here's some toy data demonstrating what I'm trying to do:

import pandas as pd
import numpy as np

# evenly spaced times
t1 = np.array([0,0.5,1.0,1.5,2.0])
y1 = t1

# unevenly spaced times
t2 = np.array([0,0.34,1.01,1.4,1.6,1.7,2.01])
y2 = 3*t2

df1 = pd.DataFrame(data={'y1':y1,'t':t1})
df2 = pd.DataFrame(data={'y2':y2,'t':t2})

df1 and df2 look like this:

df1:
    t   y1
0  0.0  0.0
1  0.5  0.5
2  1.0  1.0
3  1.5  1.5
4  2.0  2.0

df2:
    t    y2
0  0.00  0.00
1  0.34  1.02
2  1.01  3.03
3  1.40  4.20
4  1.60  4.80
5  1.70  5.10
6  2.01  6.03

I'm trying to merge df1 and df2, interpolating y2 on df1.t. The desired result is:

df_combined:
     t   y1   y2
0  0.0  0.0  0.0
1  0.5  0.5  1.5
2  1.0  1.0  3.0
3  1.5  1.5  4.5
4  2.0  2.0  6.0

I've been reading documentation for pandas.resample, as well as searching previous stackoverflow questions, but haven't been able to find a solution to my particular problem. Any ideas? Seems like it should be easy.

UPDATE: I figured out one possible solution: interpolate the second series first, then append to the first data frame:

from scipy.interpolate import interp1d
f2 = interp1d(t2,y2,bounds_error=False)
df1['y2'] = f2(df1.t)

which gives:

df1:
    t   y1   y2
0  0.0  0.0  0.0
1  0.5  0.5  1.5
2  1.0  1.0  3.0
3  1.5  1.5  4.5
4  2.0  2.0  6.0

That works, but I'm still open to other solutions if there's a better way.

解决方案

If you construct a single DataFrame from Series, using time values as index, like this:

>>> t1 = np.array([0, 0.5, 1.0, 1.5, 2.0])
>>> y1 = pd.Series(t1, index=t1)

>>> t2 = np.array([0, 0.34, 1.01, 1.4, 1.6, 1.7, 2.01])
>>> y2 = pd.Series(3*t2, index=t2)

>>> df = pd.DataFrame({'y1': y1, 'y2': y2})
>>> df
       y1    y2
0.00  0.0  0.00
0.34  NaN  1.02
0.50  0.5   NaN
1.00  1.0   NaN
1.01  NaN  3.03
1.40  NaN  4.20
1.50  1.5   NaN
1.60  NaN  4.80
1.70  NaN  5.10
2.00  2.0   NaN
2.01  NaN  6.03

You can simply interpolate it, and select only the part where y1 is defined:

>>> df.interpolate('index').reindex(y1)
      y1   y2
0.0  0.0  0.0
0.5  0.5  1.5
1.0  1.0  3.0
1.5  1.5  4.5
2.0  2.0  6.0

这篇关于合并两个Pandas数据帧,在一个时间列上重新采样,进行插值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆