用 pandas 中另一个时间序列的值替换一个时间序列的值 [英] Replacing the values of a time series with the values of another time series in pandas

查看:90
本文介绍了用 pandas 中另一个时间序列的值替换一个时间序列的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个DataFrame:

I have two DataFrames:

s1: 
         time     X1
0  1234567000  96.32
1  1234567005  96.01
2  1234567009  96.05

s2: 
         time     X2
0  1234566999  23.88
1  1234567006  23.96

我想在保留时间戳的同时用第二个DataFrame替换第一个时间序列/DataFrame的值,以获得:

I would like to replace the values of the first time series/DataFrame with the second DataFrame while keeping the timestamp, to obtain:

frame: 
         time     X2
0  1234567000  23.88
1  1234567005  23.88
2  1234567009  23.96

输出(frame)的时间戳应为s1,但值应为s2. time是整数(它不是UNIX时间戳记). X1X2是浮动的.

The output (frame) should have the timestamps of s1 but the values of s2. time is integer (It isn't a UNIX timestamp). X1 and X2 are float.

用熊猫有什么整洁的方法吗?

Is there any neat way to do it with pandas?

我目前使用的是外部联接/合并+ fillna +内部联接/合并+ del列的链,但这似乎效率不高.

I currently use a chain of outer join/merge + fillna + inner join/merge + del columns, but that doesn't seem efficient.

from __future__ import print_function
import pandas as pd

def merge_dataframes(s1, s2, common_column, back_fill=False, verbose=False):
    if verbose: print('s1: \n{0}'.format(s1))
    if verbose: print('s2: \n{0}'.format(s2))
    frame = pd.merge(s1,s2,how='outer').sort_values(by=common_column)
    if verbose: print('frame: \n{0}'.format(frame))
    frame.fillna(method='ffill', inplace=True)
    if verbose: print('frame: \n{0}'.format(frame))
    frame = pd.merge(frame,s1,how='inner').sort_values(by=common_column)
    if verbose: print('frame: \n{0}'.format(frame))        
    for column_name in s1.columns:
        if (column_name not in common_column) and (column_name not in s2.columns):
            del frame[column_name]
    if back_fill:
        frame.fillna(method='bfill', inplace=True)
        if verbose: print('frame: \n{0}'.format(frame))            
    return frame

def main():
    '''
    Demonstrate the use of merge_dataframes(s1, s2, common_column)
    '''
    s1 = pd.DataFrame({
        'time':[1234567000,1234567005,1234567009],
        'X1':[96.32,96.01,96.05]
    },columns=['time','X1'])  

    s2 = pd.DataFrame({
        'time':[1234566999,1234567006],
        'X2':[23.88,23.96]
    },columns=['time','X2'])  

    common_column = 'time'
    frame = merge_dataframes(s1, s2, common_column, verbose=True)
    print('frame: \n{0}'.format(frame))

if __name__ == "__main__":
    main()
    #cProfile.run('main()') # if you want to do some profiling

推荐答案

pd.merge_asof在您的示例中对我有用

pd.merge_asof works for me on your sample

pd.merge_asof(s1,s2,on='time')
Out[108]: 
         time     X1     X2
0  1234567000  96.32  23.88
1  1234567005  96.01  23.88
2  1234567009  96.05  23.96

编辑-绝对合并的解决方案

def Matcher2(value,mat):
    return np.argmin(np.absolute(mat-value))

mat = s2.time.as_matrix()
s1['dex'] = s1.time.apply(lambda row: Matcher2(row,mat))
mg = pd.merge(s1,s2,left_on='dex',right_index=True,how='left')
print mg[['time_x','X1','X2']]

       time_x     X1     X2
0  1234567000  96.32  23.88
1  1234567005  96.01  23.96
2  1234567009  96.05  23.96

这篇关于用 pandas 中另一个时间序列的值替换一个时间序列的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆