日期时间 Pandas 的列操作 [英] Column Manipulations with date-Time Pandas

查看:39
本文介绍了日期时间 Pandas 的列操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试同时对行和列进行一些列操作,包括 Pandas 中的日期和时间序列.传统上没有系列的 python 词典很棒.但是对于 Pandas 来说,这对我来说是一件新鲜事.

I am trying to do some column manipulations with row and column at same time including date and time series in Pandas. Traditionally with no series python dictionaries are great. But with Pandas it a new thing for me.

输入文件:N 个.

File1.csv, File2.csv, File3.csv, ........... Filen.csv 

Ids,Date-time-1    Ids,Date-time-2  Ids,Date-time-1
56,4568          645,5545         25,54165
45,464           458,546        

我正在尝试将所有文​​件的 Date-time 列合并到一个关于 Ids 的大数据文件中

I am trying to merge the Date-time column of all the files into a big data file with respect to Ids

Ids,Date-time-ref,Date-time-1,date-time-2
56,100,4468,NAN
45,150,314,NAN
645,50,NAN,5495
458,200,NAN,346
25,250,53915,NAN

  1. 检查 date-time 列 - 如果不匹配,则创建一个,然后通过减去当前 date- 填充与 Ids 相关的值-时间值与相应Idsdate-time-ref的值.

  1. Check for date-time column - If not matched create one and then fill the values with respect to Ids by Subtracting the current date-time value with the value of date-time-ref of that respective Ids.

NAN 填充空白处,如果下一个文件具有该值,则用 NAN

Fill in empty place with NAN and if next file has that value then replace the new value with NAN

如果是直列减法,它非常容易,但与 date-time seriesIds 同步似乎有点混乱.

If it were straight column subtract it was pretty much easy but in sync with date-time series and with respect to Ids seems a bit confusing.

感谢一些建议.提前致谢.

Appreciate some suggestions to begin with. Thanks in advance.

推荐答案

这是一种方法.

import pandas as pd
import numpy as np
from StringIO import StringIO

# your csv file contents
csv_file1 = 'Ids,Date-time-1\n56,4568\n45,464\n'
csv_file2 = 'Ids,Date-time-2\n645,5545\n458,546\n'
# add a duplicated Ids record for testing purpose
csv_file3 = 'Ids,Date-time-1\n25,54165\n645, 4354\n'
csv_file_all = [csv_file1, csv_file2, csv_file3]

# read csv into df using list comprehension
# I use buffer here, replace stringIO with your file path
df_all = [pd.read_csv(StringIO(csv_file)) for csv_file in csv_file_all]


# processing
# =====================================================
# concat along axis=0, outer join on axis=1
merged = pd.concat(df_all, axis=0, ignore_index=True, join='outer').set_index('Ids')


  Out[206]: 
        Date-time-1  Date-time-2
   Ids                          
   56          4568          NaN
   45           464          NaN
   645          NaN         5545
   458          NaN          546
   25         54165          NaN
   645         4354          NaN

# custom function to handle/merge duplicates on Ids (axis=0)
def apply_func(group):
    return group.fillna(method='ffill').iloc[-1]

# remove Ids duplicates
merged_unique = merged.groupby(level='Ids').apply(apply_func)


  Out[207]: 
        Date-time-1  Date-time-2
   Ids                          
   25         54165          NaN
   45           464          NaN
   56          4568          NaN
   458          NaN          546
   645         4354         5545

# do the subtraction
master_csv_file = 'Ids,Date-time-ref\n56,100\n45,150\n645,50\n458,200\n25,250\n'
df_master = pd.read_csv(io.StringIO(master_csv_file), index_col=['Ids']).sort_index()

# select matching records and horizontal concat
df_matched = pd.concat([df_master,merged_unique.reindex(df_master.index)], axis=1)

# use broadcasting
df_matched.iloc[:, 1:] = df_matched.iloc[:, 1:].sub(df_matched.iloc[:, 0], axis=0)



   Out[208]: 
        Date-time-ref  Date-time-1  Date-time-2
   Ids                                         
   25             250        53915          NaN
   45             150          314          NaN
   56             100         4468          NaN
   458            200          NaN          346
   645             50         4304         5495

这篇关于日期时间 Pandas 的列操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆