日期时间 Pandas 的列操作 [英] Column Manipulations with date-Time Pandas
问题描述
我正在尝试同时对行和列进行一些列操作,包括 Pandas 中的日期和时间序列.传统上没有系列的 python 词典很棒.但是对于 Pandas 来说,这对我来说是一件新鲜事.
I am trying to do some column manipulations with row and column at same time including date and time series in Pandas. Traditionally with no series python dictionaries are great. But with Pandas it a new thing for me.
输入文件:N 个.
File1.csv, File2.csv, File3.csv, ........... Filen.csv
Ids,Date-time-1 Ids,Date-time-2 Ids,Date-time-1
56,4568 645,5545 25,54165
45,464 458,546
我正在尝试将所有文件的 Date-time
列合并到一个关于 Ids
的大数据文件中
I am trying to merge the Date-time
column of all the files into a big data file with respect to Ids
Ids,Date-time-ref,Date-time-1,date-time-2
56,100,4468,NAN
45,150,314,NAN
645,50,NAN,5495
458,200,NAN,346
25,250,53915,NAN
检查
date-time
列 - 如果不匹配,则创建一个,然后通过减去当前date- 填充与
与相应Ids
相关的值-时间值Ids
的date-time-ref
的值.
Check for
date-time
column - If not matched create one and then fill the values with respect toIds
by Subtracting the currentdate-time value
with the value ofdate-time-ref
of that respectiveIds
.
用 NAN
填充空白处,如果下一个文件具有该值,则用 NAN
Fill in empty place with NAN
and if next file has that value then replace the new value with NAN
如果是直列减法,它非常容易,但与 date-time series
和 Ids
同步似乎有点混乱.
If it were straight column subtract it was pretty much easy but in sync with date-time series
and with respect to Ids
seems a bit confusing.
感谢一些建议.提前致谢.
Appreciate some suggestions to begin with. Thanks in advance.
推荐答案
这是一种方法.
import pandas as pd
import numpy as np
from StringIO import StringIO
# your csv file contents
csv_file1 = 'Ids,Date-time-1\n56,4568\n45,464\n'
csv_file2 = 'Ids,Date-time-2\n645,5545\n458,546\n'
# add a duplicated Ids record for testing purpose
csv_file3 = 'Ids,Date-time-1\n25,54165\n645, 4354\n'
csv_file_all = [csv_file1, csv_file2, csv_file3]
# read csv into df using list comprehension
# I use buffer here, replace stringIO with your file path
df_all = [pd.read_csv(StringIO(csv_file)) for csv_file in csv_file_all]
# processing
# =====================================================
# concat along axis=0, outer join on axis=1
merged = pd.concat(df_all, axis=0, ignore_index=True, join='outer').set_index('Ids')
Out[206]:
Date-time-1 Date-time-2
Ids
56 4568 NaN
45 464 NaN
645 NaN 5545
458 NaN 546
25 54165 NaN
645 4354 NaN
# custom function to handle/merge duplicates on Ids (axis=0)
def apply_func(group):
return group.fillna(method='ffill').iloc[-1]
# remove Ids duplicates
merged_unique = merged.groupby(level='Ids').apply(apply_func)
Out[207]:
Date-time-1 Date-time-2
Ids
25 54165 NaN
45 464 NaN
56 4568 NaN
458 NaN 546
645 4354 5545
# do the subtraction
master_csv_file = 'Ids,Date-time-ref\n56,100\n45,150\n645,50\n458,200\n25,250\n'
df_master = pd.read_csv(io.StringIO(master_csv_file), index_col=['Ids']).sort_index()
# select matching records and horizontal concat
df_matched = pd.concat([df_master,merged_unique.reindex(df_master.index)], axis=1)
# use broadcasting
df_matched.iloc[:, 1:] = df_matched.iloc[:, 1:].sub(df_matched.iloc[:, 0], axis=0)
Out[208]:
Date-time-ref Date-time-1 Date-time-2
Ids
25 250 53915 NaN
45 150 314 NaN
56 100 4468 NaN
458 200 NaN 346
645 50 4304 5495
这篇关于日期时间 Pandas 的列操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!